Big Data and Customer Journey Analytics

One of the key challenge facing CMOs today is establishment of long term relationship or partnership with their customers, one that will result in continued growth of the brand. Thanks to Big Data tools and technology, it is possible to solve this challenge through Customer Journey Analytics.

Embedded below are two great presentations from McKinsey & Company on the topic and a YouTube video featuring David Edelman, Partner and Global Co-Leader of McKinsey Digital, Marketing & Sales.

Top 5 Reasons Why Big Data and Analytics Projects Fail – Part 1

A lot has been written about success and failure of Big Data and Analytics projects in recent times. Unfortunately, most of the articles and blog posts on this subject fail to highlight real reasons why Big Data projects fail. Given below are top 5 reasons, in my opinion, why most Big Data and Analytics project fail. They are:

1. Failure to define use case in objective terms

2. Failure to use right technology

3. Failure to focus on business requirements first, technology next

4. Failure to leverage all available data sets and assets

5. Failure to effectively use power of advanced analytics

In my next post, I will elaborate on these five reasons because of which Big Data projects fail and recommend ways you can avoid these pitfalls.

How to Profit from Big Divide of Big Data

When it comes to Big Data and Analytics solution implementations for enterprise clients/large business organizations, I see a big divide between what functionality business users want and are demanding from their CTO or IT organization and the functionality that CTO or IT organization can deliver. I would like to call this gap as “Big Divide of Big Data“.

Given the exponential increase in number of sources and volume of Big Data being generated, thanks to digital and sensor based productivity revolution, this gap is growing wider every passing day. And one wonders if twain shall ever meet and if this divide can ever be filled or bridged. This is one of the key challenge being faced by CIO and CTO of most large business organizations.

What is a problem for CIOs and CTOs is an opportunity for vendors and solution providers. I see a great opportunity for System Integrators (SIs) and Consulting organizations when it comes to bridging this big divide of big data.  And this can be done through development of industry or vertical specific platform solutions for leveraging Big Data and Analytics.

I see this already happening in Health Care and Life Sciences industry/vertical and expect this trend to catch on in all other industries too, especially Banking/Financial Services, Insurance and Retail. Thanks to availability of industry specific platform based solutions,  large and medium-sized enterprises will be able to leverage Big Data and Analytics without making heavy upfront investments.

What do you think? Please do share your opinion or respond on Twitter, my ID is @HKotadia.


Google Flu Trends: Importance of Veracity, the 4th “V” in Big Data

A lot has been written recently criticizing Goolge’s Flu Trends – a flu tracker service that predicts flu activity based on specific search terms using aggregated Google search data and estimates current flu activity around the world in near real-time. For more, read How does this work?

Science magazine has recently published an article titled “The Parable of Google Flu: Traps in Big Data Analysis” and Steve Lohr has published a great piece in BITS blog of New York Times titled “Google Flu Trends: The Limits of Big Data.

It is important to note that over-estimation of flu activity in Google Flu Trends is NOT a limitation of Big Data or Analytics used for estimating the flu activity as some of the writers have suggested. Rather, it highlights importance of fourth “V” of Big Data – Veracity.

It is often mentioned that Big Data has three defining attributes – three Vs as they are called, namely Data Volume, Data Variety and Data Velocity. (for more, check out TDWI Best Practices Report titled Big Data Analytics). But this definition of Big Data misses a very important dimension or element of Big Data, namely Data Veracity.

I think Google Flu Trends estimates will be much more realistic if we were to incorporate Data Veracity, the fourth dimension of Big Data into estimation models and adjust estimates based on “Veracity Score”.

In other words, inaccurate estimates of flu activity as reported by Google Flu Trends is NOT a limitation of Big Data or Analytics, rather we need to incorporate the Data Veracity element into the estimation model.

What do you think? Do you agree that inaccurate estimates of flu activity as reported by Google Flu Trends is NOT a limitation of Big Data or Analytics?

Big Data and Analytics: What to expect in 2014

Last year was a great one for technology professionals, consultants and practitioners working in the area of Big Data and Analytics. And 2014 promises to be even better. Given below are three main reasons why 2014 will be a spectacular year for those of us in Big Data and Analytics:

1) Investments in very large Big Data projects:

Many of the large and medium-sized enterprises ‘experimented’ with Big Data last year by undertaking ‘Pilot’ projects to demonstrate business value. Well, they are now ready to take next major step in their Big Data journey by undertaking major projects and are planning to invest substantial amount of money on Big data and Analytics initiatives in the year 2014

2) Maturing of Big Data tools and technology:

Another reason CIOs/CTOs of large and medium-sized enterprises are planning to make substantial investment on Big Data and Analytics initiatives in 2014 is ‘maturing’ of Big Data and Analytics tools and technology. This is the result of large-scale investments by venture capital firms in this area during last two years.

3) Rise of Industry specific platform based solution for Big Data and Analytics:

Another trend that is gaining traction is industry or vertical specific solution to leverage Big Data and Analytics. In no industry this trend is more evident than in Health Care and Life Sciences industry/vertical. I expect this trend to catch on in all other industries too, especially Banking/Financial Services, Insurance and Retail. And thanks to availability of industry specific platform based solutions,  large and medium-sized enterprises will be able to leverage Big Data and Analytics without making heavy upfront investments.

What do you think? Do you agree that 2014 will be spectacular year for Big Data and Analytics professionals? Please do share your opinion or respond on Twitter, my ID is @HKotadia.


From Data to Information to Insights: Changing Role of #CIO

There was a time not too long ago (late 1970s to be precise) when companies use to have Electronic Data Processing department and EDP Managers to manage data processing function. EDP department evolved to become Management Information Systems department in early 1980s and MIS Manager ran the show when it came to ‘computerization’ initiatives as it used to be called back then.

As technology evolved from mainframes to client-server computing and with the rise of personal computers (PCs) or desk-tops, adoption of information technology across organization picked up pace and IT was no longer limited to key business processes such as accounting and inventory management.

With expansion of computer networks and growth of the internet in 1990s, increasing number of business processes started getting ‘computerized’ resulting in decentralization of information technology and companies started moving away from having a ‘centralized’ EDP or MIS department for managing their IT functions. This decentralization of MIS (or IT) received boost with development of ‘business friendly’ software applications that just needed to be customized to meet business requirements rather than developing them from a scratch. Case in point – ERP or CRM systems.

But although key functions of MIS department were decentralized due to rapid expansion of information technology within the organization, there was need for a role to monitor and guide the adoption and use of information technology tools across organization in order to make sure that multitude of systems that were being developed ‘talked’ to each other and followed consistent organizational standards when it came to development and usage. Role of Chief Information Officer (or CIO) evolved to meet this need and most large and medium sized companies started having CIO function or role since middle of 1990s.

Role of CIOs gained importance with the expansion of internet and growth of web enabled business applications during dot com boom. And CIO’s function became critical during the outsourcing boom following dot com bust in early 2000s as growing number of CEOs relied upon their CIOs (and their ‘twin’ brother CTOs) not only to manage growing complexity of enterprise IT but also to manage it in a cost effective manner through outsourcing. As a result, CIO’s role became critical within most large and medium sized companies in the last ten years.

But as ‘short’ history of information technology can tell us that only thing that is constant in IT is change. And change at even faster pace with each passing year. To confirm, just consider changes that have taken place in IT over the past few years. We have seen IT evolve at an even faster pace, thanks to rapid growth and adoption of cloud computing, mobile devices, social media and explosion at the rate at which data is being generated by end users resulting in phenomenon that is now known as ‘Big Data’.

Add to this the fact that technology is becoming even more business and end-user friendly. Just as an example, because of cloud computing and software as a service (SaaS) model, some of the work that was done traditionally by CIO’s or CTO’s organization is now being done by outside service providers with help from functional executives in user departments. To cite another example, CMOs or key executives in marketing department of large and medium sized companies are working directly with vendors with minimal inputs from CIO or CTO of their organization to leverage social media for multi-channel marketing or customer engagement. This growing ‘consumerization’ of technology has resulted in erosion of the clout or influence CIOs had in their respective organization just a few years back.

So the question is how can CIOs regain past glory of their role? In my opinion, best way Chief Information Officers (Or CIOs) can regain past glory of their role is to adopt and grow with changing technology and evolve to become Chief Insights Officer – still a CIO! As more and more applications get migrated to cloud and with enterprise apps and data residing in cloud managed mostly by third party service providers, ‘traditional’ functions performed by CIOs are being performed by third party vendors. But what has not changed is the need for ‘quality’ information in a timely manner to aid in decision making across organization. With more data being generated outside the organization than within the organization, such as social media data, there is a need for someone at a senior level to not only monitor but guide the organization as to how data are collected, stored and most importantly analyzed in a timely manner to aid in decision making. Given the volume, velocity and variety of data being generated, it is no longer enough just to prepare ‘simple’ reports, but to derive critical insights in real time from available data, both from internal and external sources. And in my opinion, no one is better prepared to take on this challenge other than good old Chief Information Officer.

So Dear Chief Information Officer, are you ready to take on the role of Chief Insights Officer?

Big Data in Retail Industry

Here’s a great data visualization on Big Data in retail industry. Also, a FT video on the subject:

The Retailer’s Guide to Big Data

Source: Monetate Marketing Infographics


Where The Big Data Jobs Are and How Much They Pay

I am often asked the question, especially by those aspiring for a career in Big Data, as to how to find a suitable job in Big Data and how much do they pay.

Given below is an excellent Visualization by Chris Dannen that answers some of these questions like where the jobs are, how much they pay etc. Hope you find this data visualization chart useful.

In my future posts, I will highlight some of the skills necessary to get one of these job and how to get necessary training on a low budget or even free. So you may want to bookmark my blog site URL address: or sign up for email updates using this link:

Source: Big Data Jobs Around The Nation (And What They Pay)

Key Big Data Terms You Should Know

Given below is a listing of key Big Data terms that you should know and a very brief explanation of what it is in simple language. Hope you find it useful.

1. Hadoop: System for processing very large data sets
2. HDFS or Hadoop Distributed File System: For storage of large volume of data (key elements – Datanodes, Namenode and Tasktracker)
3. MapReduce: Think of it as Assembly level language for distributed computing. Used for computation in Hadoop
4. Pig: Developed by Yahoo. It is a higher level language than MapReduce
5. Hive: Higher level language developed by Facebook with SQL like syntax
6. Apache HBase: For real-time access to Hadoop data
7. Accumulo: Improved HBase with new features like cell level security
8. AVRO: New data serialization format (protocol buffers etc.)
9. Apache ZooKeeper: Distributed co-ordination system
10. HCatalog: For combining meta store of Hive and merging with what Pig does
11. Oozie: Scheduling system developed by Yahoo
12. Flume: Log aggregation system
13. Whirr: For automating hadoop cluster processing
14. Sqoop: For transfering structured data to Hadoop
15. Mahout: Machine learning on top of MapReduce
16: Bigtop: Integrate multiple Hadoop  sub-systems into one that works as a whole
17. Crunch:  Runs on top of MapReduce, Java API for tedious tasks like joining and data aggregation.
18. Giraph: Used for large scale distributed graph processing

Also, embedded below is an excellent TechTalk by Jakob Homan of LinkedIn on the subject explaining these tech terms.

Master Data Management (MDM): Key to Big Data Success

With the hype surrounding Big Data and current focus on tools and technology such as Hadoop, it is easy to forget that success of any technology project rests more on strategy and less on technology/tools. That’s true even in the case of Big Data solutions.

Architects and managers implementing Big Data solutions would do well to remember that in order to truly leverage and derive insights from Big Data, it is important to have a Master Data Management (MDM) solution in place with a repository of relevant non-transactional data entities (also known as master data).

For example, if an organization wants to leverage social media data for better sales, marketing or customer support, it is important that a master database of all customers and prospects is in place with information on social media profiles/handles for each customer. Master Data Management (MDM) “comprises a set of processes, governance, policies, standards and tools that consistently defines and manages the master data of an organization” (for more, see this).

Trying to implement a Big Data solution without a repository of relevant master data is a recipe for disaster in my opinion. What to you think? Do you agree that MDM is key to Big Data success? Please share your thoughts: