May 14, 2014

Why are data analysts so inefficient?

Image by Falkor, Krypt3ia

In an attempt to produce better products, explore new markets and defeat competition, companies spend huge amounts of time and money to create data processes that empower researchers, analysts and data scientists. While these processes take many different forms, most companies still aren't using data efficiently, and much of the time and money they spend on data is wasted. 

Taiichi Ohno, who is widely considered the father of lean manufacturing, defined what he believed to be the seven primary sources of waste in manufacturing. From five of these factors (conveyance, inventory, waiting, over-processing and correction) as well as one of our own (opacity), our list of "The Six Biggest Data Time Wasters" was born. In this series, we'll examine these time wasters and offer solutions on how to eliminate them.


Analysts and IT professionals waste time moving data back and forth between models, users and physical locations. Some examples of this include moving databases between users, linking spreadsheets, copying and pasting data from one location to another or syncing FTP sites to access data in multiple countries.

While many organizations try to address these needs by placing spreadsheets on mirrored shared drives or creating SQL databases, this often makes the data conveyance process both inflexible and unstable. One of our customers originally set up a series of five interconnected spreadsheet models that had a total process failure after one analyst moved his output data over just one column.

Here is a blog post on recommendations on how to reduce conveyance waste.


Collecting and holding unused data in inventory costs money. While the direct cost of storing data continues to fall, the human investment in updating or maintaining unused data as well as the opportunity cost of not using that data both continue to rise. Also, storing data that is not used or unusable can muddle the entire data ecosystem, making it harder to find the data you actually need.

The magnitude of inventory costs can vary, with examples ranging from the cost of preserving large, legacy databases all the way down to updating a small data set with superfluous data points. And while groups of all sizes face this issue, this problem is much more pronounced and costly within large organizations. One large ($10 billion +) energy company we talked to was spending millions a year to maintain and collect unused data whose value had not been determined.

Here is a blog post on recommendations on how to reduce inventory waste.


"Waiting" can be classified into two core categories - human and technological. On the human side, many organizations that contain "data bureaucracies" can inadvertently create work process bottlenecks that waste enormous time and cause frustration for data users. Many of our customers have historically assigned analysts (or in some cases interns) to manage specific data sets only to find that when that person is busy, on vacation or has left the company the data is not updated.

When it comes to technology, many organizations have updating or ETL processes that refresh too infrequently.  Because of the clunkiness of this type of setup, data users can be forced to wait for the data to update, which might not happen for another several hours or even days. 

Here is a blog post on recommendations on how to reduce waiting time.


In the spirit of Rube Goldberg, data often go through too many processing steps on the way to being useful. While these steps are intended to save time, increase consistency or standardize inputs or outputs, many companies go too far.

We have seen countless companies create structures that move data from one spreadsheet or database to another (conveyance) that adds no value. These steps can drive huge inefficiencies by inflating storage requirements, documentation time and drawing an unnecessary amount of human intervention.

Here is a blog post on recommendations on how to reduce over-processing.


Speaking of over-processing, the more complex the data process, the more error prone the results become. While this is true with any process, the complexity of data systems combined with tight project deadlines can create the ideal conditions for a mistake.

Even though these mistakes are both common and costly, quality checks are not present enough in data processes. Prior to engaging them, one of our clients was forced to redact an entire presentation built on a forecast with an out of date set of assumptions, embarrassing the team and the company while costing them future work.


Any data process that does not have documentation and transparency can result in disaster for any organization. The trigger for these catastrophes can be changes in personnel (the point person leaves), infrequent updates (the point person forgets how to do it) or simply changes in requirements (the point person needs to change his or her process).

Having seen countless companies pay the price for opacity, we believe that creating a culture that supports a transparent set of data process documentation is critical to maintaining the implicit value and going concern of your business unit or organization. This issue is discussed our Top Five Business Modeling Pitfalls blog post as well.

While no single framework can fully capture all the challenges associated with data efficiency, understanding these six factors will help organizations develop more productive team members and higher quality results. Over the coming weeks, we will publish a post for each of these six factors that provide detailed examples and suggestions on how to reduce their cost and impact.


  1. Really interesting article and the role I.T play could solve many of the issues you highlight. If you build a successful Data Exploration model and give the Analysts the right tool to have total freedom then many of the issues would disappear.

  2. Hi Andy - I think you hit the nail on the head. IT can solve it, but daily intervention must be rare to avoid creating yet another layer of process.

  3. Retrieving data from tape storage often isn't as fast as with disk storage, but that isn't too large of concern in terms of Sarbanes Oxley compliance. Self Storage

  4. I am extremely baffled by observing the entire bundle of information and I got the undertaking structure my boos to compress the huge information into the short frame and present this information to other colleagues so its huge issue for me. Then my companion enlightened me regarding this for getting the information researcher who can without much of a stretch short my information and change over into graphical frame effectively and subsequent to reaching them my issues is illuminated.

  5. One can go to a boundless number of educator drove online sessions from various coaches for 1 year at no extra expense. data science course in pune

  6. Well, the most on top staying topic is Data Analytics. Data Analytics is one of the most promising technique in the growing world. I would like to add Data Analytics training to the preference list. Out of all, Data analytics course in Mumbai is making a huge difference all across the country. Thank you so much for showing your work and thank you so much for this wonderful article.

  7. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    machine learning courses in Bangalore

  8. Such a very useful article. I have learn some new information.thanks for sharing.
    data scientist course in mumbai

  9. I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing.
    Data science course in mumbai

  10. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
    Data science course in mumbai

  11. Such a very useful article. Very interesting to read this article. I have learn some new information.thanks for sharing. ExcelR