May 14, 2014

Why are data analysts so inefficient?

Image by Falkor, Krypt3ia








In an attempt to produce better products, explore new markets and defeat competition, companies spend huge amounts of time and money to create data processes that empower researchers, analysts and data scientists. While these processes take many different forms, most companies still aren't using data efficiently, and much of the time and money they spend on data is wasted. 

Taiichi Ohno, who is widely considered the father of lean manufacturing, defined what he believed to be the seven primary sources of waste in manufacturing. From five of these factors (conveyance, inventory, waiting, over-processing and correction) as well as one of our own (opacity), our list of "The Six Biggest Data Time Wasters" was born. In this series, we'll examine these time wasters and offer solutions on how to eliminate them.

1. CONVEYANCE

Analysts and IT professionals waste time moving data back and forth between models, users and physical locations. Some examples of this include moving databases between users, linking spreadsheets, copying and pasting data from one location to another or syncing FTP sites to access data in multiple countries.

While many organizations try to address these needs by placing spreadsheets on mirrored shared drives or creating SQL databases, this often makes the data conveyance process both inflexible and unstable. One of our customers originally set up a series of five interconnected spreadsheet models that had a total process failure after one analyst moved his output data over just one column.

Here is a blog post on recommendations on how to reduce conveyance waste.

2. INVENTORY

Collecting and holding unused data in inventory costs money. While the direct cost of storing data continues to fall, the human investment in updating or maintaining unused data as well as the opportunity cost of not using that data both continue to rise. Also, storing data that is not used or unusable can muddle the entire data ecosystem, making it harder to find the data you actually need.

The magnitude of inventory costs can vary, with examples ranging from the cost of preserving large, legacy databases all the way down to updating a small data set with superfluous data points. And while groups of all sizes face this issue, this problem is much more pronounced and costly within large organizations. One large ($10 billion +) energy company we talked to was spending millions a year to maintain and collect unused data whose value had not been determined.

Here is a blog post on recommendations on how to reduce inventory waste.

3. WAITING

"Waiting" can be classified into two core categories - human and technological. On the human side, many organizations that contain "data bureaucracies" can inadvertently create work process bottlenecks that waste enormous time and cause frustration for data users. Many of our customers have historically assigned analysts (or in some cases interns) to manage specific data sets only to find that when that person is busy, on vacation or has left the company the data is not updated.

When it comes to technology, many organizations have updating or ETL processes that refresh too infrequently.  Because of the clunkiness of this type of setup, data users can be forced to wait for the data to update, which might not happen for another several hours or even days. 

Here is a blog post on recommendations on how to reduce waiting time.

4. OVER-PROCESSING

In the spirit of Rube Goldberg, data often go through too many processing steps on the way to being useful. While these steps are intended to save time, increase consistency or standardize inputs or outputs, many companies go too far.

We have seen countless companies create structures that move data from one spreadsheet or database to another (conveyance) that adds no value. These steps can drive huge inefficiencies by inflating storage requirements, documentation time and drawing an unnecessary amount of human intervention.

Here is a blog post on recommendations on how to reduce over-processing.

5. CORRECTION (ERRORS)

Speaking of over-processing, the more complex the data process, the more error prone the results become. While this is true with any process, the complexity of data systems combined with tight project deadlines can create the ideal conditions for a mistake.

Even though these mistakes are both common and costly, quality checks are not present enough in data processes. Prior to engaging them, one of our clients was forced to redact an entire presentation built on a forecast with an out of date set of assumptions, embarrassing the team and the company while costing them future work.

6. OPACITY

Any data process that does not have documentation and transparency can result in disaster for any organization. The trigger for these catastrophes can be changes in personnel (the point person leaves), infrequent updates (the point person forgets how to do it) or simply changes in requirements (the point person needs to change his or her process).

Having seen countless companies pay the price for opacity, we believe that creating a culture that supports a transparent set of data process documentation is critical to maintaining the implicit value and going concern of your business unit or organization. This issue is discussed our Top Five Business Modeling Pitfalls blog post as well.

While no single framework can fully capture all the challenges associated with data efficiency, understanding these six factors will help organizations develop more productive team members and higher quality results. Over the coming weeks, we will publish a post for each of these six factors that provide detailed examples and suggestions on how to reduce their cost and impact.

3 comments :

  1. Really interesting article and the role I.T play could solve many of the issues you highlight. If you build a successful Data Exploration model and give the Analysts the right tool to have total freedom then many of the issues would disappear.

    ReplyDelete
  2. Hi Andy - I think you hit the nail on the head. IT can solve it, but daily intervention must be rare to avoid creating yet another layer of process.

    ReplyDelete
  3. Retrieving data from tape storage often isn't as fast as with disk storage, but that isn't too large of concern in terms of Sarbanes Oxley compliance. Self Storage

    ReplyDelete