Xplenty just wrapped a commissioned study of +200 BI pros and found that a third spend 50-90% of their time just cleaning raw data. This is one of the first reports to tie an actual # to the ETL process.
From my days at Wells Fargo being an analyst I know how hard it was to maximize your analysis and communication time and minimize time spent finding and cleaning data. This was especially true for me as I was using more unstructured data to do things like competitive intelligence then structured data.
I see it being even more of a challenge now because the % of unstructured data in any business has exploded the past few years. Being able to mine valuable insights from unstructured data is a time consumer, at least until you get a process in place to extract and refresh the data using some kind of technology.
In addition, businesses continue to find new data points to bring into their data warehouses, dramatically increasing the amount of structured data.
What this means is a lot of analysts are spending a lot more time looking through mountains of data to figure out exactly which data to use. Its not going to get easier.
Good data gathering methodologies and nimble BI tools can help cut down on some of the workload, but in the end we just keep making data faster then we have the ability to truly process it.
There is just no replacing the human factor of someone knowledgeable about the business who can interpret the data and decide what data to use and what not to use.
Which makes life even more challenging, because once we determine what data we want to use, we still often have to take the raw data and clean it up so it is valid and so it will fit nicely into our BI tools.
If you have having trouble figuring out what data to use in your business and if you find yourself spending far too much time cleaning the data, perhaps DMAI can help. We have a Data Science team ready to assist your organization with just these types of challenges.