Twenty years ago data was mostly stored in databases. These databases housed all the data a business would need to do analytics. Transaction data, sales data, customer data, demographic data was all neatly collected, stored and analyzed in databases.
A surprising number of companies still store most of their data in databases. It works well for business that just need to look at historical data to conduct basic descriptive analytics.
About ten years ago the amount of data captured in a business and the growing diversity in date sources and data storage brought about the mainstream use of data warehouses in the business world.
Data warehouse are often a collection of databases interconnected so that data can be brought together into one place for reporting and analysis.
Whether you are working with a data base or a data warehouse, you should have a basic understanding of how data is stored. It should be in table format, with header columns and data rows.
A good way to quickly assess the analytics culture of a business is to look at how data is shared among management. Does it look table like? Or is it obvious that most of the time spent by the author was put into decorating? If you can’t easy sort something, then you are not dealing with a good data culture.
The best way to have a good data culture is to have well documented data structures. Any dB admin worth a grain of salt has the data hierarchy mapped out and has a knowledge base to help users know what data is in each field.
Like with finding data, being good at storing data starts with knowing the environment. Any good analyst should have a basic understanding of how to use SQL to pull a query for a data table. Even if you cant do hard core coding, know how data is generally stored in a structure is key.
Another important concept about data warehouses if you have to know how to join or blend data from different sources. When you have multiple data tables in a warehouse you often need to join the data on a common field. Data blending goes on step further as you are often trying to take data that doesn’t have a natural point on common that is easy to join on. Advanced data warehouses and data management tools can blend things easily, but its still important to understand the core concepts of how to join and blend data.
As I mentioned in earlier posts, there is now a new concept taking root that one up data warehouses. Data lakes are being used to address the fact that we have more unstructured data then we have structured data. Data bases and data warehouses were designed only to handle structured data the easily fits into a data able.
Now we have to collect data from images, videos, blogs, comments and other places that are not easily converted to a value. Data blending across both traditional structured data warehouses and new types of data is not easily done in most data warehouses so tools are being developed to bridge this gap.
The lake is no longer a place just to fish, but also to do all the other things a lake can be used for.
So, when it comes to understanding data warehouses, learn who built and/or maintains it and buy them a cup of coffee. Get your hands on the data dictionary, knowledge base, FAQ, metadata.. whatever you can to map out the data environment. If you do that then you can find use the big data stored in a data warehouse to find the right data at the right time.