If you don’t know where that information is coming from and whether you can trust it, then it’s useless.
Imagine your data as water.
The same idea applies to big data analytics. If you don’t know where the data is coming from, your data lake will quickly start to resemble a swamp instead of what it should resemble: a reservoir, something that guarantees access, quality, and provenance.
The role of the DMAI big data analyst is at the guy managing the dam at the mouth of a big river. Data analysts constitute the foundation of a data science project and they are trusted with the responsibility of capturing, storing and processing the relevant data. Data Collection, Data Warehousing, Data Transformation and Data Analysis – these are typical tasks of a data analyst.
They are the professionals who play with the tools and frameworks, like Hadoop or HBase, in a distributed environment to ensure that all the raw data points are captured and processed correctly. The processed data is then handed over to the next group of people, the machine learning experts, for taking it further.
In order to call your data a true “reservoir” or “lake,” you big data analyst needs to be able to provide the business-level guarantees that one comes to expect from a data warehouse.
If you are able to create this type of environment the you should have no problem using data analytics in your business, then you are the ideal Big Data Analyst candidate. You are a pro with apps Hadoop, MapReduce or HBase and have the analytical skills required to become a successful data analyst.
A data analyst should be flexible to learn new tools according to the changing business needs and always be willing to upgrade to specialized techniques related to data analysis. Just like the guy controlling the flow of water from a lake to the community that lives off it.
Once we have the guy who makes sure we have the data we need, when we need it, then the DMAI Data Science Team will be complete.