The Not So Elusive Unstructured Data – Part 2

Historically people have been talking about data within the firewall, document management or collaboration information that is not structured, such as video, photos, documents and diagrams.

However now that we are at a tipping point: There is as much value in unstructured data in terms of what customers are thinking on the web and what businesses can derive from other organizations’ data

Good Analysts know how to identify, inventory and integrate unstructured data right along side the structure business data they have always had.


Recently, BI and warehousing suppliers have been adding support for unstructured data management to their tool sets.

Many IT organizations have built their own platforms for converting unstructured data into structured records, for example, through knowledge management systems.

And new businesses are popping up to offer unstructured data collection, storage and analysis options that are integrated into the enterprise analytics solution.

Companies who get unstructured data will have a huge competitive advantage. DMAI can give you the training, consulting and analytics talent you need to stay ahead of the pack. #GrowMoreDMAI


The Ever Elusive Unstructured Data – Part 1

Per Wikipedia, Unstructured data (or unstructured information) refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.

This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.

As a result the traditional model of business analytics no longer works.


Recent discussions about Big Data are showing that about 80-90% of data currently being captured by businesses is unstructured. Just two year ago it was 50% and five years ago about 20%. The boom is unstructured data storage is fundamentally changing business analytics as we know it.

Businesses across all industries are gathering and storing more and more data on a daily basis. But when it comes to assessing the benefits and challenges of big data, sometimes it is easy to overlook one key point: Most of the business information in use today does not reside in a standard relational database.

So how do we overcome these challenges?

DMAI has the answer!


Finding The Right Data To Help A Business Is The Key To Being A Great Analyst

Knowing where to go to find the data you need is one of the most important keys to being a successful analyst.

There are three basic areas where you can go to find data:

  1. Private Company Databases and sources
  2. Public Databases and sources
  3. The Internet

Each company treats its data a little different, but you can expect them to store their data in data bases that fall into the following couple of categories:

  1. Proprietary Databases. All of the data used for analysis is kept in databases that are built and maintained by an internal IT team. They may use heavily personalized commercial software.
  2. Off the Shelf Databases. Most data is housed in a commercial database solution like Oracle, Teradata, MS Access, etc. where IT team often work in partnership with the database manufacturer.
  3. External Databases. The company does not have its own IT team and receives its data from external resources. Usually analysis is conducted via a connection to the data through the vendor.

In addition to using internal data sources, you may also find yourself surfing the web to find data for your analysis.

A lot of time it takes a combination of internal business data and things from the web to give you an overall picture.

In my experience there are three places I generally go to in search of publicly available data on the internet. I generally find what I need from either:

  1. As a general starting point for just about anything you can begin with a Wikipedia search.
  2. Google Search. To pull together press releases, news articles, images, and other pieces of data that are not statistically driven, Google is your best bet.
  3. Government Databases. The are billions upon billions of datasets out there on just about every kind of public data in terms of demographics, government spending, monetary flows and many, many other type of data.

So when you look to provide a well-rounded and detailed analysis of any business problem, the first step is always knowing where to go to get your data.