Unstructured Data and Mining the Untapped Insight It Contains

We create a lot of data today.  IBM estimates our daily production of data is 2.5 exabytes.  That’s 2.5 billion gigabytes.  Every day.

Yet that volume is still growing, and while there’s been some dispute over the rate of increase, they are all incredible numbers.  One estimate from former Google CEO Eric Schmidt is that we produce as much data every 2 days as we did from the beginning of time through 2003.  Other estimates are that we create the same amount of data in a week today as we did in a year a decade ago.  Regardless, it’s a lot and it is growing fast.

And given the volume, everyone is talking big data and using that data to make better decisions.  For example, data is used to better target an audience so that marketing dollars can be spent more efficiently.  Data is also used to profile consumers so that the right marketing message is sent to drive higher conversions.

But as big as the numbers are, much of the data isn’t in traditionally scalable or useable format.  Emails, social media posts, phone calls, online reviews, and blogs are all narrative in form and are not structured into measurable categories or quantifiable numbers.

In a talk given by Amit Deshpande at Epsilon, he shared that only 20% of data is structured in measureable ways.  80% of data is captured in unstructured formats through various forms of media.  Yet that 80% may be as valuable, or more so, than predefined numbers.  Think of surveys that you administer whether for a conference’s speaker evaluation or a satisfaction survey.  A generic numerical score such as a 5 rating is often less insightful than when someone takes the time to fill out an open comments section.

Improving technology is changing the ability to analyze that unstructured data.  One option is text mining – software that extracts measureable sentiment from text and converts it into assigned categories of data.  For example, the below restaurant reviewer gives 4 stars on Yelp, yet there’s so much more information contained in the review other than an overall 4 star rating.


In analyzing the review, text mining would break down the text into helpful segments, interpret what the reviewer said, and score some preset categories.  As a person, I might interpret the review as follows in a couple of minutes, but a text mining application could spit out similar results for 100 of these in less than 5 seconds:

  • Food: +5
  • Wait time: -1
  • Ambiance: +3
  • Drinks: -3
  • Likely return: +4

“Translating” the review into numerical ratings for certain preset categories makes using the reviews measureable, scaleable, and comparable to benchmark future performance.  Other categories that are likely common targets in restaurant reviews and that can be programmed for ratings include service, meal (breakfast, lunch, dinner), events (happy hour, live band, special occasions), cost/value, and others.

You can imagine applications of text mining in other areas such as scanning social media posts for unsolicited reviews, analyzing recorded phone calls into a call center, and identifying repeat complaint areas in emails, just to name a few.  Today’s text mining can even detect sarcasm and assign value to tone of voice, and allows marketers to know their audience in deeper and more meaningful ways.

This is just one way of maximizing the return on unstructured data in an age of big data where insights and analysis make significant differences in the performance of marketing efforts.  And the potential seems limitless.

Leave a Reply

(Comment Guidelines)



First Name

Last Name

Company Name

Email Address