Map of top 50 UK PR Twitter personalities and their followers

Source: Top 50 UK PR Twitter accounts and their followers – Porter Novelli Global via Flickr.

With all of the hype talk in enterprise circles about big data, it can be easy to lose perspective on big data – what it is, who is using it, and why you should care about it. It’s not that big data has only just emerged in the last couple of years (though you may be surprised to hear that 90% of the world’s data was created in just the last two years) – in fact, “big data” has been around for as long as statistics, mathematical modeling, and time-series analysis have been areas of applied mathematics. What’s changed is our collective ability to access big data: though the underlying principles for representing data has changed little, the physical demands that exist around big data – data warehousing and the infrastructure needed to support the storage, delivery, and management of massive quantities of data – have become much more affordable and accessible.

Where it used to be the case that access to “big data” was restricted to those with expensive high performance computing clusters and most data was kept under lock-and-key, today, a lot of useful real-time analysis can be performed on consumer-grade hardware (the Apple iPhone 5 has more computing power than NASA’s Mars Rover) and there are vast stores of open data sets available to the general public, making it possible for such analysis to be done on an unprecedented scale (check out the World Bank Open Data Catalogue and public data sets from Amazon Web Services). And with all of that power comes great responsibility: how would you even make sense of massive quantities of data, and then apply it to real-world problems?

The really smart guys at IBM have developed a framework for tackling big data, centered around four “Vs” (though there are those who would argue there are maybe even six Vs):

  • Volume – though “big data” doesn’t need to be of any specific size, we can safely say that you won’t be able to load big data sets into Microsoft Excel.
  • Velocity – just how fast data is being received, as well as how quickly the data needs to be analyzed so it can be used to make meaningful decisions.
  • Variety – the number of data sources that make up your datasets, including sensor data, plain text, rich documents, video, social analytics, etc.
  • Veracity – how reliable your datasets are, which is especially important because if you can’t trust the data in the first place, no amount of analysis will yield good results.

There is one “V” that hasn’t yet received a lot of attention: Visualization. Even with the incredible exponential increases we see in computing power year-over-year, our need to consume data far outstrips our ability to process it (cognitively or otherwise), and there is a point at which even data science is more of an art in practice (wasn’t it once said that any profession that has to qualify that it’s a science, really isn’t?). Visualization already plays a crucial role in data science, helping data scientists (not actually mathemagical superhumans as foretold in myth and legend) make sense of the structure and underlying patterns that may be held within the data, even before any serious computation begins.

Here are three reasons why visualization may very well be the biggest “V” of them all:

1. Visualization will be key to making big data an integral part of decision making. While it might seem like the real value behind big data would involve unleashing your Ph.D.-imbued data analysts on it, or pushing a button and letting a hierarchical clustering algorithm make the decision for you, the reality is that the creation of actionable insights often involves the presence of data at the right time and at the right place so that data can actually be acted upon. In scenarios where decisions have to be made in a snap, there simply isn’t the time to do analysis “offline” and then have the decision be made elsewhere. Visualization is key in those kinds of scenarios, as it is the only way to make large quantities of data accessible. As enterprises evolve their decision making to include ever-larger quantities of data, visualization will only become even more important.

2. Visualization will be the only way to make big data accessible to a large audience. Data gains true value when it is used to influence and compel a large audience into action (just have a look at Hans Rosling). Storytelling through data is an emerging art form that shows that data is not just about scalar values and multivariate analysis, but about transforming what is inherently counterintuitive into a narrative that others can relate to. Without visualization, data is just an account of the facts, but with it, data has the ability to inspire and transform the way people see the world around them.

3. Visualization will be essential to the analysis of big data so it can be of highest value. Even in “traditional” data science, visual exploration is one of the very first things that a data scientist does to try and understand what she is dealing with. What do the data look like? What do the data say? That won’t likely change when data sets get bigger and more complex – there will always be a role there for visualization to play, even as data science and big data analysis continue to evolve with new analytical methods and statistical techniques. Visualization will help data scientists understand what techniques might be best used to uncover hidden insights or patterns and will help them understand the outcomes of applying those techniques, thereby helping to guide the analysis process to desired outcomes.

Where do you think visualization will go in the age of big data? NGRAIN CEO Gabe Batstone will be speaking about the role of 3D visualization in big data at the O’Reilly Strata Conference + Hadoop World in New York this coming October. It’s still very early days in the world of big data visualization (comparatively speaking), but the future looks exciting!