Data visualisation is a powerful tool that allows us to better understand vast datasets, communicate data analysis, and communicate a hidden story behind our data. It exploits the way we perceive information visually in order to improve our understanding and comprehension of it.
Data visualisation can be defined as “the representation and presentation of data that exploits our visual perception abilities in order to amplify cognition.”
To help better digest the aim of visualisation the analogy I use is that data are the items in a supermarket, which - without a shopping list and a recipe - can be overwhelmingly random (have you walked into a supermarket and felt, well... lost?). If raw data is the items on the supermarket shelves, data visualisation is the Sunday roast. All of a sudden, it's so clear, the supermarket and all its moving parts fall into place. You can see how the cook selected, discarded, purchased, prepared, and presented the meal much like a data analyst might select, discard, purchase, prepare, and present data.
Let's break down the moving parts of visualisation:
What are the specific advantages of visualising data? Let’s explore a few classic examples. F.J Anscombe published a journal article in The American Statistician containing his famous quartet, which warns us of the perils of not visualising data.
All four scatter plots share the same line of best fit and correlation, however, visually the data all tell very different stories.
Anscombe's Dataset:
Producing identical summary statistics:
Resulting in very different visual representation:
As you can see visualising a data set - even a dataset with identical summary statistics - can uncover the truth behind the data set. Raw data doesn't often paint the full picture.
Another compelling example is Dr John Snow's map of London plotting the location of cholera cases in the city.
At the time officials were of the belief that diseases such as cholera and the Black Death were spread through miasma - or "bad air". Miasma theory held that some epidemics were spread through the air emanating from rotting organic matter. Dr Snow's map identified a common water pump used by residents which was later found to be contaminated discounting the theory of miasma (although the theory lived on).
With only a few examples we can see how visualisation fosters better comprehension of raw data. Arthur Conan Doyle stated that "It is a capital mistake to theorize before you have all the evidence. It biases the judgement." Perhaps he should have gone one further to add lack-of-visualisation to the capital mistakes.
What other data visualisation examples have surprised you in your career?