When considering data visualisation, seeing really is believing.
Visual trickery and maps that changed the world.
Data visualisation is a powerful tool that allows us to better understand vast datasets, communicate data analysis, and communicate a hidden story behind our data. It exploits the way we perceive information visually in order to improve our understanding and comprehension of it.
Data visualisation can be defined as “the representation and presentation of data that exploits our visual perception abilities in order to amplify cognition.”
To help better digest the aim of visualisation the analogy I use is that data are the items in a supermarket, which - without a shopping list and a recipe - can be overwhelmingly random (have you walked into a supermarket and felt, well... lost?). If raw data is the items on the supermarket shelves, data visualisation is the Sunday roast. All of a sudden, it's so clear, the supermarket and all its moving parts fall into place. You can see how the cook selected, discarded, purchased, prepared, and presented the meal much like a data analyst might select, discard, purchase, prepare, and present data.
Let's break down the moving parts of visualisation:
- A Representation of raw data - There isn’t much we can discern from a raw, unprocessed spreadsheet full of stats and figures. However, if we can represent data in forms that we are familiar with, like geometric objects, lines, colours, we can start to gain insight. Data visualisation represents data in a visual form ready for our brains to process.
- The presentation of facts - Careful presentation of data is necessary to ensure that the story behind the data comes to light. There are many choices and decisions that need to be made when presenting your visualisation to ensure we're not deceiving our reader, we're acting ethically, and we're representing the facts as they appear in the data.
- Visual perception - Our brain is a complex and powerful pattern recognition machine. We can exploit our visual processing capabilities to quickly and accurately interpret data. Good data visualisation exploits our visual systems to enhance understanding while avoiding its pitfalls.
- Amplify cognition: Data visualisation should always inform and increase knowledge.
What are the specific advantages of visualising data? Let’s explore a few classic examples. F.J Anscombe published a journal article in The American Statistician containing his famous quartet, which warns us of the perils of not visualising data.
All four scatter plots share the same line of best fit and correlation, however, visually the data all tell very different stories.
Anscombe's Dataset:
Producing identical summary statistics:
Resulting in very different visual representation:
As you can see visualising a data set - even a dataset with identical summary statistics - can uncover the truth behind the data set. Raw data doesn't often paint the full picture.
Another compelling example is Dr John Snow's map of London plotting the location of cholera cases in the city.
At the time officials were of the belief that diseases such as cholera and the Black Death were spread through miasma - or "bad air". Miasma theory held that some epidemics were spread through the air emanating from rotting organic matter. Dr Snow's map identified a common water pump used by residents which was later found to be contaminated discounting the theory of miasma (although the theory lived on).
With only a few examples we can see how visualisation fosters better comprehension of raw data. Arthur Conan Doyle stated that "It is a capital mistake to theorize before you have all the evidence. It biases the judgement." Perhaps he should have gone one further to add lack-of-visualisation to the capital mistakes.
What other data visualisation examples have surprised you in your career?