Dark Data: Why What You Don’t Know Matters
By David Hand
A much-need counterpoint to the big-data hype of recent years – and a clarion call for us all to be constantly on the alert to unknown unknowns as well as the known unknowns.
A good cartoon captures the important features of a face or behaviour, but there is no guarantee of this. It can easily miss much that matters. Indeed, it can easily miss the most important things.
Big data is like a cartoon simplification. Although it’s meant to represent and describe the world, its abundance can mislead people into thinking they know everything. In Dark Data, the eminent statistician David Hand explores the implications of what we might be missing. He shows, through many real examples, just how serious things can get – how missing data can lead to death and disaster, failed economies and societies, and ruined lives.
Hand lays bare the ubiquity of dark data, what causes it and where it is likely to manifest itself. It can arise for many reasons, which themselves may not be obvious – asymmetric information in wars, time delays in financial trading, dropouts in clinical trials and deliberate selection to enhance apparent performance in hospitals, policing and schools. What is clear is that measuring and collecting more and more data are not guaranteed to lead to more relevant information or to better understanding.
But there’s also a more positive side to dark data. When approached from the right angle, it can lead to insights that cannot be obtained any other way. Counterintuitive though it might seem, deliberately obscuring some of the data can lead to improved predictions and better understanding – providing, of course, the right data are obscured in the right way.
The modern world of big data holds huge potential for improving the human condition as well as for misleading us. Dark Data shows how to achieve the first and avoid the second.