In the end, big data is really a backward science.
It’s a rear-view mirror tool. Think of a super computer with OCD tirelessly sorting through past events – even in real-time as they un-spool – one that seeks out repetitive patterns in the world, makes sense of them, then sells them to the highest bidder.
Most of us remember a way of life guided by homilies, intuition, and superstition. Now we have apps like Yelp, Match.com and Pandora that tap into the power of big data and help discover our hearts desire. Just for fun, check out the documentary attached to see how music discovery was done back in the days before Pandora came on the scene.
Big data is so big you need a machine to figure it out.
Data scientists now employ a branch of statistics called “machine learning” to analyze big data sets. And central to this is the algorithm – a computer program designed by humans, then set loose on big data to provide superhuman insights and automated response. Algorithms combined with big data allow us to model the weather, execute stock trades, decode the genome, even sell more toothpaste.
To understand the math behind machine learning see here and here. Or to ponder its impact on our lives, watch the TED talk below:
Big data is a by-product of our digital lives.
Experts and pundits have begun to use the term “digital exhaust” to describe the data which we as a global community throw off each day. Some of this data comes from academic centers (think CERN or the Human Genome Project). But the lion’s share flows from the digital footprints left by individuals, businesses and institutions through the day.
Here’s a rough breakdown provided by Deloitte:
- Web & social media – click-stream and web interaction data
- Machine-to-machine – data from senors, meters, “internet of things”
- Big transaction data – customer/partner transaction data
- Biometric – data includes fingerprints, genetics, retinal scans
- Human-generated – unstructured data like email, social posts, video
And for a deeper dive, see below:
Big data is truly big, and growing exponentially.
Statistics circulating online put things into perspective. IBM says that 90% of the data in the world today was created in just the last two years. CSC projects a 4300% increase in data production by 2020. IDC predicts that by 2020 the annual data creation will reach 40 Zettabytes – equivalent to 360 million years of BlueRay quality video or the data created if every US citizen wrote 3 tweets per minute for the next 598,867 years.
For more information see here, here, or here. Or watch commentary below by John Dodge of the Enterprise CIO Forum:
Data can be surprisingly beautiful.
Thanks to new data visualization techniques, and the artists and designers pioneering them, we can transcend traditional numeric tables and now see data in a whole new way.
Consider the Literary Organism project by designer Stefanie Posavec. Through path analysis and color coding she has transformed the narrative structure of Kerouac’s On The Road into a graphic rendition of both rich factual content and delicate beauty. Also, spend 20 minutes with Jer Thorp, NYT Artist In Residence, in the talk below where he shares his passion for the emerging field of data visualization. Beauty is truth!