For decades we have been focusing on the architecture of data; now the time has come to turn our attention to the economics of data. . .
It is a tired cliché that Silicon Valley and high tech revolution resemble the 1849 California Gold Rush. But, at the dawn of the Big Data era, that analogy may be more true than ever. . . not to the Gold Rush itself, but to what came immediately thereafter.
As any student of California history knows, the Gold Rush began when John Marshall, foreman at Sutter’s Mill near Sacramento, saw an interesting color in bottom of a creek and pulled out a gold nugget. . . and then proceeded to announce his discovery to everyone he met. Within a year, thousands of prospectors had arrived from the East and swarmed over every creek and river in the Sierra Nevada foothills.
Though it is hard to believe, many of the stories are true about those first prospectors literally dipping their pans in creek gravel and pulling out gold nuggets worth a fortune. But it is also true that this bonanza didn’t last for long. Within months, the prospectors had switched to ‘Long Toms” and other devices to cull through large amounts of shoveled river gravel. And by the end of the decade, the age of the gold panner was over, replaced by mine companies and other corporate interests.
The most memorable of these new gold prospectors – you can still see the erosion and other ecological damage from their work more than a century later – were the ‘placer’ miners: men who used water ‘monitor’ cannons to blast through hillsides and expose old gravel bars for mass processing and extraction. The original 49ers were doing what might be characterized as ‘opportunistic’ gold mining – that is, the likely locations were accessible and largely known, and the gold was near the surface and in abundance. By comparison, for the placer miners, the gold was increasingly rare, difficult to locate, and buried deep beneath the surface – and ultimately it became an economic equation: how much gravel could you move, and how much gold were you likely to find, to justify the cost of equipment and operation?
The story of data – and we realize now that it is a ‘story’ – has followed a similar trajectory. The seventy-year history of the computer era to date can be seen as the Gold Rush phase of data. That is, the (comparatively) small amounts of data captured and stored – sales figures, payroll, scientific results, etc. — has been of high value, easy to capture, and increasingly accessible. And you can see the impact that this Data Gold Rush has had on every corner of modern life.
But with the rise of the Internet, low-cost sensors and microcontrollers and vast server farms, those deposits of high rich data ‘ore’ are becoming increasingly difficult to find. There is still point of sale information of course, as well as other caches of rich data (census, scientific research results, medical data, etc.). . . but they have now been overwhelmed by a giant wave of new, much lower-grade data streaming by the terabytes from everything from tablets and smartphones to billions of autonomous sensor measuring everything from weather to human digestive tracts.
Over the years, a lot of companies have made huge fortunes selling the tools – the picks and pans – needed by their clients, the corporate ‘panners’, to extract the rich data gold from the comparatively small streams of the digital world. Companies like Vertica, SAP, Excellitics, Oracle, IBM, and Visa have perfected the art of diving into the wide, but shallow pools of data and coming up with valuable returns for their customer.
Not surprisingly, this easy accessibility to useful data has led to the widespread belief, rarely verbalized, that all data is equal. And, until now, that assumption has been pretty much valid. But the world is changing fast around us: there are hundreds of millions of laptop and personal computers out there and soon there will be several billion smart phones. By the end of the decade, given the plummeting prices thanks to Moore’s Law, digital sensors will be almost free and embedded in the world in numbers that will make even those consumer product numbers look tiny. . . and all are going to be compiling data on the world and everything taking place in it. This means that, as busy as the digital world is now, by then we’ll likely be spewing out a year’s worth of today’s data in an hour.
That’s a lot of information. But here’s the thing: buried in that Himalayas of data will be some incredibly valuable information – including some stunning new revelations that we can’t yet imagine, and in fact, can’t yet even capture. Waiting out there could be some profound discoveries about human and animal behavior, epidemiology and a host of other sciences. By the same token, the capture and processing of all of this data may very well enable us for the first time in history to create true mass customization of products and services – i.e. customized clothing, shelter, entertainment, work and healthcare for every individual.
That’s the great attraction of Big Data, and that’s why it has become the hot new topic in the high tech world: it’s potential is almost unlimited.
All of that sounds just great. But there is also a problem. Now we are back to the Gold Rush analogy. Infinite riches are out there buried in the ground. But the easy stuff has already been collected. And the tools that have been successful to this point, the picks and screens we’ve been using just won’t cut it anymore. Sure, you can still use Vertica to crunch all of your data, but you’ll spend millions and never see a return on your investment. We are now out of the gold-panning business; the world of Big Data involves dynamite, water cannons, steam shovels and mega-dump trucks.
In fact, some corners of the cyber-universe are already experiencing this change. For example, as part of its IPO process, Facebook released the fact that it moves 20 terabytes of data per day. If the company were to actually try to process all of this raw, undifferentiated data using traditional data processing tools, it would soon go bankrupt trying to find those elusive bits of high value content.
All of this suggests that Big Data will not only be a revolution in data management and search, but that it will also likely bring to the fore a whole new generation of hot companies specializing in building and selling new tools for mining vast amounts of low-quality data for the rare nuggets. And beyond the tool-makers, there will also be important new companies that will behave like geologists, testing samples of this raw data to determine whether it contains copper or platinum – that is, the quality and density of its data – and that will in turn decide which mining tools need to be used.
By the same token, the new challenge facing customers who want to enjoy the benefits of Big Data (and soon enough, that will be everybody) will be determining the real value of their data – and using the right tools to process it. The company that pays for data at its true value will enjoy a crucial competitive edge.
The most successful of these new Big Data mining toolmakers and consultants will likely be those that first master and then advance the technology of data compression. Data compression has been around for decades, as anyone who has worked with computer memory well knows. But it hasn’t been a topic of great interest in the data analysis world. That’s because you can’t compress data below its useful information content. . . and since most crunched data has been information rich, there hasn’t been much application for this technique.
By comparison, in dealing with the vast realms of low-content data, just about the only way to cost-effectively refine down to the good stuff will be through high-powered data compression. So keep an eye out for the companies that create most of the new intellectual property in this technology – they will likely be the new Oracles and IBM’s of the Big Data era.
As for the old IBM’s and Oracle’s: if they don’t adapt, they will be reduced to becoming niche, high-end data specialists. Who needs a gold pan, especially one made of real gold, when you have to find four nuggets of gold in an Everest of rock and dirt?
Originally posted on Forbes.com.