Data Here, Data There, Data Data Everywhere

As users, we are creating hundreds of terabytes of data coming from every possible digital source. If you’ve been following big data, you probably have heard of the 3 “V’s” that it is becoming associated with:

Volume – big data volume is ‘big’; terabytes possibly even petabytes of information
Velocity – big data value is in its real-time or near real-time nature
Variety – big data extends beyond structured to semi-structured or un-structured information and includes all types – audio, video, log files, web traffic and good old fashioned text

On the surface, everyone understands the opportunity in big data. The information is more than we can handle, cost of storage has become somewhat small and the options to process the information, plenty. However, when you take a peek under the surface, there is much more to analyzing data than visualizing information in pretty dashboards. The challenge and ironically the value of big data lives in data management – storing, processing and aggregating this information to find meaningful patterns in it. This is why data stores such as Apache Hadoop are quickly becoming the content store of choice. Hadoop simplifies the storage and processing of big data with its MapReduce framework and by processing vast amounts of data in parallel on large clusters of computing nodes. The challenge in managing and unlocking its real value is aggregation and is being addressed by BI and analytic vendors like Birst.

Today we announced support for Apache Hadoop via a new data connector which provides an ideal bridge between current BI investments and the Hadoop content store. While a new data connector itself could prompt the reaction of ‘so what?’, the real value of the Birst data connector lies in its ability to pull big data out of Hadoop and into Birst’s automated multi-dimension datamart. What this means to the business user – having the power to create aggregates and subsets of views on larger Hadoop data sets which would otherwise require complex ETL processes. In the simplicity of bypassing scripting and load processes lies the power of the Birst connector for Hadoop. Once aggregated, users can search, browse, query, analyze and visualize big data. And for those who still don’t believe this is stellar, the Birst connector can be used to batch-extract data from Apache Hadoop using the Apache HiveQL or pull data in real-time using Birst Live Access (also via Apache HiveQL). The information can then be used to integrate with other data sources including SAP, SalesForce, operational and financial systems.

As the industry continues to explore the possibilities of big data, there are 3 broad ways in which it delivers business value. First, the increasing amounts of transactional data in digital form have details on everything – from human resources to sales and marketing programs to consumer interaction. These can be used to drive more focused products and services. Secondly, the real-time or near real-time characteristic of big data improves responsiveness and assists in faster decision making. Finally, giving business analyst’s powerful analytics on accurate information will become an insightful driver for new products, processes and strategy.

While the next sexy job in 10 years might be that of an Internet statistician and we can all hope to have a few in house, business users today need a simple yet powerful way to navigate through a sea of information. The key is to look beyond slick front-ends and into the data layer to find the real nugget of the big data world.