Earlier this week, I blogged about how enterprise business intelligence (BI) can unlock the power of Hadoop, making this open-source big data solution much more accessible to enterprise business users. With this post, I’ll get into some of the details of the necessary enterprise BI solutions architecture.
Three architectural pillars
Bringing the power of big data to a broader user base with business-style analytics entails three architectural tenets:
- A user-ready data store that delivers analysis at business speed
- A business user interface for interacting with data in business terms
- The ability to spin up new virtual data instances for departmental analysis
Creating an analytic data store on top of Hadoop
To make your big data environment business user-ready, you’ll first need to create an analytic version of the data that sits above your Hadoop data lake. This analytic data store is loaded from the lake with the data most commonly used, yet has the ability to send less-common queries back to Hadoop.
Building this data tier helps you create an abstraction layer that caters both to users’ exploratory analysis, for unplanned scenarios, as well as reporting on common use cases and hypotheses. As the information needs of the user community shift, this user data tier will automatically detect changes and adjust itself accordingly. If an individual analyst veers away from pre-aggregated data, the data tier automatically routes the queries to the raw Hadoop engine. This process is seamless to the user.
Providing a business user interface through a semantic layer
Non-technical users typically lack SQL skills, so they’re unable to query Hadoop directly (via Hive, Impala, or similar). To hide the complexities in big data, and expose its richness to business users in business terms, your second architecture move will be to put a semantic overlay on top of your data.
A semantic layer is a logical representation of data. For example, a semantic layer can contain “revenue.” This data point might have gone through different levels of calculation and transformation, before making it to the semantic layer – all seamless to the business user who simply searches for the word “revenue.”
A semantic layer also enables you to see a federated view of data that has been sourced from multiple databases, applications, or places within your Hadoop data lake. For example, you may want to see your current online purchases by those customers who have bought from you in the past. In this case, historical purchase data is stored in an existing warehouse, current sales data is in a CRM application, and your website clickstream data is in Hadoop.
A semantic layer enables a business user to quickly look for “online purchases by recurring customers in northeast.” Otherwise, IT would likely have to get involved in composing three separate queries, getting the results, and rationalizing and consolidating the records.
Creating virtual instances to empower decentralized teams
A final BI architecture consideration will allow distributed teams to operate independently – extremely important in today’s departmentally driven enterprise environment – while allowing IT to retain control over the data (i.e., governance).
The enterprise BI solution’s architecture should enable different groups (finance, sales, marketing, customer support, etc.) to subscribe to virtual instances of your centralized BI instance. Because this is a virtual (logical) copy of a subset of your centralized data set and not a physical one, any changes to it do not impact the master copy. Instead, virtual instances enable end users to not only use the data as often as they wish but, even more importantly, blend it with their own local data sets, without polluting the centralized BI instance.
In this way, you, the technology owner, control the centralized data, and ensure its governance and semantic rules, while giving business users unencumbered access to corporate data stored in Hadoop – and the ability to blend it with their own data.
The diagram below illustrates the difference between a Hadoop environment accessed through traditional ETL mechanisms, and a self-service Hadoop environment enhanced with an enterprise BI front end.
The preceding is a blog post originally published in CIO on October 8, 2015.