Warning: This blog is more technical than our usual blog entries, but it serves to explain the value (and power) of Birst.
Ok, now that we understand that calculations in Birst occur at two different levels, what does Birst expression flavor #1 for physical queries and pre-aggregation expressions buy us?
When you type something into the custom attribute or measure panel, that’s what you are creating. Because the columns you are referring to are PHYSICAL columns in the raw tables NOT logical constructs that may exist in many physical tables. There is no notion of a dimension or a measure in this syntax. A raw table is simply a bunch of columns. And you can construct a definition of a measure or attribute with these raw columns.
But the range of things a database can do is limited. The database has no notion of joins, dimensions, hierarchies, etc. Birst’s logical layer and OLAP engine has all of that. So when we want to do things like positional calculations, ranks, lookups, and so forth, those calculations must occur in the logical layer of Birst POST-AGGREGATION. In other words, after the database does 99 percent of the heavy lifting summarizing your billion records into a 100 row result set, then logical expressions can manipulate those 100 rows to make even more complex calculations.
In the world of logical queries, you now have concepts like dimension and measure and position and hierarchy, as well as the Birst Query Language, which even provides programmatic structures to do advanced manipulation of these constructs. In this world, our engine does ALL of the calculations and we have 100 percent control over any manipulation of the data. The only downside is that it’s data post-aggregation—after the database has done its work.
Logical queries that return result sets to the middle tier are restricted in their size because moving data out of the DB in volume gets slow past a certain point. That’s why we use bulk loading techniques when we load data into a DB. In fact, we’ve set an upper limit to 100,000 records coming back from the DB. There’s no magic to that number by the way; it’s somewhat arbitrary.
So, let’s take a simple example of a custom measure [Sales]—defined as “quantity*price”. Let’s also consider another base metric “quantity” and time. If I create the logical query below to calculate average unit price:
Select [Time.Year],[Sales]/[Quantity] As ‘Average Unit Price’ from [All]
You can see both types of calculations: 1) one that is being executed in the DB, and 2) one that is executed in BIRST. That query gets transformed into physical SQL that looks something like:
Select DW_DM_TIME_DAY.Year$, SUM(FACT.quantity*FACT.price),SUM(quantity) from FACT inner join DW_DM_TIME_DAY on FACT.DayID=DW_DM_TIME_DAY.DayID
You can see that the calculation for “sales” is compiled directly into SQL. So the syntax that supports this is restricted only to things that can be sent to the DB.
The result set comes back and Birst takes each row and divides the “sales” value by the “quantity” value to get an average unit price by year. Our second syntax flavor for logical expressions at the logical layer can therefore support lots more functions because we can write our own functions in our language and execute them.
Ok, in our next blog post (Back on the Path) we will address where ETL scripts fit in.