This is the first of a series of posts, which I will use to describe the implementation of a SQL Server 2008 BI solution designed to extract and analyze data captured in a semi-structured data source I discussed in my previous post.
In the course of this exercise, we will:
- Define a top-down SQL Server Analysis Services (SSAS) Unified Dimensional Model (UDM), which will capture the dimensional structure of the source information.
- Generate a relational star schema based on the UDM.
- Develop a SQL Server 2008 Integration Services (SSIS) solution, which will extract the data from the semi-structured spreadsheet and use it to populate the UDM.
- Revisit the SSIS solution and augment it with maintenance procedures such as updating the UDM with new data feeds.
A typical data integration solution loads data into a preexisting data warehouse schema. For the purposes of these posts, however, we will assume that the data warehouse schema does not exist yet - this will make the solution more thorough (by forcing us to design the schema) and will give us a chance to explore a rarely discussed feature of Analysis Services - the “top-down” schema generator. In case you have a preexisting schema and you need to apply the techniques discussed here, SSIS comes packaged with a toolbox full of data flow transformations and destinations that will help you funnel your data into your particular schema.
Developing the UDM
Let’s once again quickly review the contents of the data source.
It contains employment numbers (in thousands) for two categories of employees: non-agricultural and civilian employees. The numbers are broken down by metropolitan statistical area (MSA) and time period. The report also contains unemployment rates broken by MSA and time period, but not by employment category.
Here’s a screenshot of the original report with highlighted fact areas:
Essentially, this report contains two separate measure groups which share some, but not all of the dimensions reflected in the report:
- The Employment measure group (the orange highlight above) contains numeric facts which refer to three dimensions: Employment Category and Time Period located in the header rows above the facts, and MSA located in the header columns to the left of the facts.
- The Unemployment Rate measure group (the green highlight above) which refer to two dimensions: Time Period above and MSA to the left of the facts.
You may wonder why we didn’t highlight PCT CHG (percent change) as part of the facts. The reason is that we do not consider “percent change” a raw fact - it is in fact a calculation derived from the raw facts, the employment rate numbers. The problem with the “percent change” calculation (and many other pre-calculated bits of analysis typically present in data reports) is that they are correct in the current report layout, however, once the data is loaded into a flexible dimensional model, these static pre-calculated data points become wrong as the user starts pivoting the data and viewing it from many different angles, aggregating at different levels, etc.
We will skip PCT CHG while extracting the data from the raw report, but later we will extend our UDM with a flexible Percent Change calculated measure, which will be correct in the context of any pivoted point of view.
Now, let’s build us a UDM.
We will start by creating a new blank Visual Studio 2008 solution which will contain all the projects involved in getting our BI system up and running - we will call this solution Employment BI (at the end of this series, I will post the final solution for download).
We will continue by adding a new SSAS project to the solution - we will call it SSAS:

Now, we will create a new cube from scratch. It will not be based on a pre-existing data source schema - we will rather use this cube as the blueprint for creating a new relational schema using the new SSAS “Schema Generation Wizard”. This is usually called a “top-down” approach to building a data warehouse. Even though I am not sure about the practical implications of this approach (database architects, developers and DBAs typically prefer to have full control over the design of their data warehouse schema), I believe it is a very useful tool for educational and demonstration purposes.
The way to create a new cube from scratch is to right-click on the “Cubes” folder in Solution Explorer, and select “New Cube…”. Skip the “Welcome” page and navigate to the “Select Creation Method” page. Select “Generate tables in the data source”. Keep the “Template” selection to “None” - we want to design the cube manually:

On the next page of the wizard, we will add the two measures: Employment and Unemployment Rate. Since they have different dimensionality (Unemployment Rate does not relate to the Employment Category dimension), we will create two separate measure groups - one measure group per measure, like this:

Employment is an additive fact - it presents the nominal number (in thousands) of people employed in the particular employment category, time frame and metropolitan area. When this measure is aggregated to a higher level across any of the dimensions, it is summed up. Therefore, we will use Sum as the aggregation rule.
On the other hand, Unemployment Rate is a percent value - it is non-additive. It makes sense to average the unemployment rate when rolling up totals and sub-totals. Unfortunately, Microsoft Analysis Services does not provide such an aggregation rule out-of-the box. However, such an aggregation could be accomplished by a combination of a calculated measure and two hidden physical measures - something I will demonstrate later. For now we will use aggregation “Sum” for Unemployment Rate and we will replace this with the correct solution later.
One more thing before we move on to the next page - the original facts in the report are floating point numbers. We will use Double precision number as the data type for both measures.
On the next page, we get to define the dimensions of our UDM:

Note that I unchecked the “Time” dimension from the upper list of template dimensions. This special time dimension is useful in those cases where the dimensional model contains semi-additive measures - i.e. measures which are additive across all dimensions but non-additive across the time dimension. An example of a semi-additive measure is the measure “amount” in a typical store inventory snapshot. It is additive across all dimensions such as store or product, but it is averaged across the time period.
In our Employment dimensional model we have only one special measure, Unemployment Rate, which is averaged across all dimensions, including time. This measure is fully non-additive. This is why, we do not need the special time dimension - we do have a Time Period dimension in the report, but for the purposes of our analysis it will suffice to create it as a regular dimension.
We move on to the next page, where we will define the dimension usage per measure group:

We set up the dimension usage to reflect the model observed in the raw data source - the Time Period and MSA dimensions are shared between both measure groups, but dimension Employee Category is referred to only by the Employment Facts measure group. Unemployment Facts has no direct relation to Employee Category.
On the last page of the wizard, we can review the main features of our cube and give it a name:

We will leave “Generate schema now” unchecked for now - we will make a few more tweaks to the UDM before we generate the schema.
We press “Finish” on this last page and, voila!, the cube and dimensions are created for us:

In my next post I will continue by decorating the dimensions with attributes, which will make for some interesting analysis. Then, with the help of the SSAS Schema Generation Wizard I will create the relational data warehouse schema. This will set the stage for the most interesting part of our mini-BI solution - the data extraction and loading procedure.
Stay tuned for more!
Find this blog exceptional. Hope to see it further when i have more time. If your looking for a job, visit our site.Thanks
http://conceptualismfootage.blogspot.com/2010/03/autographic-tsunami-footage-glossies.html
fun
video
funny
best
great.
http://serialisationsgranddaddy.blogspot.com/ best downloads free free downloads.
i was beginning to assume i would probably be the sole human being whom cared about this, at least currently i acknowledge i’m not loony
i will be sure to go and visit a number several other blogposts soon after i get my morning caffeine in me, it is actually hard to read without my coffee, I was until the wee hours of the morning last evening jamming zynga poker and after having a few brewskies i wound up burning off all my facebook poker chips take care 
Greetings from Michigan! Just saw your blog. Actually read your article, I’ll forward it along!
Have a excellent day!
Hi, at the moment I’m seeking with my iphone four new design ideas and so I’ve identified your web site. Your pattern is really easy but also compare it for a posts. I consider I would use it for my first draft. Thank you for this idea!