“Employment Analysis” BI Solution - Part I (Building the UDM)

This is the first of a series of posts, which I will use to describe the implementation of a SQL Server 2005 BI solution designed to extract and analyze data captured in a semi-structured data source I discussed in my previous post.

In the course of this exercise, we will:

  • Define a top-down SQL Server Analysis Services (SSAS) Unified Dimensional Model (UDM), which will capture the dimensional structure of the source information.
  • Generate a relational star schema based on the UDM.
  • Develop a SQL Server 2005 Integration Services (SSIS) solution, which will extract the data from the semi-structured spreadsheet and use it to populate the UDM.
  • Revisit the SSIS solution and augment it with maintenance procedures such as updating the UDM with new data feeds.

A typical data integration solution loads data into a pre-existing data warehouse schema. For the purposes of these posts, however, we will assume that the data warehouse schema does not exist yet - this will make the solution more thorough (by forcing us to design the schema) and will give us a chance to explore the new “top-down” schema generation feature of SSAS. In case you have a pre-existing schema and you need to apply the techniques discussed here, SSIS comes packaged with a toolbox full of data flow transformations and destinations that will help you funnel your data into your particular schema.

Developing the UDM

Let’s once again quickly review the contents of the data source.

It contains employment numbers (in thousands) for two categories of employees: non-agricultural and civilian employees. The numbers are broken down by metropolitan statistical area (MSA) and time period. The report also contains unemployment rates broken by MSA and time period, but not by employment category.

Here’s a screen-shot of the original report with highlighted fact areas:

Employment Report Measure Groups

Essentially, this report contains two separate measure groups which share some, but not all of the dimensions reflected in the report:

  • The Employment measure group (the orange highlight above) refers to three dimensions: Employment Category and Time Period located in the header rows above the facts, and MSA located in the header columns to the left of the facts.
  • The Unemployment Rate measure group (the green highlight above) refers to two dimensions: Time Period above and MSA to the left of the facts.

You may wonder why did we exclude PCT CHG (percent change) from the facts. The reason is that we do not consider this a fact - it is a calculation derived from the real raw facts. The problem with this calculation (and many other pre-calculated bits of analysis typically present in data reports) is that they are correct in the current report layout, however, once the data is loaded into a flexible dimensional model, these static pre-calculated data points become wrong as the user starts pivoting the data and viewing it from many different angles, aggregating at different levels, etc.
We will skip PCT CHG while extracting the data from the raw report, but later we will augment our UDM with a flexible Percent Change calculation, which will be correct in the context of any pivoted point of view.

Now, let’s build us a UDM.

We will start by creating a new blank Visual Studio 2005 solution, which will contain all the projects involved in getting our BI system up and running - we will call this solution Employment BI (at the end of this series, I will post the final solution for download).

We will continue by adding a new SSAS project to the solution - we will call it SSAS:
New SSAS Project

Now, we will create a new cube from scratch. It will not be based on a pre-existing data source schema - we will rather use this cube as the blueprint for creating a new relational schema using the new SSAS “Schema Generation Wizard”. This is usually called a “top-down” approach to building a data warehouse. Even though I am not sure about the practical implications of this approach (database architects, developers and DBAs typically prefer to have full control over the design of their data warehouse schema), I believe it is a very useful tool for educational and demonstration purposes.

The way to create a new cube from scratch is to right-click on the “Cubes” folder in Solution Explorer, and select “New Cube…”. Skip the “Welcome” page and navigate to the “Select Build Method” page. Select “Build the cube without using a data source” and uncheck “Use a cube template” - we want to define the entire structure of our new cube ourselves:

New Cube

On the next page of the wizard, we will add the two measures: Employment and Unemployment Rate. Since they have different dimensionality (Unemployment Rate does not relate to the Employment Category dimension), we will create two separate measure groups - one measure group per measure, like this:

Measure groups

Employment is an additive fact - it presents the nominal number (in thousands) of people employed in the particular employment category, time frame and metropolitan area. Therefore, we will use Sum as the aggregation rule.

On the other hand, Unemployment Rate is a percent value - it is non-additive. It makes sense to average the unemployment rate when rolling up totals and sub-totals. Unfortunately, Microsoft Analysis Services does not provide such an aggregation rule out-of-the box. However, such an aggregation could be accomplished by a combination of a calculated measure and two hidden physical measures - something I will demonstrate later. For now we will use aggregation “Sum” for Unemployment Rate and we will replace this with the correct solution later.

One more thing before we move on to the next page - the original facts in the report are floating point numbers. We will use Double precision number as the data type for both measures.

On the next page, we get to define the dimensions of our UDM:

SSAS Dimensions

Note that I unchecked the “Time” dimension from the upper list of template dimensions. This special time dimension is useful in those cases where the dimensional model contains semi-additive measures - i.e. measures which are additive across all dimensions but non-additive across the time dimension. An example of a semi-additive measure is the measure “amount” in a typical store inventory snapshot. It is additive across all dimensions like store and product, but it is averaged across the time period.
In our Employment dimensional model we have only one special measure, Unemployment Rate, which is averaged across all dimensions, including time. This measure is fully non-additive. This is why, we do not need the special time dimension - we do have a Time Period dimension in the report, but for the purposes of our analysis it will suffice to create is as a regular dimension.

We move on to the next page, where we will define the dimension usage per measure group:

Dimension Usage

We set up the dimension usage to reflect the model observed in the raw data source - the Time Period and MSA dimensions are shared between both measure groups, but dimension Employee Category is referred to only by the Employment Facts measure group. Unemployment Facts has no direct relation to Employee Category.

On the last page of the wizard, we can review the main features of our cube and give it a name:

Create Cube

We will leave “Generate schema now” unchecked for now - we will make a few more tweaks to the UDM before we generate the schema.

We press “Finish” on this last page and, voila!, the cube and dimensions are created for us:

Created Cube

In my next post I will continue by decorating the dimensions with attributes, which will make for some interesting analysis. Then, with the help of the SSAS Schema Generation Wizard I will create the relational data warehouse schema. This will set the stage for the most interesting part of our mini-BI solution - the data extraction and loading procedure.

Stay tuned for more!

10 Responses to ““Employment Analysis” BI Solution - Part I (Building the UDM)”


  1. 1 greg

    This type of article is hard to find. I am looking at analysis services as a possible solution for us & this was helpful thanks!

  2. 2 Vassil

    Hi Greg,

    Thanks. I realize that it’s been 4 months since I started this and as life goes on other things took priority, however, I plan to pick up from where I left off and finish this thread.

    I am glad this was helpful to you.

    Best regards,
    Vassil

  3. 3 Carlos Nascimento

    Hi Vassil,

    This is one the best BI tutorials that I’ve seen (simple, concise and complete). Very good job!
    Any chance that you will post the next step(s) any time soon?
    Please let me know, I really appreciate it.

    Carlos Nascimento
    New York, NY

  4. 4 Serenaic

    thanks much, man

  5. 5 Javid Khan

    Great Job on explaining AS. When can I see next continue artical on AS.
    Thanks again. Keep it up and get blessing.

    Javid

  6. 6 hentai listing danny

    listing phantom hentai hentai listing phantom

  7. 7 Economy shower stalls

    shower stalls Economy Economy shower stalls

  8. 8 Dryclerense

    job viagra joining

  9. 9 osobo

    Новый способ давления на кандидата на пост Главы г. Химки

    Новый способ “наказать” тех, кто посмел участвовать в выборной кампании не на стороне действующей власти изобрели правоохранительные органы г.о. Химки.
    Руководствуясь не нормой закона, а чьей-то “волей″ сотрудники милиции решили “проверить” все фирмы, внесшие денежные средства в избирательный фонд неудобных кандидатов.
    Начались “проверки” с телефонных звонков - где директор, сколько человек работает на фирме. После чего последовали “письма счастья” с просьбой предоставить всю бухгалтерскую документацию, учредительные документы фирмы, и даже, план экспликации БТИ.
    Такие запросы химкинским фирмам рассылает 1 отдел Оперативно-розыскной части № 9 Управления по налоговым преступлениям ГУВД Московской области за подписью начальника подполковника милиции Д.В. Языкова.
    И всё это в то время, когда Президент дал прямое указание правоохранительным органам о прекращении всех незаконных проверок малого и среднего бизнеса. С это целью внесены изменения в Федеральный закон “О милиции” - из статьи 11 этого закона исключены пункты 25 и 35, на основании которых ранее правоохранительные органы имели право проверять финансово-хозяйственную деятельность предприятий.
    Видно, об изменениях действующего законодательства местные правоохранительные органы не уведомлены. И не смотрят телепередачи с выступлениями Президента.
    Может быть, эта публикация подвигнет их к исполнению указаний Президента, а также к изучению и соблюдению действующего законодательства

  10. 10 etheryprope

    Hello forum members

    I just became a member of this forum

    Great work forum crew!

    Yesterday I read that there is a treatment for diabetes on www.healthcaredaily.org
    Can diabetes seriously be cured? The source looks like a reliable healthcare news website

    Could you someone tell me if this healthcare information is for real?

    Thanks

    etheryprope

Leave a Reply