In this post we will finish the Employment BI project by creating an SSIS project to populate the dimensional model with the data captured in the Excel spreadsheet.
First we will start by slightly modifying the relational schema generated by the automatic schema generator.
The measure fields in the fact table are generated as “Real” which is a 4-byte floating point number. We will need to switch this to “Float” - the 8-byte floating point number.
Go to SQL Server Management Studio, navigate to database “Employment BI”, select table “Employment_Facts”, right-click, select “Design” and change the type of field “Employment” to “float”. Apply the same change to field “Unemployment_Rate” in table “Unemployment_Facts”.
Save the changes and switch back to the Employment BI solution in Business Intelligence Development Studio.
Now we will create a new SSIS project which will load the data into the relational database.
Select File/New/Project, select “Integration Services Project”, name it “SSIS”, select “Add to solution” and click “OK”.
- Create a new data flow task in the control flow of the SSIS project.
- Assuming that you have installed DataDefractor for SSIS, select the DataDefractor source from the toolbox and drop it on the canvas:

-
Double-click on the DataDefractor source component to invoke its wizard driven interface and click “New” to create a new DataDefractor connection manager to connect to the Excel file. Select “DataDefractor.ExcelFile from the list of connection managers and click “Add…”:
- Select employment_report.xls and click Ok:
- Select “Table-1″, move it to the list on the right by clicking the right-arrow “Add-table” button and click “Next”:
- On this step we will cut off the top header of the spreadsheet. We are interested in the area starting with the first row which has text in column C. Select “Vertical Sub-Pages”, Set property “MatchSingleSubpage” to “True” and type in .+ as a regular expression for column C:

Click “Next” to continue to the next page. - On this step select row 5 as the first row to contain facts and click “Next”:
- Select columns C through H as the columns containing facts and add them as a “Flexible Data Range”:

Click “Next” - We will spend some more time on this page to define the dimensionality of our data source. We will be processing the data of the Employment Facts first. As far as this measure group is concerned, there is only one measure - Employment Rate. So, instead of adding a list of measures from the data source, we simply type “Employment Rate” as the name of the measures:
- Next we will handle the MSA dimension. Select the “New Dimension” item and rename it to MSA. Then select column “B” and move it to the “Sources” list by clicking the left-arrow:
- Now we will add MSA’s attributes “Region”, “Sub Region” and “State”. Start by adding a new attribute called “Region”. Double-click column A to select it as the source of items for the attributes, select the “Column:A” source in the “Sources” list and press the “Properties” button. In the source properties dialog box select “Custom regular expression” and type in ^(\S*)
This regex will match and extract only items which begin with a non-whitespace character.
Select “Fill blank cells with carryover values” and click OK:

- Add the “Sub Region” and “State Attributes” following the same steps, but type in the following regular expressions for those attributes:
For “Sub Region”: ^\s(\S.*)
For “State”: ^\s{2}(\S.*)
These regular expressions extract respectively values with a single whitespace before a non-space character and a two-white-spacer before a non-whitespace character. - Create a Time Period dimension, select Row 3 as the source of member names, click “Properties” and use the the following regex to match only time period items: ^(\w{3}-\d{4})

- Then create the “Year” and “Month” attributes by using the following regexes to extract the appropriate data:
For “Year” : ^\w{3}-(\d{4})
For “Month” : ^(\w{3})-\d{4} - We left the easy one for dessert - create an “Employment Category” dimension and select row 1 as the source of member names. No need to perform extractions:

We’re done with the dimensional model - move on by pressing “Next”. - Leave the default selections on page 6 and press Finish:

- Add four SQL Server Destination components to the data flow task and name them “Facts”, “MSA”, “Time Period” and “Employment Category”. Bind the DataDefractor outputs to their respective outputs then select each destination, map it to the appropriate relational table in the Employment BI database and map the columns from the DataDefractor output to the appropriate table fields. For example, the ID columns must be mapped to the PK_ and FK_ columns in the destination components:

Note the warning signs in the destination - they are caused by the fact that DataDefractor emits 1000 varchar data for the dimension names and attributes, while the schema generation wizard creates shorter fields. You can resolve this by modifying the size of the DataDefractor output columns in the “SSIS Advanced Editor”. - Run the SSIS project to populate the relational model, then deploy the SSAS cube and process it.
After this, you can browse the Employment part of the cube. Here’s a snapshot of the cube browser:
At this point we add a new SSIS data flow task and process the Unemployment data using another DataDefractor data flow task.
We end up with two measure groups with one measure per group.
There is only one thing left to do - make the Unemployment Rate measure aggregate by averaging the data points at lower levels.
The way we do this is by adding two new measures based on the same data column in the fact table - lets’ call them URC and URS. Then we make the URC measure aggregate using the “Count” aggregation rule and the URS measure use the “Sum” aggregation.
We hide the two measures, then create a calculated measure called “Unemployment Rate”. The MDX formula for this new measure is:
iif([Measures].[URC] = 0, Null, [Measures].[URS] / [Measures].[URC])
This pretty much wraps it up. We now have an analytical model containing employment and unemployment data created off of a seemingly unwieldy Excel source.











