Wednesday, April 15, 2009

Enumerating reporting services metadata with C# and SQL Server 2005

Who should read this article Report writers, database developers, data modelers, anyone familiar with SQL Server 2005 & 2008 Reporting Services (familiarity with Report Models, Report Designer and Report Builder) and lastly with some knowledge of programming with C# and XML.

Introduction

Why should report model metadata be stored (in a metadata repository)? This could be a question you are asking yourself or you already know the answer and are looking for additional silver bullets to help explain it to your manager or project sponsor.
One of the more compelling reasons for capturing metadata about entities and attributes (and to an extent – roles & relationships) in reporting services report models in any organization is ‘documentation’. May that be client documentation or in-house documentation to serve as a reference guide for business users, or as a technical reference for ETL developers, or for that new developer who just joined you organization and wanted to know the definition or description of a business term, or more importantly as a quick reference for report writers (who may not be familiar with the field names used in the report model). Whichever the use; the bottom line lies in the term ‘Metadata Management’ (not to be confused with Master Data Management – an entirely different topic). The objective of this article is to extract from the smdl file using C#, meaningful and necessary metadata and store it in a SQL Server 2005 database.

Report Authors
I separated the report writers based on method of developing reports:
1. Using Report Designer (from Visual Studio or BIDS) for canned\ known reports - Technical Report writers
Technical Report writers are usually report developers who are familiar with the data model and may\may not know much about business processes.
2. Using Report Builder (spawning from the browser) for ad-hoc reports - Ad-hoc report writers.
Ad-hoc report writers are usually business users (end users working on the application or management folks).

This blog is more inclined towards the ad-hoc report writer community.

If you are interested in knowing more about metadata and metadata driven architectures you will have to wait until my next article or just google ‘Metadata driven architecture’ and ‘Microsoft Project Real’. Microsoft’s Project Real has a good implementation of metadata (open source) for enumerating and storing metadata in the database for SSIS, SSAS and for database structures, but lacks a module for enumerating Reporting Services models.

Report Models
A Report model is a slice of information of the database structure (views and\or tables) that relates to a specific business scope. For example, in an automobile dealership data warehouse one of the report models would be a Sales model which links back to an inventory table and a date time dimension. Report models are created with a ‘data source view’ reference. Think of a ‘Data Source View’ as a visual representation of a database diagram with tables or views having keys and relationships. Report models are created with the end user perspective as they use report models as their source to build ad-hoc reports.

Note: Report models have a file extension of smdl. The content of these ‘smdl’ files is a set of XML nodes describing the model (They contain entities, attributes, rules, roles, relationships, etc).


Creating the metadata repository for Report Models
The report model repository can be created as a separate schema with a set of tables or be a part of your own custom metadata solution; or feel free to use the scripts and schema I provide in this article. Once your schema is created; two stored procedures need to be created for inserting into the Entity table and the Attribute table. You could create simple CRUD type procedures that do a one row insert at a time, but a more efficient solution would be to create OPENXML stored procedures to insert all the data in one call. In my example I used the latter. By using the OPENXML stored procedure that I reduced the number of database inserts calls from hundreds to one. I have provided the code for the OPENXML stored procedures at the end of the article.








Figure 1: Report Model schema.

Now that your schema and stored procedures (“AddAttribute” and “AddEntity”) are created (and tested), you can begin with the application code.
At this point - Make a copy of the Report Model file from which metadata needs to be extracted from.

Extracting data from the Report Model
(.smdl): There are various ways of retrieving this report model data from the smdl file; like using XSLT and XPATH in C# to get to the desired nodes; or store XML chunks of the model file as XML data type in SQL Server and retrieve it based on XQUERY. Any way can be adopted; i will demonstrate one 'simple' way of doing it. You could apply a similar extraction and storage technique for report files (rdl files).


For my solution i used a C# windows application solution in Visual Studio. You could even create a console app instead of windows. I created a windows app so that i could have the feature of browsing for .smdl files without having to type it in.
Basically, the extract from smdl (XML) files was done by using XMLTextReader class provided in the System.XML namespace. I parsed the entire XML file for tags (XMLNodeType), in a specific format hierarchically based on the XML node structure (Entity --> Attribute --> Attribute Properties).

Each line in the XML supports these tags to identify the item associated with it. Once a type (XMLNodeType) name "Attribute" is found, i look for additional properties of the nodes with tags "Name", "DataType", "Nullable", "Format", "SortDirection", "ColumnName", "Description" etc. and start building my XML string structure.
A few functionalities provided by the XMLNodeType (and how they were used):
1. XmlNodeType.Element: Used for element level parsing.

2. XmlNodeType.Text: Text content of a node to retrieve description of a column, etc.
3. XmlNodeType.EndElement: To find the exit point of an element (entity, attribute, or anything else)


An important thing to note here is that i am NOT retrieving values based on keys and inserting to the metadata store. I am actually building a well formatted valid XML string that will contain the key-value mapping structure so that it can be passed to the SaveAttribute and SaveEntity methods (OPENXML stored procedures).
Once this XML string is created it is passed to the methods SaveAttribute (executes stored proc AddAttribute) and SaveEntity (executes stored proc AddEntity). Both these methods execute with one call each to the database. The relationship between an attribute and its parent entity is also saved to the database table based on GUIDs. Following is a sample of the data from the metadata repository.






Figure 2: Sample data view.

Figure 3: Sample OPENXML stored procedure content (where @AttributeXML is the XML string created by the C# code):























Summary

The idea behind this article is to give the community another way of retrieving metadata from report models. This would be a good addition to the existing BI metadata repositories that companies currently support. One really has to understand and harness the power of metadata. There are people who do not support the idea behind metadata and its use. Think of a simple scenario (a real world project) of metadata use - A project for migrating around 1000 tables with millions of rows transactional data from AS400 i-series to SQL Server 2008 using SSIS, so, were you going to create source, staging and warehouse structures manually? I don't think so, Ideally you would retrieve all the metadata from the system catalog of AS400 and write dynamic SQL scripts to create all the structures.
Long story short - utilize metadata to the fullest extent, maintain a store of the metadata for business, technical and ETL process meta data and incorporate it during the earlier phases of your BI projects. Hope this was helpful. Do leave feedback.

Download code: Download Code here

Thanks,

Vishal Gamji

MCITP - Database Developer

admin@expertsqltraining.com

Saturday, March 14, 2009

SSIS Execute SQL Task failure

Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.


Every so often ETL developers working with the Execute SQL Task in SSIS encounter the error above. I have seen a few developers who try to 'quick-fix' this by changing type mappings without exactly knowing the differences between them and re-running the ETL task only to get to the next red light. The best example of this would be the type mismatch of Long and Numeric types when using the native OLEDB provider. I wouldn't be surprised if there are developers out there looking up the precision error on the types. I will not delve too deep into all the providers, but will provide a reference to a very good resource that Microsoft has provided, and which I think should be on every ETL developer's desk.


In this (first) blog I will point out the most common places in the Execute SQL Task where you should troubleshoot the error (above). Will also go over some SQL provider mumbo jumbo at a high level.

Troubleshooting steps (order need not be followed):


1. Verify that for the stored procedure\SQL statement used, the parameter counts and direction (input, output or return value) is set appropriately.

If using a OLEDB or ODBC provider check if your number of "?" (parameters), and if using ADO or ADO.NET provider, check the number of @ are equal to the number of parameters being mapped to on the parameter mapping screen.


2. Verify your data types.

Verify that the data types that exist\declared in the stored procedure or SQL are mapped to a compatible data type in the parameter mapping. For example: when using the OLEDB provider, map the integer (int) parameters to Long data type.


Providers: Which data provider should be used when choosing between (Managed) ADO.NET or (Native) OLEDB? - There are n number of websites and blogs (yes, this is also another one of them) which will tell you why you should prefer the use of OLEDB provider over ADO.NET, as ADO.NET is a managed provider which adds another layer code to connect to the data source, thus making ADO.NET slower than OLEDB. If you really want to find the exact runtime exection difference I would suggest running profiler with the two connection managers and then comparing them.


Helpful links for data type mapping and provider info:

- http://msdn.microsoft.com/en-us/library/aa198346(SQL.80).aspx

- http://msdn.microsoft.com/en-us/library/aa263420.aspx

- http://www.carlprothman.net/Default.aspx?tabid=97#10


3. Verify the ResultSet property:

Verify that the ResultSet property is set appropriately for the SQL command being executed. For example: if the SQL command\stored procedure returns a full result set, set it to the 'Object' data type, so that you can retrieve from the object based on the index position of the result (0 = first column, 1 = second column and so on) as members of the 'Object' type are ordinal. Also make sure that all columns in the sql result set have column names.


4. Do you trust your SQL Command\Stored procedure, there is a very good possibilty if it being incorrect:

The execute SQL task can do nothing about bad\erroneous SQL code given to it for execution but fail it. As a routine, test the SQL for all possible exceptions, giving more importance to situations where a value or a list of values is expected and nothing is returned.


For example: the sub query inside the stored procedure may be returning multiple values, NULL\invalid or no values at all. Watch out for the tricky no value situation.

Example: In this example, our objective is to retrieve object_id from sys.objects where the name meets a certain condition. The 'ResultSet' property of the execute sql task is being set to "Single Row" (as our objective is to retrieve a scalar value).


[Note: Each Case builds on the previous cases.]


Case 1: Simple Select to assign to a variable


SELECT object_id FROM sys.objects WHERE name = 'sysrowsets';


If the result set of the execute sql task is set to "Single Row", it will work only when the WHERE condition is satisfied. If it is not satisfied, it does not return NULL, instead returns nothing, i.e. an empty result set (see screenshot below) - which will cause the task to fail giving us the same error "ResultSet property not set correctly". [A developer must keep this exception in mind when testing the execute sql task]. This query will also raise an exception when multiple values are returned by the query and our task is to assign a single value to a variable from the output of our query.


Case 2: Handle multiple values.


DECLARE @Object_ID int;


SET @Object_ID = (SELECT TOP 1 object_id FROM sys.objects WHERE name = 'sysrowsets');


In this case, we handled the multiple values problem from Case1. Now, if the result set returns anything other than a NULL value or a Non-Empty value, we are close to living in a perfect world, but, as it turns out (someone told me) that we don't. If\When a NULL or an empty result is returned by the above query an exception will be thrown. Yes; it is the same exception we are discussing in this article.


I am more inclined on the 'When' it will happen than the 'if' possibility because at-least in the scenarios I have worked with, it was bound to happen, then again, your case may be different.


Case 3: Split the simple select into 'simpler' select.


DECLARE @Object_ID int;


SET @Object_ID = (SELECT TOP 1 object_id FROM sys.objects WHERE name = 'sysrowsets_Bogus');


//Comments-The _Bogus makes the query to return an empty result set.


SELECT ISNULL(@Object_ID, 0) AS 'Object_ID';


The first Select statement here takes care of multiple values; the second Select statement handles any NULL values and sets it to 0 and also solves the 'empty' result set problem when the @Object_ID is selected separately in a different query (as the 2nd query).


Summary

To summarize, double check your SQL queries\Stored procedures before you execute them in the ExecuteSQLTask. Don't jump to hasty conclusions about the error message description Integration Services provides you. For instance if you get any 'Arithabort error' [Ex: Update failed because the following SET options have incorrect settings: 'ARITHABORT'] on a stored procedure that performs an insert\update\delete - check the database compatibility level (sp_dbcmptlevel) which maybe a SQL Server 2000 database with setting 80 and needs to be upgraded to compatibility level 90 of SQL Server 2005, or verify if any columns are computed columns performing calculations which may be throwing arithabort errors. Check for any external factors that are affecting your ExecuteSQLTask variables, for instance package configurations or dynamic expressions. Performing these checks will surely save you some time in the long run and also make your packages robust.


Hope this article\blog was helpful. Do leave a feedback (as this is my first blog). My next blog is about enumerating Reporting Services metadata from models into a database repository. Hope to get it out soon.


Thanks,

Vishal Gamji

MCITP - Database Developer

admin@expertsqltraining.com