ETL with Microsoft SSIS: First steps and common issues
First impression when using Microsoft SQL Server Integration Services (SSIS), the developer’s first impression is the tool provides powerful features in addition is simple to use. Although this is true, there are some common errors that can make us waste our time. The purpose of this blog article is to cover some of these very common issues and save some time when starting to develop a solution with Microsoft SSIS.
As a quick overview, SSIS is a component of Microsoft SQL Server that we can use to perform data integration tasks. Microsoft provides a visual tool for the ETL Development based on Visual Studio called SQL Server Business Intelligence Development Studio.
Components:
Solution Explorer: Contains all the project files, mainly: Packages, Folders and Project configuration managers. The configuration managers are just configuration files for providing connectivity to sources (flat files, databases…). There are two types depending on the ownership, package and project connection managers.
Properties tab: Displays properties of the selected item.
Connection managers: Here we have visibility for all the connection managers used by the package (project and package connection managers).
When taking the first steps with the tool, there are some common errors that can make us waste a lot of time and that can become a challenge once we deploy the solution in a server. We can summarize these problems into the following points:
Configuration of 32/64 environment
Protection Levels
Network drives
Configuring for 32/64 environment:
The machine may not have 64 bit for Microsoft Excel, Oracle or ODBC data provider. In order to configure the project to be executed in 32-bit mode, we have to configure it through the project properties. If we go to the Solution Explorer view, click on properties->debugging, we can switch the 64-bit mode:
Next step after your development is done is probably to deploy the ETL in a server. The project is deployed in an Integration Services Catalog in an instance of SQL-Server. The SQL Agent will be then responsible for executing and scheduling the packages. All this process is detailed described in the following link:
http://technet.microsoft.com/en-us/library/hh231102.aspx
Once deployed in the server, we can also test whether this is going to be executed in 32 or 64 bit (in the weird case the server is using a 32-bitruntime). We can switch the 32 or 64 mode if we click in the job->properties->steps->step property-> 32-bit runtime.
Protection levels
An SSIS project contains confidential data for many purposes. Most probably contains connection manager files in order to be able to connect to different data sources. Within theses files, SSIS stores the credentials for the data source. These credentials can be configured to be encrypted, depending on the protection level. SSIS provides the following options for protection level:
DoNotUseSensitive: Does not store any info
EncryptAllWithPassword / Encrypt Sensitive with password
EncryptAllWithUserKey / EncryptSensitiveWithUserKey
Server Storage
More details about the protection levels can be found here: http://blogs.prodata.ie/post/SSIS-Package-Protection-Levels-and-their-Usage.aspx
Usually there are not many problems when developing in your machine. Many problems come up once the package is deployed and the SQL Agent tries to read out data from them; if you debug this process you will get a connection attempt error. Switching between these options can help us when troubleshooting.
However, most common problems come from one of those reasons:
Protection level is different at project level and package level.
Protection level is configured as encrypt with user key and SQL Agent user is different from the project owner.
We can change the package protection level by going to the control flow and in the Properties tab (down right corner in the image below).
Whereas the protection level at project level can be found by right click Project-> Common Properties-> Security -> Protection Level.
Following the same procedure, we can see the username who created the package/project. If encryption mode is enabled (by default it is) then we need to make sure that the project/packages ownership matches the username in the SQL-Server Agent Service. We can access this information by clicking Control PanelSystem and SecurityAdministrative ToolsServices and then properties of the SQL Agent service->Log On
Network drives
A common issue when loading data from flat files appears when these files are located in a shared drive. SQL Agent will not recognize network paths if they are not in UNC format (read more about the UNC in http://compnetworking.about.com/od/windowsnetworking/g/unc-name.htm ), whereas SSIS works fine with them. It may be confusing since when the developer deploys the package in the server everything seems to be fine. But at the time the job executes the SQL agent will report: Path not found error.
Since we can not specify the UNC path with the SSIS GUI, we need to write it within the connection manager XML file and manually introduce the UNC path in the DTS Connection String:
<?xml version="1.0"?>
<DTS:ConnectionManager xmlns:DTS="www.microsoft.com/SqlServer/Dts"
DTS:ObjectName="Web Portal SOURCE"
DTS:DTSID=""
DTS:CreationName="FILE">
<DTS:ObjectData>
<DTS:ConnectionManager
DTS:ConnectionString="dc01fpnv04public$My Number My IdentityExclusionsWeb-portal dataWeb portal weekly Submission Report new1.csv" />
</DTS:ObjectData>
</DTS:ConnectionManager>
We can access this file by right click in the connection manager -> View Code: