In this post I’m going to a basic lesson in SQL Server Integration Services, how to create a project, we will be focusing on the lay out of an SSIS project. The Control flow, Data Flow, we will be looking into tasks, source and destination adapters.
What is SSIS?
In simple terms SQL Server Integration Services allows you to transform data, take data from one place transform it and save it to another place. You can automate the packages you create by scheduling a job in SQL Server Agent. Have you ever tried to import an XML file into an SQL Server database? Do you need to copy data from one data base to another each night? If so SQL Server Integration Services is the tool for you.
The actual description of the program from Microsoft is “Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions. You use Integration Services to solve complex business problems by copying or downloading files, sending e-mail messages in response to events, updating data warehouses, cleaning and mining data, and managing SQL Server objects and data. The packages can work alone or in concert with other packages to address complex business needs. Integration Services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations.” *
From this point on I will be referring to SQL Server Integration Services as SSIS.
Requirements
SSIS is only available in the “Standard” and “Enterprise” editions.
Creating a project
There are two ways of creating a SSIS project depending on the version of SQL server you are running. If you are running SQL Server 2008 or lower you should have a program called Business Intelligence Development Studio (BIDS). If you are running a newer version of SQL Server you can create the project directly in Microsoft Visual Studio.
Creating a project in BIDS
Open BIDS and create a new project. You should see the following window. Make sure that Business Intelligence projects is selected on the left and Integration services Project is selected on the right. Select a name for your project I will be using “TestProject1” in all of these examples.
Create a project with visual studio
Open visual studio click create new project. On the left select fold out the Business Intelligence section of the menu. Then select integration services. Then on the right select Integration services project. Give a name for your project and Press OK.
Blank Project
From this point on things are almost the same if you are using BIDS or Visual Studio. In this Lesson we aren’t actually going to do anything with the project I’m just going to try and explain what each part of this blank project is for. In order to understand the later lessons you will need a road map.
SSIS Toolbox
As you can see the SSIS Toolbox is slightly different in Visual Studio then it is in BIDS. In Visual studio they have been nice and split things up for you. You have Favorites (the tasks you use most often), Common, Containers (used for looping and grouping tasks together) and you have Other tasks (most of which are tasks only DBA’s would be interested in). BIDS has the same tasks there just not split up.
What is a task?
A task is something you want to do. Do you want to update a table that’s an Execute SQL Task, do you want to ftp a file up or down from a server that’s a FTP Task. You want to email a report that’s the Send Mail Task. The most powerful task you will use is the Data Flow Task, this one will let you get data from one place (source adapter) transform it and insert it into another destination (destination adapter).
Design window: Package.dtsx [Design]
The design window has 5 tabs. I’m only going to tell you about the first 3 we can look into the other two in a different lesson.
Control Flow
The control flow contains one or more tasks or containers that execute when the package runs. We can control the order that each task or containers is executed by linking them together. Let me show you how.
In your toolbox find the Execute SQL Task. Drag it onto the control flow canvas. It will have a red X on the right hand side, its telling you that its not set up right. That’s because you haven’t told it which database to connect to or what SQL to execute. We aren’t going to worry about that right now I just want to show you how to get tasks onto the canvas and show you the Data flow window.
In your toolbox find a Data Flow Task. Drag it onto your control flow canvas. There is no red X this time. That’s because a data flow task by it self doesn’t do anything so it doesn’t have any settings.
Select your Execute SQL task. Notice there is a green arrow under it. Select the green arrow and drag it over the Data Flow task. Your control flow canvas should now look something like the picture below.
See hole purpose of the control flow tab is to control the flow of your tasks. You can have several tasks and control which one runs first by using the arrow. If you didn’t connect them when you execute the package they would both start to run at the same time.
If you click on your Execute SQL task you will get another Green arrow, This is so that one task can cause several to run, or you can change it to error. Right click the Green line between your Execute SQL Task and your Data Flow Task.
Right now its Set to Success. This means that if the Execute SQL Task succeeds then the Data flow Task will run. If you set it to Failure the Data flow Task will only run if the Execute SQL task Fails. This is quite handy if you have a task you need to know when it fails you can set it up to notify you in case of failure. But if it doesn’t fail you can have a second one that goes on and Does what needs doing.
I added a Send Mail Task to my canvas, and linked it to Execute SQL Task but i changed it to fail. Now if Execute SQL task fails i will get an email. But if it Succeeds then it will go on to run the Data Flow Task. This is grate when you have Jobs that you want to run every night you don’t want to have to check yourself every morning to be sure that it ran.
Data Flow
The data flow consists of sources that extract data from one place and destinations used to load the data into another place, there are also transformers used to modify that data on its way though the data pipeline. It is best to have a separate data flow for each operation you will be doing. Let me show you how this works.
Double Click on your Data Flow Task on the Control Flow Canvas. You are now taken to the Data Flow Tab. You can always go back again to the control flow by selecting the Control flow tab.
Note: If you have more then one Data flow Task on your control flow selecting the Data flow tab from the Control Flow tab gets confusing. You wont always get the one you want so its best to just double click the Data Flow Task you want.
Data flow Toolbox
The data flow toolbox is different then the tool box you had on the control flow. Its also organised differently in BIDS vs Visual Studio. But the controls are the same, you have Sources , destinations, and transformers.
Sources are where your data is coming from, Destination is where your data is going, and transformers let you change the data once you have it.
Here is a very simple example. I’m taking data from a Flat File (Flat File Source) , Exporting column (Export Column), and writing it to a database (OLE DB Destination). Notice again how we have the arrows linking them to each other. The red X’s again are because there I haven’t actually set up the Source telling it where to get the file or the destination telling it what db to write to. I’m not going to show you how to make this work in this lesson its already getting to long.
Parameters
Parameters are basically variables that you can set up. You could set the package to find the current date store it in a parameter and use it though out your package.
Solution Explorer
What you see in your solution explorer depends upon if you are using BIDS or Visual stuido.
In bids you get Data Sources, Data Source Views SSIS packages and Miscellaneous. Data Sources and Data Source views just let you create a connection to a Database at the project level. You can also create it at the Package level but if you create it at the package level and you need to change it and have several packages in the project it can be a pain to go and change it in each of the packages so its best to set it at the project level.
SSIS packages are each of the packages you have in your project. If you right click on your package you can execute the package. You can have several packages in the same project that do different things.
In Visual studio you have Project.parms (your variables), Connection Managers (like Data Sources and Data Souce views in bids but allows you to use any type of connection manager not just Database), SSIS Packages the same as in BIDS
Conclusion
After reading this tutorial you should now understand how to create a SQL Server integration services project. You should also be aware of the difference between the control flow and the data flow tabs. In my next Tutorial we will be looking into how to read from a flat file and load the information contained inside that file into the database. Lesson 2: First Package