Friday, May 30, 2014

SSIS - Split data into multiple destination Files

Recently i was working on a requirement on exporting the data into excel file. As the data may exceed more than millions the data needs to be split into multiple excel files as row limitation is 1048576 i.e, if the source returns 3 million rows the data needs to be split into 3 destination files.

For this i have used the script component destination in the data flow task to achieve this


The source query for the data source has one single column "CSVCol"

SELECT 
convert(varchar(50),D1.DateKey)+ ',' + 
convert(varchar(50),D1.FullDateAlternateKey)+ ',' + 
convert(varchar(50),D1.DayNumberOfWeek)+ ',' + 
convert(varchar(50),D1.EnglishDayNameOfWeek)+ ',' + 
convert(varchar(50),D1.SpanishDayNameOfWeek)+ ',' + 
convert(varchar(50),D1.FrenchDayNameOfWeek)+ ',' + 
convert(varchar(50),D1.DayNumberOfMonth)+ ',' + 
convert(varchar(50),D1.DayNumberOfYear)+ ',' + 
convert(varchar(50),D1.WeekNumberOfYear)+ ',' + 
convert(varchar(50),D1.EnglishMonthName)+ ',' + 
convert(varchar(50),D1.SpanishMonthName)+ ',' + 
convert(varchar(50),D1.FrenchMonthName)+ ',' + 
convert(varchar(50),D1.MonthNumberOfYear)+ ',' + 
convert(varchar(50),D1.CalendarQuarter)+ ',' + 
convert(varchar(50),D1.CalendarYear)+ ',' + 
convert(varchar(50),D1.CalendarSemester)+ ',' + 
convert(varchar(50),D1.FiscalQuarter)+ ',' + 
convert(varchar(50),D1.FiscalYear)+ ',' + 
convert(varchar(50),D1.FiscalSemester)+ ','  AS CSVCol
FROM DimDate D1

Note : This logic can be implemented inside the script task itself to concatenate the columns into the comma separated values.


The following variables are used for the below purposes


Bucket Size - For how many records a new file needs to be created
Filename     - Name of the file. Multiple files will be created with _fileno Ex- Sorna_1
FilePath       - Path in which file will be created
Header Row - In each file header row needs to be added. For this demo it is hard coded. This also can be achieved dynamically using expressions or other mechanisms


Following namespaces needs to be added in the script component code

 using System;  
 using System.Data;  
 using Microsoft.SqlServer.Dts.Pipeline.Wrapper;  
 using Microsoft.SqlServer.Dts.Runtime.Wrapper;  
 using System.IO;  
 using Microsoft.CSharp; 

Below is the code written in script component

public class ScriptMain : UserComponent   
  {   
   int RowCount = 0;   
   int batchcount = 0;   
   int FileCount = 0;   
   int BucketSize = 0;   
   String Filename = "" ;   
   String RunFileName = "";   
   String Filepath = "";   
   String FileFullpath = "";   
   String Excelfilepath = "";   
   string HeaderRow = "";   
   StreamWriter writer;   
   Object mv;   
   public override void PreExecute()   
   {   
    base.PreExecute();   
    RowCount = 0;   
    batchcount = 0;   
    FileCount = 1;   
    Filepath = Variables.FilePath;   
    Filename = Variables.FileName;   
    BucketSize = Variables.BucketSize;   
    HeaderRow = Variables.HeaderRow;   
   }   
   public override void PostExecute()   
   {   
    base.PostExecute();   
    writer.Dispose();   
   }   
   public override void Input0_ProcessInputRow(Input0Buffer Row)
   {   
    if (RowCount ==0)   
    {   
     RunFileName = Filename + "_" + FileCount.ToString();   
     FileFullpath = Filepath + RunFileName + ".csv";   
     writer = new System.IO.StreamWriter(FileFullpath);   
     writer.WriteLine(HeaderRow);   
    }   
    writer.WriteLine(Row.CSVCol.ToString());   
    RowCount++;   
    batchcount++;   
    if (batchcount == BucketSize)   
    {   
     writer.Dispose();   
     FileCount++;   
     batchcount = 0;   
     RunFileName = Filename + "_" + FileCount.ToString();   
     FileFullpath = Filepath + RunFileName + ".csv";   
     writer = new System.IO.StreamWriter(FileFullpath);   
     writer.WriteLine(HeaderRow);   
    }   
   }   
  }   

The package is available in the below link

http://1drv.ms/1o44vnJ

This package will create the files as CSV. This also can be parameterized by putting this in a variable.

Hope this post is helpful for this scenario.

4 comments:

  1. Hi,
    Could you please explain "writer.WriteLine(Row.CSVCol.ToString());".
    From where we are using CSVCol

    Thanxs.

    ReplyDelete
    Replies
    1. Hi,

      CSVCol would be the alias of the column from the OLEDB source. Seems like i have missed to add that part. Will add that details. Thanks for bringing to my notice.

      Delete
    2. Hi,

      Edited the post with explanation of CSVCol.

      Thanks

      Sorna

      Delete
  2. Thanks Sorna,

    Nice explanation. Keep Posting

    Thanks

    ReplyDelete