An exercise in coding. Loading a text file to SQL. There are a billion different ways to do it, and depending on your source data format and such, you can parse the file a million ways too. But how fast can you get data from disk to SQL using .NET code? (VB.NET or C#) . This post is going to show the differences in some of the ways you can load data from disk to SQL.
I am sure I could do more, but this is a good sampling. Lets assume a 1,00,000 row file, comma separated with 3 columns, string, int, string of variable length. Lets assume our destination is SQL Server 2005, table already created, no keys or anything on the table.
We will call our table LoadedData. Our test app will be a VB.NET Console Application, running on the same box as SQL 2005 is loaded. Now, there are many ways to load files. A few are: Reading them line by line, ReadToEnd() and also using the JET engine to read in a CSV, etc. From the testing I have been doing, all of these seem to work fairly fast, maybe a comparison on these is for another blog post, but for brevity’s sake, lets just say they are all comparable. Now, I chose 3 methods of inserting data.
1) StreamReader.ReadLine, Insert Line By Line
Sub Method1()
Dim i As Long = 0
Dim sr As StreamReader = New StreamReader(filename)
Dim line As String = sr.ReadLine()
Dim dbConn As SqlConnection = New SqlConnection(ConfigurationManager.ConnectionStrings("MyDB").ToString())
Dim dbCmd As SqlCommand = New SqlCommand()
dbCmd.Connection = dbConn
Dim wholeFile As String = sr.ReadToEnd()
Do
Dim fields() As String = line.Split(",")
dbCmd.CommandText = "INSERT INTO dbo.TestData (Column1,Column2,Column3) " & _
" VALUES (‘" & fields(0) & "’," & fields(1) & ",’" & fields(2) & "’)"
dbConn.Open()
dbCmd.ExecuteNonQuery()
dbConn.Close()
i = i + 1
line = sr.ReadLine()
Loop While Not line = String.Empty
End Sub
2) StreamReader.ReadLine, Batch Insert With DataAdapter
Sub Method2()
Dim i As Long = 0
Dim dbConn As SqlConnection = New SqlConnection(ConfigurationManager.ConnectionStrings("MyDB").ToString())
Dim sr As StreamReader = New StreamReader(filename)
Dim line As String = sr.ReadLine()
Dim strArray As String() = line.Split(",")
Dim dt As DataTable = New DataTable()
Dim row As DataRow
For Each s As String In strArray
dt.Columns.Add(New DataColumn())
Next
Do
row = dt.NewRow()
row.ItemArray = line.Split(",")
dt.Rows.Add(row)
i = i + 1
line = sr.ReadLine()
Loop While Not line = String.Empty
Dim dataAdapter As New SqlDataAdapter()
dataAdapter.SelectCommand = New SqlCommand("SELECT TOP 1 Column1,Column2,Column3 from dbo.TestData", dbConn)
Dim cmdBuilder As SqlCommandBuilder = New SqlCommandBuilder(dataAdapter)
dbConn.Open()
Dim ds As DataSet = New DataSet
dataAdapter.Fill(dt)
dataAdapter.UpdateBatchSize = 1000
dataAdapter.Update(dt)
dbConn.Close()
End Sub
3) StreamReader.ReadLine, SqlBulkCopy
Sub Method3()
Dim i As Long = 0
Dim dbConn As SqlConnection = New SqlConnection(ConfigurationManager.ConnectionStrings("MyDB").ToString())
Dim sr As StreamReader = New StreamReader(filename)
Dim line As String = sr.ReadLine()
Dim strArray As String() = line.Split(",")
Dim dt As DataTable = New DataTable()
Dim row As DataRow
For Each s As String In strArray
dt.Columns.Add(New DataColumn())
Next
Do
row = dt.NewRow()
row.ItemArray = line.Split(",")
dt.Rows.Add(row)
i = i + 1
line = sr.ReadLine()
Loop While Not line = String.Empty
Dim bc As SqlBulkCopy = New SqlBulkCopy(dbConn, SqlBulkCopyOptions.TableLock, Nothing)
bc.DestinationTableName = "TestData"
bc.BatchSize = dt.Rows.Count
dbConn.Open()
bc.WriteToServer(dt)
dbConn.Close()
bc.Close()
End Sub
The results of the 3 methods are surprising. The thing is, most people are going to use Method1 because it just is the first thing you think of doing, and maybe the easiest to code (everyone learns loops in school, etc) – now, nitpickers will say "use a stored proc" etc – that will save minimal time, and in best practice yes, but for the sake of the example bear with it..
Method2 is less intuitive, and really tricky to get working (at least I had some issues with it) but once it works, it makes a little bit more sense then Method1.
Method3 is something that no one ever hears or uses, but once they do, they never go back.
Side note: about 5 years ago I worked on a program that inserted huge files, and they were taking 10-20 minutes a piece. I was using VB6, and converted the line by line insert to use BCP from code and got it down to 2-3 minutes, which was good. So I know about BCP and BULK INSERT. I just didn’t know it was built into .NET, now I do..anyways, on to the results.
Method 1- 14.69 minutes to insert 1 million records
Method 2 – 7.88 minutes to insert 1 million records
Method 3 – 0.28 minutes to insert 1 million records
So yeah. Wow. That is not a typo for Method 3. Roughly 17 seconds. Now, give Method2 so credit, it reduced the time from Method1 by 50% but Method3 just KILLS them both. Man, just awesome. When you run something like that and see that kind of performance, you can’t help but smile.
A few caveats and gotchas:
Method2 – the BatchSize property I have set to 1000. If you set to 0 it uses the max. I tried this and locke
d my machine up. Yikes.
Method3 – The SqlBulkCopyOptions makes a difference – TableLock speeds up the operation. The BatchSize here I have set to the # of rows, It might run differently with different batch sizes, I really didn’t experiment with it, but adding the Copy Options and BatchSize, it sped up the operations.
So, the fastest way I have found in .NET to load data from files to sql – hands down – SqlBulkCopy. Since it took 17 seconds, The next step is to try different file reading methods and see what time times are there, like I said, maybe that will be my next post. 🙂
p.s. yes I am up at 2:00 AM posting this, insomnia anyone? In any event I got to watch the repeat of the Nevada Democratic Debate 🙂
38 replies on “.NET – Fastest Way to Load Text File To SQL – SqlBulkCopy”
Steve,
I do recommend you try the union approach. Not sure why it wouldn’t get used under the hood in 2), but I have certainly found it to be very quick, and unlike 3) it will work without particularly weird coding and against a variety of database systems.
I suspect also that your test is a bit flattering to bulk in this case since it has no keys, which is hardly useful in practice for a million rows.
LikeLike
not sure what the UNION approach is, but yeah, even if I put I primary key, I think the bulk (#3) would still beat out the others, especially if the # of records was higher. There is just no way you are going to be make sequential inserts faster.
LikeLike
You go:
insert t ()
select ()
union select ()
union select ()
union …
SQLServer bulk should be fastest but I suspect it has a setup overhead and will be a lot less optimised with indices and triggers involved.
The statement above can be used anywhere a simple insert can be used, so you can insert to several tables in a single TSQL batch – or from a trigger, stored proc etc.
And the performance can be rather impressive.
Bulk has its place – and its easier on SQLServer than Sybase – but also some limitations.
LikeLike
Darn, its stripped some of the text with angle brackets. 😦
Should still be understandable.
LikeLike
Yeah, I dont know how you would really use that in a stored proc unless you knew the values in advance. If I get any motivation, I might open up the project I made to test this and add that, but Id bet money it doesn’t perform as well as SqlBulk.
LikeLike
I suspect it won’t be as fast as bulk – but as you add indices and any triggers, the gap will narrow, and you don’t have any awkward setup tasks that don’t use simple statements. I think you’ll find
I don’t understand your comment about knowing values in advance? Same for everything – in this case you can at least prepare a statement that inserts 10 rows, for example.
Admitedly you’re not going to _prepare_ an insert that does 1000 rows like this (seem to remember I’ve executed statements like that though), but it makes it easy to use one code path that builds insertion batches for master detail – providing you can preallocate the master keys rather than expecting an insert trigger to do it for you.
The parse and plan time seems low enough that its reasonable
to send and execute without a prep.
You can send a complex batch like this:
begin tran
insert master (…) select (…) union select (…) union select (…)
insert detail (…) select (…) union select (…) union select (…) … etc
insert sub_detail (…) select (…) union select (…) union select (…) … etc
commit
as one RPC to the server. Its very flexible.
Like I said, a million rows into a table with no index isn’t particularly representative – it can change the amount of logging performed for example – I do encourage you try the tests with a more representative schema.
LikeLike
knowing the values in advance? well with a stored proc, passing in values, or are you just saying pass in a string and execute it in a stored proc? otherwise you need to parameterize your proc and pass in values, not sure how you could do that without knowing in advance.
The test wasn’t on master/detail/subdetail values, just a bulk log file of values. The only key that I would possibly add would be a primary key, and in this case (in the test), it would have probably just been an auto index, I don’t see how that would hinder the performance too much.
This test is just inserting a ton of values from a log file to a database. With master/detail things would have to be done differently, of course, you couldn’t really bulk insert without doing something different, but for what I was trying to represent, the tests themselves speak for what I was trying to convey.
LikeLike
Well, the data can be parameters, or calculated from them, or expanded from a blob param or whatever. No matter – we have to decide the data values at some point.
Granted that bulk insert of a log file is a ‘standard’ BCP scenario, but even then I’d normally expect to see an index in place before its useful. Of course, creating the index after import is quite normal – partly becauuse the bulk import is so much slower if you have indices in place.
If all you care about is bulk log record insert then so be it. I’m really not arguing that BCP or the SQLServer bulk facility isn’t the fastest way to do this – but its just not very generalised and a multi-row insert that can be done anywhere you can do a normal insert statement is a whole lot more flexible. And it can be surprisingly fast.
LikeLike
Thank you very much for this post. Using the SqlBulkCopy method made my bulk inserts 100 times faster than using DataAdapter w/updatebatchsize property, even though my table has 2 indices.
LikeLike
I used Method3 to insert data to Database, it’s don’t throw Exception, but data don’t save to Database! Please tell me why!
LikeLike
I would need more info to know why it isn’t working for you, I have used it numerous times and it works for me
LikeLike
hi,
Thank u very much for ur valuable code its works correctly only string data. but
the text file had datatime value its cant convert to string to date time . Following exception throws “The given value of type String from the data source cannot be converted to type datetime of the specified target column.”
how to fix this problem
Regards,
sabarish.
LikeLike
I have exactly same problem as sabarish with data type conversion from string to datetime and an exception is thrown. Has anyone found a solution?
LikeLike
Hi,
Thanks for the quick and easy intro to this powerful feature. Very helpful.
In case of files that contain millions(2 or 3 mil) with 20 columns, it would be advisable to read a batch of rows from the file and write to the db, right? This will help avoid loading the entire file into the DataTable inside the loop.
Or is there any other way to do this in .NET?
The other thing to note is that you have tested this on the same box as the SQL Server which eliminates network delays/latencies et al… That number will be location specific, however….
LikeLike
Thanks buddy.. good information you posted.. I uploaded almost 1.5 million records within 13 secs by using third option.
LikeLike
Im toying around with your examples but cannot get the SqlConnection(ConfigurationManager.ConnectionStrings(“MyDB”).tostring()) to work. It doesnt recognize the ConfigurationManager. Do I need an import to involk this? Im not familiar with it.
Thanks,
Paul S
LikeLike
Paul, you need to reference System.Configuration by adding a reference to the assembly
LikeLike
FANTASTIC! That just saved me SO MUCH TIME. Thank you.
LikeLike
i am getting this error :
The given value of type String from the data source cannot be
converted to type decimal of the specified target column.
when run this command
bc.WriteToServer(dt);
how can i assign value or define value for this column may data of this column is null and type of this column is decimal so its can not be converting string to decimal ,
help me plz i am stuck !
LikeLike
i also use this
SqlBulkCopy bc = new SqlBulkCopy(connection, SqlBulkCopyOptions.KeepNulls , null);
SqlBulkCopyOptions.TableLock
SqlBulkCopyOptions.Default
but still error comes
type conversion error cannt converted string to decimal or
string to date , becouse fiels is empty in text file
error comes on this line
bc.WriteToServer(dt);
help me 😦
LikeLike
hi ia m facing problem with this
as i am new to vb.net stream
LikeLike
Hi i am getting following error …..
Error: The script files failed to load.
LikeLike
Please some one help me out ….
i have code modified some thing like this
Sub method3()
Dim i As Long = 0
Dim dbConn As SqlConnection = New SqlConnection(ConfigurationManager.ConnectionStrings(“TEST_DB”).ToString())
Dim sr As StreamReader = New StreamReader(“D:MSPLNYERXX”)
Dim line As String = sr.ReadLine()
Dim strArray As String() = line.Split(“|”c)
Dim dt As DataTable = New DataTable()
Dim row As DataRow
For Each s As String In strArray
dt.Columns.Add(New DataColumn())
Next
Do
row = dt.NewRow()
row.ItemArray = line.Split(“,”c)
dt.Rows.Add(row)
i = i + 1
line = sr.ReadLine()
Loop While Not line = String.Empty
Dim bc As SqlBulkCopy = New SqlBulkCopy(dbConn, SqlBulkCopyOptions.TableLock, Nothing)
bc.DestinationTableName = “MSPLNYERX”
bc.BatchSize = dt.Rows.Count
dbConn.Open()
bc.WriteToServer(dt)
dbConn.Close()
bc.Close()
End Sub
can any one tell me what is going wrong
.. ihave used this in my script component task
BUt Unluckly i have encounterd with the following error 😦
Error: The script files failed to load.
LikeLike
can any one help me to achieve the method 3
Itwould be good if any one one of u give me detailed steps to achive this.
because i am new into this . net stream.
Please
Thanks in advance
Prasanna KJ
LikeLike
Had the same problem – The given value of type String from the data source cannot be
converted to type decimal of the specified target column.
For me, this was due to whole columns having value NULL.
What I did was to add the column mapping before starting using:
For Each col As DataColumn In SourceDataTable
bulkData.ColumnMappings.Add(col.ColumnName, col.ColumnName)
Next
LikeLike
For me, I was reading data from an input file and the column with dates in it was in a different date format then the regional settings on my computer. I changed regional settings – short date setting – on my computer to match the date format in the import file.
LikeLike
I am having problems with the database “corrupting” after Method 3 (have not tried the others yet).
After running Method3 all the rows are inserted OK: I can run a Count(*) query OK to return the number of rows, but when a try to connect again (rerunning the application) I get an error saying the MDR file is the wrong size of the LDR file is locked.
Any ideas?
LikeLike
Man, this is really awesome. I've been looking for a way to insert 900 million rows very quickly and this is the way to do it.Thanks,
LikeLike
very helpful.
LikeLike
You can use sqlbuclkcopy to stream data directly from flat file using this
http://code.msdn.microsoft.com/FlatFileDataReader
LikeLike
I have a SQL table with first column 'CODE' as autoincrement.I am inserting fresh rows in this table from a datagridview (bound to a datatable) using sqlbulkcopy.Now how can I get back the 'CODE' values of the inserted rows to be displayed into the datagridview.Shall be pleased to receive Help and advice.
LikeLike
This doesn't speed anything up, you still need to check for dupes. Too bad there isn't a SQLBulkDupeCheck().
LikeLike
Really helpful post. I found issues in Method 3 due to data types so I am creating the structure by creating a datatable populated from the destination table. I loaded 1 million rows in under 16 seconds so very happy with the speed now. I reckon if you load the whole text file into memory and work from there it could be even quicker but why bother.
LikeLike
SqlBulkCopy internally uses Transact-SQL BULK INSERT command. You can use MS SQL Server BCP Utility to insert even faster!
LikeLike
yep, see in the post “Side note: about 5 years ago I worked on a program that inserted huge files, and they were taking 10-20 minutes a piece. I was using VB6, and converted the line by line insert to use BCP from code and got it down to 2-3 minutes, which was good. So I know about BCP and BULK INSERT. I just didn’t know it was built into .NET,”
LikeLike
[…] Inspired by: Steve Novoselac’s Post on SqlBulkCopy […]
LikeLike
Excelent post!!
thanks for the comparison data!!
LikeLike
I used the 2nd method but only one row was inserted into the table and I don’t understand the role of the variable i= i + 1
LikeLike