I have ran across this a couple times now, I actually had a blog post draft written up for this but never got around to finishing it, well this week I ran into the issue again, so here it is.
Importing data from Excel (using whatever – C#, VB.NET, SSIS, Access, etc) – you have data scattered throughout, and when you go to import it, some of the values are not getting read, they are just missing – empty, or not correct. Something goofy is happening and you cannot figure it out.
Excel tries to figure out the data type of a column, or if there is any data at all in a column by looking at the Top 8 Rows. It will ignore the header row. But it looks at these 8 rows and tries to be smart about it, and this can really screw you up if you are importing data.
Here is my example.
Year | Value1 | Value2
—————————–
1998 | | 20
1999 | |
2000 | | 15
2001 | | 20
2002 | |
2003 | | 15
2005 | | 20
2006 | |
2007 | 55 | 15
Now this might not work if you actually try it, but just bear with me for the examples sake. If you write a function in .NET to import this XLS sheet, the 55 in Value1 column will never get read. Why? Because the first 8 rows for Value1 are empty, and it thinks it is an empty column, just wacked. If you put a 0 in the 1998 row for Value1, then the 55 will get read. Since Value2 column has values in the first rows, it works just fine.
How can you get around this? Well, if you somehow read your XLS file without using the JET engine, then you are probably OK, also, you can tweak a registry setting to make excel look at more rows, say, 1000, which will usually get you by.
HKEY_LOCAL_MACHINESOFTWAREMicrosoftJet<version of jet>EnginesExcel
Where <version of jet> is usually 4.0
Change the TypeGuessRows value to something larger than 8 and then you should be good to go.
Just another Microsoft head scratchier…had me stumped for about an hour! 🙂