Question

Slow bulk insert CSV to MSSQL

4 years ago
November 27, 2020
4 replies
130 views

swebstergeo
4 replies

This question contains a few moving pieces - all in the hope of improving bulk insert operations.

Background: FME Desktop is on a dedicated workstation with 16gb ram and 2.1gb latent memory usage (ie 14gb physical ram is almost always available for FME to use.)

I'm attempting to use FeatureReader to read/write a 1.5gb CSV (10m rows, 15 columns) to a MSSQL server. The writer is Microsoft SQL Server Non-Spatial.

The SQL Server writer has Bulk Insert set to Yes, and I've tried Features per Transaction from 500 to 5000 rows.

The SQL server has more than enough free space in it's current *.mdf that it is not trying to increase size, the table is set to truncate on insert, and there are no indexes, constraints, or keys at the time of insert.

The SQL server has almost no load on it at all (extremely underutilised!), but in any case is on a large machine (20gb ram, 8 core), and the database itself has virtually no usage.

The insert is currently taking about 20 minutes. From previous experience this should be achievable in under 2 minutes in the same database.

Looking through the logs I find the settings below.

CSV2 is reading 100,000 lines at a time - I cannot seem to increase, or decrease this.

FME API version of module 'CSV2' matches current internal version (3.8 20200115)
CSV reader: Opening dataset 'C:\Temp\WeatherDownloads\data.csv'
CSV reader: Directive '_TRIM_FEATURE_TYPE_NAME' has the value 'Yes'
CSV reader: Directive '_DELIMITER' has the value 'auto'
CSV reader: Directive '_FIELD_NAMES_LINE' has the value '1'
CSV reader: Directive '_DATA_START_LINE' has the value '2'
CSV reader: Directive '_SCAN_MAX_ROWS' has the value '10000'
CSV reader: Directive '_READ_SKIPPED_LINES' has the value 'No'
CSV reader: Directive '_ENCODING' has the value 'windows-1252'
CSV reader: Directive '_SKIP_DUPLICATE_DELIMITERS' has the value 'No'
CSV reader: Directive '_FIELD_QUALIFIER_CHARACTER' has the value '"'
CSV reader: Directive '_FIELD_QUALIFIER_ESCAPE_CHARACTER' has the value '"'
CSV reader: Directive '_TRIM_FIELD_NAME_QUALIFIERS' has the value 'Yes'
CSV reader: Directive '_TRIM_FIELD_VALUE_QUALIFIERS' has the value 'Yes'
CSV reader: Directive '_PRESERVE_SOURCE_ENCODING' has the value 'No'
CSV reader: Directive '_CASE_SENSITIVE_FIELD_NAMES' has the value 'Yes'
CSV reader: Directive '_ENFORCE_STRICT_SCHEMA' has the value 'No'
CSV reader: Directive '_SCAN_TYPES' has the value 'Yes'
CSV reader: Directive '_SCAN_ADDITIONAL_FIELDS' has the value 'No'
CSV reader: Directive '_CHUNK_SIZE' has the value '100000'
CSV reader: Directive '_TRIM_FIELD_NAME_WHITESPACE' has the value 'Yes'

Next, as FME is writing to SQL server it is obviously doing it in batches based upon the number of features per transaction. However, what is odd os that it's attempting to manage memory between each batch.

Microsoft SQL Server Non-Spatial Writer: Truncating table 'WeatherGrids''
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 13 times ...
MSSQL_ADO writer: 33954 of 100000 features written
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 16 times ...
MSSQL_ADO writer: 72948 of 100000 features written
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 10 times ...
MSSQL_ADO writer: 100000 of 100000 features written
CSV reader: Read 200000 features
MSSQL_ADO writer: Splitting bulk mode features into features
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 14 times ...
MSSQL_ADO writer: 34000 of 100000 features written
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 16 times ...
MSSQL_ADO writer: 74000 of 100000 features written
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 10 times ...
MSSQL_ADO writer: 100000 of 100000 features written
CSV reader: Read 300000 features
MSSQL_ADO writer: Splitting bulk mode features into features
ResourceManager: Optimizing Memory Usage.  Please wait...

The process completes successfully, but is very slow. So far, 500-1000 seems to yield the best speed - but even that speed is incredibly slow for bulk insert.

FME Session Duration: 39 minutes 18.0 seconds. (CPU: 1058.8s user, 354.6s system)
END - ProcessID: 6328, peak process memory usage: 111224 kB, current process memory usage: 87824 kB
Translation was SUCCESSFUL
Creating reader for format: CSV (Comma Separated Value)
Trying to find a DYNAMIC plugin for reader named `CSV2'

Any thoughts on how I might be able to get this thing performing a bit better?

+54

nielsgerrits
2848 replies
4 years ago
November 27, 2020

I have no solution, but I think you should include what FME version you are using. Also, you mention "From previous experience this should be achievable in under 2 minutes in the same database." Did you manage to do this using FME? If so, what changed in the meantime? Did you upgraded FME? Did you change the workspace? Did you change from Classic Readers / Writers to FeatureReaders / FeatureWriters?

swebstergeo
Author
4 replies
4 years ago
November 27, 2020

Thanks Niel,

FME Desktop 2020.2

Prior experience using BCPUtility (which has it's own issues!!!)

The workstation is a pretty new machine on a gigabit network, and FME + Python are the only two things loaded aside from a standard office install (so, MS Word, etc - though it's not used).

fhilding
56 replies
4 years ago
December 2, 2020

BCP Utility uses the copy-command on the server I'd guess, which is blazingly fast. In this case, it seems like the machine you're running FME on runs out of memory, which forces it to swap it to disk, which in turn takes more and more time. If you can't increase the memory on the FME-machine, then that'll continue to be a bottleneck (unless you can reduce the attributes you're reading, but I guess that's not an option really). So, comparing FME to the bulk copy utility is a bit unfair to poor FME, which tries to do normal inserts and not the blazing-fast-SQL-copy-from-csv command.

+54

nielsgerrits
2848 replies
4 years ago
December 2, 2020

Not sure if this is an option in your case, but if you probably can run BCP Utility commandline from Workbench via the SystemCaller transformer.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Slow bulk insert CSV to MSSQL

4 replies

Reply

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

How do I insert shp-data into specific cells in a excel template?icon

Using template in excel writericon

Python Exception: No module named win32apiicon

Write to Excel template header into specific cellsicon

How do I configure Excel Writer to use an Excel template when I want to write out to a table in the middle of the sheet but I don't know how many rows of data I will be writing out each time?icon

Helpful Members This Week

Recently Solved Questions

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Create date segments of two table with overlap of times

Automate Fanout of columns/splitting attributes to different output by attribute name

Tracing Multiple Networks from Sources to Valves Without Python

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings