Skip to main content

This question contains a few moving pieces - all in the hope of improving bulk insert operations. 

 

Background: FME Desktop is on a dedicated workstation with 16gb ram and 2.1gb latent memory usage (ie 14gb physical ram is almost always available for FME to use.)

I'm attempting to use FeatureReader to read/write a 1.5gb CSV (10m rows, 15 columns) to a MSSQL server. The writer is Microsoft SQL Server Non-Spatial. 

The SQL Server writer has Bulk Insert set to Yes, and I've tried Features per Transaction from 500 to 5000 rows. 

The SQL server has more than enough free space in it's current *.mdf that it is not trying to increase size, the table is set to truncate on insert, and there are no indexes, constraints, or keys at the time of insert. 

The SQL server has almost no load on it at all (extremely underutilised!), but in any case is on a large machine (20gb ram, 8 core), and the database itself has virtually no usage. 

 

The insert is currently taking about 20 minutes. From previous experience this should be achievable in under 2 minutes in the same database. 

 

Looking through the logs I find the settings below. 

CSV2 is reading 100,000 lines at a time - I cannot seem to increase, or decrease this. 

FME API version of module 'CSV2' matches current internal version (3.8 20200115)
CSV reader: Opening dataset 'C:\Temp\WeatherDownloads\data.csv'
CSV reader: Directive '_TRIM_FEATURE_TYPE_NAME' has the value 'Yes'
CSV reader: Directive '_DELIMITER' has the value 'auto'
CSV reader: Directive '_FIELD_NAMES_LINE' has the value '1'
CSV reader: Directive '_DATA_START_LINE' has the value '2'
CSV reader: Directive '_SCAN_MAX_ROWS' has the value '10000'
CSV reader: Directive '_READ_SKIPPED_LINES' has the value 'No'
CSV reader: Directive '_ENCODING' has the value 'windows-1252'
CSV reader: Directive '_SKIP_DUPLICATE_DELIMITERS' has the value 'No'
CSV reader: Directive '_FIELD_QUALIFIER_CHARACTER' has the value '"'
CSV reader: Directive '_FIELD_QUALIFIER_ESCAPE_CHARACTER' has the value '"'
CSV reader: Directive '_TRIM_FIELD_NAME_QUALIFIERS' has the value 'Yes'
CSV reader: Directive '_TRIM_FIELD_VALUE_QUALIFIERS' has the value 'Yes'
CSV reader: Directive '_PRESERVE_SOURCE_ENCODING' has the value 'No'
CSV reader: Directive '_CASE_SENSITIVE_FIELD_NAMES' has the value 'Yes'
CSV reader: Directive '_ENFORCE_STRICT_SCHEMA' has the value 'No'
CSV reader: Directive '_SCAN_TYPES' has the value 'Yes'
CSV reader: Directive '_SCAN_ADDITIONAL_FIELDS' has the value 'No'
CSV reader: Directive '_CHUNK_SIZE' has the value '100000'
CSV reader: Directive '_TRIM_FIELD_NAME_WHITESPACE' has the value 'Yes'

 

Next, as FME is writing to SQL server it is obviously doing it in batches based upon the number of features per transaction. However, what is odd os that it's attempting to manage memory between each batch. 

 

Microsoft SQL Server Non-Spatial Writer: Truncating table 'WeatherGrids''
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 13 times ...
MSSQL_ADO writer: 33954 of 100000 features written
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 16 times ...
MSSQL_ADO writer: 72948 of 100000 features written
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 10 times ...
MSSQL_ADO writer: 100000 of 100000 features written
CSV reader: Read 200000 features
MSSQL_ADO writer: Splitting bulk mode features into features
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 14 times ...
MSSQL_ADO writer: 34000 of 100000 features written
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 16 times ...
MSSQL_ADO writer: 74000 of 100000 features written
ResourceManager: Optimizing Memory Usage.  Please wait...
... Last line repeated 10 times ...
MSSQL_ADO writer: 100000 of 100000 features written
CSV reader: Read 300000 features
MSSQL_ADO writer: Splitting bulk mode features into features
ResourceManager: Optimizing Memory Usage.  Please wait...

 

The process completes successfully, but is very slow. So far, 500-1000 seems to yield the best speed - but even that speed is incredibly slow for bulk insert. 

 

FME Session Duration: 39 minutes 18.0 seconds. (CPU: 1058.8s user, 354.6s system)
END - ProcessID: 6328, peak process memory usage: 111224 kB, current process memory usage: 87824 kB
Translation was SUCCESSFUL
Creating reader for format: CSV (Comma Separated Value)
Trying to find a DYNAMIC plugin for reader named `CSV2'

 

Any thoughts on how I might be able to get this thing performing a bit better?

 

 

 

I have no solution, but I think you should include what FME version you are using. Also, you mention "From previous experience this should be achievable in under 2 minutes in the same database." Did you manage to do this using FME? If so, what changed in the meantime? Did you upgraded FME? Did you change the workspace? Did you change from Classic Readers / Writers to FeatureReaders / FeatureWriters?


Thanks Niel,

 

FME Desktop 2020.2

Prior experience using BCPUtility (which has it's own issues!!!)

 

The workstation is a pretty new machine on a gigabit network, and FME + Python are the only two things loaded aside from a standard office install (so, MS Word, etc - though it's not used).


BCP Utility uses the copy-command on the server I'd guess, which is blazingly fast. In this case, it seems like the machine you're running FME on runs out of memory, which forces it to swap it to disk, which in turn takes more and more time. If you can't increase the memory on the FME-machine, then that'll continue to be a bottleneck (unless you can reduce the attributes you're reading, but I guess that's not an option really). So, comparing FME to the bulk copy utility is a bit unfair to poor FME, which tries to do normal inserts and not the blazing-fast-SQL-copy-from-csv command.


Not sure if this is an option in your case, but if you probably can run BCP Utility commandline from Workbench via the SystemCaller transformer.


Reply