Question

CSV: dynamic structure creation from CSVT information

7 years ago
April 30, 2018
9 replies
70 views

dollargis
Contributor
37 replies

I have to regularly import a bigger number (200+) of CSV-files into postgis, and obviously dynamic reading of those files is the way to go.

Before coming to FME I did this with the help of ogr2ogr Version 2.xx, which honors the presence of a CSVT-File that contains in one line the data types of the CSV-File, separated with commas. (see http://www.gdal.org/drv_csv.html)

What I am now trying to do is to dynamically create the table structure with FME means by reading

* the first line of the csv-file (), which gives me the column names, and

* the one and only line in the csvt-file ( - the suffix is .csvt, but the uploader complained), which gives me the data types in the same sequence as the table names are.

My first draft looks like this:

I guess that a ListCreator and a SchemaSetter should accomplish this task.

Alternatively, I could imagine to write the structure creator in python, but have no idea how the structure data has to be formatted.

Goal is to run this workspace in FME-Server and submit to it only the CSV-File to be imported, deriving the table name from its filename.

Any help would be greatly appreciated.

takashi
7711 replies
7 years ago
April 30, 2018

Hi @dollargis, I don't think the SchemaSetter is helpful here. If you know the mapping rule between every possible CSVT type name (real, string, etc.) to corresponding FME generic data type fme_real64, fme_buffer, fme_int32, etc.), I think you can create schema definition based on the CSVT file and configure dynamic writer feature type.

Can you provide a comprehensive table of the CSVT types and their meanings?

dollargis
Author
Contributor
37 replies
7 years ago
April 30, 2018

@takashi

thanks for offering support on this issue.

According to https://giswiki.hsr.ch/GeoCSV, csvt knows those data types:

Integer or "Integer".
Real or "Real".
String or "String".
Date (format "YYYY-MM-DD"), Time (format "HH:MM:SS+nn") and DateTime (format "YYYY-MM-DD HH:MM:SS+nn"), whereas nn is the timezone.
"WKT" (preferred over Point(X/Y)). All WKT geometry types are allowed: Point, LineString, Polygon, Multipoint, MultiLinestring, MultiPolygon, GeometryCollection, Arcs, ... (see OGC WKT).
"CoordX","CoordY" (preferred) or "Point(X)","Point(Y)". Two separate colums in either order and not necessary neighboring of type Integer or Float: one containing the easting coordinate, and another containing northing coordinate separated by a comma.
All values of that WKT column MAY contain the same geometry (sub)type.

Notes:

CSVT fields are separated by commas
Types can be in quotes ('"') or not, e.g. <<"Integer";"Real">>.
Types can have precision in parentheses, e.g. ('Real(20.2)')).
There's only one geometry column per .csvt, "WKT".
A WKT field is stored in one single String column.

takashi
7711 replies
7 years ago
April 30, 2018

This screenshot illustrates a BASIC way to create a schema definition (i.e. attribute{}.name, attribute{}.fme_data_type) from a CSV file and corresponding CSVT file. See here to learn more about schema definition (attribute{} list): Dynamic Workflows: Destination Schema is Derived from a Schema Feature

Note: this workflow doesn't contain a way to treat geometry fields (WKT, CoordX, CoordY etc.). Probably additional process to treat geometries would be necessary.

takashi
7711 replies
7 years ago
April 30, 2018

takashi wrote:

Note: this workflow doesn't contain a way to treat geometry fields (WKT, CoordX, CoordY etc.). Probably additional process to treat geometries would be necessary.

If the source CSV table alway has a single WKT type field, for example, this workflow following to the FeatureMerger in the workflow above detects the geometry field and replace the value with a geometry. Also, the schema definition (attribute{} list) without the WKT field will be merged to the data features from the CSV dataset.

dollargis
Author
Contributor
37 replies
7 years ago
April 30, 2018

@takashi: Thanks heaps! I am almost there, just the FeatureMerger is not yet correct:

Here is the workspace: wkt-import2k-worker.fmw, maybe the error is already at an earlier stage.

takashi
7711 replies
7 years ago
April 30, 2018

dollargis wrote:

@takashi: Thanks heaps! I am almost there, just the FeatureMerger is not yet correct:

Here is the workspace: wkt-import2k-worker.fmw, maybe the error is already at an earlier stage.

The Join On parameter setting in the FeatureMerger is not correct. This FeatureMerger should merge data type name to attribute name read from the CSV (schema reader).Take a closer look at my first screenshot.

The workflow just creates a schema definition (attribute{} list), it doesn't read the source dataset (CSV table). To complete the workspace, you need to add a CSV reader reading the source CSV dataset and add a process to merge the schema definition ("attribute{}" list) to rhe data features, and then write them into a dynamic writer feature type.

This is an example which contains a process to replace WKT value to geometry: csvt-to-schema.fmwt (FME 2018.0.0.2). Assuming that the the CSV table always has a single WKT field and the value in the field should be replaced with a geometry. Please modify the workflow appropriately depending on the actual conditions and requirements.

dollargis
Author
Contributor
37 replies
7 years ago
May 1, 2018

@takashi Thanks again, this works with sqlite and also with postgis.

Now I am trying parametrize it to prepare running this workspace on FME Server, but I am failing again. In the end, the namespace and table as well as the csvt-filename should be created from the published parameter wktfile

I have added attribute creators, but I don't see the point where I can feed them into the process.

Here is the workspace: wkt-import2k-worker-01.fmw

Admittedly I am still lacking understanding for some mechanisms here, but I will recieve training only after the deadline of the current project :(

takashi
7711 replies
7 years ago
May 1, 2018

dollargis wrote:

@takashi Thanks again, this works with sqlite and also with postgis.

I have added attribute creators, but I don't see the point where I can feed them into the process.

Here is the workspace: wkt-import2k-worker-01.fmw

Admittedly I am still lacking understanding for some mechanisms here, but I will recieve training only after the deadline of the current project :(

To parameterize the source CSV dataset name, the destination PostGIS database schema name and table name, you will have to implement:

Change the CSV reader feature type to a single merged feature type (All) so that it could accept every CSV dataset with any feature type name (file name).
Convert the source dataset parameter for the Text File (CSVT) reader to a private parameter, and set $(SourceDataset_CSV2)t as its value. Assuming that the CSVT dataset name would always be <CSV dataset name>t.
Create two published parameters which would store the destination database schema name and table name. e.g. $(DB_SCHEMA), $(DB_TABLE).
Configure the dynamic PostGIS writer feature type appropriately using those published parameters. See this screenshot, and see also this Q&A;: Setting DB namespace dynamically from SourceDataset_GENERIC through RegexReplacement
If there could be cases where a schema passed through $(DB_SCHEMA) would not exist in the database, set a SQL statement to the SQL To Run Before Write parameter in the PostGIS writer, in order to create the schema if needed. e.g.

create schema if not exists $(DB_SCHEMA)

Just an example: csvt-to-schema-2.fmwt (FME 2018.0.0.2)

I don't know if this workspace example exactly matches actual conditions and requirements. Please modify appropriately it.

dollargis
Author
Contributor
37 replies
7 years ago
May 1, 2018

dollargis wrote:

@takashi Thanks again, this works with sqlite and also with postgis.

I have added attribute creators, but I don't see the point where I can feed them into the process.

Here is the workspace: wkt-import2k-worker-01.fmw

Admittedly I am still lacking understanding for some mechanisms here, but I will recieve training only after the deadline of the current project :(

@takashi

Thank you very much - I am more than grateful for your ongoing support!

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

CSV: dynamic structure creation from CSVT information

2 Attachments