Question

CSV: dynamic structure creation from CSVT information

  • 30 April 2018
  • 9 replies
  • 14 views

Badge

I have to regularly import a bigger number (200+) of CSV-files into postgis, and obviously dynamic reading of those files is the way to go.

Before coming to FME I did this with the help of ogr2ogr Version 2.xx, which honors the presence of a CSVT-File that contains in one line the data types of the CSV-File, separated with commas. (see http://www.gdal.org/drv_csv.html)

What I am now trying to do is to dynamically create the table structure with FME means by reading

* the first line of the csv-file (), which gives me the column names, and

* the one and only line in the csvt-file ( - the suffix is .csvt, but the uploader complained), which gives me the data types in the same sequence as the table names are.

My first draft looks like this:

I guess that a ListCreator and a SchemaSetter should accomplish this task.

Alternatively, I could imagine to write the structure creator in python, but have no idea how the structure data has to be formatted.

Goal is to run this workspace in FME-Server and submit to it only the CSV-File to be imported, deriving the table name from its filename.

Any help would be greatly appreciated.


9 replies

Userlevel 3
Badge +17

Hi @dollargis, I don't think the SchemaSetter is helpful here. If you know the mapping rule between every possible CSVT type name (real, string, etc.) to corresponding FME generic data type fme_real64, fme_buffer, fme_int32, etc.), I think you can create schema definition based on the CSVT file and configure dynamic writer feature type.

Can you provide a comprehensive table of the CSVT types and their meanings?

Badge
@takashi

thanks for offering support on this issue.

According to https://giswiki.hsr.ch/GeoCSV, csvt knows those data types:

  • Integer or "Integer".
  • Real or "Real".
  • String or "String".
  • Date (format "YYYY-MM-DD"), Time (format "HH:MM:SS+nn") and DateTime (format "YYYY-MM-DD HH:MM:SS+nn"), whereas nn is the timezone.
  • "WKT" (preferred over Point(X/Y)). All WKT geometry types are allowed: Point, LineString, Polygon, Multipoint, MultiLinestring, MultiPolygon, GeometryCollection, Arcs, ... (see OGC WKT).
  • "CoordX","CoordY" (preferred) or "Point(X)","Point(Y)". Two separate colums in either order and not necessary neighboring of type Integer or Float: one containing the easting coordinate, and another containing northing coordinate separated by a comma.
  • All values of that WKT column MAY contain the same geometry (sub)type.

Notes:

  • CSVT fields are separated by commas
  • Types can be in quotes ('"') or not, e.g. <<"Integer";"Real">>.
  • Types can have precision in parentheses, e.g. ('Real(20.2)')).
  • There's only one geometry column per .csvt, "WKT".
  • A WKT field is stored in one single String column.

Userlevel 3
Badge +17

This screenshot illustrates a BASIC way to create a schema definition (i.e. attribute{}.name, attribute{}.fme_data_type) from a CSV file and corresponding CSVT file. See here to learn more about schema definition (attribute{} list): Dynamic Workflows: Destination Schema is Derived from a Schema Feature

Note: this workflow doesn't contain a way to treat geometry fields (WKT, CoordX, CoordY etc.). Probably additional process to treat geometries would be necessary.

Userlevel 3
Badge +17

This screenshot illustrates a BASIC way to create a schema definition (i.e. attribute{}.name, attribute{}.fme_data_type) from a CSV file and corresponding CSVT file. See here to learn more about schema definition (attribute{} list): Dynamic Workflows: Destination Schema is Derived from a Schema Feature

Note: this workflow doesn't contain a way to treat geometry fields (WKT, CoordX, CoordY etc.). Probably additional process to treat geometries would be necessary.

If the source CSV table alway has a single WKT type field, for example, this workflow following to the FeatureMerger in the workflow above detects the geometry field and replace the value with a geometry. Also, the schema definition (attribute{} list) without the WKT field will be merged to the data features from the CSV dataset.

 

 

 

 

Badge

@takashi: Thanks heaps! I am almost there, just the FeatureMerger is not yet correct:

Here is the workspace: wkt-import2k-worker.fmw, maybe the error is already at an earlier stage.

Userlevel 3
Badge +17

@takashi: Thanks heaps! I am almost there, just the FeatureMerger is not yet correct:

Here is the workspace: wkt-import2k-worker.fmw, maybe the error is already at an earlier stage.

The Join On parameter setting in the FeatureMerger is not correct. This FeatureMerger should merge data type name to attribute name read from the CSV (schema reader).Take a closer look at my first screenshot.

 

The workflow just creates a schema definition (attribute{} list), it doesn't read the source dataset (CSV table). To complete the workspace, you need to add a CSV reader reading the source CSV dataset and add a process to merge the schema definition ("attribute{}" list) to rhe data features, and then write them into a dynamic writer feature type.

 

This is an example which contains a process to replace WKT value to geometry: csvt-to-schema.fmwt (FME 2018.0.0.2). Assuming that the the CSV table always has a single WKT field and the value in the field should be replaced with a geometry. Please modify the workflow appropriately depending on the actual conditions and requirements.

 

Badge

@takashi Thanks again, this works with sqlite and also with postgis.

Now I am trying parametrize it to prepare running this workspace on FME Server, but I am failing again. In the end, the namespace and table as well as the csvt-filename should be created from the published parameter wktfile

I have added attribute creators, but I don't see the point where I can feed them into the process.

Here is the workspace: wkt-import2k-worker-01.fmw

Admittedly I am still lacking understanding for some mechanisms here, but I will recieve training only after the deadline of the current project :(

Userlevel 3
Badge +17

@takashi Thanks again, this works with sqlite and also with postgis.

Now I am trying parametrize it to prepare running this workspace on FME Server, but I am failing again. In the end, the namespace and table as well as the csvt-filename should be created from the published parameter wktfile

I have added attribute creators, but I don't see the point where I can feed them into the process.

0684Q00000ArKTxQAN.png

Here is the workspace: wkt-import2k-worker-01.fmw

Admittedly I am still lacking understanding for some mechanisms here, but I will recieve training only after the deadline of the current project :(

To parameterize the source CSV dataset name, the destination PostGIS database schema name and table name, you will have to implement:

 

  1. Change the CSV reader feature type to a single merged feature type (All) so that it could accept every CSV dataset with any feature type name (file name).
  2. Convert the source dataset parameter for the Text File (CSVT) reader to a private parameter, and set $(SourceDataset_CSV2)t as its value. Assuming that the CSVT dataset name would always be <CSV dataset name>t.
  3. Create two published parameters which would store the destination database schema name and table name. e.g. $(DB_SCHEMA), $(DB_TABLE).
  4. Configure the dynamic PostGIS writer feature type appropriately using those published parameters. See this screenshot, and see also this Q&A;: Setting DB namespace dynamically from SourceDataset_GENERIC through RegexReplacement

    0684Q00000ArMULQA3.png

  5. If there could be cases where a schema passed through $(DB_SCHEMA) would not exist in the database, set a SQL statement to the SQL To Run Before Write parameter in the PostGIS writer, in order to create the schema if needed. e.g.
create schema if not exists $(DB_SCHEMA) 
Just an example: csvt-to-schema-2.fmwt (FME 2018.0.0.2)

 

I don't know if this workspace example exactly matches actual conditions and requirements. Please modify appropriately it.
Badge

@takashi Thanks again, this works with sqlite and also with postgis.

Now I am trying parametrize it to prepare running this workspace on FME Server, but I am failing again. In the end, the namespace and table as well as the csvt-filename should be created from the published parameter wktfile

I have added attribute creators, but I don't see the point where I can feed them into the process.

Here is the workspace: wkt-import2k-worker-01.fmw

Admittedly I am still lacking understanding for some mechanisms here, but I will recieve training only after the deadline of the current project :(

@takashi

 

Thank you very much - I am more than grateful for your ongoing support!

 

Reply