Skip to main content

I'm still a newbe to FME considering below.

I'm trying to query open data from the Dutch https://public.ep-online.nl/api/v4/Mutatiebestand/DownloadInfo?fileType=csv (API-key required, available from https://epbdwebservices.rvo.nl/) to get the data from a csv-file. HttpCaller works, url obtained with JSONFragmenter (attributes to expose: 'downloadUrl’)+testfilter (to get the actual url).

 

However, I can't seem to use this url in an FeatureReader or further HTTPCaller to actually get the data itself from the csv-file. Copying the url (contains a token valid only one time) does download the file from a browser as a zip-file, but how can this be done in FME?

I tried to look at https://public.ep-online.nl/swagger/index.html but couldn't make anything out of it due to lack of technical knowhow. 

 

Any ideas?

 

Setup as of now:

 

FeatureReader tried below (the response_body solely contains the url):

Also tried '@Value(_response_body)p*.csv]’ for parameter dataset, but also to no avail.

 

Another try with a second HTTPCaller:

 

Both don't return the zip of csv file containing over 1,000,000 items.

What happens if you set the HTTPCaller’s “Save Response Body To” as a file, not as an attribute?


Thanks. That did the trick!

  • I changed the second HTTPCaller to:
    • ‘Request URL’ set to the attribute with the url. (here: _response_body)
    • ‘Save Response Body’ to 'File’
    • ‘Output filename’ saved to ‘file.zip’ (temp)
  • Then used the FeatureReader set to ‘Format’ CSV and Dataset set to: 'local path\file.zipo*.csv]’

Works like a charm.


Note, if you want the CSV file (or any HTTPCaller File) to be temporary and deleted after Translation, then can also use a TempPathNameCreator with it.

Any paths output by TempPathNameCreator, including Files, are deleted at the end of a Partial or Full run, which can help to keep things tidy when using an interim file external to the workspace.

 

 


I tried the TempPathnameCreator. It's a zip-file with 1 csv containing fieldnames in row 3 and data from row 4.

It does return the number of features, but not the actual data.

First I got a WARN: 'Detected excess data and/or delimiters in addition to the defined schema in line 5 with value '    <meta charset="utf-8" /> ' read from file 'D:\Temp\FME\wbrun_1724306402074_21096\fmetmp_b\1724312665909_1736_d\r_2601.csv'.  Excess data will not be read.  This can occur if the file is not uniform in the number of fields per row or if fields have been intentionally removed from the source feature type.  In order to read the excess data, recreate the source feature type and set a larger value for the 'Maximum Rows to Scan' option to scan additional rows for fields.  If the file contains field names, ensure that the field names line represents all of the columns in data rows, or enable the 'Scan for Additional Fields' option.  This may also be caused by a misplaced field qualifier character.  In this case, ensure that field qualifiers in the row are correct, or consider changing the Field Qualifier Character value

 

Enlarged it to 10,000,000 to be sure (as highlighted below). WARN is then gone, but still it fails to show the scheme (log below).

What am I missing?

 

 

Log:

2024-08-22 10:07:42| 245.0|  0.0|INFORM|CSV reader: Opening dataset 'D:\Temp\FME\wbrun_1724306402074_21096\fmetmp_d\TempFS_1724313854706_56920\dsR_1_0\tor350.zip_0\v20240801_v4_csv.csv'

...

2024-08-22 10:11:43| 698.5|  2.3|INFORM|Merged 1 schema features read from 1 datasets into 1 resulting feature types
...

2024-08-22 10:13:04| 726.9|  1.3|STATS |Destination Feature Type Routing Correlator (RoutingFactory): Tested 1 input feature(s), wrote 1 output feature(s): 0 matched merge filters, 0 were routed to output, 1 could not be routed.
2024-08-22 10:13:04| 726.9|  0.0|STATS |Final Output Nuker (TeeFactory): Cloned 1 input feature(s) into 0 output feature(s)
2024-08-22 10:13:04| 726.9|  0.0|WARN  |=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
2024-08-22 10:13:04| 726.9|  0.0|WARN  |      Features not written - verify feature type fanout/dynamic settings
2024-08-22 10:13:04| 726.9|  0.0|WARN  |=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
2024-08-22 10:13:04| 726.9|  0.0|WARN  |  (Check that fanout/dynamic feature type attribute exists and has a value)
2024-08-22 10:13:04| 726.9|  0.0|WARN  |=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
2024-08-22 10:13:04| 726.9|  0.0|WARN  |                                                                             1
2024-08-22 10:13:04| 726.9|  0.0|WARN  |==============================================================================
2024-08-22 10:13:04| 726.9|  0.0|WARN  |Total features not written                                                   1
 


 

What am I missing?
 


Apologies, key step should probably outline for those doing for the first time is to first download a sample file to say C:\Temp.  Even just do it with your Browser.

This is to give a sample to the FeatureReader to “teach” it to which Field Names to attempt to Read and Expose to the Workspace.   I suspect your current Workspace is reading the attributes, but FeatureReader hasn’t been set to Expose the Attributes the Workspace Schema.  This can be fixed manually with an AttributeExposer, but there are easier ways below.

I’m going to use a sample CSV URL from here:

https://support.staffbase.com/hc/en-us/article_attachments/360009197031/username.csv

We can “teach” the FeatureReader from a sample download of this CSV file, what CSV Field Names to Expose as named Attributes in the Workspace.  As above, can manually do this with an AttributeExposer after the Reader, but is easier to set it up within the Reader where possible.

 

So, taking the FeatureReader, on the Generic Port Settings, we’ll give it a sample manually downloaded CSV file to learn from using the “Import” Function to import from the sample CSV file the expected Schema.

 

 

 

After this step, we’ve effectively told the FeatureReader which Attributes to expose to the Workspace.

 


Now, let’s put all the Transformer parts together to instead use dynamic CSV paths, in this case using the HTTPCaller _response_file_path dynamic attribute value, which itself is dynamically set by the TempPathnameCreator.

 

On the Generic Port, we get the following Exposed Attributes since that is what we setup using the sample manually downloaded CSV file + Import Schema->Expose Attributes step.   It takes these settings and similarly attempts to expose these Attributes, if they exist, in the dynamic temporary CSV file (If they didn’t exist it doesn’t break the Workspace but the output here will instead say <Missing>)

 

 


Will also add to this if want the FeatureReader to learn what Data Type to expect to read for each Field, then there is an extra step in setting up the Attribute Data Typing.

 

It is similar to above, but first we are going to set the FeatureReader to directly read from the sample CSV file.  This is a temporary override, once we are done this will be set back to the HTTPCaller _response_file_path

 

Now into “Parameters” we’ll set the Attribute Data Typing, by initially setting it to “Automatic” and then setting to “Manual”.  After this, the FeatureReader will not only know which Attributes to Expose, but will also assume a Data Type for them as well (This helps with later Writers in that they will assume the Reader Attribute Schema equivalent Writer Field Data Type saving time having to set them manually)

 

Now set the Dataset back to the Dynamic File Path

 

 

 

Now the FeatureReader has not only “learned” the Attributes to Expose, but the Data Type to assume for them as well.

 

 


Thanks for the explanation.

I read I have to manually define the dataset by adding a custom file (test.csv) to teach the reader what to expect.

When using the workbench without the tempPathnameCreator the schema did however automatically read the attributes correctly from the csv in the zip file. So in case when the source changes from the current version 4 (v20240801_v4_csv) to 5 I guess it will automatically reflect these changes too.

Or is this not the case?


Thanks for the explanation.

I read I have to manually define the dataset by adding a custom file (test.csv) to teach the reader what to expect.

When using the workbench without the tempPathnameCreator the schema did however automatically read the attributes correctly from the csv in the zip file. So in case when the source changes from the current version 4 (v20240801_v4_csv) to 5 I guess it will automatically reflect these changes too.

Or is this not the case?

 

Technically don’t “have to” use a sample CSV file to do initial setup of the FeatureReader.  You can go into the “Attributes to Expose” and put the CSV Fields/Attributes to expose in manually 1-by-1.  It is just easier to use a sample file to do this automatically using the “Import” button.

Once this is set, then the FeatureReader will keep these settings between runs, no matter what actual CSV file is used.

The reason it automatically does this for you if you hard-code an actual existing CSV file as the FeatureReader Dataset is that this triggers FeatureReader to open the existing CSV file, and it inspects the top rows to do an initial setup of the Attributes to Read/Expose for you as a helper step.   If instead you give FeatureReader a dynamic path for a file that does not yet exist, then FeatureReader cannot do this initial file opening to set up which Attributes to Read/Expose.  So….using a sample file allows the author to do a manual FeatureReader setup and store these settings to reuse for any CSV files that get created during the workspace run from the HTTPCaller download.  Once set though, it doesn’t matter what new versions of CSV file HTTPCaller downloads, so long as the new CSV files have the same fields/Attributes, then FeatureReader will read them assuming the same Attributes to Read/Expose as when they were last set.

 

 


Reply