Solved

HttpCaller download csv from response_body url

10 months ago
August 21, 2024
8 replies
254 views

ares
Contributor
11 replies

I'm still a newbe to FME considering below.

I'm trying to query open data from the Dutch https://public.ep-online.nl/api/v4/Mutatiebestand/DownloadInfo?fileType=csv (API-key required, available from https://epbdwebservices.rvo.nl/) to get the data from a csv-file. HttpCaller works, url obtained with JSONFragmenter (attributes to expose: 'downloadUrl’)+testfilter (to get the actual url).

However, I can't seem to use this url in an FeatureReader or further HTTPCaller to actually get the data itself from the csv-file. Copying the url (contains a token valid only one time) does download the file from a browser as a zip-file, but how can this be done in FME?

I tried to look at https://public.ep-online.nl/swagger/index.html but couldn't make anything out of it due to lack of technical knowhow.

Any ideas?

Setup as of now:

FeatureReader tried below (the response_body solely contains the url):

Also tried '@Value(_response_body)[*.csv]’ for parameter dataset, but also to no avail.

Another try with a second HTTPCaller:

Both don't return the zip of csv file containing over 1,000,000 items.

Best answer by nielsgerrits

What happens if you set the HTTPCaller’s “Save Response Body To” as a file, not as an attribute?

View original

Did this help you find an answer to your question?

+54

nielsgerrits
2844 replies
Best Answer
10 months ago
August 21, 2024

What happens if you set the HTTPCaller’s “Save Response Body To” as a file, not as an attribute?

ares
Author
Contributor
11 replies
10 months ago
August 21, 2024

Thanks. That did the trick!

I changed the second HTTPCaller to:
- ‘Request URL’ set to the attribute with the url. (here: _response_body)
- ‘Save Response Body’ to 'File’
- ‘Output filename’ saved to ‘file.zip’ (temp)
Then used the FeatureReader set to ‘Format’ CSV and Dataset set to: 'local path\file.zip[*.csv]’

Works like a charm.

+26

bwn
Evangelist
562 replies
10 months ago
August 21, 2024

Note, if you want the CSV file (or any HTTPCaller File) to be temporary and deleted after Translation, then can also use a TempPathNameCreator with it.

Any paths output by TempPathNameCreator, including Files, are deleted at the end of a Partial or Full run, which can help to keep things tidy when using an interim file external to the workspace.

ares
Author
Contributor
11 replies
10 months ago
August 22, 2024

I tried the TempPathnameCreator. It's a zip-file with 1 csv containing fieldnames in row 3 and data from row 4.

It does return the number of features, but not the actual data.

First I got a WARN: 'Detected excess data and/or delimiters in addition to the defined schema in line 5 with value ' <meta charset="utf-8" /> ' read from file 'D:\Temp\FME\wbrun_1724306402074_21096\fmetmp_b\1724312665909_1736_d\r_2601.csv'. Excess data will not be read. This can occur if the file is not uniform in the number of fields per row or if fields have been intentionally removed from the source feature type. In order to read the excess data, recreate the source feature type and set a larger value for the 'Maximum Rows to Scan' option to scan additional rows for fields. If the file contains field names, ensure that the field names line represents all of the columns in data rows, or enable the 'Scan for Additional Fields' option. This may also be caused by a misplaced field qualifier character. In this case, ensure that field qualifiers in the row are correct, or consider changing the Field Qualifier Character value’

Enlarged it to 10,000,000 to be sure (as highlighted below). WARN is then gone, but still it fails to show the scheme (log below).

What am I missing?

Log:

2024-08-22 10:07:42| 245.0| 0.0|INFORM|CSV reader: Opening dataset 'D:\Temp\FME\wbrun_1724306402074_21096\fmetmp_d\TempFS_1724313854706_56920\dsR_1_0\tor350.zip_0\v20240801_v4_csv.csv'

...

2024-08-22 10:11:43| 698.5| 2.3|INFORM|Merged 1 schema features read from 1 datasets into 1 resulting feature types
...

2024-08-22 10:13:04| 726.9| 1.3|STATS |Destination Feature Type Routing Correlator (RoutingFactory): Tested 1 input feature(s), wrote 1 output feature(s): 0 matched merge filters, 0 were routed to output, 1 could not be routed.
2024-08-22 10:13:04| 726.9| 0.0|STATS |Final Output Nuker (TeeFactory): Cloned 1 input feature(s) into 0 output feature(s)
2024-08-22 10:13:04| 726.9| 0.0|WARN |=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
2024-08-22 10:13:04| 726.9| 0.0|WARN | Features not written - verify feature type fanout/dynamic settings
2024-08-22 10:13:04| 726.9| 0.0|WARN |=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
2024-08-22 10:13:04| 726.9| 0.0|WARN | (Check that fanout/dynamic feature type attribute exists and has a value)
2024-08-22 10:13:04| 726.9| 0.0|WARN |=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
2024-08-22 10:13:04| 726.9| 0.0|WARN | 1
2024-08-22 10:13:04| 726.9| 0.0|WARN |==============================================================================
2024-08-22 10:13:04| 726.9| 0.0|WARN |Total features not written 1

+26

bwn
Evangelist
562 replies
10 months ago
August 22, 2024

ares wrote:

What am I missing?

Apologies, key step should probably outline for those doing for the first time is to first download a sample file to say C:\Temp. Even just do it with your Browser.

This is to give a sample to the FeatureReader to “teach” it to which Field Names to attempt to Read and Expose to the Workspace. I suspect your current Workspace is reading the attributes, but FeatureReader hasn’t been set to Expose the Attributes the Workspace Schema. This can be fixed manually with an AttributeExposer, but there are easier ways below.

I’m going to use a sample CSV URL from here:

https://support.staffbase.com/hc/en-us/article_attachments/360009197031/username.csv

We can “teach” the FeatureReader from a sample download of this CSV file, what CSV Field Names to Expose as named Attributes in the Workspace. As above, can manually do this with an AttributeExposer after the Reader, but is easier to set it up within the Reader where possible.

So, taking the FeatureReader, on the Generic Port Settings, we’ll give it a sample manually downloaded CSV file to learn from using the “Import” Function to import from the sample CSV file the expected Schema.

After this step, we’ve effectively told the FeatureReader which Attributes to expose to the Workspace.

Now, let’s put all the Transformer parts together to instead use dynamic CSV paths, in this case using the HTTPCaller _response_file_path dynamic attribute value, which itself is dynamically set by the TempPathnameCreator.

On the Generic Port, we get the following Exposed Attributes since that is what we setup using the sample manually downloaded CSV file + Import Schema->Expose Attributes step. It takes these settings and similarly attempts to expose these Attributes, if they exist, in the dynamic temporary CSV file (If they didn’t exist it doesn’t break the Workspace but the output here will instead say <Missing>)

+26

bwn
Evangelist
562 replies
10 months ago
August 22, 2024

Will also add to this if want the FeatureReader to learn what Data Type to expect to read for each Field, then there is an extra step in setting up the Attribute Data Typing.

It is similar to above, but first we are going to set the FeatureReader to directly read from the sample CSV file. This is a temporary override, once we are done this will be set back to the HTTPCaller _response_file_path

Now into “Parameters” we’ll set the Attribute Data Typing, by initially setting it to “Automatic” and then setting to “Manual”. After this, the FeatureReader will not only know which Attributes to Expose, but will also assume a Data Type for them as well (This helps with later Writers in that they will assume the Reader Attribute Schema equivalent Writer Field Data Type saving time having to set them manually)

Now set the Dataset back to the Dynamic File Path

Now the FeatureReader has not only “learned” the Attributes to Expose, but the Data Type to assume for them as well.

ares
Author
Contributor
11 replies
10 months ago
August 22, 2024

Thanks for the explanation.

I read I have to manually define the dataset by adding a custom file (test.csv) to teach the reader what to expect.

When using the workbench without the tempPathnameCreator the schema did however automatically read the attributes correctly from the csv in the zip file. So in case when the source changes from the current version 4 (v20240801_v4_csv) to 5 I guess it will automatically reflect these changes too.

Or is this not the case?

+26

bwn
Evangelist
562 replies
10 months ago
August 22, 2024

ares wrote:

Thanks for the explanation.

I read I have to manually define the dataset by adding a custom file (test.csv) to teach the reader what to expect.

Or is this not the case?

Technically don’t “have to” use a sample CSV file to do initial setup of the FeatureReader. You can go into the “Attributes to Expose” and put the CSV Fields/Attributes to expose in manually 1-by-1. It is just easier to use a sample file to do this automatically using the “Import” button.

Once this is set, then the FeatureReader will keep these settings between runs, no matter what actual CSV file is used.

The reason it automatically does this for you if you hard-code an actual existing CSV file as the FeatureReader Dataset is that this triggers FeatureReader to open the existing CSV file, and it inspects the top rows to do an initial setup of the Attributes to Read/Expose for you as a helper step. If instead you give FeatureReader a dynamic path for a file that does not yet exist, then FeatureReader cannot do this initial file opening to set up which Attributes to Read/Expose. So….using a sample file allows the author to do a manual FeatureReader setup and store these settings to reuse for any CSV files that get created during the workspace run from the HTTPCaller download. Once set though, it doesn’t matter what new versions of CSV file HTTPCaller downloads, so long as the new CSV files have the same fields/Attributes, then FeatureReader will read them assuming the same Attributes to Read/Expose as when they were last set.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

HttpCaller download csv from response_body url