Solved

how to download & process a csv from the web, but which url changes daily based on the date


Badge

Hi,

I am trying to automate a process whereby I download a CSV from an opendata site (a list of schools from our UK Department for Education) and process it to create a shapefile for the area I'm interested only. I can download the data manually then run my workflow which creates the geometry and clips to my area, etc... but the automated download via the url is the bit I'm struggling with.

The source data is uploaded daily and posted as a link which url is dynamic to reflect today's date. This is today's link: http://www.education.gov.uk/edubase/edubasealldata20170502.csv

I wanted to create the workflow and run the it every week or month as scheduled task but it needs to know the date it's running.

I've tried to use :

1. CSV reader with the url in the path -it works fine when all as above. Then I amended it in the text editor to replace the date with @Timestamp("^Y^m^d") * . It reads the schema and have all fields / columns "green" but fails on the execution = It says it can't read the url.

LOG = "CSV Reader: Failed to read the specified file(s) in the folder 'http://www.education.gov.uk/edubase/edubasealldata@Timestamp(^Y^m^d).csv'"

* I also tried multiple combination (without ", with @value first...etc) - with no luck

2. HTTPCaller - with the same url and combination - it reads all the schema / fields but keep it red and fail in execution.

3. HTTPCaller using the URL as a source from XLS file (dynamic and as text) ... no luck

4. FeatureReader with the same URL fields from XLS source file as above - no luck.

I've successfully managed to use timestamp for feature writers but this is beyond me.

Can anyone help me?

Many many thanks.

icon

Best answer by david_r 2 May 2017, 16:23

View original

14 replies

Userlevel 1
Badge +10

This flow with the featurereader works for me

Badge +22

I was able to dynamically obtain the correct file by generating an attribute http://www.education.gov.uk/edubase/edubasealldata@DateTimeFormat(@DateTimeNow(),%Y%m%d).csv

and then using a FeatureReader CSV format with the dataset pointing to that attribute.

Userlevel 4

This worked nicely for me:

Userlevel 1
Badge +10

This flow with the featurereader works for me

Your stated method for generating the filename works for me also, however

 

 

 

Badge

This flow with the featurereader works for me

Hi egomm, please can you let me know what process / transformer you have before the DateFormatter transformer? Thanks

 

 

Userlevel 1
Badge +10
Hi egomm, please can you let me know what process / transformer you have before the DateFormatter transformer? Thanks

 

 

It was a creator with an attribute named today with attribute value today

 

But a timestamper without a date formatter works exactly the same

 

 

Badge

This worked nicely for me:

Hi david_r. I replicated your workflow and although I have the right number of features going across, there is no data / fields / column. The schema is empty.

 

 

Badge
It was a creator with an attribute named today with attribute value today

 

But a timestamper without a date formatter works exactly the same

 

 

I get the right number of features through to the inspector but no data / schema when previewed or save in a feature writer though.

 

Are you able to preview the data in Inspector?

 

 

Badge +22

Hi david_r. I replicated your workflow and although I have the right number of features going across, there is no data / fields / column. The schema is empty.

 

 

If you look at the individual features, you should see all the attributes.

 

 

When using the generic port on the featureReader, you need to expose the attributes explicitly to access them in the workspace, or see them in the table view of the inspector.

 

 

Userlevel 4

Hi david_r. I replicated your workflow and although I have the right number of features going across, there is no data / fields / column. The schema is empty.

 

 

If you click on one of the lines in <no schema> you will see the attributes in the Feature Information window on the right. You will have to manually expose the attributes in the FeatureReader for them to be available (visible) in the workspace.

 

 

Userlevel 1
Badge +10
I get the right number of features through to the inspector but no data / schema when previewed or save in a feature writer though.

 

Are you able to preview the data in Inspector?

 

 

Well I was too lazy to type in all the attributes to expose so just exposed the URN, you won't see the data in the inspector without exposing the attributes
Badge

Thanks all for your help. Now I exposed the attributes - it works. Lovely! :-)

Badge +16

Thanks all for your help. Now I exposed the attributes - it works. Lovely! :-)

@opowell, fyi

 

If the data model (schema) of the input csv is static than using the option One Per Feature Type in the Output Ports section of the FeatureReader transformer is a better option since then the attributes will be automatically exposed and no manual exposing is necessary.

 

Userlevel 2
Badge +17

Alternatively, you can also define a scripted parameter to construct the URL at runtime. The URL string constructed by a scripted parameter can be linked to the dataset parameter of CSV reader or FeatureReader.

# Python Scripted Parameter Example
# Returns a URL string containing timestamp (YYYYmmdd).
from datetime import datetime
t = datetime.strftime(datetime.today(), '%Y%m%d')
return 'http://www.education.gov.uk/edubase/edubasealldata%s.csv' % t

Reply