Skip to main content
Solved

how to download & process a csv from the web, but which url changes daily based on the date


opowell
Forum|alt.badge.img

Hi,

I am trying to automate a process whereby I download a CSV from an opendata site (a list of schools from our UK Department for Education) and process it to create a shapefile for the area I'm interested only. I can download the data manually then run my workflow which creates the geometry and clips to my area, etc... but the automated download via the url is the bit I'm struggling with.

The source data is uploaded daily and posted as a link which url is dynamic to reflect today's date. This is today's link: http://www.education.gov.uk/edubase/edubasealldata20170502.csv

I wanted to create the workflow and run the it every week or month as scheduled task but it needs to know the date it's running.

I've tried to use :

1. CSV reader with the url in the path -it works fine when all as above. Then I amended it in the text editor to replace the date with @Timestamp("^Y^m^d") * . It reads the schema and have all fields / columns "green" but fails on the execution = It says it can't read the url.

LOG = "CSV Reader: Failed to read the specified file(s) in the folder 'http://www.education.gov.uk/edubase/edubasealldata@Timestamp(^Y^m^d).csv'"

* I also tried multiple combination (without ", with @value first...etc) - with no luck

2. HTTPCaller - with the same url and combination - it reads all the schema / fields but keep it red and fail in execution.

3. HTTPCaller using the URL as a source from XLS file (dynamic and as text) ... no luck

4. FeatureReader with the same URL fields from XLS source file as above - no luck.

I've successfully managed to use timestamp for feature writers but this is beyond me.

Can anyone help me?

Many many thanks.

Best answer by david_r

This worked nicely for me:

View original
Did this help you find an answer to your question?

14 replies

ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • May 2, 2017

This flow with the featurereader works for me


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • May 2, 2017

I was able to dynamically obtain the correct file by generating an attribute http://www.education.gov.uk/edubase/edubasealldata@DateTimeFormat(@DateTimeNow(),%Y%m%d).csv

and then using a FeatureReader CSV format with the dataset pointing to that attribute.


david_r
Evangelist
  • Best Answer
  • May 2, 2017

This worked nicely for me:


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • May 2, 2017
ebygomm wrote:

This flow with the featurereader works for me

Your stated method for generating the filename works for me also, however

 

 

 


opowell
Forum|alt.badge.img
  • Author
  • May 2, 2017
ebygomm wrote:

This flow with the featurereader works for me

Hi egomm, please can you let me know what process / transformer you have before the DateFormatter transformer? Thanks

 

 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • May 2, 2017
opowell wrote:
Hi egomm, please can you let me know what process / transformer you have before the DateFormatter transformer? Thanks

 

 

It was a creator with an attribute named today with attribute value today

 

But a timestamper without a date formatter works exactly the same

 

 


opowell
Forum|alt.badge.img
  • Author
  • May 2, 2017
david_r wrote:

This worked nicely for me:

Hi david_r. I replicated your workflow and although I have the right number of features going across, there is no data / fields / column. The schema is empty.

 

 


opowell
Forum|alt.badge.img
  • Author
  • May 2, 2017
ebygomm wrote:
It was a creator with an attribute named today with attribute value today

 

But a timestamper without a date formatter works exactly the same

 

 

I get the right number of features through to the inspector but no data / schema when previewed or save in a feature writer though.

 

Are you able to preview the data in Inspector?

 

 


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • May 2, 2017
opowell wrote:

Hi david_r. I replicated your workflow and although I have the right number of features going across, there is no data / fields / column. The schema is empty.

 

 

If you look at the individual features, you should see all the attributes.

 

 

When using the generic port on the featureReader, you need to expose the attributes explicitly to access them in the workspace, or see them in the table view of the inspector.

 

 


david_r
Evangelist
  • May 2, 2017
opowell wrote:

Hi david_r. I replicated your workflow and although I have the right number of features going across, there is no data / fields / column. The schema is empty.

 

 

If you click on one of the lines in <no schema> you will see the attributes in the Feature Information window on the right. You will have to manually expose the attributes in the FeatureReader for them to be available (visible) in the workspace.

 

 


ebygomm
Influencer
Forum|alt.badge.img+32
  • Influencer
  • May 2, 2017
opowell wrote:
I get the right number of features through to the inspector but no data / schema when previewed or save in a feature writer though.

 

Are you able to preview the data in Inspector?

 

 

Well I was too lazy to type in all the attributes to expose so just exposed the URN, you won't see the data in the inspector without exposing the attributes

opowell
Forum|alt.badge.img
  • Author
  • May 3, 2017

Thanks all for your help. Now I exposed the attributes - it works. Lovely! :-)


itay
Supporter
Forum|alt.badge.img+17
  • Supporter
  • May 3, 2017
opowell wrote:

Thanks all for your help. Now I exposed the attributes - it works. Lovely! :-)

@opowell, fyi

 

If the data model (schema) of the input csv is static than using the option One Per Feature Type in the Output Ports section of the FeatureReader transformer is a better option since then the attributes will be automatically exposed and no manual exposing is necessary.

 


takashi
Influencer
  • May 3, 2017

Alternatively, you can also define a scripted parameter to construct the URL at runtime. The URL string constructed by a scripted parameter can be linked to the dataset parameter of CSV reader or FeatureReader.

# Python Scripted Parameter Example
# Returns a URL string containing timestamp (YYYYmmdd).
from datetime import datetime
t = datetime.strftime(datetime.today(), '%Y%m%d')
return 'http://www.education.gov.uk/edubase/edubasealldata%s.csv' % t

Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings