Skip to main content
Question

How to extract data from website using HTTPCaller? (SRU) -> return is XML


Forum|alt.badge.img

Hi all,

FME beginner here.

 

I'm trying to process data from a Dutch government website.

 

One can request data using search keys in the URL (SRU I believe?) - the response is an XML.

e.g.

https://zoek.officielebekendmakingen.nl/sru/Search?version=1.2&operation;=searchRetrieve&x-connection;=oep&startRecord;=1&maximumRecords;=10&query;=title=%rotonde%

I'm trying to put in multiple search strings and process the results / output in FME.

What I have now:

Excel file with search strings connected to HTTPCaller.

HTTPCaller setup:

Request URL has "@Value(Search string)" referring to input Excel file

Output / errors:

*Edit

When inspecting the _response_body it seems I do have some XML data.

 

My next question, how to process this data? What transformers should I use next?

Many thanks,

Ed

16 replies

danilo_fme
Evangelist
Forum|alt.badge.img+45
  • Evangelist
  • June 27, 2017

Hi @edhere,

I tried to look this URL in my browser but its wrong.


danilo_fme
Evangelist
Forum|alt.badge.img+45
  • Evangelist
  • June 27, 2017
Hi @edhere, what kind of data do you like to make download?

 

Danilo

 


Forum|alt.badge.img
  • June 27, 2017

For me, its working with the browser and HTTPCaller (the _response_body attribute contains the returned XML).


mark2atsafe
Safer
Forum|alt.badge.img+44
When you say "no luck", what happens? Is there a crash? An error message? Or just a feature is output with no data? Can you post a screenshot of the transformer parameters, so that we can see what settings you are using? Thanks!

 


danilo_fme
Evangelist
Forum|alt.badge.img+45
  • Evangelist
  • June 27, 2017
danilo_fme wrote:

Hi @edhere,

I tried to look this URL in my browser but its wrong.

I tried it now in my machine and its works. :)

Forum|alt.badge.img
  • Author
  • June 28, 2017

Hi all,

Thanks for your responses. I have updated the start post with more info.

 

Hope this makes sense.

Thanks,

 

Ed

takashi
Influencer
  • June 28, 2017

Hi @edhere,

> What transformers should I use next?

Generally you can use the XMLFragmenter and/or the XMLFlattener to extract some values contained by an XML document as feature attributes. In some cases, the XMLXQueryExploder or the XMLXQueryExtractor could also be helpful. The concrete solution depends on how you need to interpret the XML document.


mygis
Supporter
Forum|alt.badge.img+13
  • Supporter
  • June 28, 2017

Hello @edhere , would you be able to let us know which data you are looking for in the xml? Would you be able to be specific? If it is one value you could extract the information using a regular expression, otherwise if it is more complex then it is better to consider it as an XML file and use xml handlng transformers. Those are the traansformers cited by @takashi


Forum|alt.badge.img
  • Author
  • June 28, 2017
Hi, gisinnovationsb

 

 

I've checked the XML contained in the _response_body, let's start with:

 

<dcterms:title>*randomtext*</dcterms:title>
 <url>*randomurl*</url> 
How would I extract the data in title and url?

 

 

Many thanks,

 

Ed

 


Forum|alt.badge.img
  • Author
  • June 28, 2017
mygis wrote:

Hello @edhere , would you be able to let us know which data you are looking for in the xml? Would you be able to be specific? If it is one value you could extract the information using a regular expression, otherwise if it is more complex then it is better to consider it as an XML file and use xml handlng transformers. Those are the traansformers cited by @takashi 

Hi, gisinnovationsb

 

 

I've checked the XML contained in the _response_body, let's start with:

 

<dcterms:title>*randomtext*</dcterms:title>
 <url>*randomurl*</url> 
How would I extract the data in title and url?

 

 

Many thanks,

 

Ed

 


takashi
Influencer
  • June 28, 2017
takashi wrote:

Hi @edhere,

> What transformers should I use next?

Generally you can use the XMLFragmenter and/or the XMLFlattener to extract some values contained by an XML document as feature attributes. In some cases, the XMLXQueryExploder or the XMLXQueryExtractor could also be helpful. The concrete solution depends on how you need to interpret the XML document.

If you need to extract the values of the descendant elements (e.g. <title>, <url>) of the <record> element for each record, the XMLFragmenter with this setting might help you.

 

Just be aware the transformer would also extract unexposed attributes other than title and url. You can use FME Data Inspector (Feature Information Window) to check all the attributes that the resulting feature contains.

 


mygis
Supporter
Forum|alt.badge.img+13
  • Supporter
  • June 28, 2017
mygis wrote:

Hello @edhere , would you be able to let us know which data you are looking for in the xml? Would you be able to be specific? If it is one value you could extract the information using a regular expression, otherwise if it is more complex then it is better to consider it as an XML file and use xml handlng transformers. Those are the traansformers cited by @takashi

hi @edhere,

 

 

Is this correct?

 

 


mygis
Supporter
Forum|alt.badge.img+13
  • Supporter
  • June 28, 2017
mygis wrote:
hi @edhere,

 

 

Is this correct?

 

 

The idea is to read the url from an xml reader and not the httpCaller. I am using FME 2016. Attached is the workspace.

 

 

 

When you click on the parameters button, you will be able to filter any node from the xml file you wish to gain access to.

 

 

 

 

 


mygis
Supporter
Forum|alt.badge.img+13
  • Supporter
  • June 28, 2017
mygis wrote:
hi @edhere,

 

 

Is this correct?

 

 

xml2none.fmw

 

 


mygis
Supporter
Forum|alt.badge.img+13
  • Supporter
  • June 28, 2017
edhere wrote:
Hi, gisinnovationsb

 

 

I've checked the XML contained in the _response_body, let's start with:

 

<dcterms:title>*randomtext*</dcterms:title>
 <url>*randomurl*</url> 
How would I extract the data in title and url?

 

 

Many thanks,

 

Ed

 

Answered above

 

 


Forum|alt.badge.img+2

@edhere Ed - the approach you take really does depend on what data you want to extract. But the general steps are:

  • use the approach you already have to read your query from Excel.
  • determine the the XML node that you want to split your records - it looks like it would be either:
    • searchRetrieveResponse/records/record or
    • searchRetrieveResponse/records/record/recordData

Tip: if you don't know the XML very well then add the XML reader and use the XML Elements to Match reader tree view to browse the XML to find the appropriate tag:

cut and paste the Selected Items. Once you have the selected item, cancel everything (i.e. don't actually add the XML reader to the workspace)

  • use either HTTPCaller (with XMLFragmenter) OR use the FeatureReader - I think I'd suggest FeatureReader
    • FeatureReader:
      • add the XML reader, Dataset: <attribute with URL>,
      • Parameters: Elements to Match: <selected items>, i.e. searchRetrieveResponse/records/record,
      • Flatten Options: Enable Flattening
  • HTTPCaller & XMLFragmenter will be more or less the same.

Example Workspace attached: xmlreader.fmw

There's a pretty good XML Tutorial on the KnowledgeCentre that covers many of these topics..


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings