Question

How to extract data from website using HTTPCaller? (SRU) -> return is XML


Badge

Hi all,

FME beginner here.

 

I'm trying to process data from a Dutch government website.

 

One can request data using search keys in the URL (SRU I believe?) - the response is an XML.

e.g.

https://zoek.officielebekendmakingen.nl/sru/Search?version=1.2&operation;=searchRetrieve&x-connection;=oep&startRecord;=1&maximumRecords;=10&query;=title=%rotonde%

I'm trying to put in multiple search strings and process the results / output in FME.

What I have now:

Excel file with search strings connected to HTTPCaller.

HTTPCaller setup:

Request URL has "@Value(Search string)" referring to input Excel file

Output / errors:

*Edit

When inspecting the _response_body it seems I do have some XML data.

 

My next question, how to process this data? What transformers should I use next?

Many thanks,

Ed


16 replies

Userlevel 4
Badge +30

Hi @edhere,

I tried to look this URL in my browser but its wrong.

Userlevel 4
Badge +30
Hi @edhere, what kind of data do you like to make download?

 

Danilo

 

Badge

For me, its working with the browser and HTTPCaller (the _response_body attribute contains the returned XML).

Userlevel 4
Badge +25
When you say "no luck", what happens? Is there a crash? An error message? Or just a feature is output with no data? Can you post a screenshot of the transformer parameters, so that we can see what settings you are using? Thanks!

 

Userlevel 4
Badge +30

Hi @edhere,

I tried to look this URL in my browser but its wrong.

I tried it now in my machine and its works. :)
Badge

Hi all,

Thanks for your responses. I have updated the start post with more info.

 

Hope this makes sense.

Thanks,

 

Ed
Userlevel 2
Badge +17

Hi @edhere,

> What transformers should I use next?

Generally you can use the XMLFragmenter and/or the XMLFlattener to extract some values contained by an XML document as feature attributes. In some cases, the XMLXQueryExploder or the XMLXQueryExtractor could also be helpful. The concrete solution depends on how you need to interpret the XML document.

Badge +2

Hello @edhere , would you be able to let us know which data you are looking for in the xml? Would you be able to be specific? If it is one value you could extract the information using a regular expression, otherwise if it is more complex then it is better to consider it as an XML file and use xml handlng transformers. Those are the traansformers cited by @takashi

Badge
Hi, gisinnovationsb

 

 

I've checked the XML contained in the _response_body, let's start with:

 

<dcterms:title>*randomtext*</dcterms:title>
 <url>*randomurl*</url> 
How would I extract the data in title and url?

 

 

Many thanks,

 

Ed

 

Badge

Hello @edhere , would you be able to let us know which data you are looking for in the xml? Would you be able to be specific? If it is one value you could extract the information using a regular expression, otherwise if it is more complex then it is better to consider it as an XML file and use xml handlng transformers. Those are the traansformers cited by @takashi 

Hi, gisinnovationsb

 

 

I've checked the XML contained in the _response_body, let's start with:

 

<dcterms:title>*randomtext*</dcterms:title>
 <url>*randomurl*</url> 
How would I extract the data in title and url?

 

 

Many thanks,

 

Ed

 

Userlevel 2
Badge +17

Hi @edhere,

> What transformers should I use next?

Generally you can use the XMLFragmenter and/or the XMLFlattener to extract some values contained by an XML document as feature attributes. In some cases, the XMLXQueryExploder or the XMLXQueryExtractor could also be helpful. The concrete solution depends on how you need to interpret the XML document.

If you need to extract the values of the descendant elements (e.g. <title>, <url>) of the <record> element for each record, the XMLFragmenter with this setting might help you.

 

Just be aware the transformer would also extract unexposed attributes other than title and url. You can use FME Data Inspector (Feature Information Window) to check all the attributes that the resulting feature contains.

 

Badge +2

Hello @edhere , would you be able to let us know which data you are looking for in the xml? Would you be able to be specific? If it is one value you could extract the information using a regular expression, otherwise if it is more complex then it is better to consider it as an XML file and use xml handlng transformers. Those are the traansformers cited by @takashi

hi @edhere,

 

 

Is this correct?

 

 

Badge +2
hi @edhere,

 

 

Is this correct?

 

 

The idea is to read the url from an xml reader and not the httpCaller. I am using FME 2016. Attached is the workspace.

 

 

 

When you click on the parameters button, you will be able to filter any node from the xml file you wish to gain access to.

 

 

 

 

 

Badge +2
hi @edhere,

 

 

Is this correct?

 

 

xml2none.fmw

 

 

Badge +2
Hi, gisinnovationsb

 

 

I've checked the XML contained in the _response_body, let's start with:

 

<dcterms:title>*randomtext*</dcterms:title>
 <url>*randomurl*</url> 
How would I extract the data in title and url?

 

 

Many thanks,

 

Ed

 

Answered above

 

 

Badge +2

@edhere Ed - the approach you take really does depend on what data you want to extract. But the general steps are:

  • use the approach you already have to read your query from Excel.
  • determine the the XML node that you want to split your records - it looks like it would be either:
    • searchRetrieveResponse/records/record or
    • searchRetrieveResponse/records/record/recordData

Tip: if you don't know the XML very well then add the XML reader and use the XML Elements to Match reader tree view to browse the XML to find the appropriate tag:

cut and paste the Selected Items. Once you have the selected item, cancel everything (i.e. don't actually add the XML reader to the workspace)

  • use either HTTPCaller (with XMLFragmenter) OR use the FeatureReader - I think I'd suggest FeatureReader
    • FeatureReader:
      • add the XML reader, Dataset: <attribute with URL>,
      • Parameters: Elements to Match: <selected items>, i.e. searchRetrieveResponse/records/record,
      • Flatten Options: Enable Flattening
  • HTTPCaller & XMLFragmenter will be more or less the same.

Example Workspace attached: xmlreader.fmw

There's a pretty good XML Tutorial on the KnowledgeCentre that covers many of these topics..

Reply