Hi all,FME beginner here. I'm trying to process data from a Dutch government website. One can request data using search keys in the URL (SRU I believe?) - the response is an XML. e.g. https://zoek.officielebekendmakingen.nl/sru/Search?version=1.2&operation;=searchRetrieve&x-connection;=oep&startRecord;=1&maximumRecords;=10&query;=title=%rotonde%I'm trying to put in multiple search strings and process the results / output in FME.What I have now:Excel file with search strings connected to HTTPCaller.HTTPCaller setup:Request URL has "@Value(Search string)" referring to input Excel fileOutput / errors:*EditWhen inspecting the _response_body it seems I do have some XML data. My next question, how to process this data? What transformers should I use next?Many thanks,Ed

How to extract data from website using HTTPCaller? (SRU) -> return is XML

Userlevel 4

+30

danilo_fme
Evangelist
1874 replies
6 years ago
27 June 2017

Hi @edhere,

I tried to look this URL in my browser but its wrong.

Userlevel 4

+30

danilo_fme
Evangelist
1874 replies
6 years ago
27 June 2017

Hi @edhere, what kind of data do you like to make download?

Danilo

L

For me, its working with the browser and HTTPCaller (the _response_body attribute contains the returned XML).

Userlevel 4

+25

When you say "no luck", what happens? Is there a crash? An error message? Or just a feature is output with no data? Can you post a screenshot of the transformer parameters, so that we can see what settings you are using? Thanks!

Userlevel 4

+30

danilo_fme
Evangelist
1874 replies
6 years ago
27 June 2017

Hi @edhere,

I tried to look this URL in my browser but its wrong.

I tried it now in my machine and its works. :)

E

edhere
Author
60 replies
6 years ago
28 June 2017

Hi all,

Thanks for your responses. I have updated the start post with more info.

Hope this makes sense.

Thanks,

Ed

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
28 June 2017

Hi @edhere,

> What transformers should I use next?

Generally you can use the XMLFragmenter and/or the XMLFlattener to extract some values contained by an XML document as feature attributes. In some cases, the XMLXQueryExploder or the XMLXQueryExtractor could also be helpful. The concrete solution depends on how you need to interpret the XML document.

M

+2

Hello @edhere , would you be able to let us know which data you are looking for in the xml? Would you be able to be specific? If it is one value you could extract the information using a regular expression, otherwise if it is more complex then it is better to consider it as an XML file and use xml handlng transformers. Those are the traansformers cited by @takashi

E

edhere
Author
60 replies
6 years ago
28 June 2017

Hi, gisinnovationsb

I've checked the XML contained in the _response_body, let's start with:

<dcterms:title>*randomtext*</dcterms:title>

 <url>*randomurl*</url>

How would I extract the data in title and url?

Many thanks,

Ed

E

edhere
Author
60 replies
6 years ago
28 June 2017

Hello @edhere , would you be able to let us know which data you are looking for in the xml? Would you be able to be specific? If it is one value you could extract the information using a regular expression, otherwise if it is more complex then it is better to consider it as an XML file and use xml handlng transformers. Those are the traansformers cited by @takashi

Hi, gisinnovationsb

I've checked the XML contained in the _response_body, let's start with:

<dcterms:title>*randomtext*</dcterms:title>

 <url>*randomurl*</url>

How would I extract the data in title and url?

Many thanks,

Ed

Userlevel 2

+17

takashi
Contributor
7538 replies
6 years ago
28 June 2017

Hi @edhere,

> What transformers should I use next?

Generally you can use the XMLFragmenter and/or the XMLFlattener to extract some values contained by an XML document as feature attributes. In some cases, the XMLXQueryExploder or the XMLXQueryExtractor could also be helpful. The concrete solution depends on how you need to interpret the XML document.

If you need to extract the values of the descendant elements (e.g. <title>, <url>) of the <record> element for each record, the XMLFragmenter with this setting might help you.

Just be aware the transformer would also extract unexposed attributes other than title and url. You can use FME Data Inspector (Feature Information Window) to check all the attributes that the resulting feature contains.

M

+2

Hello @edhere , would you be able to let us know which data you are looking for in the xml? Would you be able to be specific? If it is one value you could extract the information using a regular expression, otherwise if it is more complex then it is better to consider it as an XML file and use xml handlng transformers. Those are the traansformers cited by @takashi

hi @edhere,

Is this correct?

M

+2

hi @edhere,

Is this correct?

The idea is to read the url from an xml reader and not the httpCaller. I am using FME 2016. Attached is the workspace.

When you click on the parameters button, you will be able to filter any node from the xml file you wish to gain access to.

M

+2

hi @edhere,

Is this correct?

xml2none.fmw

M

+2

Hi, gisinnovationsb

I've checked the XML contained in the _response_body, let's start with:

<dcterms:title>*randomtext*</dcterms:title>

 <url>*randomurl*</url>

How would I extract the data in title and url?

Many thanks,

Ed

Answered above

M

+2

@edhere Ed - the approach you take really does depend on what data you want to extract. But the general steps are:

use the approach you already have to read your query from Excel.
determine the the XML node that you want to split your records - it looks like it would be either:
- searchRetrieveResponse/records/record or
- searchRetrieveResponse/records/record/recordData

Tip: if you don't know the XML very well then add the XML reader and use the XML Elements to Match reader tree view to browse the XML to find the appropriate tag:

cut and paste the Selected Items. Once you have the selected item, cancel everything (i.e. don't actually add the XML reader to the workspace)

use either HTTPCaller (with XMLFragmenter) OR use the FeatureReader - I think I'd suggest FeatureReader
- FeatureReader:
  - add the XML reader, Dataset: <attribute with URL>,
  - Parameters: Elements to Match: <selected items>, i.e. searchRetrieveResponse/records/record,
  - Flatten Options: Enable Flattening
HTTPCaller & XMLFragmenter will be more or less the same.

Example Workspace attached: xmlreader.fmw

There's a pretty good XML Tutorial on the KnowledgeCentre that covers many of these topics..

How to extract data from website using HTTPCaller? (SRU) -> return is XML

16 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded