Skip to main content

I have tried to meka a flat file out of an xml file but I can't get the parent data onto every row of the child data. How can it be done?

here is snippet of the xml file:

<?xml version="1.0" encoding="UTF-8"?>
<event Type="Relay">
<eventname>Lappee-Jukola 2016, Jukolan Viesti</eventname>
<class>
<classname>JU</classname>
<team>
<teamid>9</teamid>
<teamname>Koovee</teamname>
<teamnro>1</teamnro>
<result>8:03:46</result>
<tsecs>29026</tsecs>
<placement>1</placement>
<leg>
<legnro>1</legnro>
<nm>Topi Anjala</nm>

,....

so I'd need to get all this data on the same row for each result:

eventname, teamid, teamname, teamnro, result, tsecs, placement, legnro, nm

Jukola,9,Koovee,1,8:03:46,29026,1,1,Topi Anjala

You can use the XML reader, just remember to specify the "event" tag as the root level and then activate flattening. Example:

The resulting feature will look like this:


This does not work for me! I only get the event! All others are lost (only included as xml Fragment)


This does not work for me! I only get the event! All others are lost (only included as xml Fragment)

Are you sure you ticked the "Enable Flattening" box? That's the one that should pull out all the attributes.

Can you post your XML file here so we can have a closer look? What version of FME are you using?


This does not work for me! I only get the event! All others are lost (only included as xml Fragment)

I have FME 2016 and it does not seem to do that. It only creates list elements.


I have FME 2016 and it does not seem to do that. It only creates list elements.

My example (above) does not contain list elements because the sample dataset did not contain more than one child element per parent. If your data has several teams (child) per event (parent), then you will get lists.

You can use a ListExploder or a ListIndexer to expand your lists.


Are you sure you ticked the "Enable Flattening" box? That's the one that should pull out all the attributes.

Can you post your XML file here so we can have a closer look? What version of FME are you using?

Unfortunately it is too big to upload even zipped, but here it can be downloaded: http://results.jukola.com/tulokset/results_j2016_ju.xml


My example (above) does not contain list elements because the sample dataset did not contain more than one child element per parent. If your data has several teams (child) per event (parent), then you will get lists.

You can use a ListExploder or a ListIndexer to expand your lists.

Ok, I see. I tried the Listexploder but it was very slow and would "never" have finished


The whole xml file can be found here. I would appreciate it if you can try it out!

http://results.jukola.com/tulokset/results_j2016_ju.xml


Ok, I see. I tried the Listexploder but it was very slow and would "never" have finished

Make sure to remove as many unnecessary attributes as possible before the ListExploder. But yeah, it's known for being slow for huge datasets as it eats up memory real fast.

If the log contains a lot of messages about optimizing memory, you could maybe also consider trying the 64-bit version of FME.


Hi @jan

so I'd need to get all this data on the same row for each result

That's the part I'm wondering about. Do you mean the "result" tag? Maybe a record per team? I mean I'm looking at the file and there is only a single event tag in there, so using that as the Element to Match isn't going to do much. You'll just get the entire XML file stored as a list element, which - as you found - is not particularly helpful.

In fact, as an aside, I can't even open this in IE without it crashing, so it must be a fairly complex file.

Anyway, it depends what level of detail you want. If you use "team" as the Element to Match then you get this structure (1716 records):

placementresultteamidteamnameteamnrotsecs18:03:469Koovee129026

If you want the individual leg times for each team, that's where you need to use the ListExploder. Then you get a record per leg for each team (12012 records). For me that runs in about 40 seconds.

In short, change the Element to Match to "team" and then use the ListExploder on the leg list.

Hope that helps.

Mark


I think David's suggestion (read/flatten 'event' element with the XML reader and retrieve required features with the ListExploder) is a way to go. The performance can be improved by removing unnecessary attributes before exploding, as David mentioned.

Alternatively, you can read <event>, <team> and <leg> elements separately without flattening unnecessary sub-elements, and then merge them to form required records. The advantage of this method is that unnecessary sub-elements (i.e. <control>) will not be flattened. The disadvantage is that <team>, <leg> elements will be fragmented wastefully. However, I expect the performance would be better on the whole.

In this example, I assume that an XML document contains a single event data, so I've exposed 'fme_dataset' in the reader feature types and used it as join key for merging.


I think David's suggestion (read/flatten 'event' element with the XML reader and retrieve required features with the ListExploder) is a way to go. The performance can be improved by removing unnecessary attributes before exploding, as David mentioned.

Alternatively, you can read <event>, <team> and <leg> elements separately without flattening unnecessary sub-elements, and then merge them to form required records. The advantage of this method is that unnecessary sub-elements (i.e. <control>) will not be flattened. The disadvantage is that <team>, <leg> elements will be fragmented wastefully. However, I expect the performance would be better on the whole.

In this example, I assume that an XML document contains a single event data, so I've exposed 'fme_dataset' in the reader feature types and used it as join key for merging.

Looking good! thanks


I also have to parse an XML document. The suggestions below work when I am reading from a file, but I am reading this from a WFS page where I have to login. I am doing this with the HTTPCaller and then passing the _response_body to an XMLFlattener. I am using the same path to the tags that I use when reading from a file and have enabled flattening, but when reading the _response_body it only reads the first FeatureType tag. How can I get it to read all the tags? Here is a sample of the XML:

<FeatureTypeList><Operations><Query/></Operations><FeatureType><Name>layer1</Name><Title>Health care</Title><SRS>EPSG:4326</SRS></FeatureType><FeatureType><Name>layer1</Name><Title>Facilities</Title><SRS>EPSG:4326</SRS></FeatureType></FeatureTypeList>

 


Reply