Question

Best way to flatten a huge complex xml file and extract all data

5 years ago
October 9, 2019
2 replies
641 views

draganasubotic1
8 replies

Hello,

I am dealing with XML files that contain more than 300 attributes deeply nested within lists, and the records present point data. The final goal is to extract and flatten all the data that is there and covert it further into a gdb. What I was doing so far was flatten the XML through the XML reader, and then use Attribute Exposer to expose as many lists and list elements as possible. However this is quite time consuming since every element contains lists that need to get exposed, and does not create a unique solution for all the XMLs that I have. Moreover, the extent is different in every XML, for example I choose to expose certain elements based on a reference XML file:

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{0}.ReferenceID

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{0}.ReferenceID.type

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{0}.ReferenceID.primary

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{1}.ReferenceID

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{1}.ReferenceID.type

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{1}.ReferenceID.primary

However, another XML could have more than just two elements in this list, and with this definition those would not be included.

Is there way to extract/expose all list attributes from a flattened XML file?

Thank you in advance.

+17

bruceharold
Contributor
338 replies
5 years ago
October 9, 2019

Hi, when tackling moving-target XML I fall back to PythonCaller and xml.etree.ElementTree.

duvenagep
1 reply
5 years ago
January 13, 2020

bruceharold wrote:

Hi, when tackling moving-target XML I fall back to PythonCaller and xml.etree.ElementTree.

HI Bruce

I am currently working on a project where I need to expose all elements of a .xml file and then write it to a database. This forms a small part of a larger process. Below a sample of my XML:

Currently, my strategy is a follows:

XMLFragmenter (fragment all the different featuremember Types, in this case, Utiliteitsnet into separate fragments)
AttributeFilter (filter all fragments into the different types of featuremembers)
XMLFlattener (Flatten nested xml)
AttributeExposer (Expose all the attributes that I need for the rest of the process)

See sample workspace below:

This works fine enough and I get the desired output however as soon as my .xml file gets larger than 50Mb, this "xml process" takes a long time and becomes the bottleneck in my process. Most of my .xml files are much larger.

Question:

I want to pursue the PythonCaller (xml.etree.Elementtree) option and parse the whole file to it's constituent elements and then use that output in the rest of the process. Do you think I will gain any significant performance increases?

Secondly, do you have any examples of how to implement this within FME. I have a small sample .xml file with a section of python code that outputs the desired value but I am struggling to get the output from the pythoncaller transformer.

Thanks in advance

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Best way to flatten a huge complex xml file and extract all data