Question

Best way to flatten a huge complex xml file and extract all data

  • 9 October 2019
  • 2 replies
  • 92 views

Badge

Hello,

 

 

I am dealing with XML files that contain more than 300 attributes deeply nested within lists, and the records present point data. The final goal is to extract and flatten all the data that is there and covert it further into a gdb. What I was doing so far was flatten the XML through the XML reader, and then use Attribute Exposer to expose as many lists and list elements as possible. However this is quite time consuming since every element contains lists that need to get exposed, and does not create a unique solution for all the XMLs that I have. Moreover, the extent is different in every XML, for example I choose to expose certain elements based on a reference XML file:

 

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{0}.ReferenceID

 

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{0}.ReferenceID.type

 

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{0}.ReferenceID.primary

 

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{1}.ReferenceID

 

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{1}.ReferenceID.type

 

Content.Base.Relationshiplist.Relationship.Referencelist.Reference{1}.ReferenceID.primary

 

 

 

However, another XML could have more than just two elements in this list, and with this definition those would not be included.

 

 

Is there way to extract/expose all list attributes from a flattened XML file?

 

 

Thank you in advance.

2 replies

Badge +16

Hi, when tackling moving-target XML I fall back to PythonCaller and xml.etree.ElementTree.

Hi, when tackling moving-target XML I fall back to PythonCaller and xml.etree.ElementTree.

HI Bruce

I am currently working on a project where I need to expose all elements of a .xml file and then write it to a database. This forms a small part of a larger process. Below a sample of my XML:

Currently, my strategy is a follows:

  1. XMLFragmenter (fragment all the different featuremember Types, in this case, Utiliteitsnet into separate fragments)
  2. AttributeFilter (filter all fragments into the different types of featuremembers)
  3. XMLFlattener (Flatten nested xml)
  4. AttributeExposer (Expose all the attributes that I need for the rest of the process)

See sample workspace below:

 

This works fine enough and I get the desired output however as soon as my .xml file gets larger than 50Mb, this "xml process" takes a long time and becomes the bottleneck in my process. Most of my .xml files are much larger.

Question:

I want to pursue the PythonCaller (xml.etree.Elementtree) option and parse the whole file to it's constituent elements and then use that output in the rest of the process. Do you think I will gain any significant performance increases?

Secondly, do you have any examples of how to implement this within FME. I have a small sample .xml file with a section of python code that outputs the desired value but I am struggling to get the output from the pythoncaller transformer.

Thanks in advance

Reply