Skip to main content

I have some XML data in the structure

<a>
 <b>
  <e>data1</e>
  <e>data2</e>
 </b>
 <c>
  <e>data3</e> 
  <e>data4</e>
 </c> 
 <d>
  <e>data5</e>
 </d>
</a>

I would like one feature for every 'e' element, which is simple enough using Feature Paths by setting the elements to match parameter to e on the XML Reader.

 

 

What I haven't been able to figure out is how to indicate which parent it belongs to.  (data1, data2 is b, data3,data4 is c,  data5 is d).

 

 

Adding Ancestor Elements (Parent) in the XML Flatten Options doesn't help because there are no attributes associated with b,d,e.

 

 

Even if if set the elements to match to a/b/e a/c/e a/d/e explicitly rather than just e, the xml_matched_element is still just e.

 

I can't guarantee the order of the parent nodes, so using the xml_id is not an option either.

 

I don't think it's possible to extract name of parent node (b, c, d) with the feature paths configuration for extracting child elements (e) as features.

A possible workaround I can think of is: if you configure feature paths to extract the parent nodes as features, their names are extracted as "fme_feature_type". You can then perform fragmentation and flattening on the child elements (e) with the XMLFragmenter, preserving "fme_feature_type".


I don't think it's possible to extract name of parent node (b, c, d) with the feature paths configuration for extracting child elements (e) as features.

A possible workaround I can think of is: if you configure feature paths to extract the parent nodes as features, their names are extracted as "fme_feature_type". You can then perform fragmentation and flattening on the child elements (e) with the XMLFragmenter, preserving "fme_feature_type".

That is the route I ended up taking. I was hoping for something better, as that requires manually exposing the flattened attributes of e, of which there are over 100.


I don't think it's possible to extract name of parent node (b, c, d) with the feature paths configuration for extracting child elements (e) as features.

A possible workaround I can think of is: if you configure feature paths to extract the parent nodes as features, their names are extracted as "fme_feature_type". You can then perform fragmentation and flattening on the child elements (e) with the XMLFragmenter, preserving "fme_feature_type".

If you are looking for a way to expose flattened attributes easily, configure the XML reader to expose required attributes when adding it to the workspace (i.e. set "e" to the Elements to Match parameter and enable flattening). You can modify those parameters on the Navigator later.


Hi @jdh,

If you match on the top level <a> in the XML reader, and enable flattening, each record will be an attribute, with the path to the data is stored in the attribute name. You can use a AttributeExploder to split each record into its own feature, with the path in attr_name and the record data in attr_value.

I am attaching a simple workspace to illustrate.

readxmlpaths.fmw


Hi @jdh,

If you match on the top level <a> in the XML reader, and enable flattening, each record will be an attribute, with the path to the data is stored in the attribute name. You can use a AttributeExploder to split each record into its own feature, with the path in attr_name and the record data in attr_value.

I am attaching a simple workspace to illustrate.

readxmlpaths.fmw

It's an interesting concept, it won't work well in my particular case as the the <e> nodes have a several variables, and the <a> nodes have dozens of other subnodes that are irrelevant.

I would have to filter out hundreds of unneccesary features after the attribute exploder (or attribute remover before) and then manipulate the _attr_name attribute to give me something I can recombine them on. That would be blocking as the AttributeExploder does not seem to output features in any logical manner.

 

 

Takashi's suggestion seems like the better route.

Reply