Question

Compare XML Datasets

9 years ago
26 May 2014
11 replies
28 views

entombedtrader
4 replies

Hi,

I have an interesting analytical type question that I am trying to solve in FME. I have two (or more) XML files which I need to compare for differing values in the same datatype and report on them. I have created a CSV file with a vertical datamodel of the structure

PrimaryID, Datafile, AttributeName, AttributeValue

What I need to accomplish is having FME report

Where PrimaryID = PrimaryID and DataFile <> Datafile and AttributeName = AttributeName and AttributeValue <> Attribute Value

Something like this

PrimaryID, Datafile, AttributeName, AttributeValue

1001, FileA, Engine, 4Cylinder

1001, FileA, Colour, Blue

1001, FileB, Engine, 6 Cylinder

1001, FileB, Colour, Blue

1002, FileA, Engine, 4Cylinder

1002, FileA, Colour, Blue

1002, FileB, Engine, 4Cylinder

1002, FileB, Colour, Blue

The result would be that FME would report

PrimaryID

1001, FileA, Engine, 4Cylinder

1001, FileB, Engine, 6Cyliner

As these value are different even though they relate to the same Car (id 1001) but one file reports the car having a 4 cylinder engine and the second file reports the same car having a 6Cylinder engine.

At the moment, I can have multiple XML files to read, some of which may be missing attributes, so I would need to detect those as well.

My process is to read them in, I then use an AttributeExploder to expose all of the XML tags. From there I use a matcher to match on the PrimaryID values, and so on to constrian the list to what is not matched. It is at this point that the process begins to fail.

Any thoughts would be greatly appreciated.

Thanks,

Kieren

11 replies

Userlevel 4

Hi,

While FME is pretty good with XML, personally, I'd look into more specialised tools for this scenario. Here's a discussion about various alternatives (http://blogs.msdn.com/b/dmahugh/archive/2008/06/18/open-xml-diff-tools.aspx).

David

Userlevel 2

+17

takashi
Contributor
7538 replies
9 years ago
26 May 2014

Hi,

I think PrimaryID + AttributeName can be considered as a complex primary key. That is, the key is unique in a dataset.

If my understanding is correct, the Matcher transformer can detect value mismatching among features having same key (ID and attribute).

-----

Match Geometry: NONE

Attribute Matching Strategy: Match Selected Attributes

Selected Attributes: PrimaryID AttributeName

Attributes That Must Differ: AttributeValue

-----

However, it might not be enough if there are 3 or more datasets. In a case such as the following example, the Matcher will not output either FileB or FileC, because they have the same attribute value (6 Cylinder), although FileA will be output.

-----

1001, FileA, Engine, 4 Cylinder

1001, FileB, Engine, 6 Cylinder

1001, FileC, Engine, 6 Cylinder

-----

If you need to get both FileB and FileC in such a case, the FeatureMerger can be used additionally.

-----

All the original features --> Requestor

Matched features from the Matcher --> Supplier

Join On: PrimaryID = PrimaryID and AttributeName = AttributeName

-----

Hope this helps,

Takashi

Hi,

You can use a listbuilder grouped on PrimaryID and AttributeName then do a listelementcount. Select elementcount>1 and then use in sequence listduplicateremovers, 1 for DataFile and 1 for AttributeValue (order does notmatter). Then test for existance of a second record (like _list{1}. AttributeName exists), wich should not and therefore yields your result. Explode it.

Zoom in picture to see settings.

Btw, it is indeed as Takashi says: PrimaryID and AttributeName is used as a Key.

Matcher can be used in this way too.

You use the key, then at outpurtport Matched u add again a sequence off matchers.

This time u need to use the as key "_matched_id" and "AttributeValue" followed by (order does not matter) "_matched_id" and "DataFile". For the latter 2 u need to use the Not_Matched outputport.

Actually thats even better, u just have 3 matcehrs in a row!

Userlevel 2

+17

takashi
Contributor
7538 replies
9 years ago
26 May 2014

Inspired by Gio's first post. How about this workflow?

According to Kieren's boolean rule this set :

1001, FileA, Engine, 4 Cylinder

1001, FileB, Engine, 6 Cylinder

1001, FileC, Engine, 6 Cylinder

Only first row should pass.

I have managed to make a workbench that does it correctly:

and the customtransformer that is the core of it:

I tested it with al possible combo's.

This is a flexible solution; u can add more booleans variables to it without much change needed.

Looks simple,took me a while to find this solution tho.

I discarded 3 or more techniques.

Greets

My other suggestions only worked for the initial example data.

My last solution has no such limit.

Apologies for the late reply. This is an unbelievably awesome community. I had been working with a Inline Querier transformer (actually chaining them) and trying to fight through the SQL.

But all of your solutions are fantastic. Truly fantastic.

As my FME skills improve, I certainly hope that I can contribute back (if anyone needs ArcGIS and ArcGIS Mobile help, do let me know, but it's the wrong forum for that)

Gio,

I cannot quite seem to get the top string concatenator set up with the Query, is it possible to share your workbench? I really like the use of the ListHistogrammer. This was a completely new transformer to me (I had noticed it, just had never thought about using it).

Gio,

Another question if you don't mind. In the custom transformer you make mention of an output attribute called Koppeling. Where does this get set?

Thanks,

Kieren

Hi Entombed,

The attribute koippeling is set in both the concatenators seperately.

To make this bench, you must make one customtransformer. When its finished you then copy paste it (the input and output attributes get "locked". It then becomes hard/impossible to change the transformer, only by reducing it's instance to only 1 it can be altered/adapted).

It can be tricky to get this bit set up.

I will share the bench.

(btw. Koppeling is Dutch for Coupler, Link, connector. It's the attribute to merge on)

Compare XML Datasets

11 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded