Question

Compare XML Datasets

11 years ago
May 26, 2014
11 replies
111 views

entombedtrader
4 replies

Hi,

I have an interesting analytical type question that I am trying to solve in FME. I have two (or more) XML files which I need to compare for differing values in the same datatype and report on them. I have created a CSV file with a vertical datamodel of the structure

PrimaryID, Datafile, AttributeName, AttributeValue

What I need to accomplish is having FME report

Where PrimaryID = PrimaryID and DataFile <> Datafile and AttributeName = AttributeName and AttributeValue <> Attribute Value

Something like this

PrimaryID, Datafile, AttributeName, AttributeValue

1001, FileA, Engine, 4Cylinder

1001, FileA, Colour, Blue

1001, FileB, Engine, 6 Cylinder

1001, FileB, Colour, Blue

1002, FileA, Engine, 4Cylinder

1002, FileA, Colour, Blue

1002, FileB, Engine, 4Cylinder

1002, FileB, Colour, Blue

The result would be that FME would report

PrimaryID

1001, FileA, Engine, 4Cylinder

1001, FileB, Engine, 6Cyliner

As these value are different even though they relate to the same Car (id 1001) but one file reports the car having a 4 cylinder engine and the second file reports the same car having a 6Cylinder engine.

At the moment, I can have multiple XML files to read, some of which may be missing attributes, so I would need to detect those as well.

My process is to read them in, I then use an AttributeExploder to expose all of the XML tags. From there I use a matcher to match on the PrimaryID values, and so on to constrian the list to what is not matched. It is at this point that the process begins to fail.

Any thoughts would be greatly appreciated.

Thanks,

Kieren

david_r
8354 replies
11 years ago
May 26, 2014

Hi,

While FME is pretty good with XML, personally, I'd look into more specialised tools for this scenario. Here's a discussion about various alternatives (http://blogs.msdn.com/b/dmahugh/archive/2008/06/18/open-xml-diff-tools.aspx).

David

takashi
7703 replies
11 years ago
May 26, 2014

Hi,

I think PrimaryID + AttributeName can be considered as a complex primary key. That is, the key is unique in a dataset.

If my understanding is correct, the Matcher transformer can detect value mismatching among features having same key (ID and attribute).

-----

Match Geometry: NONE

Attribute Matching Strategy: Match Selected Attributes

Selected Attributes: PrimaryID AttributeName

Attributes That Must Differ: AttributeValue

-----

However, it might not be enough if there are 3 or more datasets. In a case such as the following example, the Matcher will not output either FileB or FileC, because they have the same attribute value (6 Cylinder), although FileA will be output.

-----

1001, FileA, Engine, 4 Cylinder

1001, FileB, Engine, 6 Cylinder

1001, FileC, Engine, 6 Cylinder

-----

If you need to get both FileB and FileC in such a case, the FeatureMerger can be used additionally.

-----

All the original features --> Requestor

Matched features from the Matcher --> Supplier

Join On: PrimaryID = PrimaryID and AttributeName = AttributeName

-----

Hope this helps,

Takashi

+15

gio
Contributor
2252 replies
11 years ago
May 26, 2014

Hi,

You can use a listbuilder grouped on PrimaryID and AttributeName then do a listelementcount. Select elementcount>1 and then use in sequence listduplicateremovers, 1 for DataFile and 1 for AttributeValue (order does notmatter). Then test for existance of a second record (like _list{1}. AttributeName exists), wich should not and therefore yields your result. Explode it.

Zoom in picture to see settings.

+15

gio
Contributor
2252 replies
11 years ago
May 26, 2014

Btw, it is indeed as Takashi says: PrimaryID and AttributeName is used as a Key.

Matcher can be used in this way too.

You use the key, then at outpurtport Matched u add again a sequence off matchers.

This time u need to use the as key "_matched_id" and "AttributeValue" followed by (order does not matter) "_matched_id" and "DataFile". For the latter 2 u need to use the Not_Matched outputport.

Actually thats even better, u just have 3 matcehrs in a row!

takashi
7703 replies
11 years ago
May 26, 2014

Inspired by Gio's first post. How about this workflow?

+15

gio
Contributor
2252 replies
11 years ago
May 27, 2014

According to Kieren's boolean rule this set :

1001, FileA, Engine, 4 Cylinder

1001, FileB, Engine, 6 Cylinder

1001, FileC, Engine, 6 Cylinder

Only first row should pass.

I have managed to make a workbench that does it correctly:

and the customtransformer that is the core of it:

I tested it with al possible combo's.

This is a flexible solution; u can add more booleans variables to it without much change needed.

Looks simple,took me a while to find this solution tho.

I discarded 3 or more techniques.

Greets

+15

gio
Contributor
2252 replies
11 years ago
May 27, 2014

My other suggestions only worked for the initial example data.

My last solution has no such limit.

entombedtrader
Author
4 replies
11 years ago
May 28, 2014

Apologies for the late reply. This is an unbelievably awesome community. I had been working with a Inline Querier transformer (actually chaining them) and trying to fight through the SQL.

But all of your solutions are fantastic. Truly fantastic.

As my FME skills improve, I certainly hope that I can contribute back (if anyone needs ArcGIS and ArcGIS Mobile help, do let me know, but it's the wrong forum for that)

entombedtrader
Author
4 replies
11 years ago
May 29, 2014

Gio,

I cannot quite seem to get the top string concatenator set up with the Query, is it possible to share your workbench? I really like the use of the ListHistogrammer. This was a completely new transformer to me (I had noticed it, just had never thought about using it).

entombedtrader
Author
4 replies
11 years ago
May 29, 2014

Gio,

Another question if you don't mind. In the custom transformer you make mention of an output attribute called Koppeling. Where does this get set?

Thanks,

Kieren

+15

gio
Contributor
2252 replies
11 years ago
June 2, 2014

Hi Entombed,

The attribute koippeling is set in both the concatenators seperately.

To make this bench, you must make one customtransformer. When its finished you then copy paste it (the input and output attributes get "locked". It then becomes hard/impossible to change the transformer, only by reducing it's instance to only 1 it can be altered/adapted).

It can be tricky to get this bit set up.

I will share the bench.

(btw. Koppeling is Dutch for Coupler, Link, connector. It's the attribute to merge on)

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Compare XML Datasets