Question

Transformer(s) to test multiple attributes in multiple input files

  • 25 January 2018
  • 5 replies
  • 25 views

Badge

I have two input files, and I want to check whether or not the address in one input file (comprised of a street attribute, number attribute, postal code attribute etc.) matches the adress in another input file (also comprised of more or less the same columns). It's a string/text object, so I don't need it to be an exact match.

I've already tried feature merger, but it seems as if feature merger only accepts exact matches. I could be mistaken of course.


5 replies

Badge +7

How fuzzy do you need it to be? Is it just a question of case (i.e. HIGH STREET = High Street)? Or does it need to be more fuzzy than that (e.g. High Street = HIGH ST)?

A quick Google has come up with the FuzzyStringComparer in the FME Hub.

https://hub.safe.com/transformers/fuzzystringcomparer

If you find limitations with it (e.g. it only works on one dataset), this might be useful:

https://knowledge.safe.com/questions/3776/fuzzy-string-matching-from-two-datasets.html

Another way of merging 2 datasets into one without losing the knowledge of which features belong to which dataset is to expose fme_basename on the Reader(s). You can then connect both dataset to the same input port on a transformer (or to a Junction). Anytime you need to split the data back out (e.g. for the Requestor and Supplier inputs of FeatureMerger), you just use a Tester or AttributeFilter on fme_basename.

Badge +3

you could try a fuzzy stringcomparison.

It's available in python, tcl etc.

Someone put it in a custom transformer so you can download the transformer (just type fuzzy on your canvas)

You wil have to do a Cartesian set comparison (by doing a unconditional featuremerger 1=1 ) or iterate one set by the elements of the other (using custom transformer).

If you don't have huge sets, I'd go for the unconditional merger. Remember to take take care of attributename conflict when using the merger.

Badge

you could try a fuzzy stringcomparison.

It's available in python, tcl etc.

Someone put it in a custom transformer so you can download the transformer (just type fuzzy on your canvas)

You wil have to do a Cartesian set comparison (by doing a unconditional featuremerger 1=1 ) or iterate one set by the elements of the other (using custom transformer).

If you don't have huge sets, I'd go for the unconditional merger. Remember to take take care of attributename conflict when using the merger.

@gio

 

@tim_wood

 

 

It doesn't work, or maybe I'm doing something wrong because apparently the check is extremely fuzzy, nothing matches (the ratio value is very low), when I know I should get hits.... However, maybe a regular featuremerger does work, because when I merge on streetname, I do get matches. However, when I add the postalcode attributes to the join clause, I get very few matches.

 

I think this is due to differing datatypes of the attributes in the datasets I want to merge (?)

 

One dataset is an excel file, and I was able to easily set the type of the postal code upon loading.

 

 

However the other dataset is an FFS (see https://knowledge.safe.com/questions/57248/wfs-data-not-coming-through.html?childToView=57489#comment-57489 on how I had to create it). I can't change the attribute type of the postal code attribute in this dataset. It's set to "buffer". How can I change this? Or is this even the problem on why the "featuremerger" transformer can't match when I add postal code to the joins?

 

 

Badge +3

Hi

@tim_wood

If you add postal code the join is street name and postal code.

Meaning in your case the postal code and street name don't match fully. Street name may belong to 2 or more zones, or data is wrong.

I usually join on a concatenation of postal code, house number and house letter

If available.

Different stringcoding can cause merging to fail. Change it can help.

To know what the problem is, we would need a sample data.

Maybe you can provide some?

Badge +7

you could try a fuzzy stringcomparison.

It's available in python, tcl etc.

Someone put it in a custom transformer so you can download the transformer (just type fuzzy on your canvas)

You wil have to do a Cartesian set comparison (by doing a unconditional featuremerger 1=1 ) or iterate one set by the elements of the other (using custom transformer).

If you don't have huge sets, I'd go for the unconditional merger. Remember to take take care of attributename conflict when using the merger.

https://docs.safe.com/fme/html/FME_Desktop_Documentation/FME_ReadersWriters/ffs/Reader_Directives.htm

 

"Buffers store unbounded length character or byte strings." I'm not completely sure what that means but it sounds like it's text. You could try using an AttributeCreator to copy the value into a new attribute then delete the old attribute.

 

 

I sometimes have to use an AttributeTrimmer to remove whitespace before/after the values.

 

For UK postcodes, the value may be written with a space in the middle or without e.g. "AA1 2BC" or "AA12BC". There are various ways to add the space if required (e.g. using a Regular Expression) or you could remove the space with a StringReplacer.

 

Reply