Skip to main content
Solved

Python script to compare two datasets that involve character strings

  • February 13, 2018
  • 4 replies
  • 57 views

fmenco
Contributor
Forum|alt.badge.img+5

I have two datasets, 1 and 2, containing addresses. The address consist of all the 4 fields.

I'm trying to compare the two datasets, to find matches between the two. I have already used a stringreplacer, to remove all the spaces in the postalcodes, and to replace all the special characters in the streetnames.

In dataset 2, the addition2 field contains a lot of random information, as you can see. I need help compiling a python script (to be used in the python caller transformer or any other transformer) that's able to:

- detect when there's more than one character in the addition2 field (I expect that field to contain only one character) and filter that record out;

-from the filtered records, detect the first (or maybe first three?) characters in the addition2 field, and use that to recheck the address in dataset 2 with address1, to see if there's a hit.

- if there's no hit, I want fme to ignore the information in the addtion2 field, see it as an empty field, and then compare again with the address in dataset1.

I'm not a programmer, but do have the ambition to learn. However, I'm just noticing the mismatch in the addition2 field, and there's no time to get my Python learning groove on right now...

Suggestions for a workflow will do, too.

Best answer by carsonlam

Just a quick answer for filtering based on addition2. For your scenario, it's better to use Tester instead of PythonCaller. Using a Tester makes the workflow clearer, especially since you want to work with the filtered features.

With Tester, have a Test Clause of 

@Length(@Value(addition2)) > 1
.

To get the first character, use AttributeCreator with a value of 

@Substring(@Value(addition2), 01)
.

View original
Did this help you find an answer to your question?

4 replies

robertr
Contributor
Forum|alt.badge.img+6
  • Contributor
  • February 13, 2018

Can we see an example of the data? not sure you attached the file(s).


fmenco
Contributor
Forum|alt.badge.img+5
  • Author
  • Contributor
  • February 13, 2018
robertr wrote:

Can we see an example of the data? not sure you attached the file(s).

 

Ohw, sorry,...something went wrong. And image is added now

 


carsonlam
Safer
Forum|alt.badge.img
  • Safer
  • Best Answer
  • February 13, 2018

Just a quick answer for filtering based on addition2. For your scenario, it's better to use Tester instead of PythonCaller. Using a Tester makes the workflow clearer, especially since you want to work with the filtered features.

With Tester, have a Test Clause of 

@Length(@Value(addition2)) > 1
.

To get the first character, use AttributeCreator with a value of 

@Substring(@Value(addition2), 01)
.


fmenco
Contributor
Forum|alt.badge.img+5
  • Author
  • Contributor
  • February 16, 2018
carsonlam wrote:

Just a quick answer for filtering based on addition2. For your scenario, it's better to use Tester instead of PythonCaller. Using a Tester makes the workflow clearer, especially since you want to work with the filtered features.

With Tester, have a Test Clause of 

@Length(@Value(addition2)) > 1
.

To get the first character, use AttributeCreator with a value of 

@Substring(@Value(addition2), 01)
.

 

@carsonlam 

 

 

Thanks, I changed the workbench to include a bunch of testfilters and attributecreators, and that seemed to work

 

 


Did this help you find an answer to your question?

Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings