Skip to main content
Solved

Python script to compare two datasets that involve character strings

  • February 13, 2018
  • 4 replies
  • 105 views

fmenco
Contributor
Forum|alt.badge.img+5

I have two datasets, 1 and 2, containing addresses. The address consist of all the 4 fields.

I'm trying to compare the two datasets, to find matches between the two. I have already used a stringreplacer, to remove all the spaces in the postalcodes, and to replace all the special characters in the streetnames.

In dataset 2, the addition2 field contains a lot of random information, as you can see. I need help compiling a python script (to be used in the python caller transformer or any other transformer) that's able to:

- detect when there's more than one character in the addition2 field (I expect that field to contain only one character) and filter that record out;

-from the filtered records, detect the first (or maybe first three?) characters in the addition2 field, and use that to recheck the address in dataset 2 with address1, to see if there's a hit.

- if there's no hit, I want fme to ignore the information in the addtion2 field, see it as an empty field, and then compare again with the address in dataset1.

I'm not a programmer, but do have the ambition to learn. However, I'm just noticing the mismatch in the addition2 field, and there's no time to get my Python learning groove on right now...

Suggestions for a workflow will do, too.

Best answer by carsonlam

Just a quick answer for filtering based on addition2. For your scenario, it's better to use Tester instead of PythonCaller. Using a Tester makes the workflow clearer, especially since you want to work with the filtered features.

With Tester, have a Test Clause of 

@Length(@Value(addition2)) > 1
.

To get the first character, use AttributeCreator with a value of 

@Substring(@Value(addition2), 0, 1)
.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

4 replies

robertr
Contributor
Forum|alt.badge.img+6
  • Contributor
  • 60 replies
  • February 13, 2018

Can we see an example of the data? not sure you attached the file(s).


fmenco
Contributor
Forum|alt.badge.img+5
  • Author
  • Contributor
  • 86 replies
  • February 13, 2018

Can we see an example of the data? not sure you attached the file(s).

 

Ohw, sorry,...something went wrong. And image is added now

 


carsonlam
Safer
Forum|alt.badge.img+7
  • Safer
  • 62 replies
  • Best Answer
  • February 13, 2018

Just a quick answer for filtering based on addition2. For your scenario, it's better to use Tester instead of PythonCaller. Using a Tester makes the workflow clearer, especially since you want to work with the filtered features.

With Tester, have a Test Clause of 

@Length(@Value(addition2)) > 1
.

To get the first character, use AttributeCreator with a value of 

@Substring(@Value(addition2), 0, 1)
.


fmenco
Contributor
Forum|alt.badge.img+5
  • Author
  • Contributor
  • 86 replies
  • February 16, 2018

Just a quick answer for filtering based on addition2. For your scenario, it's better to use Tester instead of PythonCaller. Using a Tester makes the workflow clearer, especially since you want to work with the filtered features.

With Tester, have a Test Clause of 

@Length(@Value(addition2)) > 1
.

To get the first character, use AttributeCreator with a value of 

@Substring(@Value(addition2), 0, 1)
.

 

@carsonlam 

 

 

Thanks, I changed the workbench to include a bunch of testfilters and attributecreators, and that seemed to work