I may have had an aneurysm but I can't figure out how you can evaluate if one list string value is contained in another list of strings.Some context: For each list string value in List A, I want to cycle through all string values in List B and evaluate whether any string variable within List B is contained within each List A item. Basically, I'm trying to create a nested loop cycling through each of List A once and for each List A item, cycle through List B (a nested loop). If there is a match [e.g., Line B value(34) matches Line A(15)], then I want to pull that string out of List A(15) and plop it into an attribute and evaluate the next Line A value. Why can't I figure this out in FME? Please be kind.Also is there any way to plug in an attribute value into the list element index, to aid in looping? I'd like to define this: list{@value(count)} to cycle through the list values, but it only allows user entry of the index number.Thanks, PetePS. Also, I'd like to do this without resorting to Python, if at all possible.

Loop through list using element index

+22

jdh
Contributor
1959 replies
8 years ago
16 March 2016

Python is probably the most efficient way to do this, but in pure fme I would try:

create a loop iteration attribute (_i)

listElementCounter on list A{} (to determine the number of times to loop

test to see if _i is less than _element count (exit criteria for loop)

listIndexer (list index set to _i)

listSearcher (List Attribute: ListB{}, Search For: List A (demoted attribute)}

do what ever processing you want on the Found port

increment the iteration attribute and loop (from both found and not found streams)

Userlevel 2

+12

erik_jan
Contributor
2177 replies
8 years ago
16 March 2016

First you need to count the elements in ListA (ListElementCounter).

Then create a custom transformer (for looping) with:

Then starting at 0 do a search (ListSearcher) of ListA.Value(0) in ListB.

If found record the matching values.

Test if index is the element count.

If not add 1 to the index and loop back to start.

If the element count is reached exit the custom transformer.

I know this will need fine tuning but this concept could work.

Userlevel 2

+12

erik_jan
Contributor
2177 replies
8 years ago
16 March 2016

First you need to count the elements in ListA (ListElementCounter).

Then create a custom transformer (for looping) with:

Then starting at 0 do a search (ListSearcher) of ListA.Value(0) in ListB.

If found record the matching values.

Test if index is the element count.

If not add 1 to the index and loop back to start.

If the element count is reached exit the custom transformer.

I know this will need fine tuning but this concept could work.

And while I was typing @jdh recorded a similar answer.

+22

jdh
Contributor
1959 replies
8 years ago
16 March 2016

And while I was typing @jdh recorded a similar answer.

happens to me all the time ;)

Userlevel 4

+25

Yes, as others have said below, the ListIndexer transformer inside a custom transformer loop is what you need here. Relatively easy to do.

Userlevel 2

+17

takashi
Contributor
7538 replies
8 years ago
17 March 2016

Hi @pvaziri, if the List A and B are held by the same feature, explode the feature on the List A with the ListExploder. Then, use the ListSearcher to find an element in the List B, which matches A (an element extracted from the List A). Finally merge the original feature to the matched feature to restore the List A, if necessary. Note that this method cannot collect multiple List B elements which match an identical element in the List A.

Userlevel 4

+25

Takashi got me thinking and this blog article might help - http://blog.safe.com/2014/10/fmeevangelist129/

It's about creating loops with lists, without using custom transformer loops. Instead you use a Cloner to create a separate copy of the feature for list A each of which contains list B. It would probably work here - but I still think a standard loop is easier.

Userlevel 2

+17

takashi
Contributor
7538 replies
8 years ago
17 March 2016

Takashi got me thinking and this blog article might help - http://blog.safe.com/2014/10/fmeevangelist129/

It's about creating loops with lists, without using custom transformer loops. Instead you use a Cloner to create a separate copy of the feature for list A each of which contains list B. It would probably work here - but I still think a standard loop is easier.

Hi @Mark2AtSafe, I prefer to use the Cloner-Loop technique than Loop with custom transformer. Just my preference, there is no specific reason. I learned the effective Cloner use from @gio in this thread: Road Chainage

G

+3

You could also append (vertical) the lists of course and then check for duplicates.

Alas the listduplicateremover does not output info about the duplicates, so you would simply test the value of a listlementcounter.

Or, depening on the size of the lists, you could sort them each on the searchattribute and then stuff them in a string and use a regexp to find the duplicates.

This for instance removes duplicates from a sorted string

(^|-)([^-]*)(-\\2)+','\\1\\2')

where in this case a "-" is used as seperator in the string.

Another fun way to do it is, extend the shortest list to match the longest.

Create a rownumbervalue based on the search value. ( like appending its ascci codes (or unicode etc. or rownumber of total objects) or something like that)

If you do it right all matching objects will be having same rownumber.

..

greets

P

pvaziri
Author
67 replies
8 years ago
22 March 2016

Thanks for your help regarding creating the loop employing lists and ListSearcher. Is there anyway to get multiple matches from the ListSearcher (only search types are First...)? For example, I have many rows with the same item value. Instead of returning first match, how can I obtain a row for each match's list index?

My "key" list A is about 80K and my input list B is around 1.6M. A has unique city_state_zip combinations. The B list is a single column with full addresses, many with the same city_state_zip combination. So for each A item, I need to obtain none, one, or many matches to the single-attribute string in the B list.

How is this best accomplished?

Note: I've created a query that has a subquery in PostgreSQL and it is mercilessly slow (~20 min to process 10K records and that would make it ~24 days to process all 1.6M A rows).

If I need to venture into python, could someone direct me to a stub or something where I can learn and get going rapidly?

Thanks,

Pete

Userlevel 2

+17

takashi
Contributor
7538 replies
8 years ago
22 March 2016

Hi @pvaziri, I think that SQL query is the most efficient way if you have saved the data into a PostgreSQL database already. I don't believe PostgreSQL querying would be such slow, possibly there could be a room to improve SQL statement. Can you post your SQL statement that took 20 minutes to process 10K records?

P

pvaziri
Author
67 replies
8 years ago
22 March 2016

Hi @pvaziri, if the List A and B are held by the same feature, explode the feature on the List A with the ListExploder. Then, use the ListSearcher to find an element in the List B, which matches A (an element extracted from the List A). Finally merge the original feature to the matched feature to restore the List A, if necessary. Note that this method cannot collect multiple List B elements which match an identical element in the List A.

Right now I have my comparison key list in a single attribute text file, e.g.,

DAYTON OH 45459

DAYTON 45450

CINCINNATI OH 49999

CINCINNATI 49999

etc. to about 80K rows

I a lot of full addresses that I'm trying to match against (some have state abbreviations and some do not, thanks to admin inconsistencies).

For the output, I've created an additional variable: matchValue.

Once their is a match, I want to populate matchValue with the search criteria value, e.g.,:

fullAddress matchValue

479 MEADOWVIEW DR DAYTON OH 45459 DAYTON OH 45459

480 DURBAN CT DAYTON OH 45459 DAYTON OH 45459

660 UNIVERSITY DR CINCINNATI 49999 CINCINNATI 49999

Notice if you have a full full address list, lets say for an entire state, there will be repeated matched values. When I look at the ListSearcher settings, I only see search types with 'First' settings instead of 'Any'.

I need to completely scour the fullAddress list and identify any hits against each string in the comparison key list or feature type.

Any recommendations for how to do this expediently within or outside of FME?

Thanks,

Pete

P

pvaziri
Author
67 replies
8 years ago
22 March 2016

Takashi got me thinking and this blog article might help - http://blog.safe.com/2014/10/fmeevangelist129/

It's about creating loops with lists, without using custom transformer loops. Instead you use a Cloner to create a separate copy of the feature for list A each of which contains list B. It would probably work here - but I still think a standard loop is easier.

I'll check this out. Thanks!

Userlevel 2

+17

takashi
Contributor
7538 replies
8 years ago
22 March 2016

Right now I have my comparison key list in a single attribute text file, e.g.,

DAYTON OH 45459

DAYTON 45450

CINCINNATI OH 49999

CINCINNATI 49999

etc. to about 80K rows

I a lot of full addresses that I'm trying to match against (some have state abbreviations and some do not, thanks to admin inconsistencies).

For the output, I've created an additional variable: matchValue.

Once their is a match, I want to populate matchValue with the search criteria value, e.g.,:

fullAddress matchValue

479 MEADOWVIEW DR DAYTON OH 45459 DAYTON OH 45459

480 DURBAN CT DAYTON OH 45459 DAYTON OH 45459

660 UNIVERSITY DR CINCINNATI 49999 CINCINNATI 49999

Notice if you have a full full address list, lets say for an entire state, there will be repeated matched values. When I look at the ListSearcher settings, I only see search types with 'First' settings instead of 'Any'.

I need to completely scour the fullAddress list and identify any hits against each string in the comparison key list or feature type.

Any recommendations for how to do this expediently within or outside of FME?

Thanks,

Pete

@pvaziri, do you have two text files - one contains abbreviated addresses, another contains full addresses?

P

pvaziri
Author
67 replies
8 years ago
22 March 2016

@pvaziri, do you have two text files - one contains abbreviated addresses, another contains full addresses?

Here you go. This contains 10K sample full addresses and the full listing of possible city-state-zip combinations in the US...textcomparesamples.zip

Userlevel 2

+17

takashi
Contributor
7538 replies
8 years ago
23 March 2016

Hi @pvaziri, if the List A and B are held by the same feature, explode the feature on the List A with the ListExploder. Then, use the ListSearcher to find an element in the List B, which matches A (an element extracted from the List A). Finally merge the original feature to the matched feature to restore the List A, if necessary. Note that this method cannot collect multiple List B elements which match an identical element in the List A.

@pvaziri, this workspace took just a few seconds to extract matched full address records from your sample data.

Regex: [^\\s]+$ (Japanese Windows displayes a backslash as ? symbol. troublesome...)

Since the LIKE operation is relatively inefficient, I extracted the last word (zipcode?) from every address in both tables with the StringSearchers, and then compared them before the LIKE operation in the SQL statement. It's a tip to improve the query performance drastically.

If you create an external database (e.g. "FullAddress" table with "address" and "zipcode" fields, "CityStateZip" table with "citystate" and "zipcode" fields), the SQLCreator with a similar SQL statement could be more efficient.

Note:The InlineQuerier (i.e. SQLite) seems to perform the LIKE operation in case-insensitive, but there could be a database system that performs the operation in case-sensitive.

P

pvaziri
Author
67 replies
8 years ago
23 March 2016

@pvaziri, this workspace took just a few seconds to extract matched full address records from your sample data.

Regex: [^\\s]+$ (Japanese Windows displayes a backslash as ? symbol. troublesome...)

Since the LIKE operation is relatively inefficient, I extracted the last word (zipcode?) from every address in both tables with the StringSearchers, and then compared them before the LIKE operation in the SQL statement. It's a tip to improve the query performance drastically.

If you create an external database (e.g. "FullAddress" table with "address" and "zipcode" fields, "CityStateZip" table with "citystate" and "zipcode" fields), the SQLCreator with a similar SQL statement could be more efficient.

Note:The InlineQuerier (i.e. SQLite) seems to perform the LIKE operation in case-insensitive, but there could be a database system that performs the operation in case-sensitive.

Thanks @takashi. Looks sweet! I'll try it tout de suite!

Loop through list using element index

17 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded