Skip to main content

I may have had an aneurysm but I can't figure out how you can evaluate if one list string value is contained in another list of strings.

Some context: For each list string value in List A, I want to cycle through all string values in List B and evaluate whether any string variable within List B is contained within each List A item. Basically, I'm trying to create a nested loop cycling through each of List A once and for each List A item, cycle through List B (a nested loop).

If there is a match [e.g., Line B value(34) matches Line A(15)], then I want to pull that string out of List A(15) and plop it into an attribute and evaluate the next Line A value. Why can't I figure this out in FME? Please be kind.

Also is there any way to plug in an attribute value into the list element index, to aid in looping? I'd like to define this: list{@value(count)} to cycle through the list values, but it only allows user entry of the index number.

Thanks, Pete

PS. Also, I'd like to do this without resorting to Python, if at all possible.

Python is probably the most efficient way to do this, but in pure fme I would try:

 

create a loop iteration attribute (_i)

listElementCounter on list A{} (to determine the number of times to loop

test to see if _i is less than _element count (exit criteria for loop)

listIndexer (list index set to _i)

 

listSearcher (List Attribute: ListB{}, Search For: List A (demoted attribute)}

 

do what ever processing you want on the Found port

increment the iteration attribute and loop (from both found and not found streams)


First you need to count the elements in ListA (ListElementCounter).

Then create a custom transformer (for looping) with:

Then starting at 0 do a search (ListSearcher) of ListA.Value(0) in ListB.

If found record the matching values.

Test if index is the element count.

If not add 1 to the index and loop back to start.

If the element count is reached exit the custom transformer.

I know this will need fine tuning but this concept could work.


First you need to count the elements in ListA (ListElementCounter).

Then create a custom transformer (for looping) with:

Then starting at 0 do a search (ListSearcher) of ListA.Value(0) in ListB.

If found record the matching values.

Test if index is the element count.

If not add 1 to the index and loop back to start.

If the element count is reached exit the custom transformer.

I know this will need fine tuning but this concept could work.

And while I was typing @jdh recorded a similar answer.


And while I was typing @jdh recorded a similar answer.

happens to me all the time ;)


Yes, as others have said below, the ListIndexer transformer inside a custom transformer loop is what you need here. Relatively easy to do.


Hi @pvaziri, if the List A and B are held by the same feature, explode the feature on the List A with the ListExploder. Then, use the ListSearcher to find an element in the List B, which matches A (an element extracted from the List A). Finally merge the original feature to the matched feature to restore the List A, if necessary. Note that this method cannot collect multiple List B elements which match an identical element in the List A.


Takashi got me thinking and this blog article might help - http://blog.safe.com/2014/10/fmeevangelist129/

It's about creating loops with lists, without using custom transformer loops. Instead you use a Cloner to create a separate copy of the feature for list A each of which contains list B. It would probably work here - but I still think a standard loop is easier.


Takashi got me thinking and this blog article might help - http://blog.safe.com/2014/10/fmeevangelist129/

It's about creating loops with lists, without using custom transformer loops. Instead you use a Cloner to create a separate copy of the feature for list A each of which contains list B. It would probably work here - but I still think a standard loop is easier.

Hi @Mark2AtSafe, I prefer to use the Cloner-Loop technique than Loop with custom transformer. Just my preference, there is no specific reason. I learned the effective Cloner use from @gio in this thread: Road Chainage


You could also append (vertical) the lists of course and then check for duplicates.

Alas the listduplicateremover does not output info about the duplicates, so you would simply test the value of a listlementcounter.

Or, depening on the size of the lists, you could sort them each on the searchattribute and then stuff them in a string and use a regexp to find the duplicates.

This for instance removes duplicates from a sorted string

(^|-)([^-]*)(-\\2)+','\\1\\2')

where in this case a "-" is used as seperator in the string.

Another fun way to do it is, extend the shortest list to match the longest.

Create a rownumbervalue based on the search value. ( like appending its ascci codes (or unicode etc. or rownumber of total objects) or something like that)

If you do it right all matching objects will be having same rownumber.

..

greets


Thanks for your help regarding creating the loop employing lists and ListSearcher. Is there anyway to get multiple matches from the ListSearcher (only search types are First...)? For example, I have many rows with the same item value. Instead of returning first match, how can I obtain a row for each match's list index?

My "key" list A is about 80K and my input list B is around 1.6M. A has unique city_state_zip combinations. The B list is a single column with full addresses, many with the same city_state_zip combination. So for each A item, I need to obtain none, one, or many matches to the single-attribute string in the B list.

How is this best accomplished?

Note: I've created a query that has a subquery in PostgreSQL and it is mercilessly slow (~20 min to process 10K records and that would make it ~24 days to process all 1.6M A rows).

If I need to venture into python, could someone direct me to a stub or something where I can learn and get going rapidly?

Thanks,

Pete


Hi @pvaziri, I think that SQL query is the most efficient way if you have saved the data into a PostgreSQL database already. I don't believe PostgreSQL querying would be such slow, possibly there could be a room to improve SQL statement. Can you post your SQL statement that took 20 minutes to process 10K records?


Hi @pvaziri, if the List A and B are held by the same feature, explode the feature on the List A with the ListExploder. Then, use the ListSearcher to find an element in the List B, which matches A (an element extracted from the List A). Finally merge the original feature to the matched feature to restore the List A, if necessary. Note that this method cannot collect multiple List B elements which match an identical element in the List A.

Right now I have my comparison key list in a single attribute text file, e.g.,

DAYTON OH 45459

 

DAYTON 45450

 

CINCINNATI OH 49999

 

CINCINNATI 49999

 

etc. to about 80K rows

I a lot of full addresses that I'm trying to match against (some have state abbreviations and some do not, thanks to admin inconsistencies).

For the output, I've created an additional variable: matchValue.

Once their is a match, I want to populate matchValue with the search criteria value, e.g.,:

 

fullAddress matchValue

 

479 MEADOWVIEW DR DAYTON OH 45459 DAYTON OH 45459

 

480 DURBAN CT DAYTON OH 45459 DAYTON OH 45459

 

660 UNIVERSITY DR CINCINNATI 49999 CINCINNATI 49999

Notice if you have a full full address list, lets say for an entire state, there will be repeated matched values. When I look at the ListSearcher settings, I only see search types with 'First' settings instead of 'Any'.

I need to completely scour the fullAddress list and identify any hits against each string in the comparison key list or feature type.

Any recommendations for how to do this expediently within or outside of FME?

Thanks,

Pete


Takashi got me thinking and this blog article might help - http://blog.safe.com/2014/10/fmeevangelist129/

It's about creating loops with lists, without using custom transformer loops. Instead you use a Cloner to create a separate copy of the feature for list A each of which contains list B. It would probably work here - but I still think a standard loop is easier.

I'll check this out. Thanks!


Right now I have my comparison key list in a single attribute text file, e.g.,

DAYTON OH 45459

 

DAYTON 45450

 

CINCINNATI OH 49999

 

CINCINNATI 49999

 

etc. to about 80K rows

I a lot of full addresses that I'm trying to match against (some have state abbreviations and some do not, thanks to admin inconsistencies).

For the output, I've created an additional variable: matchValue.

Once their is a match, I want to populate matchValue with the search criteria value, e.g.,:

 

fullAddress matchValue

 

479 MEADOWVIEW DR DAYTON OH 45459 DAYTON OH 45459

 

480 DURBAN CT DAYTON OH 45459 DAYTON OH 45459

 

660 UNIVERSITY DR CINCINNATI 49999 CINCINNATI 49999

Notice if you have a full full address list, lets say for an entire state, there will be repeated matched values. When I look at the ListSearcher settings, I only see search types with 'First' settings instead of 'Any'.

I need to completely scour the fullAddress list and identify any hits against each string in the comparison key list or feature type.

Any recommendations for how to do this expediently within or outside of FME?

Thanks,

Pete

@pvaziri, do you have two text files - one contains abbreviated addresses, another contains full addresses?


@pvaziri, do you have two text files - one contains abbreviated addresses, another contains full addresses?

Here you go. This contains 10K sample full addresses and the full listing of possible city-state-zip combinations in the US...textcomparesamples.zip


Hi @pvaziri, if the List A and B are held by the same feature, explode the feature on the List A with the ListExploder. Then, use the ListSearcher to find an element in the List B, which matches A (an element extracted from the List A). Finally merge the original feature to the matched feature to restore the List A, if necessary. Note that this method cannot collect multiple List B elements which match an identical element in the List A.

@pvaziri, this workspace took just a few seconds to extract matched full address records from your sample data.

Regex: [^\\s]+$ (Japanese Windows displayes a backslash as ? symbol. troublesome...)

Since the LIKE operation is relatively inefficient, I extracted the last word (zipcode?) from every address in both tables with the StringSearchers, and then compared them before the LIKE operation in the SQL statement. It's a tip to improve the query performance drastically.

If you create an external database (e.g. "FullAddress" table with "address" and "zipcode" fields, "CityStateZip" table with "citystate" and "zipcode" fields), the SQLCreator with a similar SQL statement could be more efficient.

Note:The InlineQuerier (i.e. SQLite) seems to perform the LIKE operation in case-insensitive, but there could be a database system that performs the operation in case-sensitive.


@pvaziri, this workspace took just a few seconds to extract matched full address records from your sample data.

Regex: [^\\s]+$ (Japanese Windows displayes a backslash as ? symbol. troublesome...)

Since the LIKE operation is relatively inefficient, I extracted the last word (zipcode?) from every address in both tables with the StringSearchers, and then compared them before the LIKE operation in the SQL statement. It's a tip to improve the query performance drastically.

If you create an external database (e.g. "FullAddress" table with "address" and "zipcode" fields, "CityStateZip" table with "citystate" and "zipcode" fields), the SQLCreator with a similar SQL statement could be more efficient.

Note:The InlineQuerier (i.e. SQLite) seems to perform the LIKE operation in case-insensitive, but there could be a database system that performs the operation in case-sensitive.

Thanks @takashi. Looks sweet! I'll try it tout de suite!


Reply