Skip to main content

Hello, I have a File Geodatabase Feature Class that has a field with people's names in the format "First Last". I have another spreadsheet with the people's Last names in one field, and their Active Directory usernames in another field. I need to get the names matched up with their Active Directory usernames as well as possible. What would be the best transformer to use to join these to tables using something similar to a "Like" or "Contains" query to get an output feature class with the usernames?

The FeatureMerger doesn't allow you to use LIKE queries as far as I can tell.

For example, say the Feature Class "Fred Johnson" in the name field, and the other table just has Johnson, I would need a Transformer and Query that joins those two and outputs a new feature class with the username.

 

You could try the custom transformer FuzzyStringCompareFrom2Datasets

If you just want to join with a like query then the InLineQuerier will allow you to do that.


You could try the custom transformer FuzzyStringCompareFrom2Datasets

If you just want to join with a like query then the InLineQuerier will allow you to do that.

Thanks for the reply, I was actually trying the InlineQuerier, but maybe I have something off I am getting no results, and it stalls for like 30 mins at one point?


I'm not sure the SQLite Query Optimiser used in InlineQuerier would perform well with a LIKE being used like this in a JOIN ... ON condition. The way it works is it would possibly test the WHERE condition twice, once to establish the Cartesian product of the JOIN which evaluates to TRUE (there is extra processing done when using "ON" in a JOIN expression), and then an additional test of the WHERE clause to filter this result, even though it is the same WHERE clause.

Also, don't use "+" to concatenate. In most SQL syntaxes it is a double-pipe || to concatenate strings.   This will be the most likely reason that no results are returned, "+" is a literal maths/arithmetic operator in SQLite, and on non-numeric strings it will return basically in your current statement the equivalent of "... LIKE NULL", which will return no results as no string can be LIKE NULL.

Additionally, to significantly decrease the run time of InlineQuerier when dealing with lots of records, then it is highly recommended to limit the data passed to InlineQuerier to just:

  • An Identifier in Table A;
  • An Identifier in Table B; and
  • The 2 comparison data fields.

Do this with an AttributeManager in front of the InlineQuerier to remove all unneeded Attributes before passing to InlineQuerier.

Similarly, limit the Attributes being returned in the SQL statement. When you are dealing with a comparison result of potentially up to 3,000 x 1 million = 3 billion records in this case, this will make a substantial performance difference in the Transformer as well as reducing the SQLite engine's run-time.

Try instead like this:

SELECT A.IdentifierField, B.IdentifierField
FROM CustomerData_22020 A, Sheet1 B
WHERE A.account_owner_salesrep LIKE '%'||B.LastName||'%'

You could try the custom transformer FuzzyStringCompareFrom2Datasets

If you just want to join with a like query then the InLineQuerier will allow you to do that.

@morijlumos @cdalessandro has updated his FuzzyStringCompareFrom2Datasets trasnformer to be python 3 compliant.


Reply