Question

Question of the Week: Lists and the InlineQuerier

4 years ago
April 9, 2020
5 replies
90 views

+43

mark2atsafe
Safer
2501 replies

Hello FME'ers,

Because Friday is a holiday this week, you get the question-of-the-week post a little early. Hurrah for long weekends that start on the Friday. So much better than ending on a Monday, I think.

Question of the Week

This question from @rmm7 came up yesterday and relates to testing and filtering data:

Q) I'm trying to match values between two attributes on the same feature.

I have an electric transformer with a phase value, and it's connected electric cable, also with a phase value. The phase values can be any combination of A,B, or C. For example, the electric transformer might have a value of AC, and the cable it's joined to has a value of ABC.

I'm trying to find a way to test if the phase values of the transformer are contained within the phase values of the cable.

A) This question got a number of answers, and I think the user has a way forward. But in working out my particular answer, I came up with an idea and thought I'd share it here, to see if it would be useful.

If you don't want to go through the whole thought process, then just skip to the Summary section, where I outline what that idea is.

Otherwise, let's start by looking at what exactly we're trying to do here...

Question Requirements

So there is an electric transformer and a cable. Each has a list of phase values. We need to know if the transformer values are stored within the cable values.

So, for example:

Cable Phases: A,B,C
Transformer Phases: A, C
Result: Passed

But:

Cable Phases: A,C
Transformer Phases: B, C
Result: Failed

In short, we need to match lists of values, and you can't really do that by default in the Tester.

Approaches

There are two ways to approach a problem like this.

Firstly, we can go for a generic approach, where we try to solve the problem of list matching, without restriction. So the solution would work for any list, not just the examples here.

Secondly, we can go for a focused approach, where we try to solve the problem using the limits specified by the user. In this case, "the phase values can [only] be any combination of A,B, or C", plus from the examples shown, we can assume the values are always in alphabetical order.

This makes the task much easier because there are only a set number of matches.

By my calculations there can only be 7 different values (A/B/C/A,B/A,C/B,C/A,B,C). As @erik_jan points out, you can use the Tester with a CONTAINS operator, to eliminate most of those matches. Only A,C is difficult because it's the only possibility not contained within the full A,B,C string.

So something like this:

...should be fine. Great reasoning and a nice answer there Erik.

So why is my question title banging on about Lists and the InlineQuerier?

A Generic Approach using Lists

Personally, I started out looking for a generic approach to the problem. I often do that because you can end up with a solution that works for many scenarios - not just this specific question - and then you can put it on the hub and other users can find it

Plus, the generic approach often turns out to be the same as the focused approach, so it doesn't matter anyway.

Here the core problem is this:

Problem: It's not easy to test multiple values in one string, against multiple values in another string.

OK, if we can't test the values in that structure, can we restructure them to something easier? If you've got comma-separated values (like A,B,C) the simplest transformation is to put them into an FME List attribute.

So I add two AttributeSplitters:

And now I have list attributes, phaseList1{} and phaseList2{}, say like this:

phaseList1{0} = A
phaseList1{1} = B
phaseList1{2} = C
phaseList2{0} = A
phaseList2{1} = C

Problem: It's not much more easy to test multiple values in one list, against multiple values in another list.

What I'd probably do here is create a custom transformer and check all the values in phaseList2{}, looking for their equivalent in phaseList1{}. I'd need a custom transformer loop, with a ListSearcher transformer to do the actual testing.

But custom transformers and loops seems a bit complicated for this. Is there a better way? What else could we restructure the data to?

A Generic Approach using Regex

My next idea was to use Regex and this is the solution I posted to the question. @ebygomm came up with a Regex solution too.

My version was:

@FindRegEx(@Value(PrimaryPhase),(?=.*@Value(xList{0}))(?=.*@Value(xList{1}))(?=.*@Value(xList{2}))) != -1

Her's was:

@FindRegEx(@Value(PrimaryPhase),@ReplaceString(@ReplaceRegEx(@Value(XFMRPhase),([A-Z]),(?=.*\1)),",",""))

As @redgeographics says, creating Regex on the fly must be the ultimate FME hack. To be fair I think both solutions are quite horrific! Mine was mostly a cut-and-paste job from StackOverflow and is not fully generic. Plus, I'd hate to be an FME user and have to come up with one of these.

So what else do we have?

A Generic Approach using SQL

Could we use SQL? Why not? It has the advantage of a single line of "code" and it would be fairly readable too, not some Regex gibberish.

But how do I get the attributes into a database and what do I use to query them? I'd need to have several tables all joined together with keys, and how do I set that up?

I use a single transformer: the InlineQuerier:

This is the million dollar idea I wanted to share!

If I just route the data into an InlineQuerier as a single connection, I'm no better off. A List is a List, and the transformer won't automatically process that list into a more usable table.

But if I explode the list and send it as a separate connection to the InlineQuerier?

Now I have a scenario where I can build a single SQL query to pick features where all entries in List B join with an entry in List A.

Problem: My SQL skills have gone rusty over the years because I've rarely needed to use them.

That's a problem with me, not the technique. I think the technique is sound, it's just that I can't figure out how to structure the SQL. So far I have:

select distinct "ID" from "List B" where (select count(*) from "List B" where "xList" in (select "pList" from "List A")) = (select count(*) from "Main Data" as m join "List B" as lb on m.ID = lb.ID)

Yikes! It works, but only for one feature, and it looks pretty sad. Maybe someone with better SQL skills can come up with a better command?

But still, using SQL has opened up a lot of possibilities here.

Summary

Overall, I think this is an interesting question and all, and it does show us the different approaches we can take towards solving a particular need.

But what I really wanted to pass on was the idea of exploding a list and sending it into an InlineQuerier as a separate input.

Here I have two lists, so I split them off, keeping the original stream in case I need it. This way I get multiple database tables, automatically, and can run a query that combines all elements of them.

It even lets me query between previous/subsequent features, without using the Adjacent Feature processing.

When I think of all the transformations I've tried to do on a list, it seems to me that they could be much easier using SQL. It's half way between putting multiple transformers together and writing a Python script to do the job. As most new FME users know more SQL than FME transformers, I think it could help.

I don't know if this is a truly unique idea, or whether you've been doing this already. Maybe you think it's a dumb idea? Maybe you think that we should put an option into the InlineQuerier to explode lists like this? Either way, it's OK. This just popped into my head and so I wanted to pass it on in case it was of use.

Let me know below what you think.

+31

ebygomm
Influencer
3241 replies
4 years ago
April 9, 2020

I don't know if this is a truly unique idea, or whether you've been doing this already. Maybe you think it's a dumb idea?

I think for this particular use case, python is easier

If you really wanted to do it though, i think something like this would work

Select Distinct m.id, m.PrimaryPhase, m.XFMRPhase, jj.match from master m left join
(SELECT
    bb.id, "fail" as match
FROM
    listb bb
WHERE
    NOT EXISTS (SELECT *
        FROM
           lista aa
        WHERE
           aa."primary" = bb.xfmr
           and 
           aa.id = bb.id)) jj

on m.id = jj.ID

Output

+26

bwn
Evangelist
564 replies
4 years ago
April 10, 2020

That is indeed some tortuous SQL @mark2atsafe !

For both performance and simplicity reasons it is better to just do it as plain old JOINs (INNER and LEFT OUTER). Try to avoid things like IN() or EXISTS() as the SQLite Query Optimiser gets a little funky with those. The best performance cases often work with just using JOIN statements as this is most likely to trigger the Auto-Indexer if the SQL parser analyses the statement and determines it can build temporary in-memory Field indices at the time of SQL execution to speed up the query.

SELECT M.*, Lookup.Match
FROM "Main Data" M LEFT OUTER JOIN
(SELECT A.ID , 1 AS Match
FROM "List A" A INNER JOIN "List B" B ON (A.ID=B.ID AND A.pList=B.xList)
GROUP BY A.ID) AS Lookup
ON (M.ID=Lookup.ID)

Syntactically equivalent......but not performance equivalent, is a variation on the SQL by using DISTINCT instead of GROUP BY in the above. For whatever reason SQLite is usually faster when using GROUP BYs rather than DISTINCTs even thought the documentation infers they should execute similarly

SELECT M.*, Lookup.Match
FROM "Main Data" M LEFT OUTER JOIN
(SELECT DISTINCT A.ID , 1 AS Match
FROM "List A" A INNER JOIN "List B" B ON (A.ID=B.ID AND A.pList=B.xList)
) AS Lookup
ON (M.ID=Lookup.ID)

...now another Tip, you don't need InlineQuerier to do the above! You can replicate the SQL by a FeatureJoiner (For the Subquery Inner Join), a DuplicateFilter on its output to return a Unique ID value for the X-query results (to mimic the GROUP BY), and a Final Left Join Mode FeatureJoiner with the Main Table entering the Left Side and the DuplicateFilter entering the Right Side

In fact, when using FME and being so used to doing it in SQL, I often think, "How would I do that in SQL and then mimic in FME?" Usually this is just some combinations of FeatureJoiner, DuplicateFilter and/or Sorter.

It could actually be done completely in InlineQuerier without the need for any lists at all by parsing out the values "A", "B", C" onto separate rows using 2 x WITH RECURSIVEs used in conjunction with the SubStr() functionto generate 2 inline Views that created multiple rows within the SELECT SQL operation itself that recurse through each leading character in say "ABC" or "AC" until all characters are parsed out onto separate rows within the SQL statement. The way this works is it would recurse out each individual value of "A", "B", "C" for each Main Table entry that ends up with Inline Views equivalent to the output of your ListExploders,.....but the SQL gets a lot more complex, would suffer a performance hit and to be honest, I similarly mix and match pre-processing Transformers like List Creators and Exploders with InlineQuerier SQL to find the "optimum" amount of complexity in the number of Transformers needed versus how complex the SQL statement I feel like writing out!

You are correct thought that InlineQuerier and the SQLite SQL within it is extremely versatile and fast, and I'll often scrub out parts of my workspace where I needed to use a complex series of Transformers to test complex permutations of data relationships and replace them all with a Single InlineQuerier. As above, it is also good at Looping/Recursion for those that are familiar with SQL CTEs.

+26

bwn
Evangelist
564 replies
4 years ago
April 11, 2020

bwn wrote:

That is indeed some tortuous SQL @mark2atsafe !

SELECT M.*, Lookup.Match
FROM "Main Data" M LEFT OUTER JOIN
(SELECT A.ID , 1 AS Match
FROM "List A" A INNER JOIN "List B" B ON (A.ID=B.ID AND A.pList=B.xList)
GROUP BY A.ID) AS Lookup
ON (M.ID=Lookup.ID)

SELECT M.*, Lookup.Match
FROM "Main Data" M LEFT OUTER JOIN
(SELECT DISTINCT A.ID , 1 AS Match
FROM "List A" A INNER JOIN "List B" B ON (A.ID=B.ID AND A.pList=B.xList)
) AS Lookup
ON (M.ID=Lookup.ID)

And for completeness, here is how you can replace the Lists and ListExploders with just pure SQL to derive the same Feature List Table that exits from the ListExploder Transformers:

WITH RECURSIVE Vw_ListA (ID, pList, pListResidual) AS
 (SELECT ID, substr(PrimaryPhase,1,1) AS pList, substr(PrimaryPhase,2) AS pListResidual  FROM "Main Data" WHERE PrimaryPhase IS NOT NULL AND PrimaryPhase<>''
 UNION ALL SELECT ID, substr(pListResidual,1,1) AS pList, substr(pListResidual,2) AS pListResidual FROM Vw_ListA WHERE pListResidual <>'')
SELECT ID,pList FROM Vw_ListA

...Gets List A, and:

WITH RECURSIVE Vw_ListB (ID, xList, xListResidual) AS
 (SELECT ID, substr(XFMRPhase ,1,1) AS pList, substr(XFMRPhase ,2) AS xListResidual  FROM "Main Data" WHERE XFMRPhase IS NOT NULL AND XFMRPhase<>''
 UNION ALL SELECT ID, substr(xListResidual,1,1) AS xList, substr(xListResidual,2) AS xListResidual FROM Vw_ListB WHERE xListResidual <>'')
SELECT ID,xList FROM Vw_ListB

...Gets List B

These can be substituted in the SQL above by replacing where the named Tables "List A" and "List B" have been called to instead look like this:

SELECT M.*, Lookup.Match FROM "Main Data" M LEFT OUTER JOIN (
SELECT A.ID AS ID, 1 AS Match FROM (
WITH RECURSIVE Vw_ListA (ID, pList, pListResidual) AS
 (SELECT ID, substr(PrimaryPhase,1,1) AS pList, substr(PrimaryPhase,2) AS pListResidual  FROM "Main Data" WHERE PrimaryPhase IS NOT NULL AND PrimaryPhase<>''
 UNION ALL SELECT ID, substr(pListResidual,1,1) AS pList, substr(pListResidual,2) AS pListResidual FROM Vw_ListA WHERE pListResidual <>'')
SELECT ID,pList FROM Vw_ListA
) AS A INNER JOIN (
WITH RECURSIVE Vw_ListB (ID, xList, xListResidual) AS
 (SELECT ID, substr(XFMRPhase ,1,1) AS pList, substr(XFMRPhase ,2) AS xListResidual  FROM "Main Data" WHERE XFMRPhase IS NOT NULL AND XFMRPhase<>''
 UNION ALL SELECT ID, substr(xListResidual,1,1) AS xList, substr(xListResidual,2) AS xListResidual FROM Vw_ListB WHERE xListResidual <>'')
SELECT ID,xList FROM Vw_ListB) AS B ON (A.ID=B.ID AND A.pList = B.xList)
GROUP BY A.ID
) AS Lookup ON (M.ID = Lookup.ID)

Which is elegant but complex for someone to type and read! Which is why I like your solution of using the Lists/ListExploders more to create separate List Tables from the Main Table.......it probably doesn't perform as fast, but it is a faster way to design and test a Workflow!

+22

takashi
Contributor
7552 replies
4 years ago
April 11, 2020

Here is another approach using regex.

You can build a regex that matches any character belonging to the Primary phases list and a subsequent comma if exists. That is, if the Primary phases is "A,C", this expression creates [AC],?

[@ReplaceString(@Value(PrimaryPhase),",","")],?

Assuming that the regex is stored in an attribute called "_regex", you can then create a comma-separated list containing XFMER phases NOT matched the regex with this expression.

@Trim(@ReplaceRegEx(@Value(XFMRPhase),@Value(_regex),""),",")

Finally determine if every XFMR phase belongs to Primary phase by testing whether the resulting list has a value.

If an actual phase name is a word (not a single character), this expression is applicable to create an appropriate regex.

\b(@ReplaceString(@Value(PrimaryPhase),",",|))\b,?

+43

mark2atsafe
Author
Safer
2501 replies
4 years ago
April 14, 2020

bwn wrote:

And for completeness, here is how you can replace the Lists and ListExploders with just pure SQL to derive the same Feature List Table that exits from the ListExploder Transformers:

WITH RECURSIVE Vw_ListA (ID, pList, pListResidual) AS
 (SELECT ID, substr(PrimaryPhase,1,1) AS pList, substr(PrimaryPhase,2) AS pListResidual  FROM "Main Data" WHERE PrimaryPhase IS NOT NULL AND PrimaryPhase<>''
 UNION ALL SELECT ID, substr(pListResidual,1,1) AS pList, substr(pListResidual,2) AS pListResidual FROM Vw_ListA WHERE pListResidual <>'')
SELECT ID,pList FROM Vw_ListA

...Gets List A, and:

WITH RECURSIVE Vw_ListB (ID, xList, xListResidual) AS
 (SELECT ID, substr(XFMRPhase ,1,1) AS pList, substr(XFMRPhase ,2) AS xListResidual  FROM "Main Data" WHERE XFMRPhase IS NOT NULL AND XFMRPhase<>''
 UNION ALL SELECT ID, substr(xListResidual,1,1) AS xList, substr(xListResidual,2) AS xListResidual FROM Vw_ListB WHERE xListResidual <>'')
SELECT ID,xList FROM Vw_ListB

...Gets List B

These can be substituted in the SQL above by replacing where the named Tables "List A" and "List B" have been called to instead look like this:

SELECT M.*, Lookup.Match FROM "Main Data" M LEFT OUTER JOIN (
SELECT A.ID AS ID, 1 AS Match FROM (
WITH RECURSIVE Vw_ListA (ID, pList, pListResidual) AS
 (SELECT ID, substr(PrimaryPhase,1,1) AS pList, substr(PrimaryPhase,2) AS pListResidual  FROM "Main Data" WHERE PrimaryPhase IS NOT NULL AND PrimaryPhase<>''
 UNION ALL SELECT ID, substr(pListResidual,1,1) AS pList, substr(pListResidual,2) AS pListResidual FROM Vw_ListA WHERE pListResidual <>'')
SELECT ID,pList FROM Vw_ListA
) AS A INNER JOIN (
WITH RECURSIVE Vw_ListB (ID, xList, xListResidual) AS
 (SELECT ID, substr(XFMRPhase ,1,1) AS pList, substr(XFMRPhase ,2) AS xListResidual  FROM "Main Data" WHERE XFMRPhase IS NOT NULL AND XFMRPhase<>''
 UNION ALL SELECT ID, substr(xListResidual,1,1) AS xList, substr(xListResidual,2) AS xListResidual FROM Vw_ListB WHERE xListResidual <>'')
SELECT ID,xList FROM Vw_ListB) AS B ON (A.ID=B.ID AND A.pList = B.xList)
GROUP BY A.ID
) AS Lookup ON (M.ID = Lookup.ID)

Wow! That is indeed very elegant. Yes, I do tend to torture SQL a bit; the result of not using it for so many years. I'm not sure I even knew about the Recursive capability.

I did wonder if the FeatureJoiner could do the necessary work - which is how I'd often do a match - but I didn't think it could handle it. Goes to show I was wrong.

This is excellent work. I applaud your SQL skills!

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

Question of the Week: Lists and the InlineQuerier

5 replies

Reply

Helpful Members This Week

Recently Solved Questions

black out text in pdf

FME Feature Writer - Default Values Not Applied

Having a problem with data type 'double' when fme is writing shapefiles

How to display GroupID with SpatialSorter using FME Workbench 2024

Create group ID based on conditions?

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Helpful Members This Week

Recently Solved Questions

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings