Skip to main content

I am trying to do a regexp match between 2 datasets and get a merge of the datasets as if it were a join query. Here's the query that I have run on postgres and would like to find help in FME to do it.

WITH g as (select name from gname where feature_code = 'ADM2'),p as (select "NAME" FROM "NAMES" WHERE "PLACE_ID" IN (SELECT "PLACE_ID" FROM
"PLACES" WHERE "TYPE" = 'X')) select g.name,p."NAME" from g join p on exists (select * from regexp_matches(g.name,p."NAME"))

 

 

Just want to add that there is no relation between the 2 datasets or no common field to join

 

 


Hi @raghunaren, since you have the query already, you can execute it with the SQLCreator or SQLExecutor transformer. I think it's the quickest way.


Hi @raghunaren, since you have the query already, you can execute it with the SQLCreator or SQLExecutor transformer. I think it's the quickest way.

Thanks. The 2 datasets I mentioned earlier are actually 2 features from 'notmerged' and 'unreferenced' output ports of a featuremerger transformer I use in a step before. Not sure if SQLCreator or SQLExecutor can be used in this scenario.

 

 

I did try the inlinequerier but it seems i can't use the query as it is. Inlinequerier throws syntax errors while trying to execute this query. The reason I stuck to this query was that the query plan in this case was such that it executes faster than something like this.

 

 

select g.name, p."NAME" FROM gname g join "NAMES" p on exists(select * from regexp_matches(g.name,p."NAME") WHERE g.feat_code = 'ADM2' AND p."ID" IN (SELECT "ID" FROM "PLACES" WHERE "TYPE" = 'X'))

 

 

Sorry I should been more precise and elaborate in my question in the first place.

 

 


@raghunaren, I assumed that the two tables ("gname" and "NAMES") are belonging to the same database schema, i.e. the query you posted can be executed on PostgreSQL. If so, the SQLCreator or SQLExecutor could execute the query directly and outputs the resulting records.

If the tables are read from two different database schemas, the InlineQuerier may be a good choice. The InlineQuerier creates a temporary SQLite database internally and then executes a query on the database. However, SQLite doesn't support regex pattern matching by default unfortunately.

 

 

If your requirement is to just determine if g.name contains p.NAME, you can use the like operator. This is a simplified example to do that.

 

select g.name, p.NAME
from gname as g inner join NAMES as p
on g.name like '%'||p.NAME||'%'
Here, % indicates 0 or more any characters. The like operator in this statement determines if g.name contains p.NAME. Just be aware that the determination will be performed in case-insensitive.

 

 

Otherwise, a possible way is to perform CROSS JOIN with the InlineQuerier and then filter the resulting records by the StringSearcher with the regular expression (i.e. the value of NAME). This approach may take a long time because the cross join generates N x N records, but the StringSearcher definitely performs regex pattern matching for every combination of g.name and p.NAME
select g.name, p.NAME
from gname as g cross join NAMES as p

0684Q00000ArK0GQAV.png


@raghunaren, I assumed that the two tables ("gname" and "NAMES") are belonging to the same database schema, i.e. the query you posted can be executed on PostgreSQL. If so, the SQLCreator or SQLExecutor could execute the query directly and outputs the resulting records.

If the tables are read from two different database schemas, the InlineQuerier may be a good choice. The InlineQuerier creates a temporary SQLite database internally and then executes a query on the database. However, SQLite doesn't support regex pattern matching by default unfortunately.

 

 

If your requirement is to just determine if g.name contains p.NAME, you can use the like operator. This is a simplified example to do that.

 

select g.name, p.NAME
from gname as g inner join NAMES as p
on g.name like '%'||p.NAME||'%'
Here, % indicates 0 or more any characters. The like operator in this statement determines if g.name contains p.NAME. Just be aware that the determination will be performed in case-insensitive.

 

 

Otherwise, a possible way is to perform CROSS JOIN with the InlineQuerier and then filter the resulting records by the StringSearcher with the regular expression (i.e. the value of NAME). This approach may take a long time because the cross join generates N x N records, but the StringSearcher definitely performs regex pattern matching for every combination of g.name and p.NAME
select g.name, p.NAME
from gname as g cross join NAMES as p

0684Q00000ArK0GQAV.png

Addition. In order to reduce the overhead to create the temporary SQLite database in the InlineQuerier, it might be better that you filter the records when reading them from the source tables, with applying the WHERE clauses on each reader.

 


@raghunaren

if they are from not merged and unreferenced ports.

Then just merge them unconditionaly and then use regexp.

For instance, as you are just looking for names from a table in another table, just use a stringsearcher set to egexp (if needed).

If you need word matches use wordboundaries, in fme word boundaries are \\m ...\\M.


@raghunaren

if they are from not merged and unreferenced ports.

Then just merge them unconditionaly and then use regexp.

For instance, as you are just looking for names from a table in another table, just use a stringsearcher set to egexp (if needed).

If you need word matches use wordboundaries, in fme word boundaries are \\m ...\\M.

Thanks @gio . Will definitely try that.

 


Addition. In order to reduce the overhead to create the temporary SQLite database in the InlineQuerier, it might be better that you filter the records when reading them from the source tables, with applying the WHERE clauses on each reader.

 

Thanks @takashi . That helped to what I had set out to do. Also, I would be exploring to see if I can extend the pattern matching a bit using fme. It was silly of me to have not thought about using ||p.Name|| like I would do in a stored proc. postgresql.

 


Reply