Solved

Can SQLExecutor be used to get around having to read in all the data from a database table into FME and then running a feature joiner.


Badge

Can SQLExecutor be used to get around having to read in all the data from a database table into FME and then running a feature joiner?

Consider the example below we need to read in the entire table of 2Million plus records to extract data we are interested in.

CurrentPatternWhat I think should be possible but I'm not sure how to implement it is to create a temp table and use that as a subselect or joined table to get the data using a SQL query

DesiredOutcome

icon

Best answer by elpone 2 September 2021, 11:45

View original

14 replies

Badge +2

Hi @elpone​ ,

Sounds like there are two things you could be doing here.

 

First, since the first transformer is a tester, you can likely use a WHERE clause on the (LFS and SILC) reader feature type's to limit the data being read in from the database.

 

Second, use the DatabaseJoiner. The DatabaseJoiner transformer is similar to the FeatureJoiner or FeatureMerger, but instead of merging two streams of features and an in-memory join, it merges one stream of features with data read from an external database. The DatabaseJoiner allows features to be joined without having to read the entire dataset into a workspace. FME can just query the database and select the individual records it needs. This can improve performance greatly. More details in linked DatabaseJoiner article above.

 

Also, be sure to check out the let the database do the work articles as it will provide you with more tips on how to improve performance when working with DB formats!

 

Hope that helps!

Badge

Hi @elpone​ ,

Sounds like there are two things you could be doing here. 

 

First, since the first transformer is a tester, you can likely use a WHERE clause on the (LFS and SILC) reader feature type's to limit the data being read in from the database. 

 

Second, use the DatabaseJoiner. The DatabaseJoiner transformer is similar to the FeatureJoiner or FeatureMerger, but instead of merging two streams of features and an in-memory join, it merges one stream of features with data read from an external database. The DatabaseJoiner allows features to be joined without having to read the entire dataset into a workspace. FME can just query the database and select the individual records it needs. This can improve performance greatly. More details in linked DatabaseJoiner article above.

 

Also, be sure to check out the let the database do the work articles as it will provide you with more tips on how to improve performance when working with DB formats!

 

Hope that helps!

Hi Chris,

Thanks for the suggestions,  they do not quite cover the use case I have.

 

LFS and SILC are not in the database these are excel sheets with a Postcode for each record which here in Ireland correspond to a point location of a delivery point which I then want to get location for.  What I want to do is use the post codes I have from what ever source (excel, csv type data) and use this to query the database.  I had a go at the database joiner but it  very slow as it is in effect making a single query for each feature.

In effect its running this query 1000's of times

Select * 
FROM dbo.[AddressTable] as GD
WHERE GD.[EIRCODE] = @Value(Eircode)

vs selection query which should be much quicker

Select * 
 FROM dbo.[AddressTable] as GD
 WHERE GD.[EIRCODE] IN
  (
  t.Eircode
  FROM #TempTable as t
  );

There is no way to segment or thin the big table that needs to be read.

Regards

Userlevel 4

First, you really want to avoid calling the SQLExecutor many thousands of times, it's going to take a very long time.

Secondly, seeing the size of the database table (2+million features), the best performance may actually be had by first writing the two Excel sheets to the same database, indexing the keys and the using e.g. the SQLCreator to join everything together in the database before the data is read back into FME.

Badge

Why not concatanate excel features with comma separator and then use one feature in SQLExecutor?Skärmbild 2021-07-09 094857Skärmbild 2021-07-09 094924Skärmbild 2021-07-09 094831you might need some modification to aggregated attribute if it's of text type. (this should work for int)

Userlevel 4

Why not concatanate excel features with comma separator and then use one feature in SQLExecutor?Skärmbild 2021-07-09 094857Skärmbild 2021-07-09 094924Skärmbild 2021-07-09 094831you might need some modification to aggregated attribute if it's of text type. (this should work for int)

I'm not sure how well that'd work with more than 270'000 Excel features...

Badge

First, you really want to avoid calling the SQLExecutor many thousands of times, it's going to take a very long time.

Secondly, seeing the size of the database table (2+million features), the best performance may actually be had by first writing the two Excel sheets to the same database, indexing the keys and the using e.g. the SQLCreator to join everything together in the database before the data is read back into FME.

Hi @david_r​ ,

I was thinking of this as a possibility, but in a production environment I don't think I want to make the database writable to random users and there also needs to be a clean up whereby the data put into the database then needs to be deleted. I'm too paranoid to think nothing can go wrong with a create-insert-delete process (🤔😔). I also will not know the scale of the number of records as that will be variable and will not be in a predicable order of magnitude for the input pipeline, it may be; 50, 500, 5000 or 50000.

 

A user can create a temp table even if they do not have Create Table permissions on the database which is the use case I have in mind.

Badge

I'm not sure how well that'd work with more than 270'000 Excel features...

it took about 12 min to concatanate 10digit numberSkärmbild 2021-07-09 102549Skärmbild 2021-07-09 102631if SQL query is not buying it try group features and use

Where [attribute] IN (123,123,222,....)
or [attribute] IN (456,878,888,....)

 

 

Badge

Why not concatanate excel features with comma separator and then use one feature in SQLExecutor?Skärmbild 2021-07-09 094857Skärmbild 2021-07-09 094924Skärmbild 2021-07-09 094831you might need some modification to aggregated attribute if it's of text type. (this should work for int)

Hi @edgarrunnman​ ,

Thanks that looks an interesting proposition.   In the case of SQL server there may be a limit on the number of elements that can go into the In Clause. (https://dba.stackexchange.com/questions/228695/is-there-any-limit-for-in-results-in-sql-server) of 32,000.  This post also points to the solution which strikes me as a possible and elegant way to do it which is to create a temp table from our input and then use this as  part of  the SQLExecutor

 

CREATE TABLE #MyTempTable
(myeircode varchar(8) not null,
zones integer not null);
 
--This is the bit that I'm not sure how to implent in SQLExecutor
INSERT INTO #MyTempTable
(myeircode, zones)
--suspect there is a limit to the number on this  can be called (1000?)
VALUES
('p31xxxxx',1),
('p31yyyyy',1),
(p31zzzz',2);
 
--Now use the temp table
Select * 
 FROM dbo.[AddressTable] as GD
 WHERE GD.[EIRCODE] IN
  (
  t.myeircode
  FROM #MyTempTable as t
  );

 

Badge

Hi @edgarrunnman​ ,

Thanks that looks an interesting proposition.   In the case of SQL server there may be a limit on the number of elements that can go into the In Clause. (https://dba.stackexchange.com/questions/228695/is-there-any-limit-for-in-results-in-sql-server) of 32,000.  This post also points to the solution which strikes me as a possible and elegant way to do it which is to create a temp table from our input and then use this as  part of  the SQLExecutor

 

CREATE TABLE #MyTempTable
(myeircode varchar(8) not null,
zones integer not null);
 
--This is the bit that I'm not sure how to implent in SQLExecutor
INSERT INTO #MyTempTable
(myeircode, zones)
--suspect there is a limit to the number on this  can be called (1000?)
VALUES
('p31xxxxx',1),
('p31yyyyy',1),
(p31zzzz',2);
 
--Now use the temp table
Select * 
 FROM dbo.[AddressTable] as GD
 WHERE GD.[EIRCODE] IN
  (
  t.myeircode
  FROM #MyTempTable as t
  );

 

call "Grouper" transfomer with sample Rate 1000(?), make an "insert_string" attribute and concatanate with comma separator grouping by "_group_index"Skärmbild 2021-07-09 111103and then 

INSERT INTO #MyTempTable
(myeircode, zones)
VALUES @Value(insert_string) ;

 and make index for myericode

CREATE TABLE #MyTempTable
(myeircode varchar(8) not null, zones integer not null)
INDEX index_for_myeirocode (myeircode)

 

Userlevel 4

Hi @david_r​ ,

I was thinking of this as a possibility, but in a production environment I don't think I want to make the database writable to random users and there also needs to be a clean up whereby the data put into the database then needs to be deleted. I'm too paranoid to think nothing can go wrong with a create-insert-delete process (🤔😔). I also will not know the scale of the number of records as that will be variable and will not be in a predicable order of magnitude for the input pipeline, it may be; 50, 500, 5000 or 50000.

 

A user can create a temp table even if they do not have Create Table permissions on the database which is the use case I have in mind.

Agreed, I'd look into using a temp-temple for this. Perhaps consider using a global temp table so that it persists between different connections in your workspace.

Badge

Hi @edgarrunnman​ ,

Thanks that looks an interesting proposition.   In the case of SQL server there may be a limit on the number of elements that can go into the In Clause. (https://dba.stackexchange.com/questions/228695/is-there-any-limit-for-in-results-in-sql-server) of 32,000.  This post also points to the solution which strikes me as a possible and elegant way to do it which is to create a temp table from our input and then use this as  part of  the SQLExecutor

 

CREATE TABLE #MyTempTable
(myeircode varchar(8) not null,
zones integer not null);
 
--This is the bit that I'm not sure how to implent in SQLExecutor
INSERT INTO #MyTempTable
(myeircode, zones)
--suspect there is a limit to the number on this  can be called (1000?)
VALUES
('p31xxxxx',1),
('p31yyyyy',1),
(p31zzzz',2);
 
--Now use the temp table
Select * 
 FROM dbo.[AddressTable] as GD
 WHERE GD.[EIRCODE] IN
  (
  t.myeircode
  FROM #MyTempTable as t
  );

 

Thanks @edgarrunnman​ & @david_r​ , I'll have a go using this pattern over the weekend as well as using a global temp as well.  The grouper is the thing I think I was missing as it has the potential to compress 1000's of calls into just a few.

Badge

This was got to work in the end. Overall the run time reduced from 1'30 down to 2.5 seconds.  Thanks to @edgarrunnman​  and @david_r​  for their suggestions.  I'll answer my own question to close this off.  The use case was implemented on MS SQL server but should apply to other RDBMS as well.

 

The steps are firstly to create a temp table via a SQLExecutor with the SQL Statement:

FME_SQL_DELIMITER ;
create table $(targetTempTable)
(variableName varchar(7));

In this case I have used a private parameter for a temp table name which in SQL server starts with #.  This removes the need for the end user to have to worry about the name and ensures a temp table is created.   Temporary tables are dropped when the session that creates the table has closed. (Details)

Once the table is created you use your input to populate your temp table with the values you want to use as a filter on the data base joiner with an exhaustive prefetch query:

select*
from SomeDataBaseTable
WHERE eircode IN
(SELECT eircode from $(targetTempTable))

 

DatabaseJoiner_redactedThe entire workflow is below

WorkFlow 

 

Userlevel 4

This was got to work in the end. Overall the run time reduced from 1'30 down to 2.5 seconds.  Thanks to @edgarrunnman​  and @david_r​  for their suggestions.  I'll answer my own question to close this off.  The use case was implemented on MS SQL server but should apply to other RDBMS as well.

 

The steps are firstly to create a temp table via a SQLExecutor with the SQL Statement:

FME_SQL_DELIMITER ;
create table $(targetTempTable)
(variableName varchar(7));

In this case I have used a private parameter for a temp table name which in SQL server starts with #.  This removes the need for the end user to have to worry about the name and ensures a temp table is created.   Temporary tables are dropped when the session that creates the table has closed. (Details)

Once the table is created you use your input to populate your temp table with the values you want to use as a filter on the data base joiner with an exhaustive prefetch query:

select*
from SomeDataBaseTable
WHERE eircode IN
(SELECT eircode from $(targetTempTable))

 

DatabaseJoiner_redactedThe entire workflow is below

WorkFlow 

 

Looks like you found a great solution. Thanks for sharing!

Badge

Why not concatanate excel features with comma separator and then use one feature in SQLExecutor?Skärmbild 2021-07-09 094857Skärmbild 2021-07-09 094924Skärmbild 2021-07-09 094831you might need some modification to aggregated attribute if it's of text type. (this should work for int)

@edgarrunnman​ I'll now be able to look at this now that I have the fundamental pattern working as this will point to an efficient way to populate the temp table in the case where we may have 1000's of records.

Reply