Question

Check if record exists in table

  • 31 January 2017
  • 17 replies
  • 108 views

Badge +9

I have new data which I need to load into an existing table, but need to check if an ID number exists in the original table. If it does exist in the original table then I need to move that record to a new table and insert the new data into the original table..

I hope that makes sense, but what is the best method to check if an id exists and move it.


17 replies

Userlevel 2
Badge +17

Hi @ingalla, what is the format of the table? The way may be different depending on the format. For example, if the table belongs to a relational database, using SQL statements would be a possible way.

Badge +7

Hi @ingalla

 

You can achieve this functionality by trying to link the input data with a Joiner to an existing database. The output of the Joiner can be sent through a TestFilter checking if the _matched_records attribute contains a value which is higher than 0. If this is the case, the ID is already present in the database. Else it is not.

In the Joiner make sure the cardinality is set to 'Match First (1:0..1+)

 

Badge +1

You might also want to consider using the FeatureMerger. It does the job and is easily one of the most user-friendly/transparent FME transformers out there. Do take Takashi's comment into account though. Using SQL expressions directly on a database is usually the fastest option when you are working with database formats.

Badge +16

Hi @ingalla,

Hopefully you mean a database when you say table in that case you can use a number of methods to detect the existence of an ID in your target table, for example the FeatureMerger can be used, or using a WHERE IN clause on your target table to get the records that have the same id in your new data is another possibility. Another option is to use the UpdateDetector transformer.

After the matching you have two actions you need to do, an update on the target table for records that have the same id, this you can accomplish via the writer of that table or a transformer (DatabaseUpdater) and insert the existing old record in a new table (standaard writer action)

I would recommend using 2 writers and make sure that the insert into the new table is first in the navigator window to be sure that the insert into the new table takes place first before the update is done.

There are probably other ways to do this, as is the case when using FME.

A link to some useful articles.

Hope this helps.

Userlevel 4

If your database supports SQL, my preference would be an SQLExecutor with something like:

selec count(1) as record_count
from my_table
where my_id = @Value(ID)

This assumes the SQLExecutor receives a feature containing the attribute ID that you want to look for.

You then expose the attribute name "record_count" (or "RECORD_COUNT", depending on the database) and check if it's 1 or 0, e.g. using a Tester.

If the database field "my_id" is properly indexed this should be lightening fast.

I recommend avoiding functionality that moves the record count from the database to FME (i.e. FeatureMerger, Counter, etc).

Userlevel 4

Hi @ingalla

 

You can achieve this functionality by trying to link the input data with a Joiner to an existing database. The output of the Joiner can be sent through a TestFilter checking if the _matched_records attribute contains a value which is higher than 0. If this is the case, the ID is already present in the database. Else it is not.

In the Joiner make sure the cardinality is set to 'Match First (1:0..1+)

 

This is a great solution. Just be a careful if you're publishing to FME Server, several versions of FME before 2016.1.2 have a serious bug with the Joiner when used with some database formats such as MS SQL and Oracle.
Userlevel 4

You might also want to consider using the FeatureMerger. It does the job and is easily one of the most user-friendly/transparent FME transformers out there. Do take Takashi's comment into account though. Using SQL expressions directly on a database is usually the fastest option when you are working with database formats.

Personally I'd avoid using the FeatureMerger at all costs if the table in question contains more than just a few records, as you say it's simply not very efficient at all. But I do agree about the ease of use.
Badge +7

If your database supports SQL, my preference would be an SQLExecutor with something like:

selec count(1) as record_count
from my_table
where my_id = @Value(ID)

This assumes the SQLExecutor receives a feature containing the attribute ID that you want to look for.

You then expose the attribute name "record_count" (or "RECORD_COUNT", depending on the database) and check if it's 1 or 0, e.g. using a Tester.

If the database field "my_id" is properly indexed this should be lightening fast.

I recommend avoiding functionality that moves the record count from the database to FME (i.e. FeatureMerger, Counter, etc).

I am wondering why you should use an SQLExecutor instead of the Joiner? If you have to test > 10 000 features, executing a query for every one of those will take quite a lot of time. What is your opinion about this?

 

Badge +1
Personally I'd avoid using the FeatureMerger at all costs if the table in question contains more than just a few records, as you say it's simply not very efficient at all. But I do agree about the ease of use.
I agree. The FeatureMerger is definitely one of the slower/slowest options for this use-case. I just wanted to point out that it can be used (although it's not the best option). Its 'readability' might in this case be an advantage for new/inexperienced FME users compared to the SQLExecutor or joiner (even if it is only used for testing purposes on a smaller dataset).

 

 

Userlevel 4
I am wondering why you should use an SQLExecutor instead of the Joiner? If you have to test > 10 000 features, executing a query for every one of those will take quite a lot of time. What is your opinion about this?

 

That's an excellent question. First, having battled the Joiner bug on FME Server for several days I'm now very hesitant to use it ;-)

 

 

But seriously, I think they both have their role. If you find yourself looking up more or less the same ID many times, the Joiner will be a lot more efficient because of the built-in caching. The prefetch query is also a fantastic feature in scenarios where you constantly look up a known and small subset from a large table.

 

 

In the case of this question I had the impression that the ID changes for every feature (I might be wrong), and so I'm not sure it makes a difference whether you use one or the other

 

 

But in the end I'm a database guy, so I prefer SQL because then I know exactly what's going on :-)
Badge +7
That's an excellent question. First, having battled the Joiner bug on FME Server for several days I'm now very hesitant to use it ;-)

 

 

But seriously, I think they both have their role. If you find yourself looking up more or less the same ID many times, the Joiner will be a lot more efficient because of the built-in caching. The prefetch query is also a fantastic feature in scenarios where you constantly look up a known and small subset from a large table.

 

 

In the case of this question I had the impression that the ID changes for every feature (I might be wrong), and so I'm not sure it makes a difference whether you use one or the other

 

 

But in the end I'm a database guy, so I prefer SQL because then I know exactly what's going on :-)
That is especially the case with FME. Everybody does have his/her preferred way of doing stuff. As you probably have noticed already, I am a Python guy :)

 

Now for your suggestion, using an aggregator to combine the id's and doing so limit the amount of queries to be executed can improve the performance a lot. But of course, you know that already. Just mentioning this for other readers ;)

 

Userlevel 4
That is especially the case with FME. Everybody does have his/her preferred way of doing stuff. As you probably have noticed already, I am a Python guy :)

 

Now for your suggestion, using an aggregator to combine the id's and doing so limit the amount of queries to be executed can improve the performance a lot. But of course, you know that already. Just mentioning this for other readers ;)

 

Yes, lot of ways, as always. Also a Sampler combined with a VariableRetriever/VariableSetter to avoid messing with the geometries, etc :-)
Userlevel 2
Badge +17

wow, many excellent suggestions have been posted while I was off line. My first inspiration was to create a comma separated ID list and then execute these SQL statements to extract existing records and remove them from the original table. I think this way would be relatively efficient.

select * from original_table where id in (<comma separated ID list>);delete from original_table where id in (<comma separated ID list>);

See also the attached workspaces (FME 2016.1.3).

Userlevel 4

wow, many excellent suggestions have been posted while I was off line. My first inspiration was to create a comma separated ID list and then execute these SQL statements to extract existing records and remove them from the original table. I think this way would be relatively efficient.

select * from original_table where id in (<comma separated ID list>);delete from original_table where id in (<comma separated ID list>);

See also the attached workspaces (FME 2016.1.3).

I agree, this is a good solution for a "reasonable" amout of id's, but if the list gets really long it might become a problem. What's reasonable will of course depend on the database implementation, e.g. for MS SQL Server it will be the so-called maximum batch size that limits the SQL statement length.

 

Badge +7
I agree, this is a good solution for a "reasonable" amout of id's, but if the list gets really long it might become a problem. What's reasonable will of course depend on the database implementation, e.g. for MS SQL Server it will be the so-called maximum batch size that limits the SQL statement length.

 

Yes indeed, I tend to limit the amount of id's concatenated to 100 000. Otherwise an error might be thrown depending on how the ids are formatted (serial / UUID)

 

 

Userlevel 2
Badge +17
I agree, this is a good solution for a "reasonable" amout of id's, but if the list gets really long it might become a problem. What's reasonable will of course depend on the database implementation, e.g. for MS SQL Server it will be the so-called maximum batch size that limits the SQL statement length.

 

Yes, the concern is the limitation on the size of SQL statements, so I'm using the grouping tech in the workspace example, which I provided in this discussion.

 

Synchronous execution of features

 

 

Badge

You may want to experiment with FeatureMerger - where your existing table is 'supplier' and the new/current dataset is a requestor - all new records will be going via unmerged port. I found that sometimes it's easier to use FeatureMerger than Joiner.

Hope it helps.

Sasha

Reply