Archived

Database-Powered Join Transformer in FME - my poc is superfast

Related products:FME Form

10 months ago
September 7, 2024
6 replies
127 views

ronnie.utter
Contributor
37 replies

Note:
The suggested idea Is created as a poc and are used with combinations of fme transformers and it works fast. To summarize
1. Create temp table
2. Bulk insert into temp table
3. SQL queries between temp table and other tables within the server.
4. remove temp table

Problem Statement:

When FME handles large datasets and performs joins with features from FME flow with a databaes table, particularly with databases containing many attributes, the process can become time-consuming. This is due to the row-by-row attribute comparison FME typically employs, and the potential communication overhead when interacting with the database.

Current State:

FME handles joins within its own environment. While functional, this approach can lead to performance bottlenecks when dealing with substantial data volumes and complex join conditions, especially when interacting with external databases. This one by one comparision applies for both the existing databas joiner and sql executor ( SQL executor is fast if all the data is within the sql server, not if you want to join data from the fme flow with sql, then its one by one)

Proposed Feature:

I propose the introduction of a new transformer, proposed name: the " Large Dataset Join ", which would seamlessly integrate with various database systems. This transformer would leverage the database's native capabilities for handling joins, leading to significant performance.

Here's how it would work:

Bulk Insert into Temporary Table: The transformer would efficiently transfer large datasets from FME to a temporary table within the target database using bulk insert operations. This feature already exist in FME and can be re-used.
Database-Side Join: The join operation itself would be executed directly within the database, utilizing the temporary table and the target table. Users would have the flexibility to specify join conditions using either SQL queries or a visual interface.
Retrieve Joined Data: Only the necessary joined data would be retrieved back into FME, minimizing data transfer overhead.
Automatic Temporary Table Management: The transformer would handle the creation and cleanup of the temporary table, streamlining the process for the user.

Key Benefits:

Significant performance improvement for large dataset joins, particularly with databases containing many attributes.
Reduced processing burden on FME, freeing up resources for other tasks.
Streamlined workflow for leveraging the database's optimized join capabilities.

+50

redgeographics
Celebrity
3643 replies
10 months ago
September 9, 2024

Interesting idea, but conceptually it sounds a lot like what the InlineQuerier does. Did you run a benchmark against that?

+11

ronnie.utter
Author
Contributor
37 replies
10 months ago
September 9, 2024

@redgeographics thanks for your answer.

Well conceptually yes. Inlinequirer does not read any databases though .But it exists due to speed of course. So my suggestion/idea is a hybrid between inlinequirer and sql executor I would say.

+11

ronnie.utter
Author
Contributor
37 replies
10 months ago
September 9, 2024

For us we boosted the time really much. We are using alot of data. I would say from 10 minutes to 2-3 minutes. It was a major improvement

+11

nordpil
Supporter
42 replies
10 months ago
September 13, 2024

@ronnie.utter if you have a routine that works for you, why does it need to be implemented as a transformer?

How I would do it: get the keys from features in the workspace, and assemble them in batches of 1000, concatenated. The query the database to get the joinees using SqlExecutor (in batches, instead of one by one). The use a FeatureJoiner to join joiner/joinee.

An argument to do the joining in the workspace instead of your proposed method: catch errors, mismatched joined etc

+11

ronnie.utter
Author
Contributor
37 replies
10 months ago
September 13, 2024

@nordpil

Thank you for your suggestion! I think your approach with batching keys and using SqlExecutor and FeatureJoiner is a solid method for improving performance without needing new transformers. It offers flexibility for handling errors and mismatches directly within FME, which is valuable for users needing that level of control.

However, while I have a working solution myself, creating a custom transformer would make it easier for others to adopt this method without needing to manually set up batch processing, SQL queries, and joins. A dedicated transformer would streamline the process, making it more user-friendly and accessible to a broader audience, especially those who may not be as familiar with the technical aspects with queries.

In short, both approaches have their merits—yours offers flexibility, while mine prioritizes performance and ease of use. Perhaps a balance between the two could be ideal!

+15

LizAtSafe
Safer
1523 replies
3 months ago
April 5, 2025

New→Archived

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Database-Powered Join Transformer in FME - my poc is superfast