Note:
The suggested idea Is created as a poc and are used with combinations of fme transformers and it works fast. To summarize
1. Create temp table
2. Bulk insert into temp table
3. SQL queries between temp table and other tables within the server.
4. remove temp table
Problem Statement:
When FME handles large datasets and performs joins with features from FME flow with a databaes table, particularly with databases containing many attributes, the process can become time-consuming. This is due to the row-by-row attribute comparison FME typically employs, and the potential communication overhead when interacting with the database.
Current State:
FME handles joins within its own environment. While functional, this approach can lead to performance bottlenecks when dealing with substantial data volumes and complex join conditions, especially when interacting with external databases. This one by one comparision applies for both the existing databas joiner and sql executor ( SQL executor is fast if all the data is within the sql server, not if you want to join data from the fme flow with sql, then its one by one)
Proposed Feature:
I propose the introduction of a new transformer, proposed name: the " Large Dataset Join ", which would seamlessly integrate with various database systems. This transformer would leverage the database's native capabilities for handling joins, leading to significant performance.
Here's how it would work:
- Bulk Insert into Temporary Table: The transformer would efficiently transfer large datasets from FME to a temporary table within the target database using bulk insert operations. This feature already exist in FME and can be re-used.
- Database-Side Join: The join operation itself would be executed directly within the database, utilizing the temporary table and the target table. Users would have the flexibility to specify join conditions using either SQL queries or a visual interface.
- Retrieve Joined Data: Only the necessary joined data would be retrieved back into FME, minimizing data transfer overhead.
- Automatic Temporary Table Management: The transformer would handle the creation and cleanup of the temporary table, streamlining the process for the user.
Key Benefits:
- Significant performance improvement for large dataset joins, particularly with databases containing many attributes.
- Reduced processing burden on FME, freeing up resources for other tasks.
- Streamlined workflow for leveraging the database's optimized join capabilities.