Skip to main content
Question

Trouble FME to Databricks Writer


marcofuchs
Contributor
Forum|alt.badge.img+1

We're currently integrating FME with Databricks. In a vanilla Databricks environment, everything works as expected—both reading from and writing to Unity Catalog tables using the Databricks Reader and Writer in FME.

However, in our organizational Databricks environment, we're encountering an issue:

  • Reading works fine using the same credentials (tested with both OAuth and PAT).

  • Writing, however, gets stuck in an infinite polling loop in FME, with no error message returned.

The same credentials and target table work fine for writing when using a Python script in the same environment, so it seems to be specific to how FME writes data.

Given that writing works in the vanilla setup but not in our organizational environment, I suspect the issue is related to permissions or Unity Catalog access controls that are more tightly restricted in the corporate setup.

My questions:

  1. What exact permissions are required in Databricks for FME's Databricks Writer to write to Unity Catalog tables?

  2. What method does FME use under the hood to perform write operations to Databricks/Unity Catalog

Environment details:

  • FME version: 2024.2.2

  • Platform: Azure Databricks

  • Authentication: Both OAuth and PAT tested

  • Target: Unity Catalog tables

Thanks for any guidance or insight you can provide!

Best regards,
Marco

4 replies

j.botterill
Influencer
Forum|alt.badge.img+40
  • Influencer
  • June 4, 2025

Hi Marco, have you examined this Getting-Started-with-Databricks article which has troubleshooting guide and more info on the How-to-Use-the-Databricks-Writer


marcofuchs
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • June 4, 2025

Hi ​@j.botterill
Thanks for the guids!

How to Use the Databricks Writer says the follwing:
 

Staging Upload Credentials
When writing features to a table in Databricks, the writer requires a staging location. For the Databricks writer, there is the option to use a Unity Catalog Volume, Amazon S3 or Azure Data Lake Storage Gen2 (ADLS).

If you decide to use Amazon S3 or Azure Data Lake Storage Gen2, your Databricks workspace MUST have access to the cloud storage location you use. 

AWS: For instructions on how to connect to an Amazon S3 bucket from Databricks.
Microsoft Azure: For instructions on how to connect to Azure Data Lake Storage Gen2 from Databricks.

 

I'm using a Unity Catalog Volume, but the documentation doesn't specify the access permissions to be set for Unity Catalog required by FME.

The troubleshooting-guide (Troubleshooting the Databricks Reader & Writer) has an interesting link: https://support.safe.com/hc/en-us/articles/Set%20up%20and%20manage%20Unity%20Catalog. Unfortunately, the link cannot be accessed.

 

Any ideas?


j.botterill
Influencer
Forum|alt.badge.img+40
  • Influencer
  • June 5, 2025

I lodged the broken link to Safe software. Does this help https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started

There appears to be grant by UI or SQL statements which I’m sure you could build into FME via SQLexecutor/creator

 


marcofuchs
Contributor
Forum|alt.badge.img+1
  • Author
  • Contributor
  • June 5, 2025

I guess I found the reason why it didn't work. With 1’394 tables and 33,000 columns, the catalog was too large for the FME Databrick Writer. Because before the actual writing, the DatabrickWriter reads all columns and times out doing this query:

DEBUG: http.client send: b'{"language": "sql", "clusterId": "", "contextId": "", "command": "SELECT\\n ST.schema_name, ST.table_name, ST.table_type, ST.is_insertable_into, ST.data_source_format,\\n C.column_name, C.is_nullable, C.full_data_type, C.ordinal_position, C.is_updatable\\nFROM\\n (SELECT S.schema_name, T.* FROM\\n (`dev_sandbox`.information_schema.schemata as S)\\n LEFT JOIN\\n (SELECT table_schema, table_name, table_type, is_insertable_into, data_source_format FROM `dev_sandbox`.information_schema.tables) AS T\\n ON S.schema_name = T.table_schema WHERE S.schema_name IN (\\"sandbox\\")\\n ) AS ST\\nLEFT JOIN\\n (SELECT table_schema, table_name, column_name, is_nullable, full_data_type, ordinal_position, is_updatable FROM `dev_sandbox`.information_schema.columns) AS C\\nON ST.schema_name = C.table_schema AND ST.table_name = C.table_name AND ST.data_source_format = \'DELTA\'\\nORDER BY schema_name, table_name, o

 

It works with a smaller catalog (10 tables). Now: What are the possibilities for making it work in the "large" catalog?


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings