Question

Trouble FME to Databricks Writer

29 days ago
June 4, 2025
4 replies
36 views

marcofuchs
Contributor
2 replies

We're currently integrating FME with Databricks. In a vanilla Databricks environment, everything works as expected—both reading from and writing to Unity Catalog tables using the Databricks Reader and Writer in FME.

However, in our organizational Databricks environment, we're encountering an issue:

Reading works fine using the same credentials (tested with both OAuth and PAT).
Writing, however, gets stuck in an infinite polling loop in FME, with no error message returned.

The same credentials and target table work fine for writing when using a Python script in the same environment, so it seems to be specific to how FME writes data.

Given that writing works in the vanilla setup but not in our organizational environment, I suspect the issue is related to permissions or Unity Catalog access controls that are more tightly restricted in the corporate setup.

My questions:

What exact permissions are required in Databricks for FME's Databricks Writer to write to Unity Catalog tables?
What method does FME use under the hood to perform write operations to Databricks/Unity Catalog

Environment details:

FME version: 2024.2.2
Platform: Azure Databricks
Authentication: Both OAuth and PAT tested
Target: Unity Catalog tables

Thanks for any guidance or insight you can provide!

Best regards,
Marco

+40

j.botterill
Influencer
326 replies
29 days ago
June 4, 2025

Hi Marco, have you examined this Getting-Started-with-Databricks article which has troubleshooting guide and more info on the How-to-Use-the-Databricks-Writer

marcofuchs
Author
Contributor
2 replies
29 days ago
June 4, 2025

Hi @j.botterill
Thanks for the guids!

How to Use the Databricks Writer says the follwing:

Staging Upload Credentials
When writing features to a table in Databricks, the writer requires a staging location. For the Databricks writer, there is the option to use a Unity Catalog Volume, Amazon S3 or Azure Data Lake Storage Gen2 (ADLS).
If you decide to use Amazon S3 or Azure Data Lake Storage Gen2, your Databricks workspace MUST have access to the cloud storage location you use.
AWS: For instructions on how to connect to an Amazon S3 bucket from Databricks.
Microsoft Azure: For instructions on how to connect to Azure Data Lake Storage Gen2 from Databricks.

I'm using a Unity Catalog Volume, but the documentation doesn't specify the access permissions to be set for Unity Catalog required by FME.

The troubleshooting-guide (Troubleshooting the Databricks Reader & Writer) has an interesting link: https://support.safe.com/hc/en-us/articles/Set%20up%20and%20manage%20Unity%20Catalog. Unfortunately, the link cannot be accessed.

Any ideas?

+40

j.botterill
Influencer
326 replies
28 days ago
June 5, 2025

I lodged the broken link to Safe software. Does this help https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started

There appears to be grant by UI or SQL statements which I’m sure you could build into FME via SQLexecutor/creator

marcofuchs
Author
Contributor
2 replies
28 days ago
June 5, 2025

I guess I found the reason why it didn't work. With 1’394 tables and 33,000 columns, the catalog was too large for the FME Databrick Writer. Because before the actual writing, the DatabrickWriter reads all columns and times out doing this query:

DEBUG: http.client send: b'{"language": "sql", "clusterId": "", "contextId": "", "command": "SELECT\\n ST.schema_name, ST.table_name, ST.table_type, ST.is_insertable_into, ST.data_source_format,\\n C.column_name, C.is_nullable, C.full_data_type, C.ordinal_position, C.is_updatable\\nFROM\\n (SELECT S.schema_name, T.* FROM\\n (`dev_sandbox`.information_schema.schemata as S)\\n LEFT JOIN\\n (SELECT table_schema, table_name, table_type, is_insertable_into, data_source_format FROM `dev_sandbox`.information_schema.tables) AS T\\n ON S.schema_name = T.table_schema WHERE S.schema_name IN (\\"sandbox\\")\\n ) AS ST\\nLEFT JOIN\\n (SELECT table_schema, table_name, column_name, is_nullable, full_data_type, ordinal_position, is_updatable FROM `dev_sandbox`.information_schema.columns) AS C\\nON ST.schema_name = C.table_schema AND ST.table_name = C.table_name AND ST.data_source_format = \'DELTA\'\\nORDER BY schema_name, table_name, o

It works with a smaller catalog (10 tables). Now: What are the possibilities for making it work in the "large" catalog?

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing