Skip to main content

Hey Team!

 

I'm scratching my head over a problem that I'm sure one of the gurus on here can help with.

 

I have some very large spatial files that I need to store in a Snowflake environment.

Snowflake has a few big issues when it comes to spatial data. It has very strict acceptance criteria on anything submitted as geometry so we have decided to store all of the spatial information as a binary field to pull out later (using GeometryExtractor) .

However when writing this binary attribute to snowflake there is a size limit of 16MB... and when some of your features have over a million vertices this is not really going to work.

We are looking into storing the FME binary over a number of attributes (_geom1, _geom2, ect.) to stick back together when needed, but cant get any string based functions to work due to it now being in binary format. I have tried substring extractors and attribute compressors with not much luck.

Any Idea?

Here's some Python code to split up an FME binary (e.g. from the GeometryExtractor) into chunks of a pre-defined size (1600 bytes in this small example, you probably want to set it to 16777216, or one less):

def chunked(size, source):
    for i in range(0, len(source), size):
        yield source i:i+size]
 
def FMEBufferSplitter(feature):
    chunk_size = 1600  # change as needed
    geom = feature.getAttribute('_geometry')
    parts = chunked(chunk_size, geom)
    feature.setAttribute('_geometry_parts{}', list(parts))

 

And conversively, here's how to join them back together:

def FMEBufferJoiner(feature):
    parts = feature.getAttribute('_geometry_parts{}') or o]
    joined = b''.join(parts)
    feature.setAttribute('_geometry', joined)

I've also attached a demo workspace (2022.2) with sample data from the FME training dataset.

If possible, I would rather consider writing the geometries in something like PostGIS and then use a GUID in Snowflake as a foreign key to the table holding the actual geometry.


Here's some Python code to split up an FME binary (e.g. from the GeometryExtractor) into chunks of a pre-defined size (1600 bytes in this small example, you probably want to set it to 16777216, or one less):

def chunked(size, source):
    for i in range(0, len(source), size):
        yield source i:i+size]
 
def FMEBufferSplitter(feature):
    chunk_size = 1600  # change as needed
    geom = feature.getAttribute('_geometry')
    parts = chunked(chunk_size, geom)
    feature.setAttribute('_geometry_parts{}', list(parts))

 

And conversively, here's how to join them back together:

def FMEBufferJoiner(feature):
    parts = feature.getAttribute('_geometry_parts{}') or o]
    joined = b''.join(parts)
    feature.setAttribute('_geometry', joined)

I've also attached a demo workspace (2022.2) with sample data from the FME training dataset.

If possible, I would rather consider writing the geometries in something like PostGIS and then use a GUID in Snowflake as a foreign key to the table holding the actual geometry.

This worked perfectly thanks!


Reply