Solved

Extract unique values from attribute table using PythonCaller


Userlevel 3
Badge +26

Hello - I have a file geodatabase where most feature classes contains an attribute 'text_desc'. I would like to extract all unique values, and concatenate those into a single comma-separated attribute to generate a report. Due to slow gdb read times in FME, I would like to do this in PythonCaller for efficiency.

 

Is this possible?

icon

Best answer by ebygomm 8 April 2021, 18:03

View original

11 replies

Userlevel 4

If this is an Enterprise Geodatabase (SDE), why not let the database do the work for you? It's going to so much faster than anything you could do with FME. Example SQL that you can use in e.g. a SQLExecutor:

SELEC DISTINCT text_desc
FROM my_feature_class

You will then only get the unique values back, and you can then use e.g. an Aggregator to concatenate the values.

Userlevel 3
Badge +26

If this is an Enterprise Geodatabase (SDE), why not let the database do the work for you? It's going to so much faster than anything you could do with FME. Example SQL that you can use in e.g. a SQLExecutor:

SELEC DISTINCT text_desc
FROM my_feature_class

You will then only get the unique values back, and you can then use e.g. an Aggregator to concatenate the values.

Sorry, this is a File Geodatabase.

Userlevel 1
Badge +10

Can you use arcpy?

Userlevel 3
Badge +26

Can you use arcpy?

I can, but with my limited experience, so far I have been unsuccessful in getting it work.

Userlevel 1
Badge +10

I can, but with my limited experience, so far I have been unsuccessful in getting it work.

A couple of examples, need a parameter holding the gdb location and the feature triggering the pythoncaller needs to have attributes holding the featureclass name and the fieldname

import fme
import fmeobjects
import arcpy
 
def getUnique(feature):
    #get parameter with location of fileGDB
    arcpy.env.workspace = FME_MacroValues['SourceDataset_GEODATABASE_FILE']
    #get featureclassname
    fc = feature.getAttribute('featureclass')
    #get field name
    fn = feature.getAttribute('fieldname')
    row=[x[0] for x in arcpy.da.TableToNumPyArray(fc,fn)]
    uniquelist = set(row)
    #create new attribute with concatenated list joined with comma
    feature.setAttribute("unique_values",",".join(uniquelist))

or

import fme
import fmeobjects
import arcpy
 
def getUnique(feature):
    
    arcpy.env.workspace = FME_MacroValues['SourceDataset_GEODATABASE_FILE']
    fc = feature.getAttribute('featureclass')
    fn = feature.getAttribute('fieldname')
    with arcpy.da.SearchCursor(fc,fn) as SCur:
        valueList = []
        for row in SCur:
            if not row[0] in valueList:
                valueList.append(row[0])
    feature.setAttribute("unique_values",",".join(valueList))

 

 

Userlevel 3
Badge +26

I can, but with my limited experience, so far I have been unsuccessful in getting it work.

Thanks @ebygomm, the second works well with one feature class being input. If I wanted to query multiple feature class, such as 'Rivers' and 'Lakes', how could I go about that?

Userlevel 1
Badge +10

I can, but with my limited experience, so far I have been unsuccessful in getting it work.

Is it the same fieldname for each feature class? Do you want a single attribute containing all values in both featureclasses or something else?

Userlevel 3
Badge +26

I can, but with my limited experience, so far I have been unsuccessful in getting it work.

Yes, it is the same fieldname "text_desc" on each feature class. I would need a single attribute output containing unique values of "text_desc" from all the feature classes. So, if a feature from Rivers had text_desc=1, and another Rivers had text_desc=2, and a Lakes feature had text_desc=1, the result would be "unique_values=1,2"

Userlevel 1
Badge +10

I can, but with my limited experience, so far I have been unsuccessful in getting it work.

import fme
import fmeobjects
import arcpy
 
def getUnique(feature):
    
    arcpy.env.workspace = FME_MacroValues['SourceDataset_GEODATABASE_FILE']
    fc = feature.getAttribute('featureclass{}')
    fn = feature.getAttribute('fieldname')
    valueList = []
    for f in fc:
        with arcpy.da.SearchCursor(f,fn) as SCur:            
            for row in SCur:
                if not row[0] in valueList:
                    valueList.append(row[0])
    feature.setAttribute("unique_values",",".join(valueList))

You could do this if you have the featureclasses stored in a list. There's probably more efficient ways

 

Userlevel 3
Badge +26

I can, but with my limited experience, so far I have been unsuccessful in getting it work.

Thanks @ebygomm​ ! It worked great.

Userlevel 4

I can, but with my limited experience, so far I have been unsuccessful in getting it work.

This is a great solution, but it can be somewhat slow if there are a lot of unique values. Here's a slightly different take that has a more linear performance, since it uses the Python built-in set function rather than doing repeated list lookups:

import fme
import fmeobjects
import arcpy
 
def getUnique(feature):
   arcpy.env.workspace = FME_MacroValues['SourceDataset_GEODATABASE_FILE']
   fc = feature.getAttribute('featureclass{}')
   fn = feature.getAttribute('fieldname')
   valueList = []
   for f in fc:
       with arcpy.da.SearchCursor(f, fn) as cursor:
           valueList.extend(list({row[0] for row in cursor}))
   feature.setAttribute("unique_values",",".join(set(valueList)))

Tested with ~1.5 million records containing ~130k unique values (medium length strings):

  • Using list lookup: 15 minutes
  • Using sets: 17 seconds

If the number of unique values is very small, the two solutions are practically as fast.

Reply