Skip to main content
Question

Export Unique Values For Each Field From Multiple Datasets


mikeinhamilton
Contributor
Forum|alt.badge.img+5

I'm trying to determine all of the unique values for each field for a dataset. I'd like to create a dynamic workspace because I need to do this for 100+ datasets. Each dataset has a different number of fields with different field names.

I'd either like to export a csv for each dataset with the field name and all the unique values listed.

The data will be coming from feature classes in an Oracle database. Any suggestions?

5 replies

erik_jan
Contributor
Forum|alt.badge.img+17
  • Contributor
  • March 3, 2017

Not sure if this can be done in FME alone.

But I would crate a value table first in Orcale using a PL/SQL script like this:

create table values_table(table varchar2(30), column varchar2(30), value varchar2(2000));

declare l_c cursor is select table_name, column_name from user_tab_columns; l_s_statement varchar2(2000); l_value varchar2(2000); begin for l_r in l_c loop for l_s_value in 'select distinct l_r.column_name from l_r.table_name' loop Insert into values_table (l_r.table_name, l_r.column_name, l_s_value ); end loop; end loop; end;Hope this helps solving your challenge.

itay
Supporter
Forum|alt.badge.img+16
  • Supporter
  • March 3, 2017

Hi @mikey,

I think this can be done with creating a list of the values for each field and using the list histogrammer to find all the unique values per field.

Hope this helps.


david_r
Evangelist
  • March 7, 2017

You can use a dynamic reader (expose fme_feature_type) followed by an AttributeExploser. You can then create content statistics using a group by on the fme_feature_type and _attr_name, e.g. using a StatisticsCalculator.

That said, I think a solution like the one suggested by @erik_jan will be faster by an order of magnitude. The reason is that you won't have to extract the data from the database first, and also that using the AttributeExploder on large datasets tend to consume a lot of memory.


nielsgerrits
VIP

Or read the records from all_tab_cols with the SQLExecutor in FME and dynamically create "select distinct" statements for all table / column combinations. This way you can easily exclude columns containing primary keys etc as it is useless and time consuming to get the unique values of those.

This example took 1 minute to run.

00-workbench.fmw

Generally speaking it is best practice to let the database do the work if possible. Instead of loading all the data in FME and process it locally.

Select distinct doesn't work on BLOB's / CLOB's tho. You might need query all not null values from the db to analyse those.


jdh
Contributor
Forum|alt.badge.img+28
  • Contributor
  • March 7, 2017

The most efficient way I found for generic data (ie no taking advantage of built in database functionality) was to use python, to get the attribute names (feature.getAllAttributeNames()) add them to a master list (not all features necessarily have all attributes) and then loop through the masterList and for each attribute loop through the features and built the unique value list. I outputted one feature per attribute, with a list of unique values as well the number of features that had an "empty" value for that attribute.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings