Solved

Generate list of all unique characters contained in (huge free text) field.

8 years ago
June 26, 2017
5 replies
169 views

rob14
27 replies

Hi,

Can anyone provide some advice as to the most efficient way to explode a huge free text field or fields into all of its character elements retain a single instance of each. I am essentially trying to complete a pre-flight check in order to understand whether there are any ‘odd’ or ‘unexpected’ characters in an ever expanding data set, over which I have no control.

I have created a process below which completes the task; however, it is very inefficient and as the number of records increases it will become too slow.

1. Derive string length of free text field

2. clone by number derived in 1 (clone number created in process)

3. substring extract using clone number to obtain character at that position

4. Duplicate remover to create my list.

5. Expose character code.

6. Output list

Thanks in advance,

Rob

Best answer by takashi

1. You can expose the list name "_char{}" with the Attributes to Expose parameter in the PythonCaller parameters dialog.

2. This script creates a list from all the input features, then outputs a single feature having the list at last.

# PythonCaller Script Example 2
import fmeobjects
class FeatureProcessor(object):
    def __init__(self):
        self.chars = set([])
        
    def input(self, feature):        
        self.chars |= set(feature.getAttribute('_text'))
        
    def close(self):
        feature = fmeobjects.FMEFeature()
        feature.setAttribute('_char{}', list(self.chars))
        self.pyoutput(feature)

In addition, if you finally need to explode the feature on the list, the close method can be modified like this, instead of using the ListExploder afterword.

    def close(self):
        for i, c in enumerate(self.chars):
            feature = fmeobjects.FMEFeature()
            feature.setAttribute('_char', c)
            feature.setAttribute('_element_index', i)
            self.pyoutput(feature)

View original

Did this help you find an answer to your question?

+39

ebygomm
Influencer
3313 replies
8 years ago
June 26, 2017

You could create your list initiallly by using a stringsearcher with regular expression . and creating a list name for all matches, then using a list duplicate remover to get a list of unique characters.

No idea on how that would compare performance wise

takashi
7717 replies
8 years ago
June 26, 2017

Hi @rob14, I think using Python script could be more efficient. Assuming that an attribute called "_text" stores a text string, a PythonCaller with this script creates a list contains unique characters.

# PythonCaller Script Example
def processFeature(feature):
    s = set(feature.getAttribute('_text'))
    feature.setAttribute('_char{}', list(s))

rob14
Author
27 replies
8 years ago
June 26, 2017

takashi wrote:

# PythonCaller Script Example
def processFeature(feature):
    s = set(feature.getAttribute('_text'))
    feature.setAttribute('_char{}', list(s))

Hi @takashi,

Thanks very much, nearly there. but I have 2 questions;

1. the script has worked and I can see the unique chars, however, how do I expose and explode the list "_char". When I tried to use list exploder the list is not seen, do I need to complete additional configuration in the PythonCaller Trasnformer?.

2. Also if I needed to also do this globally across all records to find a unique list across all records, (instead/as well as unique to a given record), is there a quick way to do that as well? (rather than python caller -> list exploder-> duplicate remover.

I am interested in being able to do both.

Thanks,

Rob

takashi
7717 replies
Best Answer
8 years ago
June 26, 2017

1. You can expose the list name "_char{}" with the Attributes to Expose parameter in the PythonCaller parameters dialog.

2. This script creates a list from all the input features, then outputs a single feature having the list at last.

# PythonCaller Script Example 2
import fmeobjects
class FeatureProcessor(object):
    def __init__(self):
        self.chars = set([])
        
    def input(self, feature):        
        self.chars |= set(feature.getAttribute('_text'))
        
    def close(self):
        feature = fmeobjects.FMEFeature()
        feature.setAttribute('_char{}', list(self.chars))
        self.pyoutput(feature)

In addition, if you finally need to explode the feature on the list, the close method can be modified like this, instead of using the ListExploder afterword.

    def close(self):
        for i, c in enumerate(self.chars):
            feature = fmeobjects.FMEFeature()
            feature.setAttribute('_char', c)
            feature.setAttribute('_element_index', i)
            self.pyoutput(feature)

rob14
Author
27 replies
8 years ago
June 26, 2017

Hi @takashi

You are a Star.

Thanks very much that was lighting quick to run!.

Regards,

Rob

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Generate list of all unique characters contained in (huge free text) field.

5 replies

Reply

Helpful Members This Week

Recently Solved Questions

How to see which features have invalid source datasets when using a FeatureWrite?

How to compare multiple AGOL Feature Services

Simple arithmatic problem

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

FeatureExecutoricon

FeatureCreatoricon

AttributeTruncationCheckericon

Question of the Week: Lists and the InlineQueriericon

Help with JSONTemplater - Sub Templates & Arraysicon

Helpful Members This Week

Recently Solved Questions

How to see which features have invalid source datasets when using a FeatureWrite?

How to compare multiple AGOL Feature Services

Simple arithmatic problem

How to get a list of Asana tasks with their corresponding custom field values?

Using one AttributeRounder for different accuracies

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings