Question

Match sets of keywords in freeform text

5 years ago
June 19, 2019
8 replies
11 views

+28

jdh
Contributor
1981 replies

Hi all,

I have one dataset containing a set of features with attributes like ID, Name, Date, Location.

While none of the individual attributes are unique, the combination of all of them are. (Record)

I have another dataset of features with one attribute containing freeform multiline text. (Text)

Each Text feature contains ALL of the values of ONE feature of the Record dataset, but not in any order, and generally not an exact match on a line.

I need to identify which Text feature corresponds to which Record feature. Each Record should have a zero or one match with a Text feature.

I am assuming that python and regex is the way to go, but I'm not sure as to the most efficient way to process the data.

Record FeaturesIDNameDateLocation24AAA23 MAY 2019X32AAA07 JUN 2019Y24BBB07 JUN 2019Z

A sample text feature could contain something like:

SEE 24
2926m
7000'
Search shelter X
32
500 2000 2500
800 32 200
AAA/ABC
07 JUN 2019
Y

The correct record in this case is 32-AAA.

+28

jdh
Author
Contributor
1981 replies
5 years ago
June 19, 2019

I could guarantee that the Record features arrive before the text features.

Maybe something in a pythonCaller where

Add the record features to a dictionary

For each text feature loop through the record dictionary and regex search for all attributes of that record.

If there is a full match, pop that record from the dictionary, export text feature and break inner loop.

+18

erik_jan
Contributor
2181 replies
5 years ago
June 19, 2019

If the datasets are not too big, you could use an unconditional FeatureMerger (join on 1=1) to create a Cartesian join.

Then follow by a Tester to test on:

Text contains ID And Text contains Name ......

That would solve it using 2 transformers.

+34

ebygomm
Influencer
3275 replies
5 years ago
June 19, 2019

What format is your record dataset in? If it is stored as a csv/text or similar I'd be tempted to read the csv directly to a list of tuples and iterate over them for matches, e.g.

import fme
import fmeobjects
import csv

class FeatureProcessor(object):
    
    
    def __init__(self):
        self.inputfilename = FME_MacroValues['SourceDataset_CSV2']
        with open(self.inputfilename) as f:
            self.data=[tuple(line) for line in csv.reader(f)]

        
    def input(self,feature):
        text = feature.getAttribute('text')
        for y in self.data:
            value = 0
            for x in y:
                if x in text:
                    value +=1
            if value ==4:
                feature.setAttribute('value',value)
                feature.setAttribute('record',','.join(y))
                feature.setAttribute('ID',y[0])
                feature.setAttribute('Name',y[1])
                feature.setAttribute('Date',y[2])
                feature.setAttribute('Location',y[3])
                self.pyoutput(feature)
                break
                    

    def close(self):
        pass

+28

jdh
Author
Contributor
1981 replies
5 years ago
June 19, 2019

erik_jan wrote:

If the datasets are not too big, you could use an unconditional FeatureMerger (join on 1=1) to create a Cartesian join.

Then follow by a Tester to test on:

Text contains ID And Text contains Name ......

That would solve it using 2 transformers.

I would say that it averages to about 2000 records, and there are 8 attributes to check.

+18

erik_jan
Contributor
2181 replies
5 years ago
June 19, 2019

jdh wrote:

I would say that it averages to about 2000 records, and there are 8 attributes to check.

With 2000 records, I would give this a try.

Assuming you use FME 2019, which is a lot faster and better with memory.

+18

erik_jan
Contributor
2181 replies
5 years ago
June 19, 2019

erik_jan wrote:

If the datasets are not too big, you could use an unconditional FeatureMerger (join on 1=1) to create a Cartesian join.

Then follow by a Tester to test on:

Text contains ID And Text contains Name ......

That would solve it using 2 transformers.

And in FME 2019 using the FeatureJoiner instead of FeatureMerger

+28

jdh
Author
Contributor
1981 replies
5 years ago
June 19, 2019

erik_jan wrote:

And in FME 2019 using the FeatureJoiner instead of FeatureMerger

A Feature Joiner with 2000 features on an unconditional merge would output 4 million features, all of which would have to be tested. Also how do you differentiate a record that has no matching text, from a record that just doesn't match that particular text?

+28

jdh
Author
Contributor
1981 replies
5 years ago
June 19, 2019

ebygomm wrote:

What format is your record dataset in? If it is stored as a csv/text or similar I'd be tempted to read the csv directly to a list of tuples and iterate over them for matches, e.g.

import fme
import fmeobjects
import csv

class FeatureProcessor(object):
    
    
    def __init__(self):
        self.inputfilename = FME_MacroValues['SourceDataset_CSV2']
        with open(self.inputfilename) as f:
            self.data=[tuple(line) for line in csv.reader(f)]

        
    def input(self,feature):
        text = feature.getAttribute('text')
        for y in self.data:
            value = 0
            for x in y:
                if x in text:
                    value +=1
            if value ==4:
                feature.setAttribute('value',value)
                feature.setAttribute('record',','.join(y))
                feature.setAttribute('ID',y[0])
                feature.setAttribute('Name',y[1])
                feature.setAttribute('Date',y[2])
                feature.setAttribute('Location',y[3])
                self.pyoutput(feature)
                break
                    

    def close(self):
        pass

I like you idea of just tracking how many values passed.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Match sets of keywords in freeform text

8 replies

Reply

Helpful Members This Week

Recently Solved Questions

Adding the workbench's file path via a creator

A geodatabase feature could not be written

Why does FME store files in My Documents folder?

Importing a module in the workspace's directory into PythonCaller

Convert JSON to ESRI Point Feature Class

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

"Error: There is an issue with your configuration. Check the underlying error for more details."icon

Android - Error 23: There is an issue with your configuration. Check the underlying error for more detailsicon

Purchases.getOfferings returns "Error: There is an issue with your configuration. Check the underlying error for more details"icon

Error: There is an issue with your configuration. Check the underlying error for more details.icon

Purchases.getOfferings() returns [Error: There is an issue with your configuration. Check the underlying error for more details.]icon

Helpful Members This Week

Recently Solved Questions

Adding the workbench's file path via a creator

A geodatabase feature could not be written

Why does FME store files in My Documents folder?

Importing a module in the workspace's directory into PythonCaller

Convert JSON to ESRI Point Feature Class

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings