Skip to main content
This is kind of a weird question I know BUT I will ask it anyway.

 

I have a very very large DXF text file and have some duplicate lines of text  that I need to erase.

 

Here is a piece of the file as an example

 

 

Text_Northing=13333

 

1000

 

Name=Eastern Isles

 

1000

 

1000

 

1000

 

1000

 

Feature_Serial_Number=10156

 

1000

 

1000

 

Date_Last_Amended=19930101

 

1000

 

 

What I would like to do is to remove the duplicate  lines with "1000" in them so that the file looks like this

 

 

Text_Northing=13333

 

1000

 

Name=Eastern Isles

 

1000

 

Feature_Serial_Number=10156

 

1000

 

Date_Last_Amended=19930101

 

1000

 

 

The question is can this be done in FME. I have looked at the StringSearcher but cannot seem to be able to select more than one line at a time usiong the regular expressions

 

 

Any assistance would be greatly appreciated.

 

 

The text file is too big to run through a normal text editor.

 

 

Thanks for any help
Hi,

 

 

This procedure might help you.

 

(1) Read the source file line by line with a Text File reader. Expose a format attribute called "text_line_number", which stores the line number (1-based sequential number).

 

(2) Filter the text line features by the line number to separate the 1st line from others.

 

(3) Send the 1st line to a VariableSetter to assign the text to a variable (store the prior text for the next line).

 

(4) Send other lines to a VariableRetriever to fetch the variable value (prior line text); send the text line which is not equal to the prior text to the VariableSetter to update the variable (discard duplicate text).

 

(5) Write the text into a new file with a Text File writer.

 

 

Takashi
Alternatively, the AttributeCreator can be used to get the prior line text. You can then select the text that is not equal to the prior line with a Tester.

 


Or read the text file with txt reader, stringsearcher to look for "1000" and a variablesetter on the found port.

 

Calculate differentce in linenumber (wich you must expose on the reader).

 

Any difference=1 you dithc, rest you pass.

 

 

Reassemlbe the rows and reorder (sort) the records by linenumber.

 

 

 
Yup, there are several ways.

 

The PythonCaller with this script may also be effective.

 

-----

 

# Python Script Example

 

class FeatureProcessor(object):

 

    def __init__(self):

 

        self.prior = ''

 

        

 

    def input(self, feature):

 

        text = str(feature.getAttribute('text_line_data'))

 

        if text != self.prior:

 

            self.pyoutput(feature)

 

            self.prior = text

 

-----
Just wanted to say thanks to you for the great assistance

Reply