Question

duplicate text lines


This is kind of a weird question I know BUT I will ask it anyway.

 

I have a very very large DXF text file and have some duplicate lines of text  that I need to erase.

 

Here is a piece of the file as an example

 

 

Text_Northing=13333

 

1000

 

Name=Eastern Isles

 

1000

 

1000

 

1000

 

1000

 

Feature_Serial_Number=10156

 

1000

 

1000

 

Date_Last_Amended=19930101

 

1000

 

 

What I would like to do is to remove the duplicate  lines with "1000" in them so that the file looks like this

 

 

Text_Northing=13333

 

1000

 

Name=Eastern Isles

 

1000

 

Feature_Serial_Number=10156

 

1000

 

Date_Last_Amended=19930101

 

1000

 

 

The question is can this be done in FME. I have looked at the StringSearcher but cannot seem to be able to select more than one line at a time usiong the regular expressions

 

 

Any assistance would be greatly appreciated.

 

 

The text file is too big to run through a normal text editor.

 

 

Thanks for any help

5 replies

Userlevel 3
Badge +17
Hi,

 

 

This procedure might help you.

 

(1) Read the source file line by line with a Text File reader. Expose a format attribute called "text_line_number", which stores the line number (1-based sequential number).

 

(2) Filter the text line features by the line number to separate the 1st line from others.

 

(3) Send the 1st line to a VariableSetter to assign the text to a variable (store the prior text for the next line).

 

(4) Send other lines to a VariableRetriever to fetch the variable value (prior line text); send the text line which is not equal to the prior text to the VariableSetter to update the variable (discard duplicate text).

 

(5) Write the text into a new file with a Text File writer.

 

 

Takashi
Userlevel 3
Badge +17
Alternatively, the AttributeCreator can be used to get the prior line text. You can then select the text that is not equal to the prior line with a Tester.

 

Badge +3
Or read the text file with txt reader, stringsearcher to look for "1000" and a variablesetter on the found port.

 

Calculate differentce in linenumber (wich you must expose on the reader).

 

Any difference=1 you dithc, rest you pass.

 

 

Reassemlbe the rows and reorder (sort) the records by linenumber.

 

 

 
Userlevel 3
Badge +17
Yup, there are several ways.

 

The PythonCaller with this script may also be effective.

 

-----

 

# Python Script Example

 

class FeatureProcessor(object):

 

    def __init__(self):

 

        self.prior = ''

 

        

 

    def input(self, feature):

 

        text = str(feature.getAttribute('text_line_data'))

 

        if text != self.prior:

 

            self.pyoutput(feature)

 

            self.prior = text

 

-----
Just wanted to say thanks to you for the great assistance

Reply