Question

Removing Duplicates in Fixed Width text File

5 years ago
30 November 2018
2 replies
2 views

cdoherty
4 replies

Hi,

I have a fixed width text file and would like to remove some of the rows where a certain portion of the row is duplicated, for example please see extract below:

NE500086033 15.10.201831.12.2099Masterson

NE500085977 08.10.201831.12.2099Gilmore

NE500085699 24.09.201831.12.2021Doherty

NE500085699 24.09.201831.12.2099Banks

NE500085312 10.09.201831.12.2099Moyo

If the parts in bold are the same I would like to remove both records from the file. Then output the file in exactly the same format with the duplicates removed. I am using the CAT reader to read the file but I'm not 100% sure which is the best writer to use. Any advice is much appreciated.

Thanks,

Charlie

2 replies

Userlevel 4

+25

redgeographics
Influencer
3336 replies
5 years ago
30 November 2018

If it's always the same length you could use an AttributeSplitter to split off that bit into a new list attribute and then a DuplicateRemover with that list attribute as the key.

A SubstringExtractor instead of the AttributeSplitter has the same result.

Userlevel 2

+17

takashi
Contributor
7538 replies
5 years ago
30 November 2018

A possible way is:

Read the text file with the Text File reader line by line.
Extract the portion to be compared and save as a new attribute with a transformer, such as SubstringExtractor.
Send the features to the Matcher (Mathed Geometry: None, Attribute Matching Strategy: Match Selected Attributes, Selected Attributes: <attribute storing the portion to be compare>).
Write text line data in the features output via the NotMatched port into a destination text file with the Text File writer.

If you need to keep the original order of text lines, expose "text_line_number" attribute in the reader, and sort the features by the line number with the Sorter before writing out.

Removing Duplicates in Fixed Width text File

2 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded