Skip to main content
Solved

Unique port of DuplicateFilter returning duplicates

  • February 27, 2024
  • 8 replies
  • 108 views

tim_bkr
Participant
Forum|alt.badge.img+5

Hi,

I’ve got a strange issue.

I am using a FeatureMerger to match values from a lookup table.

Then I use a DuplicateFilter to check that all the terms from the lookup table are used. And strangely, with certain non ASCII caracters, there are duplicates coming out the Unique port !

 

Another strange thing is that these duplicates in Unique port dont appear when I check for duplicates immediately on the lookup table.

So what happens in the FeatureMerger that makes the DuplicateFilter dysfunction ?

 

Here’s what the Excel lookup table looks like.

 

When I replace the non ASCII caracter, the problem doesn’t appear.

What to do ?

 

Best answer by tim_bkr

Hey !

2021.2.0

I solved the problem using an AttributeEncoder.

I find this is a weird behavior.
To my understanding, this means that :

  1. Two different encodings can come from the same input dataset
  2. These two encodings remain different within an fme datastream (or what do you call data that flows through an FME connector ?)

 

View original
Did this help you find an answer to your question?

8 replies

nielsgerrits
VIP
Forum|alt.badge.img+54

Hard to say without data, but it looks like there is a newline in every other attribute? The white space above the text in the cell in the Excel screenshot?

These can be removed using a StringReplacer with regex and \n:

 


tim_bkr
Participant
Forum|alt.badge.img+5
  • Author
  • Participant
  • February 27, 2024


Hi @nielsgerrits ,

Thanks.

No, the only newlines or carriage returns there are are when there are two lines of text in the Excel cell.

 


nielsgerrits
VIP
Forum|alt.badge.img+54

It is hard to say without data. Can you share a .ffs with the incorrect output from the DuplicateFilter Unique outputport?


tim_bkr
Participant
Forum|alt.badge.img+5
  • Author
  • Participant
  • February 28, 2024

Hey @nielsgerrits ,

Sure. Here it is. I’ve provided two examples.

I had to zip it as .ffs isn’t accepted.

Chears,

Timothée


nielsgerrits
VIP
Forum|alt.badge.img+54
tim_bkr wrote:

Hey @nielsgerrits ,

Sure. Here it is. I’ve provided two examples.

I had to zip it as .ffs isn’t accepted.

Chears,

Timothée

Hey @tim_bkr,

In 20240228_duplicate_in_unique_CATEG_INV.ffs, when I double click row 4, CATEG_INV, it has the value 

A (Substance d'origine)
B (Structure d'origine)

which is different from the value

A (Substance d'origine)

on row 3.

So these are different right? Or did I get you question wrong?


tim_bkr
Participant
Forum|alt.badge.img+5
  • Author
  • Participant
  • February 28, 2024

You need to order on either column.

The duplicates are those with the text “C (Caractère spécifique d'origine)”

 


nielsgerrits
VIP
Forum|alt.badge.img+54

Sorry, I misunderstood. I now see duplicate value “C (Caractère spécifique d'origine)”  in the file “20240228_duplicate_in_unique_CATEG_INV.ffs”.

When I use a DuplicateFilter in 2021.2.6 and 2023.2.2 it just works fine. What version FME do you use?


tim_bkr
Participant
Forum|alt.badge.img+5
  • Author
  • Participant
  • Best Answer
  • February 28, 2024

Hey !

2021.2.0

I solved the problem using an AttributeEncoder.

I find this is a weird behavior.
To my understanding, this means that :

  1. Two different encodings can come from the same input dataset
  2. These two encodings remain different within an fme datastream (or what do you call data that flows through an FME connector ?)

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings