Skip to main content
Solved

Removing duplicate files from folders if file exists in subfolder

  • February 12, 2021
  • 2 replies
  • 56 views

joe.fme
Contributor
Forum|alt.badge.img+4

Hi FME Community,

I've recently been given quite a big task of migrating, renaming and sorting documents. I have naied the moving and renaming part by using a mix of FME and SystemCaller for Robocopy (as FME would output the PDF files as folders, for some reason).

 

So, the moving is all good, but is there a way I can use FME (as I am quite constrained by my working environment, such as python being stock and FME version) to say -

 

"if a file exists in folder A, but also exists in subfolder X, delete the file in folder A"

 

I've tried with a DuplicateFilter, which gets a few, but misses others (attached screenshot).

 

Been going mad with this one

Best answer by chrisatsafe

Hi @joe.fme​ ,

 

What are you sorting by? I tested this by sorting by path_directory in descending alphabetical order which always sends the duplicate files from the main folder to the duplicate output port and the file from the subfolder to the unique output port.

 

Or could subfolder X be in a completely different path?

2021-02-12_14-51-11

View original
Did this help you find an answer to your question?

2 replies

chrisatsafe
Contributor
Forum|alt.badge.img+2
  • Contributor
  • Best Answer
  • February 12, 2021

Hi @joe.fme​ ,

 

What are you sorting by? I tested this by sorting by path_directory in descending alphabetical order which always sends the duplicate files from the main folder to the duplicate output port and the file from the subfolder to the unique output port.

 

Or could subfolder X be in a completely different path?

2021-02-12_14-51-11


joe.fme
Contributor
Forum|alt.badge.img+4
  • Author
  • Contributor
  • February 15, 2021

Good morning Chris, been waiting all weekend to log back on and try this, so thank you for replying!

I was sorting by path_filename - hence the need for a tester to differentiate between file and folder. I see in yours it is path_directory as well as having the filter on for path as *.* which is something I didn't have!

 

As for what to do with the end results, instead of deleting them, I opted to move them into a duplicates folder, which may be more of a nod towards how I was feeling with this work, as I wasn't sure it would do the right thing. Due to this, I've added to your model, a systemcaller that moves them to a folder just in case it's incorrect - but looking at the numbers, my one found 26 duplicates, and yours found 205.

 

Thank you for your help! Means a lot as this was something that's been driving me up the wall for some time


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings