If you keep the daily csv's in a separate folder you can add a Python shutdown script to delete all files in that folder when the workspace is done. Alternatively you can use a SystemCaller to do the same thing via a command line.
That way it'll make sure no daily file is processed more than once.
Thank you for your quick response!
Unfortunately, the data belongs to a different department so I wont be able to delete anything or amend the file structure.
I was wondering if there was a way more to check the names of the files that have already been read and saved to the merged file (if I add fme basename) and get it to only read those which have not. I could read the output file and retrieve the basenames that exist (get unique values through something like a Duplicate filter). However, I don't know how I can tell a reader only to read certain files (the ones where the basename does not appear in the duplicate filer) - is there a way to pass a list of files to the reader?
Thank you for your help
Thank you for your quick response!
Unfortunately, the data belongs to a different department so I wont be able to delete anything or amend the file structure.
I was wondering if there was a way more to check the names of the files that have already been read and saved to the merged file (if I add fme basename) and get it to only read those which have not. I could read the output file and retrieve the basenames that exist (get unique values through something like a Duplicate filter). However, I don't know how I can tell a reader only to read certain files (the ones where the basename does not appear in the duplicate filer) - is there a way to pass a list of files to the reader?
Thank you for your help
Well, that makes things a bit more complicated. You can keep a log of your own of files you've already processed and then use a File/Directory Path reader to get a list of files first, compare that list to yours and then only process the ones you haven't done yet. Instead of using a CSV reader in your workspace you'll have to use a FeatureReader so you can still do the whole thing in a single workspace.
Create a SQL table, bring it in to FME, bring in the CSV, run them both through a ChangeDetecror. From there you will have;
updated - records that have changed
insert - new records
deleted - deleted records
unchanged - existing records that are unchanged
After that add 2 SQL writers;
Order matters in your navigator so put the delete as the highest by right clicking and click 'move up' until it is above the insert.
1st writer is a delete operation to handle the updated records
2nd writer is an insert operation for both the update and insert records
Deletes and unchanged shouldn't be needed.
Well, that makes things a bit more complicated. You can keep a log of your own of files you've already processed and then use a File/Directory Path reader to get a list of files first, compare that list to yours and then only process the ones you haven't done yet. Instead of using a CSV reader in your workspace you'll have to use a FeatureReader so you can still do the whole thing in a single workspace.
Amazing - this is exactly what I'm looking for, thank you. However, I'm new to the feature reader and am having an issue. Its reading the features correctly but its not actually bringing in any data - when I view what is on the generic port, it shows the correct number of records, but it says No Schema. Any idea what I am doing wrong?
Thanks
Well, that makes things a bit more complicated. You can keep a log of your own of files you've already processed and then use a File/Directory Path reader to get a list of files first, compare that list to yours and then only process the ones you haven't done yet. Instead of using a CSV reader in your workspace you'll have to use a FeatureReader so you can still do the whole thing in a single workspace.
I forgot, the FeatureReader can be a bit tricky to set up if you're feeding it filenames from an attribute. Basically you'll need to specify an output port and name it what the feature type name would be if you'd be using a regular reader, so in this case that's CSV
Then if you click OK you'll get a popup window asking about generating output ports. Select one of the original CSV files there.
Then it should route all output through the CSV port and it'll keep the schema.
Well, that makes things a bit more complicated. You can keep a log of your own of files you've already processed and then use a File/Directory Path reader to get a list of files first, compare that list to yours and then only process the ones you haven't done yet. Instead of using a CSV reader in your workspace you'll have to use a FeatureReader so you can still do the whole thing in a single workspace.
Brilliant 😊 . I had played around with a lot of the settings in the feature reader, but not this one. Its now working perfectly and doing exactly what I needed it to. Thank you so much for your help