1. Use Generate Workspace and SHP -> FFS. Change the settings on the FFS-writer to include indexes
Then you can try reading the FFS - sorting and Aggregating
The other thing is to make sure there aren't unneeded attributes.
You could also import them all into a postgis database and only read in what is actually a duplcate with something like this:
https://stackoverflow.com/questions/28156795/how-to-find-duplicate-records-in-postgresql - you would also want the result be sorted by the ID. You would need to make sure that the id filed has an index to make the request fast.
No sure if that would be faster than what @sigtill has suggested but in general there is somewhere here which a database could do if you have access to one.
Of Course you will still need to somehow get the results which are unique.
1. Use Generate Workspace and SHP -> FFS. Change the settings on the FFS-writer to include indexes
Then you can try reading the FFS - sorting and Aggregating
Is the sorter more efficient than the aggregator then?
I will give it a try
Is the sorter more efficient than the aggregator then?
I will give it a try
The sorter works in BulkMode so should be faster. Which version of FME you using? if you have access to 2020 then Shapefile reading is much faster too
I am on 2020. Running some time trials on the different combinations of shop / ffs and sorter / no sorter.
Sorter is definitely faster.
I am on 2020. Running some time trials on the different combinations of shop / ffs and sorter / no sorter.
Sorter is definitely faster.
Nice - performance is always a fun thing to play with. It's always a learning experience when you want to improve performance. For me I think its just such a great way to learn. When someone complains something it too slow I see it as a fun challenge and an opportunity to learn something new.
Always takes time though, but you just learn so much!
Good luck!
1. Use Generate Workspace and SHP -> FFS. Change the settings on the FFS-writer to include indexes
Then you can try reading the FFS - sorting and Aggregating
I'm normally a lurker, but I had to say this answer has SAVED ME.
I'm aggregating road names for a hefty dataset of 34mil points, and I'd been running into translation failures hours after trying to aggregate normally.
I will be sharing this tip with my team.
My process (not sure if I took extra steps or skipped things, but it worked):
- Load original dataset, use attribute keeper to only keep what I need
- Featurewriter to FFS format (this produced a ton of extra files, so I wish I'd done this in a separated folder)
- Bring in new FFS file, use sorter
- Use aggregator (I found even with sorting, "aggregating when group changed" created duplicates, so I left group by mode as Process At End (Blocking))
- I wrote a new feature of the aggregated table so I don't have to do this again in other tasks.
Went from a multi hour process to less than half an hour including set up.
Stellar help. THANK YOU!