Skip to main content

I have a dataset that has multiple records, each with a different types. Each of those types could have any number of records from 1 record to 5000+ records.

 

How can I write these records to individual files based on a limit of 50 records per file with no more than 10 types?

 

For example:

 

File 1 would consist of:

Type 1 = 10 Records,

Type 2 = 37 Records

Type 3 = 13 Records

 

File 2 would consist of:

Type 3 = 50 Records

 

File 3 would consist of:

Type 3 = 5 records

Type 4 = 45 records

 

and so on...

 

and so on...

 

I've tried groupers, and I have tried building lists, I've also tried the modulocounter, but I can't seem to find a way to build this criteria with any of those. Any help is much appreciated! Thank you!

 

I've built a sample set of data that could be used to do testing.

 

My final output needs to be no more than 50 records per file with no more than 10 Circuit ID's in each file. Please help! 😁

Do the files need to be exactly 50 records? What if a Circuit_ID has only 1 record? Is it ok to have a file with a single record?

My approach is this: StatisticsCalculator: GroupBy Circuit_ID and Circuit_ID - Total Count -> Complete output port to a ModuloCounter: Counter Name Circuit_ID attribute, Count Maximum @ceil(@Value(Circuit_ID.total_count)/50,0).

Fanout by Circuit_ID and _modulo_count. You get a single Circuit_ID per file. Workspace attached.

If you need exactly 50 records per file, I will be following this thread with a lot of interest 😁


Do the files need to be exactly 50 records? What if a Circuit_ID has only 1 record? Is it ok to have a file with a single record?

My approach is this: StatisticsCalculator: GroupBy Circuit_ID and Circuit_ID - Total Count -> Complete output port to a ModuloCounter: Counter Name Circuit_ID attribute, Count Maximum @ceil(@Value(Circuit_ID.total_count)/50,0).

Fanout by Circuit_ID and _modulo_count. You get a single Circuit_ID per file. Workspace attached.

If you need exactly 50 records per file, I will be following this thread with a lot of interest 😁

Thanks for the update and suggestion. Currently what I am doing is running the workspace into a ListBuilder and group by Circuit_ID. Then I do a ListElementCounter before doing a ListExploder. Then I use an AttributeCreator and and create the attribute 'File Name' which = @Value(Circuit_ID)_@Value(Count). I set my Feature Type Name to the File Name attribute which writes my files with the Circuit ID then adds a count of the number of records in the file.

 

However, these files are going to be used for identification of data in a very complex system that due to memory restrictions on that environment and for quality management purposes for connectivity and structure, the most records that can be done at any given time is 50 records to ensure stability. So I need to break these circuits down by increments of 50 records per file. I wish there was a way to build a list where the max number of records in each list is 50 records. Then I could simply use a counter to establish a group identity for writing. There may be a way to do that and I just don't know the settings. Possibly even a python script using a python caller. I just don't have the knowledge currently to establish it and thats the biggest help I am looking for.

 

But thank you again for your input. I like the approach you took.


To create a Group ID to fanout on, where each group will contain 50 records, simply use a counter then calculate the group by dividing by 50 and rounding down

@Evaluate(@floor(@Value(_count)/50))

image


To create a Group ID to fanout on, where each group will contain 50 records, simply use a counter then calculate the group by dividing by 50 and rounding down

@Evaluate(@floor(@Value(_count)/50))

image

@ebygomm​ Thanks for that.  I am sure that works but unfortunately my version doesn't support the floor function apparently lol.  And I have no control over version upgrading and such as that is the responsibility of my company's IT department and it is highly unlikely that we will get an upgrade any time soon.  We went from Version 2013 (In 2015) to Version 2017 (In 2019) to Version 2020 (In 2022).  So I highly doubt we will be fortunate anytime soon.  But thank you for that!

 

imageimage


@ebygomm​ Thanks for that. I am sure that works but unfortunately my version doesn't support the floor function apparently lol. And I have no control over version upgrading and such as that is the responsibility of my company's IT department and it is highly unlikely that we will get an upgrade any time soon. We went from Version 2013 (In 2015) to Version 2017 (In 2019) to Version 2020 (In 2022). So I highly doubt we will be fortunate anytime soon. But thank you for that!

 

imageimage

The log shown the Counter transformer gave the error.

You are supposed to use @floor in an AttributeCreator or AttributeManager.

My oldest edition of FME is 2021.1, not 2020.0, I believe @floor was around at that time. If not, as a workaround, use AttributeManager/Creator to evaluate _count/50 followed by AttributeRounder set to 0 decimal places and to Round Down.

floor_attribute_rounder


@ebygomm​ Thanks for that. I am sure that works but unfortunately my version doesn't support the floor function apparently lol. And I have no control over version upgrading and such as that is the responsibility of my company's IT department and it is highly unlikely that we will get an upgrade any time soon. We went from Version 2013 (In 2015) to Version 2017 (In 2019) to Version 2020 (In 2022). So I highly doubt we will be fortunate anytime soon. But thank you for that!

 

imageimage

@caracadrian​ Thanks for that! Yeah I followed what was shown in your screenshot and I think I can use that to expand on it to possibly get exactly what I am hoping for with the limitations per file with a limit of 10 circuits. I will have to play around with it and that's how I will be able to commit that function to memory. Thanks again to you and to @ebygomm​ for all your help with this! You da best! Take care!


Reply