Skip to main content

I have a huge dataset and I want to make writers based on type. Some types have only 1 samples, some have 100, and I would need to remove all empty sample-columns from all writers. There are over 100 types, so manual work is out of question.

 

I have tried to use NullAttributeMapper and set all values to missing, but how to drop them before writer?

Yes, you can do this with the new-ish SchemaScanner transformer. I picked this as my question-of-the-week and made a video of it that you can find here: https://www.youtube.com/watch?v=QB3LXE9ycp0&t=562s

Basically, you scan the schema making sure that missing attributes are left out of the new schema design. That schema is fed into a dynamic writer and provides the basis for writing out data with only the required attributes.

The video explains better though!


Hello Mark!

And thank you sir for your excellent answer, it really solved my problem.

I have one question though, can this be done with Excel-writer too?

I tried with the same fanout options, but it did not work. Csv is okay too, but xlsx would be preferred if possible.

 

I post this solution here for future reference for myself and others in need:

imageScemaScanner

imageCSV Writer:

imageimage 


Reply