Question

Converting to required format

  • 16 September 2019
  • 3 replies
  • 1 view

Badge

Hi All,

I have my data in the form given below. The data from 1507 to 2451 is a header and the three rows highlighted in green are one set of data and another set is appearing below it with differing values.

Now i want to have the data in the format:

where A to J are my columns names from the header (A-J may change in number like sometimes A-F or A-I or A-J as well). Value is coming from first row of the set of data and text from the third row of the set of data.

Sometimes the data may be missing also. e.g the second row may be missing and only two rows are there. so for that value we have to write nothing in the main table (shown above)

I have attached the excel file for reference

Kindly help.


3 replies

Badge +22

Is the original data in excel, or is it a text file?

 

The header with the column names, in the xlsx file, some of them end with a period ( A.) whereas others end with a letter (H) do both occur in the actual data, or is this a typo?
Badge

Hi @asabhinavgreats!

 

I hope the attached screenshot explains the steps I tried. This fixes most of your problems. I'm sure someone will now give you three lines of python code to do the same and more, but anyway... :)

This does not address the problem with row 2 sometimes missing and also you will get empty columns at the end when you have less than the maximum number of values. That's something to keep working on.

Userlevel 4
Badge +25

That's a fairly (actually very) tough proposition. I think the answer from @danullen is about as good as you can get. The only thing I'd mention is that the fields A,B,C,D, etc are hard-coded in his example. If you wished to go about this dynamically (i.e. you don't know in advance) then you could use this method to generate the output schema. Instead of having _list{0}, etc you would have attribute{0}.name, attribute{0}.fme_data_type, etc.

I'd also suggest that the CAT reader (Column-Aligned Text) would be useful for reading the first few header lines; so another solution would be to basically read those lines, read the actual data with a textline reader, and basically reformat it all to create a new dataset that is simpler to carry out the actual task (ie convert it to an intermediate "format"/structure, to make the real work easier). Just a thought.

Reply