Skip to main content
Question

Converting to required format

  • September 16, 2019
  • 3 replies
  • 14 views

Forum|alt.badge.img

Hi All,

I have my data in the form given below. The data from 1507 to 2451 is a header and the three rows highlighted in green are one set of data and another set is appearing below it with differing values.

Now i want to have the data in the format:

where A to J are my columns names from the header (A-J may change in number like sometimes A-F or A-I or A-J as well). Value is coming from first row of the set of data and text from the third row of the set of data.

Sometimes the data may be missing also. e.g the second row may be missing and only two rows are there. so for that value we have to write nothing in the main table (shown above)

I have attached the excel file for reference

Kindly help.

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

3 replies

jdh
Contributor
Forum|alt.badge.img+37
  • Contributor
  • 2002 replies
  • September 16, 2019

Is the original data in excel, or is it a text file?

 

The header with the column names, in the xlsx file, some of them end with a period ( A.) whereas others end with a letter (H) do both occur in the actual data, or is this a typo?

Forum|alt.badge.img
  • 104 replies
  • September 17, 2019

Hi @asabhinavgreats!

 

I hope the attached screenshot explains the steps I tried. This fixes most of your problems. I'm sure someone will now give you three lines of python code to do the same and more, but anyway... :)

This does not address the problem with row 2 sometimes missing and also you will get empty columns at the end when you have less than the maximum number of values. That's something to keep working on.


mark2atsafe
Safer
Forum|alt.badge.img+56
  • Safer
  • 2554 replies
  • September 17, 2019

That's a fairly (actually very) tough proposition. I think the answer from @danullen is about as good as you can get. The only thing I'd mention is that the fields A,B,C,D, etc are hard-coded in his example. If you wished to go about this dynamically (i.e. you don't know in advance) then you could use this method to generate the output schema. Instead of having _list{0}, etc you would have attribute{0}.name, attribute{0}.fme_data_type, etc.

I'd also suggest that the CAT reader (Column-Aligned Text) would be useful for reading the first few header lines; so another solution would be to basically read those lines, read the actual data with a textline reader, and basically reformat it all to create a new dataset that is simpler to carry out the actual task (ie convert it to an intermediate "format"/structure, to make the real work easier). Just a thought.