Solved

How to extract out every nth row and put into a column

  • 17 November 2016
  • 10 replies
  • 53 views

Badge

Hi, I have a xml that unfortunately isn't very structured for me to extract attributes easily. Every <en> line contains attributes that I want to put into specific columns in a structured table.

 

For example:

 

<en>3121 </en>

 

<en> Colorado City </en>

 

<en> 50.852010 </en>

 

<en> -89.672167</en>

 

<en> 5755 0555 </en>

 

<en> Map 593 N10 Arizona Country Directory </en>

 

<en> Office </en>

 

<en>3020 </en>

 

<en> Toronto </en>

 

<en> 50.578010 </en>

 

<en>-88.887679 </en>

 

<en> 5875 1234 </en>

 

<en> Map 105 K6 Ontario Country Directory </en>

 

<en> Agency </en>

 

and so on...

 

 

I have 2588 rows of this and there is a pattern. Is there a way for me to extract out every nth row be put together into one column? In this example, every 7th row should be in one column so I would have a field for post code, name, latitude, longitude and so on? Thank you.
icon

Best answer by takashi 17 November 2016, 06:45

View original

10 replies

Userlevel 2
Badge +17

Hi @vr_aileen, the XML reader (Elements to Match: en) can read features from the XML document for every <en> element sequentially, so you can add sequential number attribute to the features (Counter) and calculate column index and row index for each "en" feature (AttributeCreator). You can then rename the "en" to unique column name based on the column index (BulkAttributeRenamer), and aggregate the features for each row (Aggregator). See also this screenshot.

Result:

Hope this helps.

Userlevel 2
Badge +17

Hi @vr_aileen, the XML reader (Elements to Match: en) can read features from the XML document for every <en> element sequentially, so you can add sequential number attribute to the features (Counter) and calculate column index and row index for each "en" feature (AttributeCreator). You can then rename the "en" to unique column name based on the column index (BulkAttributeRenamer), and aggregate the features for each row (Aggregator). See also this screenshot.

Result:

Hope this helps.

finally rename each "en*" to appropriate attribute name if necessary.

 

Badge +16

in addition,if you use the flattening options on the xml reader unique id will be created for you to use in mapping to your destination attributes.

Badge
finally rename each "en*" to appropriate attribute name if necessary.

 

Hi @takashi,

 

 

Thank you for your reply. I tried to replicate your FME workbench that you created but unfortunately I couldn't get the result you got. To clarify, the XML I have could not use the XML reader. I had to use the Text file as reader and then used TestFilter to get to having rows of '<en>' as the starting point in my example.

 

 

Are you able to point out what I have done wrong with the workbench I created modelling what you created?

 

 

Thank you,

 

 

Aileen

 

 

offices.txttextline2none.fmw

 

offices.txt

 

Userlevel 2
Badge +17
Hi @takashi,

 

 

Thank you for your reply. I tried to replicate your FME workbench that you created but unfortunately I couldn't get the result you got. To clarify, the XML I have could not use the XML reader. I had to use the Text file as reader and then used TestFilter to get to having rows of '<en>' as the starting point in my example.

 

 

Are you able to point out what I have done wrong with the workbench I created modelling what you created?

 

 

Thank you,

 

 

Aileen

 

 

offices.txttextline2none.fmw

 

offices.txt

 

If you read the source data with the Text File reader, you will have to use "text_line_data" instead of "en" in my example.

 

Badge +16

in addition,if you use the flattening options on the xml reader unique id will be created for you to use in mapping to your destination attributes.

Actually due to the unreconstructed nature of the input this approach will not work for all the elements, I suggest using a counter as @takashi suggested.

 

However if you are reading the input as text (as delivered) and you want to make use of the XML reader functionality, I would suggest inserting it into a new attribute to create a 'real'xml out of it so that you can fragment it with the XMLFragmentter.

 

 

how-to-extract-out-every-nth-row-and-put-into-a-co.fmw

 

Badge
If you read the source data with the Text File reader, you will have to use "text_line_data" instead of "en" in my example.

 

@takashi Thank you, I have got this working! Apologies, I do have another question. I have another xml that is similar to this example except that the first 13 rows are not relevant to what I want to extract out. Do you have any recommendations on how I could disregard the first 13 rows and then from here extract out every nth row to columns?

 

Badge
Actually due to the unreconstructed nature of the input this approach will not work for all the elements, I suggest using a counter as @takashi suggested.

 

However if you are reading the input as text (as delivered) and you want to make use of the XML reader functionality, I would suggest inserting it into a new attribute to create a 'real'xml out of it so that you can fragment it with the XMLFragmentter.

 

 

how-to-extract-out-every-nth-row-and-put-into-a-co.fmw

 

Thank you @itay! I used your suggestion in creating to xml and then following takashi's workbench and it worked.
Userlevel 2
Badge +17
Hi @takashi,

 

 

Thank you for your reply. I tried to replicate your FME workbench that you created but unfortunately I couldn't get the result you got. To clarify, the XML I have could not use the XML reader. I had to use the Text file as reader and then used TestFilter to get to having rows of '<en>' as the starting point in my example.

 

 

Are you able to point out what I have done wrong with the workbench I created modelling what you created?

 

 

Thank you,

 

 

Aileen

 

 

offices.txttextline2none.fmw

 

offices.txt

 

Hi @vr_aileen, three possible ways.

 

A. Insert a Sampler with these parameters immediately after the Text File reader, and only use the NotSampled features.

 

  • Sampling Rate: 13
  • Sampling Type: First N Features
  • Randomize Sample: No
B. Expose a format attribute called 'text_line_number' in the Text File reader feature type properties, and insert a Tester immediately after the reader to filter out features whose line number is less than 14.

 

C. Just set 13 to the "Number of Lines to Skip" parameter of the Text File reader.
Badge
Thank you @takashi for your help and your further replies to my additional questions (I am unable to reply to the last reply for unknown reasons)! The FME workbench worked nicely!

 

Reply