Hi, I have a xml that unfortunately isn't very structured for me to extract attributes easily. Every <en> line contains attributes that I want to put into specific columns in a structured table. For example: <en>3121 </en> <en> Colorado City </en> <en> 50.852010 </en> <en> -89.672167</en> <en> 5755 0555 </en> <en> Map 593 N10 Arizona Country Directory </en> <en> Office </en> <en>3020 </en> <en> Toronto </en> <en> 50.578010 </en> <en>-88.887679 </en> <en> 5875 1234 </en> <en> Map 105 K6 Ontario Country Directory </en> <en> Agency </en> and so on... I have 2588 rows of this and there is a pattern. Is there a way for me to extract out every nth row be put together into one column? In this example, every 7th row should be in one column so I would have a field for post code, name, latitude, longitude and so on? Thank you.

How to extract out every nth row and put into a column

Userlevel 2

+17

takashi
Contributor
7538 replies
7 years ago
17 November 2016
Best Answer

Hi @vr_aileen, the XML reader (Elements to Match: en) can read features from the XML document for every <en> element sequentially, so you can add sequential number attribute to the features (Counter) and calculate column index and row index for each "en" feature (AttributeCreator). You can then rename the "en" to unique column name based on the column index (BulkAttributeRenamer), and aggregate the features for each row (Aggregator). See also this screenshot.

Result:

Hope this helps.

Userlevel 2

+17

takashi
Contributor
7538 replies
7 years ago
17 November 2016

Hi @vr_aileen, the XML reader (Elements to Match: en) can read features from the XML document for every <en> element sequentially, so you can add sequential number attribute to the features (Counter) and calculate column index and row index for each "en" feature (AttributeCreator). You can then rename the "en" to unique column name based on the column index (BulkAttributeRenamer), and aggregate the features for each row (Aggregator). See also this screenshot.

Result:

Hope this helps.

finally rename each "en*" to appropriate attribute name if necessary.

+16

itay
Contributor
1425 replies
7 years ago
17 November 2016

in addition,if you use the flattening options on the xml reader unique id will be created for you to use in mapping to your destination attributes.

V

finally rename each "en*" to appropriate attribute name if necessary.

Hi @takashi,

Thank you for your reply. I tried to replicate your FME workbench that you created but unfortunately I couldn't get the result you got. To clarify, the XML I have could not use the XML reader. I had to use the Text file as reader and then used TestFilter to get to having rows of '<en>' as the starting point in my example.

Are you able to point out what I have done wrong with the workbench I created modelling what you created?

Thank you,

Aileen

offices.txt textline2none.fmw

offices.txt

Userlevel 2

+17

takashi
Contributor
7538 replies
7 years ago
17 November 2016

Hi @takashi,

Thank you for your reply. I tried to replicate your FME workbench that you created but unfortunately I couldn't get the result you got. To clarify, the XML I have could not use the XML reader. I had to use the Text file as reader and then used TestFilter to get to having rows of '<en>' as the starting point in my example.

Are you able to point out what I have done wrong with the workbench I created modelling what you created?

Thank you,

Aileen

offices.txt textline2none.fmw

offices.txt

If you read the source data with the Text File reader, you will have to use "text_line_data" instead of "en" in my example.

+16

itay
Contributor
1425 replies
7 years ago
17 November 2016

in addition,if you use the flattening options on the xml reader unique id will be created for you to use in mapping to your destination attributes.

Actually due to the unreconstructed nature of the input this approach will not work for all the elements, I suggest using a counter as @takashi suggested.

However if you are reading the input as text (as delivered) and you want to make use of the XML reader functionality, I would suggest inserting it into a new attribute to create a 'real'xml out of it so that you can fragment it with the XMLFragmentter.

how-to-extract-out-every-nth-row-and-put-into-a-co.fmw

V

If you read the source data with the Text File reader, you will have to use "text_line_data" instead of "en" in my example.

@takashi Thank you, I have got this working! Apologies, I do have another question. I have another xml that is similar to this example except that the first 13 rows are not relevant to what I want to extract out. Do you have any recommendations on how I could disregard the first 13 rows and then from here extract out every nth row to columns?

V

Actually due to the unreconstructed nature of the input this approach will not work for all the elements, I suggest using a counter as @takashi suggested.

However if you are reading the input as text (as delivered) and you want to make use of the XML reader functionality, I would suggest inserting it into a new attribute to create a 'real'xml out of it so that you can fragment it with the XMLFragmentter.

how-to-extract-out-every-nth-row-and-put-into-a-co.fmw

Thank you @itay! I used your suggestion in creating to xml and then following takashi's workbench and it worked.

Userlevel 2

+17

takashi
Contributor
7538 replies
7 years ago
18 November 2016

Hi @takashi,

Thank you for your reply. I tried to replicate your FME workbench that you created but unfortunately I couldn't get the result you got. To clarify, the XML I have could not use the XML reader. I had to use the Text file as reader and then used TestFilter to get to having rows of '<en>' as the starting point in my example.

Are you able to point out what I have done wrong with the workbench I created modelling what you created?

Thank you,

Aileen

offices.txt textline2none.fmw

offices.txt

Hi @vr_aileen, three possible ways.

A. Insert a Sampler with these parameters immediately after the Text File reader, and only use the NotSampled features.

Sampling Rate: 13
Sampling Type: First N Features
Randomize Sample: No

B. Expose a format attribute called 'text_line_number' in the Text File reader feature type properties, and insert a Tester immediately after the reader to filter out features whose line number is less than 14.

C. Just set 13 to the "Number of Lines to Skip" parameter of the Text File reader.

V

Thank you @takashi for your help and your further replies to my additional questions (I am unable to reply to the last reply for unknown reasons)! The FME workbench worked nicely!

How to extract out every nth row and put into a column

10 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded