Solved

How to extract out every nth row and put into a column

8 years ago
November 17, 2016
10 replies
297 views

vral
4 replies

Hi, I have a xml that unfortunately isn't very structured for me to extract attributes easily. Every <en> line contains attributes that I want to put into specific columns in a structured table.

For example:

<en> Colorado City </en>

<en> Map 593 N10 Arizona Country Directory </en>

<en> Office </en>

<en> Toronto </en>

<en> Map 105 K6 Ontario Country Directory </en>

<en> Agency </en>

and so on...

I have 2588 rows of this and there is a pattern. Is there a way for me to extract out every nth row be put together into one column? In this example, every 7th row should be in one column so I would have a field for post code, name, latitude, longitude and so on? Thank you.

Best answer by takashi

Hi @vr_aileen, the XML reader (Elements to Match: en) can read features from the XML document for every <en> element sequentially, so you can add sequential number attribute to the features (Counter) and calculate column index and row index for each "en" feature (AttributeCreator). You can then rename the "en" to unique column name based on the column index (BulkAttributeRenamer), and aggregate the features for each row (Aggregator). See also this screenshot.

Result:

Hope this helps.

View original

Did this help you find an answer to your question?

takashi
7728 replies
Best Answer
8 years ago
November 17, 2016

Result:

Hope this helps.

takashi
7728 replies
8 years ago
November 17, 2016

takashi wrote:

Result:

Hope this helps.

finally rename each "en*" to appropriate attribute name if necessary.

+17

itay
Supporter
1441 replies
8 years ago
November 17, 2016

in addition,if you use the flattening options on the xml reader unique id will be created for you to use in mapping to your destination attributes.

vral
Author
4 replies
8 years ago
November 17, 2016

takashi wrote:

finally rename each "en*" to appropriate attribute name if necessary.

Hi @takashi,

Thank you for your reply. I tried to replicate your FME workbench that you created but unfortunately I couldn't get the result you got. To clarify, the XML I have could not use the XML reader. I had to use the Text file as reader and then used TestFilter to get to having rows of '<en>' as the starting point in my example.

Are you able to point out what I have done wrong with the workbench I created modelling what you created?

Thank you,

Aileen

offices.txt textline2none.fmw

offices.txt

takashi
7728 replies
8 years ago
November 17, 2016

vral wrote:

Hi @takashi,

Are you able to point out what I have done wrong with the workbench I created modelling what you created?

Thank you,

Aileen

offices.txt textline2none.fmw

offices.txt

If you read the source data with the Text File reader, you will have to use "text_line_data" instead of "en" in my example.

+17

itay
Supporter
1441 replies
8 years ago
November 17, 2016

itay wrote:

in addition,if you use the flattening options on the xml reader unique id will be created for you to use in mapping to your destination attributes.

Actually due to the unreconstructed nature of the input this approach will not work for all the elements, I suggest using a counter as @takashi suggested.

However if you are reading the input as text (as delivered) and you want to make use of the XML reader functionality, I would suggest inserting it into a new attribute to create a 'real'xml out of it so that you can fragment it with the XMLFragmentter.

how-to-extract-out-every-nth-row-and-put-into-a-co.fmw

vral
Author
4 replies
8 years ago
November 18, 2016

takashi wrote:

If you read the source data with the Text File reader, you will have to use "text_line_data" instead of "en" in my example.

@takashi Thank you, I have got this working! Apologies, I do have another question. I have another xml that is similar to this example except that the first 13 rows are not relevant to what I want to extract out. Do you have any recommendations on how I could disregard the first 13 rows and then from here extract out every nth row to columns?

vral
Author
4 replies
8 years ago
November 18, 2016

itay wrote:

Actually due to the unreconstructed nature of the input this approach will not work for all the elements, I suggest using a counter as @takashi suggested.

how-to-extract-out-every-nth-row-and-put-into-a-co.fmw

Thank you @itay! I used your suggestion in creating to xml and then following takashi's workbench and it worked.

takashi
7728 replies
8 years ago
November 18, 2016

vral wrote:

Hi @takashi,

Are you able to point out what I have done wrong with the workbench I created modelling what you created?

Thank you,

Aileen

offices.txt textline2none.fmw

offices.txt

Hi @vr_aileen, three possible ways.

A. Insert a Sampler with these parameters immediately after the Text File reader, and only use the NotSampled features.

Sampling Rate: 13
Sampling Type: First N Features
Randomize Sample: No

B. Expose a format attribute called 'text_line_number' in the Text File reader feature type properties, and insert a Tester immediately after the reader to filter out features whose line number is less than 14.

C. Just set 13 to the "Number of Lines to Skip" parameter of the Text File reader.

vral
Author
4 replies
8 years ago
November 18, 2016

Thank you @takashi for your help and your further replies to my additional questions (I am unable to reply to the last reply for unknown reasons)! The FME workbench worked nicely!

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

How to extract out every nth row and put into a column