Skip to main content
Question

Unstructured data, selecting by row number

  • 26 July 2024
  • 4 replies
  • 43 views

Hi,

I’m attempting to create attributes from an unstructured html table.

Looks like this…

 

The table is originally a Word .doc, that I saved as a .html file so I could read it in FME.

 

For example, I’d like to create an attribute (AttributeCreator) that selects based on row number … eg “Survey Authority” = Col1 row4

or

“End Date” = Col2 row8

This there a way I can do this ?

thanks

 

 

4 replies

Badge +24

I guess the data has the attribute names in the uneven rows and the attribute values in the even rows?

You could do something with the adjacent function in the AttributeCreator.

 

This creates an attribute with the name Survey Title and the value Survey123/10, and Locality with Sydney, Aus.

After this you should remove all uneven features. And then aggregate the remaining 4 features to one feature.

 

Badge +3

Thank you, @jkr_wrk 

Most helpful, I’ll attempt to solve it as you’ve mentioned above.

 

Userlevel 4
Badge +12

Is it a doc or a docx? If a docx, you can actually unzip it, and directly use the xml inside. Might be a bit easier. In case your zip-utility does not want to do anything with a docx, just rename the extension to .zip...

Badge +24

@s.jager Good idea to point out docx is actualy xml in zip. But it might be pretty complex to find the right objects in the xml. And find the relations.

But it could make everything a lot easier when working with this file on a regular basis.

Reply