Solved

Problem by reading of HTML Table

4 years ago
July 27, 2020
4 replies
84 views

franco
2 replies

Hello FME-Experts!

I have no experience with reading of html code, but I have problems reading data from a simple html file.

If I use the HTML Table reader I get only the first record (witch contain the attribute name of the data) and not the 4 records that I need.

I think, the workflow has to be simple but I tried several options without results.

I attach an html example file.

Thank you very much for the help!

Best answer by redgeographics

Okay, I finally figured it out. There was a similar question (same sample data actually) 2 weeks ago but I can't seem to find that one anymore. I do remember looking into it and being frustrated by it.

So the problem lies with the first row of your table:

<tr>
         <td><b>Nome_comune</b></td>
         <td><b>Numero_comune</b></td>
         <td><b>Numero_sezione</b></td>
         <td><b>Particella</b></td>
         <td><b>Superficie_m&#178</b></td>
         <td><b>Tipo</font></b></td>
         <td><b>Descrizione</b></td>
         <td><b>eGRID</b></td>
       </tr>
<tr>

There is a stray tag on the Tipo line. That caused the HTMLTable reader to bork out after reading that first row.

So when I manually removed that tag it read 5 rows of data, the first one being the table header, but that got turned into a feature and was not used to populate the attribute names.

Changing the <td></td> tags in that first row to <th></th> tags (table data to table header) fixed that.

As for the solution... I don't think I have one to be honest. Both issues require pre-processing of the data, which isn't always feasible with HTML tables. I think your best bet is to submit an idea for 2 feature requests for the HTML Table reader:

An optional parameter to ignore stray tags.
An optional parameter to assume the first row of a table contains the column names.

Hope this helps. Not the answer you were looking for I'm afraid, but at least now we know what's the problem.

View original

Did this help you find an answer to your question?

+50

redgeographics
Celebrity
3642 replies
Best Answer
4 years ago
July 27, 2020

Okay, I finally figured it out. There was a similar question (same sample data actually) 2 weeks ago but I can't seem to find that one anymore. I do remember looking into it and being frustrated by it.

So the problem lies with the first row of your table:

<tr>
         <td><b>Nome_comune</b></td>
         <td><b>Numero_comune</b></td>
         <td><b>Numero_sezione</b></td>
         <td><b>Particella</b></td>
         <td><b>Superficie_m&#178</b></td>
         <td><b>Tipo</font></b></td>
         <td><b>Descrizione</b></td>
         <td><b>eGRID</b></td>
       </tr>
<tr>

There is a stray tag on the Tipo line. That caused the HTMLTable reader to bork out after reading that first row.

So when I manually removed that tag it read 5 rows of data, the first one being the table header, but that got turned into a feature and was not used to populate the attribute names.

Changing the <td></td> tags in that first row to <th></th> tags (table data to table header) fixed that.

As for the solution... I don't think I have one to be honest. Both issues require pre-processing of the data, which isn't always feasible with HTML tables. I think your best bet is to submit an idea for 2 feature requests for the HTML Table reader:

An optional parameter to ignore stray tags.
An optional parameter to assume the first row of a table contains the column names.

Hope this helps. Not the answer you were looking for I'm afraid, but at least now we know what's the problem.

F

franco
Author
2 replies
4 years ago
August 13, 2020

Thank you very much for your answer and help! I contacted the data producer and he removed the tag. So I could read the data, skip the first line and add manually the attribute names. I added an idea for the feature request reading HTML tables.

takashi
7703 replies
4 years ago
August 16, 2020

For your information, you can use the HTMLToXHTMLConverter to clean up an HTML document containing a wrong syntax in some cases. In your case, the tag without corresponding starting would be removed if you applied the transformer. You can then parse the resulting XHTML document with the HTML Table reader.

+45

mark2atsafe
Safer
2517 replies
4 years ago
August 17, 2020

franco wrote:

Thank you very much for your answer and help! I contacted the data producer and he removed the tag. So I could read the data, skip the first line and add manually the attribute names. I added an idea for the feature request reading HTML tables.

I also created a bug report for our developers to look into this. It would be great if we could avoid failing in this scenario. fyi the reference number is FMEENGINE-66731

Reply

Rich Text Editor, editor1

Problem by reading of HTML Table

1 Attachments

4 replies

Reply

Helpful Members This Week

Recently Solved Questions

How to restart a REST Server in ArcGIS Server?

Remove last CR/LF from a CSV

1019 error with change detector and polygons

Where is the "Show Bookmark Navigator" option in FME 2024.2?

How to dynamically write new or update existing ArcGIS Online Feature Layers.

Community Stats

Latest FME

Cookie policy

Cookie settings

1 Attachments

Reply

Related Topics

HTML Table Reader - Header

PostgreSQL reader: bad_expected_accessicon

FeatureReader - Reading in records based on the LAST_UPDATED date fieldicon

FME Cloud writing from postgis to fgdb -Geodatabase Error (-2147220987): The user does not have permission to execute the operation.icon

FME Best Practice Validation Project (You Can Help!)icon

Helpful Members This Week

Recently Solved Questions

How to restart a REST Server in ArcGIS Server?

Remove last CR/LF from a CSV

1019 error with change detector and polygons

Where is the "Show Bookmark Navigator" option in FME 2024.2?

How to dynamically write new or update existing ArcGIS Online Feature Layers.

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings