Question

Convert Html Table to cvs or txt file

  • 20 August 2018
  • 6 replies
  • 20 views

I have hundreds of html tables based upon a year/month selection. I have figured out out to generate the html files. The problem I am having is converting the html to a usable tabular format. Example url. http://bbnet.gein.noa.gr/alerts_manual/2006/01/manual_alerts_01_2006.html

Any Ideas?

Thanks Frank

 


6 replies

Badge +7

HI @londonhb185, thanks for your question!

 

You can try using the HTMLExtractor to pull data out of your HTML table. However, it works best with HTML using CSS selectors to separate elements. It looks like your HTML mostly has inline styles and many of the same element (<td>,<tr>) which might make it hard to select individual rows or columns to put into features and attributes in FME.

 

Can I ask what format your source data is that you used to generate the HTML tables? Perhaps FME would better ingest it before it is in HTML format, if that's possible.

 

Hope this helps,

 

Nathan

HI @londonhb185, thanks for your question!

 

You can try using the HTMLExtractor to pull data out of your HTML table. However, it works best with HTML using CSS selectors to separate elements. It looks like your HTML mostly has inline styles and many of the same element (<td>,<tr>) which might make it hard to select individual rows or columns to put into features and attributes in FME.

 

Can I ask what format your source data is that you used to generate the HTML tables? Perhaps FME would better ingest it before it is in HTML format, if that's possible.

 

Hope this helps,

 

Nathan
@NathanAtSafe - Thanks for your response. The html is the source data. I'm trying to compile them by month/year:

 

http://bbnet.gein.noa.gr/alerts_manual/2006/01/manual_alerts_01_2006.html - Jan 2006

 

http://bbnet.gein.noa.gr/alerts_manual/2007/01/manual_alerts_01_2007.html - Jan 2007

 

etc.

 

 

My ultimate goal is to ping each html and convert to a format that I can consume in ESRI. I've tried many of the transformers, but I am unable to to see anything that resembles a table like you see when you open the html.

 

Thanks

 

Frank

 

 

 

Badge +7
@NathanAtSafe - Thanks for your response. The html is the source data. I'm trying to compile them by month/year:

 

http://bbnet.gein.noa.gr/alerts_manual/2006/01/manual_alerts_01_2006.html - Jan 2006

 

http://bbnet.gein.noa.gr/alerts_manual/2007/01/manual_alerts_01_2007.html - Jan 2007

 

etc.

 

 

My ultimate goal is to ping each html and convert to a format that I can consume in ESRI. I've tried many of the transformers, but I am unable to to see anything that resembles a table like you see when you open the html.

 

Thanks

 

Frank

 

 

 

Hi @londonhb185

 

My colleague @jlutherthomas pointed out that we also have an HTML Table reader format, although it does not seem to work on your HTML document either. After playing around a bit more with the HTMLExtractor, I think it should be able to do the job, except that it is struggling to handle a document as large as yours. For example, if I try to extract all the <tr> elements, I get a python error. This is a known limitation of the HTMLExtractor and we are investigating a fix for it. I'll be happy to keep you posted on any updates.

 

Best,

 

Nathan

 

 

Badge +7
https://safesoftware.atlassian.net/browse/FMEENGINE-56052

 

 

Badge +22

If all you want to do is write a csv file, then the HTML Table Reader will get you there.

 

 

If you want to work with the attributes in FME, things get more complicated.

 

 

noabroadband.fmw
Userlevel 2
Badge +17

Both the HTML Table reader and the HTMLExtractor don't seem to be able to parse this type HTML document as expected, but I found that the HTMLToXHTMLConverter can be used to convert the HTML to an XHTML docuemnt. Since an XHTML docuemnt is a valid XML document, you can then parse it with some XML transformers. For example: parse-html-as-xhtml.fmwt (FME 2018.1.0.0)

Reply