Skip to main content

Hello FME users,

I'm a beginner in HTML code and need some help extracting tables from websites. I've been using the HTMLTable Reader and it has been great if I need to work with just one hyperlink. I'm currently trying to use the HTTPCaller to extract tables from 300+ attributes that have hyperlinks attached.

I'm trying to only extract the 'WELL HISTORY' tables for each attribute.

An example hyperlink is below:

http://sonlite.dnr.state.la.us/sundown/cart_prod/cart_con_wellinfo2?p_WSN=199138

Copy of HTML code is attached

Any and all help is greatly appreciated.

Thanks

Hi @ngstoke, if you have an external table containing the 300+ hyperlinks (URLs), read the table and send each feature that contains a URL as its attribute (e.g. called "_url") to the FeatureReader to read HTML tables from each URL.

In my testing, the HTML Table reader has read 15 different tables from the URL example, and the WELL HISTORY table was the 6th table among them. There is no way to identify the table name directly according to the HTML contents, but if the WELL HISTORY is always 6th, you can filter the records by the feature type name. That is, the HTML Table reader gives "Table6" to the 6th table as its feature type name, and therefore you can use FeatureTypeFilter to filter the records like this.


Hi @ngstoke, if you have an external table containing the 300+ hyperlinks (URLs), read the table and send each feature that contains a URL as its attribute (e.g. called "_url") to the FeatureReader to read HTML tables from each URL.

In my testing, the HTML Table reader has read 15 different tables from the URL example, and the WELL HISTORY table was the 6th table among them. There is no way to identify the table name directly according to the HTML contents, but if the WELL HISTORY is always 6th, you can filter the records by the feature type name. That is, the HTML Table reader gives "Table6" to the 6th table as its feature type name, and therefore you can use FeatureTypeFilter to filter the records like this.

Alternatively, you can set a constraint on the feature type name to be read via the Feature Types to Read parameter in the FeatureReader. If you do so, the FeatureTypeFilter is not necessary. Assuming that an attribute called "_tableName" stores "Table6":

 

 

 


Hi @ngstoke, if you have an external table containing the 300+ hyperlinks (URLs), read the table and send each feature that contains a URL as its attribute (e.g. called "_url") to the FeatureReader to read HTML tables from each URL.

In my testing, the HTML Table reader has read 15 different tables from the URL example, and the WELL HISTORY table was the 6th table among them. There is no way to identify the table name directly according to the HTML contents, but if the WELL HISTORY is always 6th, you can filter the records by the feature type name. That is, the HTML Table reader gives "Table6" to the 6th table as its feature type name, and therefore you can use FeatureTypeFilter to filter the records like this.

Hi @takashi. Thank you for the help. My FeatureReader and FeatureTypeFilter outputs are completely empty, but they are showing as matching on the Table6, it's just that there is no data there just blank lines.

 

 

I've attached a workspace template, I'm thinking I'm just missing something simple.

 

 

Thanks again for your help.

 

table6-html.fmwt

 


The Table6 can be read as expected with the FeatureReader in your workspace. Attribute (field) names are just not exposed on Workbench/Data Inspector GUI. You can see them in the Feature Information window of Data Inspector when you select a row in the Table View.

If you need to expose the attributes on Workbench GUI, you will have to enter required attribute names (SERIAL, WELL NAME etc.) manually to the Attributes to Expose parameter in the FeatureReader. Alternatively, you can also use the AttributeExposer transformer to expose required attribute names. Just be aware that FME is case sensitive on attribute names.


@takashi. Thank you for the help on this.


Reply