Question

Extract data from webpage

8 years ago
March 25, 2017
6 replies
180 views

+10

ingalla
Contributor
33 replies

Can anyone tell me the best method to use to scrape the following webpage

https://historicengland.org.uk/listing/the-list/list-entry/1435084

I am trying initially to extract the following data:-

Name
List Entry Number

I have tried using the HTTP Caller linked to a HTML Extractor but i can not get anything extracted.

Thanks in advance

+16

itay
Supporter
1439 replies
8 years ago
March 25, 2017

Hi @ingalla, how are you configuring the HTML Extractor? can you place a screen dump of the settings?

mark_f
325 replies
8 years ago
March 25, 2017

I tried with the CSS Selector .attributeList p and can't get a handle on

<div class="attributeList">
	<p><span>Name:</span> Basingstoke War Memorial</p>
	<p><span>List entry Number:</span> 1435084</p>
</div>

However, if I hack the HTML file and change the class value to all lower case "attributelist" it finds it with the selector .attributeList p

Maybe I'm missing something simple or there is an issue with case sensitivity.

+19

takashi
Contributor
7538 replies
8 years ago
March 26, 2017

@mark_1spatial, I also found the HTMLExtroctor doesn't work as expected if you set the class name ".attributeList" as the CSS Selector, in FME 2017.0.0.1 build 17271. I don't think you are missing something, and am afraid that there could be a potential bug here.

@ingalla, in the interim (and in FME 2016 or earlier), you can use the StringSearcher to extract <div class="attributeList"> elements from the entire HTML document, and then extract your desired strings which are stored in the <p> elements under the <div>.

Regular Expression Example

<div class="attributeList">.+?</div>

mark_f
325 replies
8 years ago
March 26, 2017

takashi wrote:

Regular Expression Example

<div class="attributeList">.+?</div>

Yes same build number as me. You posted in here about case sensitive tags:

https://knowledge.safe.com/questions/34058/how-to-parse-html-file.html

+12

mygis
Contributor
297 replies
7 years ago
March 31, 2017

Hi @ingalla , I could use this method , I did the workspace quickly, I am sure there are other methods, but this works. I hope it works for you too.

Good luck.

Lyes.

extractvaluesfromhtml.fmw

stefanh
Contributor
38 replies
7 years ago
October 19, 2017

mygis wrote:

Hi @ingalla , I could use this method , I did the workspace quickly, I am sure there are other methods, but this works. I hope it works for you too.

Good luck.

Lyes.

extractvaluesfromhtml.fmw

Thanks for the workspace example. This is very useful in my case.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

Extract data from webpage

6 replies

Reply

Helpful Members This Week

Recently Solved Questions

Read AEC Objects (Geometries and Attributes) in FME

Problems with points in Bufferer

WorkspaceReader - Find annotation linked to transformers

Linear Referencing Speed along line / Event CSV and Line Geometry

Reading and IFC-file, reproject it and write back to new IFC-file

Community Stats

Latest FME

Cookie policy

Cookie settings

Reply

Related Topics

Facebook lead ads data not going to emailicon

How to attach 2 pdf files to Outlook emailicon

My Zap was dissolved right after transferring data from Facebook Lead Ads to Google Sheetsicon

Facebook Lead to email addressicon

Issue with transferring emails from Facebook leads to Google Drive using Zapiericon

Helpful Members This Week

Recently Solved Questions

Read AEC Objects (Geometries and Attributes) in FME

Problems with points in Bufferer

WorkspaceReader - Find annotation linked to transformers

Linear Referencing Speed along line / Event CSV and Line Geometry

Reading and IFC-file, reproject it and write back to new IFC-file

Popular Tags

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings