Question

Extract data from a web page

Forum|Forum|6 years ago
September 3, 2019
8 replies
298 views

mika
8 replies

Hi, I would like to extract data from a web page coded. I already get the password. The url is :https://www.portail-nextgen-telecom.tdf.fr. I need to read the url , find the data : Document Contractuel. Read it. If there is a value , get it. i tried with htlm extractor but the response is a script. I need a value. yes or no the Document contractuel is here. Anyone can help me ?

This post is closed to further activity.
It may be an old question, an answered question, an implemented idea, or a notification-only post.
Please check post dates before relying on any information in a question or answer.
For follow-up or related questions, please post a new question or idea.
If there is a genuine update to be made, please contact us and request that the post is reopened.

+22

oscard
Influencer
344 replies
Forum|Forum|6 years ago
September 3, 2019

If the response is a script or a HTML with <script> tags, HTMLExtractor won't work as expected (at least in my experience).

Without being able to take a look at the response, I can't help that much, but... have you tried the StringSearcher?

mika
Author
8 replies
Forum|Forum|6 years ago
September 3, 2019

If the response is a script or a HTML with <script> tags, HTMLExtractor won't work as expected (at least in my experience).

Without being able to take a look at the response, I can't help that much, but... have you tried the StringSearcher?

Yes I already did. But the response is still a script. After many tries, i succeed to extract all the html from the web site. Take a look . Now I want to extract the file name : Document Contractuel, from these htmls.

Thanks for your help.

+22

oscard
Influencer
344 replies
Forum|Forum|6 years ago
September 3, 2019

Thanks for your help.

I guess now you could use the Reader "Directory and File Pathnames" to read the folder where you have stored the HTML files.

After that, use the HTMLExtractor saying the HTML Input is a file. The path of that file is in the "path_windows" attribute.

mika
Author
8 replies
Forum|Forum|6 years ago
September 3, 2019

What should I write for the parameters of HTMLextractor ?

+22

oscard
Influencer
344 replies
Forum|Forum|6 years ago
September 3, 2019

What should I write for the parameters of HTMLextractor ?

First you will have to open the HTML with a text editor like Notepad++ and locate where the "Document Contractuel" is. Take a look between which tags is being written and use that info to create the query in FME.

It can get pretty complex depending on the HTML. I have found the "Help" of that transformer pretty useful, so take a look at it before doing anything. There are examples that could inspire you :)

mika
Author
8 replies
Forum|Forum|6 years ago
September 3, 2019

I open the HTML . We find :

The file that I want to extract is : DC_POUR_VALIDATION...docx. But the probleme is that the file is located behind an attribute : TYPE. So , what should I do to only extract this document?

+22

oscard
Influencer
344 replies
Forum|Forum|6 years ago
September 3, 2019

I open the HTML . We find :

The file that I want to extract is : DC_POUR_VALIDATION...docx. But the probleme is that the file is located behind an attribute : TYPE. So , what should I do to only extract this document?

You are opening the HTML with a browser. You need to open it with a text editor like Notepad++ to take a look at the HTML code, so you can check how the name of the document is written. Its tags, the divs, some class... Anything that lets you build a Query for the HTMLExtractor.

mika
Author
8 replies
Forum|Forum|6 years ago
September 3, 2019

Oh ok. I will fix that.

Thank you for your help

Extract data from a web page

8 replies

Community Stats

Latest FME

Community Stats

Latest FME

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded