Question

Hi, additional to my question:https://community.safe.com/s/feed/0D54Q00009M3lLzSAJ The website where i get info has applied captcha.

2 years ago
15 March 2022
2 replies
1 view

+2

nurbek
Contributor
11 replies

Now i receive errors, as a workaround i tried to save each web page as html file, how can i bulk read inside info of

2 replies

Userlevel 3

+13

Hello @nurbek , I'm not sure its possible to web scrape a website with CAPTCHA, or it wouldn't be easy at least, but you may be able to find resources online somewhere for that? However, if your HTML files are saved locally, you can try using a FeatureReader to read them in either as a text file or a HTML table. If you read in as text file, you can use a HTMLRExtractor afterwards to grab structured data from web pages. Hope this helps, Kailin.

Userlevel 5

+29

hkingsbury
Celebrity
1109 replies
2 years ago
15 March 2022

There's a number of python packages that claim to be able to solve this (google 'python captcha solve') - https://pypi.org/project/captcha-solver/

I would suggest that this is going to be fraught with issues. The whole idea behind systems like captcha is to stop automated scraping and to ensure only humans are accessing sites

Hi, additional to my question:https://community.safe.com/s/feed/0D54Q00009M3lLzSAJ The website where i get info has applied captcha.

2 replies

Reply

Community Stats

Reply

Community Stats

Sign up

An FME Account is required to contribute

Login to the community

An FME Account is required to contribute

Scanning file for viruses.

This file cannot be downloaded