Question

Hi, additional to my question:https://community.safe.com/s/feed/0D54Q00009M3lLzSAJ The website where i get info has applied captcha.


Badge +2

Now i receive errors, as a workaround i tried to save each web page as html file, how can i bulk read inside info of 


2 replies

Userlevel 3
Badge +13

Hello @nurbek​ , I'm not sure its possible to web scrape a website with CAPTCHA, or it wouldn't be easy at least, but you may be able to find resources online somewhere for that? However, if your HTML files are saved locally, you can try using a FeatureReader to read them in either as a text file or a HTML table. If you read in as text file, you can use a HTMLRExtractor afterwards to grab structured data from web pages. Hope this helps, Kailin.

Userlevel 5
Badge +29

There's a number of python packages that claim to be able to solve this (google 'python captcha solve') - https://pypi.org/project/captcha-solver/

 

I would suggest that this is going to be fraught with issues. The whole idea behind systems like captcha is to stop automated scraping and to ensure only humans are accessing sites

Reply