Skip to main content
Question

Hi, additional to my question:https://community.safe.com/s/feed/0D54Q00009M3lLzSAJ The website where i get info has applied captcha.

  • March 15, 2022
  • 2 replies
  • 15 views

nurbek
Contributor
Forum|alt.badge.img+3

Now i receive errors, as a workaround i tried to save each web page as html file, how can i bulk read inside info of 

2 replies

kailinatsafe
Safer
Forum|alt.badge.img+23
  • Safer
  • 720 replies
  • March 15, 2022

Hello @nurbek​ , I'm not sure its possible to web scrape a website with CAPTCHA, or it wouldn't be easy at least, but you may be able to find resources online somewhere for that? However, if your HTML files are saved locally, you can try using a FeatureReader to read them in either as a text file or a HTML table. If you read in as text file, you can use a HTMLRExtractor afterwards to grab structured data from web pages. Hope this helps, Kailin.


hkingsbury
Celebrity
Forum|alt.badge.img+63
  • Celebrity
  • 1620 replies
  • March 15, 2022

There's a number of python packages that claim to be able to solve this (google 'python captcha solve') - https://pypi.org/project/captcha-solver/

 

I would suggest that this is going to be fraught with issues. The whole idea behind systems like captcha is to stop automated scraping and to ensure only humans are accessing sites