Skip to main content
Question

HTMLExtractor - Doesnt give webpage time to load


vsalazar
Observer
Forum|alt.badge.img+4

I have an issue in which my HTMLExtractor does not give the page time to load before it returns a null value for the DIV im trying to scrape.  It works well on pages that load quickly, but this particular page can take up to 10 seconds to load.  

 

How do I work around this?  Thank you!

2 replies

debbiatsafe
Safer
Forum|alt.badge.img+20
  • Safer
  • August 13, 2024

Hello @vsalazar 

Can you try using the HTTPCaller before the HTMLExtractor to retrieve the page? The HTTPCaller has various options such as multipart response handling and concurrent requests that may help.


hkingsbury
Celebrity
Forum|alt.badge.img+53
  • Celebrity
  • August 13, 2024

It may also be that the content is being loaded after the HTML loads via Javascript. Javascript needs to be rendered client side so doing the server request via HTMLExtractor/HTTPCaller is not going to render the JS.

The plus side of JS tho, is that it is very likely then that the data is being pulled from an API which is a much more elegant and structured way than reading the webpage.

If you open dev tools on your browser and refresh the page, hopefully you’ll see some calls that contain the data you’re wanting. (there may be 100s of those calls, but once you know what you’re looking for it will be easy to find).

You can then just use the HTTPCaller to call the api directly. This is of course assuming there are no security measures in place, in which case it gets a bit harder


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings