Question

HTMLExtractor - Doesnt give webpage time to load

11 months ago
August 9, 2024
2 replies
37 views

vsalazar
Observer

I have an issue in which my HTMLExtractor does not give the page time to load before it returns a null value for the DIV im trying to scrape. It works well on pages that load quickly, but this particular page can take up to 10 seconds to load.

How do I work around this? Thank you!

+20

debbiatsafe
Safer
648 replies
11 months ago
August 13, 2024

Hello @vsalazar

Can you try using the HTTPCaller before the HTMLExtractor to retrieve the page? The HTTPCaller has various options such as multipart response handling and concurrent requests that may help.

+55

hkingsbury
Celebrity
1534 replies
11 months ago
August 13, 2024

It may also be that the content is being loaded after the HTML loads via Javascript. Javascript needs to be rendered client side so doing the server request via HTMLExtractor/HTTPCaller is not going to render the JS.

The plus side of JS tho, is that it is very likely then that the data is being pulled from an API which is a much more elegant and structured way than reading the webpage.

If you open dev tools on your browser and refresh the page, hopefully you’ll see some calls that contain the data you’re wanting. (there may be 100s of those calls, but once you know what you’re looking for it will be easy to find).

You can then just use the HTTPCaller to call the api directly. This is of course assuming there are no security measures in place, in which case it gets a bit harder

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

HTMLExtractor - Doesnt give webpage time to load