I’m trying to webscrape a particular url using requests-html module which uses Chromium browser. However Chromium couldn’t load what seem to be the Javascript portion and triggers timeout error. I thought html.render() would have rendered any Javascript code. I tested another python script using Selenium (with Google Chrome webdriver) and it ran perfectly.
So I tried to launch Chromium.exe manually (from %localappdata%\pyppeteer\pyppeteer\local-chromium) and browse the url directly, it takes forever to load the Javascript portion. My Chromium version is [71.0.3542.0 (Developer Build) (64-bit)] and it was installed the first time I ran my python script below. What should I do to make Chromium support Javascript preferably from within the python script?
from requests_html import HTMLSession
webURL = "https://mphonline.com/collections/architecture-landscaping"
session = HTMLSession(pyppeteer_kwargs = {'handleSIGINT' : False,
'handleSIGTERM': False,
'handleSIGHUP': False,
'headless': False,
}
)
root = session.get(webURL)
root.html.render(timeout= 30, keep_page=True) # default timeout is 8sec
titlesxpath = "//div[contains(@class, 'boost-sd__product-title')]"
titles = root.html.xpath(titlesxpath) # find title element
for title in titles:
print(title.text)
session.close()