Uncategorized

web scraping – Python webscraping with Chromium browser cannot load Javascript but Chrome can


I’m trying to webscrape a particular url using requests-html module which uses Chromium browser. However Chromium couldn’t load what seem to be the Javascript portion and triggers timeout error. I thought html.render() would have rendered any Javascript code. I tested another python script using Selenium (with Google Chrome webdriver) and it ran perfectly.

So I tried to launch Chromium.exe manually (from %localappdata%\pyppeteer\pyppeteer\local-chromium) and browse the url directly, it takes forever to load the Javascript portion. My Chromium version is [71.0.3542.0 (Developer Build) (64-bit)] and it was installed the first time I ran my python script below. What should I do to make Chromium support Javascript preferably from within the python script?

from requests_html import HTMLSession

webURL = "https://mphonline.com/collections/architecture-landscaping"
session = HTMLSession(pyppeteer_kwargs = {'handleSIGINT' : False,
                               'handleSIGTERM': False,
                               'handleSIGHUP': False,
                               'headless': False,
                               }
                           )

root = session.get(webURL)
root.html.render(timeout= 30, keep_page=True)    # default timeout is 8sec

titlesxpath = "//div[contains(@class, 'boost-sd__product-title')]"
titles = root.html.xpath(titlesxpath)  # find title element
for title in titles:
    print(title.text)

session.close()



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *