![image](https://github.com/Asugawara/requests-html-playwright/actions/workflows/run_test.yml/badge.svg)
![PyPI](https://img.shields.io/pypi/v/requests-html-playwright?color=green)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/requests-html-playwright)
![GitHub](https://img.shields.io/github/license/Asugawara/requests-html-playwright)
# Requests-HTML(with microsoft/playwright-python): HTML Parsing for Humans™
This library intends to make parsing HTML (e.g. scraping the web) as
simple and intuitive as possible.
When using this library you automatically get:
- **Full JavaScript support**! (Using Chromium(Webkit, Firefox), thanks to playwright)
- *CSS Selectors* (a.k.a jQuery-style, thanks to PyQuery).
- *XPath Selectors*, for the faint of heart.
- Mocked user-agent (like a real web browser).
- Automatic following of redirects.
- Connection--pooling and cookie persistence.
- The Requests experience you know and love, with magical parsing
abilities.
- **Async Support**
# Installation
```bash
$ pip install requests-html-playwright
✨🍰✨
```
Only **Python 3.8 and above** is supported.
# Tutorial & Usage
Make a GET request to \'python.org\', using Requests:
```python
>>> from requests_html_playwright import HTMLSession
>>> session = HTMLSession()
>>> r = session.get('https://python.org/')
```
Try async and get some sites at the same time:
```python
>>> from requests_html_playwright import AsyncHTMLSession
>>> asession = AsyncHTMLSession()
>>> async def get_pythonorg():
... r = await asession.get('https://python.org/')
... return r
...
>>> async def get_reddit():
... r = await asession.get('https://reddit.com/')
... return r
...
>>> async def get_google():
... r = await asession.get('https://google.com/')
... return r
...
>>> results = asession.run(get_pythonorg, get_reddit, get_google)
>>> results # check the requests all returned a 200 (success) code
[<Response [200]>, <Response [200]>, <Response [200]>]
>>> # Each item in the results list is a response object and can be interacted with as such
>>> for result in results:
... print(result.html.url)
...
https://www.python.org/
https://www.google.com/
https://www.reddit.com/
```
Note that the order of the objects in the results list represents the
order they were returned in, not the order that the coroutines are
passed to the `run` method, which is shown in the example by the order
being different.
Grab a list of all links on the page, as--is (anchors excluded):
```python
>>> r.html.links
{'//docs.python.org/3/tutorial/', '/about/apps/', 'https://github.com/python/pythondotorg/issues', '/accounts/login/', '/dev/peps/', '/about/legal/', '//docs.python.org/3/tutorial/introduction.html#lists', '/download/alternatives', 'http://feedproxy.google.com/~r/PythonInsider/~3/kihd2DW98YY/python-370a4-is-available-for-testing.html', '/download/other/', '/downloads/windows/', 'https://mail.python.org/mailman/listinfo/python-dev', '/doc/av', 'https://devguide.python.org/', '/about/success/#engineering', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', 'https://www.openstack.org', '/about/gettingstarted/', 'http://feedproxy.google.com/~r/PythonInsider/~3/AMoBel8b8Mc/python-3.html', '/success-stories/industrial-light-magic-runs-python/', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', '/', 'http://pyfound.blogspot.com/', '/events/python-events/past/', '/downloads/release/python-2714/', 'https://wiki.python.org/moin/PythonBooks', 'http://plus.google.com/+Python', 'https://wiki.python.org/moin/', 'https://status.python.org/', '/community/workshops/', '/community/lists/', 'http://buildbot.net/', '/community/awards', 'http://twitter.com/ThePSF', 'https://docs.python.org/3/license.html', '/psf/donations/', 'http://wiki.python.org/moin/Languages', '/dev/', '/events/python-user-group/', 'https://wiki.qt.io/PySide', '/community/sigs/', 'https://wiki.gnome.org/Projects/PyGObject', 'http://www.ansible.com', 'http://www.saltstack.com', 'http://planetpython.org/', '/events/python-events', '/about/help/', '/events/python-user-group/past/', '/about/success/', '/psf-landing/', '/about/apps', '/about/', 'http://www.wxpython.org/', '/events/python-user-group/665/', 'https://www.python.org/psf/codeofconduct/', '/dev/peps/peps.rss', '/downloads/source/', '/psf/sponsorship/sponsors/', 'http://bottlepy.org', 'http://roundup.sourceforge.net/', 'http://pandas.pydata.org/', 'http://brochure.getpython.info/', 'https://bugs.python.org/', '/community/merchandise/', 'http://tornadoweb.org', '/events/python-user-group/650/', 'http://flask.pocoo.org/', '/downloads/release/python-364/', '/events/python-user-group/660/', '/events/python-user-group/638/', '/psf/', '/doc/', 'http://blog.python.org', '/events/python-events/604/', '/about/success/#government', 'http://python.org/dev/peps/', 'https://docs.python.org', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', '/users/membership/', '/about/success/#arts', 'https://wiki.python.org/moin/Python2orPython3', '/downloads/', '/jobs/', 'http://trac.edgewall.org/', 'http://feedproxy.google.com/~r/PythonInsider/~3/wh73_1A-N7Q/python-355rc1-and-python-348rc1-are-now.html', '/privacy/', 'https://pypi.python.org/', 'http://www.riverbankcomputing.co.uk/software/pyqt/intro', 'http://www.scipy.org', '/community/forums/', '/about/success/#scientific', '/about/success/#software-development', '/shell/', '/accounts/signup/', 'http://www.facebook.com/pythonlang?fref=ts', '/community/', 'https://kivy.org/', '/about/quotes/', 'http://www.web2py.com/', '/community/logos/', '/community/diversity/', '/events/calendars/', 'https://wiki.python.org/moin/BeginnersGuide', '/success-stories/', '/doc/essays/', '/dev/core-mentorship/', 'http://ipython.org', '/events/', '//docs.python.org/3/tutorial/controlflow.html', '/about/success/#education', '/blogs/', '/community/irc/', 'http:/python.blogspot.com/', '//jobs.python.org', 'http://www.pylonsproject.org/', 'http://www.djangoproject.com/', '/downloads/mac-osx/', '/about/success/#business', 'http://feedproxy.google.com/~r/PythonInsider/~3/x_c9D0S-4C4/python-370b1-is-now-available-for.html', 'http://wiki.python.org/moin/TkInter', 'https://docs.python.org/faq/', '//docs.python.org/3/tutorial/controlflow.html#defining-functions'}
```
Grab a list of all links on the page, in absolute form (anchors
excluded):
```python
>>> r.html.absolute_links
{'https://github.com/python/pythondotorg/issues', 'https://docs.python.org/3/tutorial/', 'https://www.python.org/about/success/', 'http://feedproxy.google.com/~r/PythonInsider/~3/kihd2DW98YY/python-370a4-is-available-for-testing.html', 'https://www.python.org/dev/peps/', 'https://mail.python.org/mailman/listinfo/python-dev', 'https://www.python.org/doc/', 'https://www.python.org/', 'https://www.python.org/about/', 'https://www.python.org/events/python-events/past/', 'https://devguide.python.org/', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', 'https://www.openstack.org', 'http://feedproxy.google.com/~r/PythonInsider/~3/AMoBel8b8Mc/python-3.html', 'https://docs.python.org/3/tutorial/introduction.html#lists', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', 'http://pyfound.blogspot.com/', 'https://wiki.python.org/moin/PythonBooks', 'http://plus.google.com/+Python', 'https://wiki.python.org/moin/', 'https://www.python.org/events/python-events', 'https://status.python.org/', 'https://www.python.org/about/apps', 'https://www.python.org/downloads/release/python-2714/', 'https://www.python.org/psf/donations/', 'http://buildbot.net/', 'http://twitter.com/ThePSF', 'https://docs.python.org/3/license.html', 'http://wiki.python.org/moin/Languages', 'https://docs.python.org/faq/', 'https://jobs.python.org', 'https://www.python.org/about/success/#software-development', 'https://www.python.org/about/success/#education', 'https://www.python.org/community/logos/', 'https://www.python.org/doc/av', 'https://wiki.qt.io/PySide', 'https://www.python.org/events/python-user-group/660/', 'https://wiki.gnome.org/Projects/PyGObject', 'http://www.ansible.com', 'http://www.saltstack.com', 'https://www.python.org/dev/peps/peps.rss', 'http://planetpython.org/', 'https://www.python.org/events/python-user-group/past/', 'https://docs.python.org/3/tutorial/controlflow.html#defining-functions', 'https://www.python.org/community/diversity/', 'https://docs.python.org/3/tutorial/controlflow.html', 'https://www.python.org/community/awards', 'https://www.python.org/events/python-user-group/638/', 'https://www.python.org/about/legal/', 'https://www.python.org/dev/', 'https://www.python.org/download/alternatives', 'https://www.python.org/downloads/', 'https://www.python.org/community/lists/', 'http://www.wxpython.org/', 'https://www.python.org/about/success/#government', 'https://www.python.org/psf/', 'https://www.python.org/psf/codeofconduct/', 'http://bottlepy.org', 'http://roundup.sourceforge.net/', 'http://pandas.pydata.org/', 'http://brochure.getpython.info/', 'https://www.python.org/downloads/source/', 'https://bugs.python.org/', 'https://www.python.org/downloads/mac-osx/', 'https://www.python.org/about/help/', 'http://tornadoweb.org', 'http://flask.pocoo.org/', 'https://www.python.org/users/membership/', 'http://blog.python.org', 'https://www.python.org/privacy/', 'https://www.python.org/about/gettingstarted/', 'http://python.org/dev/peps/', 'https://www.python.org/about/apps/', 'https://docs.python.org', 'https://www.python.org/success-stories/', 'https://www.python.org/community/forums/', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', 'https://www.python.org/community/merchandise/', 'https://www.python.org/about/success/#arts', 'https://wiki.python.org/moin/Python2orPython3', 'http://trac.edgewall.org/', 'http://feedproxy.google.com/~r/PythonInsider/~3/wh73_1A-N7Q/python-355rc1-and-python-348rc1-are-now.html', 'https://pypi.python.org/', 'https://www.python.org/events/python-user-group/650/', 'http://www.riverbankcomputing.co.uk/software/pyqt/intro', 'https://www.python.org/about/quotes/', 'https://www.python.org/downloads/windows/', 'https://www.python.org/events/calendars/', 'http://www.scipy.org', 'https://www.python.org/community/workshops/', 'https://www.python.org/blogs/', 'https://www.python.org/accounts/signup/', 'https://www.python.org/events/', 'https://kivy.org/', 'http://www.facebook.com/pythonlang?fref=ts', 'http://www.web2py.com/', 'https://www.python.org/psf/sponsorship/sponsors/', 'https://www.python.org/community/', 'https://www.python.org/download/other/', 'https://www.python.org/psf-landing/', 'https://www.python.org/events/python-user-group/665/', 'https://wiki.python.org/moin/BeginnersGuide', 'https://www.python.org/accounts/login/', 'https://www.python.org/downloads/release/python-364/', 'https://www.python.org/dev/core-mentorship/', 'https://www.python.org/about/success/#business', 'https://www.python.org/community/sigs/', 'https://www.python.org/events/python-user-group/', 'http://ipython.org', 'https://www.python.org/shell/', 'https://www.python.org/community/irc/', 'https://www.python.org/about/success/#engineering', 'http://www.pylonsproject.org/', 'http:/python.blogspot.com/', 'https://www.python.org/about/success/#scientific', 'https://www.python.org/doc/essays/', 'http://www.djangoproject.com/', 'https://www.python.org/success-stories/industrial-light-magic-runs-python/', 'http://feedproxy.google.com/~r/PythonInsider/~3/x_c9D0S-4C4/python-370b1-is-now-available-for.html', 'http://wiki.python.org/moin/TkInter', 'https://www.python.org/jobs/', 'https://www.python.org/events/python-events/604/'}
```
Select an element with a CSS Selector:
```python
>>> about = r.html.find('#about', first=True)
```
Grab an element\'s text contents:
```python
>>> print(about.text)
About
Applications
Quotes
Getting Started
Help
Python Brochure
```
Introspect an Element\'s attributes:
```python
>>> about.attrs
{'id': 'about', 'class': ('tier-1', 'element-1'), 'aria-haspopup': 'true'}
```
Render out an Element\'s HTML:
```python
>>> about.html
'<li aria-haspopup="true" class="tier-1 element-1 " id="about">\n<a class="" href="/about/" title="">About</a>\n<ul aria-hidden="true" class="subnav menu" role="menu">\n<li class="tier-2 element-1" role="treeitem"><a href="/about/apps/" title="">Applications</a></li>\n<li class="tier-2 element-2" role="treeitem"><a href="/about/quotes/" title="">Quotes</a></li>\n<li class="tier-2 element-3" role="treeitem"><a href="/about/gettingstarted/" title="">Getting Started</a></li>\n<li class="tier-2 element-4" role="treeitem"><a href="/about/help/" title="">Help</a></li>\n<li class="tier-2 element-5" role="treeitem"><a href="http://brochure.getpython.info/" title="">Python Brochure</a></li>\n</ul>\n</li>'
```
Select Elements within Elements:
```python
>>> about.find('a')
[<Element 'a' href='/about/' title='' class=''>, <Element 'a' href='/about/apps/' title=''>, <Element 'a' href='/about/quotes/' title=''>, <Element 'a' href='/about/gettingstarted/' title=''>, <Element 'a' href='/about/help/' title=''>, <Element 'a' href='http://brochure.getpython.info/' title=''>]
```
Search for links within an element:
```python
>>> about.absolute_links
{'http://brochure.getpython.info/', 'https://www.python.org/about/gettingstarted/', 'https://www.python.org/about/', 'https://www.python.org/about/quotes/', 'https://www.python.org/about/help/', 'https://www.python.org/about/apps/'}
```
Search for text on the page:
```python
>>> r.html.search('Python is a {} language')[0]
programming
```
More complex CSS Selector example (copied from Chrome dev tools):
```python
>>> r = session.get('https://python.org/')
>>> sel = '#about > ul > li.tier-2.element-1 > a'
>>> print(r.html.find(sel, first=True).text)
Applications
```
XPath is also supported:
```python
>>> r.html.xpath('//*[@id="about"]/ul/li[1]/a')
[<Element 'a' href='/about/apps/' title=''>]
```
# JavaScript Support
Let\'s grab some text that\'s rendered by JavaScript. Until 2020, the
Python 2.7 countdown clock (<https://pythonclock.org>) will serve as a
good test page:
```python
>>> r = session.get('https://pythonclock.org')
```
Let\'s try and see the dynamically rendered code (The countdown clock).
To do that quickly at first, we\'ll search between the last text we see
before it (\'Python 2.7 will retire in\...\') and the first text we see
after it (\'Enable Guido Mode\').
```python
>>> r.html.search('Python 2.7 will retire in...{}Enable Guido Mode')[0]
'</h1>\n </div>\n <div class="python-27-clock"></div>\n <div class="center">\n <div class="guido-button-block">\n <button class="js-guido-mode guido-button">'
```
Notice the clock is missing. The `render()` method takes the response
and renders the dynamic content just like a web browser would.
```python
>>> r.html.render()
>>> r.html.search('Python 2.7 will retire in...{}Enable Guido Mode')[0]
'</h1>\n </div>\n <div class="python-27-clock is-countdown"><span class="countdown-row countdown-show6"><span class="countdown-section"><span class="countdown-amount">1</span><span class="countdown-period">Year</span></span><span class="countdown-section"><span class="countdown-amount">2</span><span class="countdown-period">Months</span></span><span class="countdown-section"><span class="countdown-amount">28</span><span class="countdown-period">Days</span></span><span class="countdown-section"><span class="countdown-amount">16</span><span class="countdown-period">Hours</span></span><span class="countdown-section"><span class="countdown-amount">52</span><span class="countdown-period">Minutes</span></span><span class="countdown-section"><span class="countdown-amount">46</span><span class="countdown-period">Seconds</span></span></span></div>\n <div class="center">\n <div class="guido-button-block">\n <button class="js-guido-mode guido-button">'
```
Let\'s clean it up a bit. This step is not needed, it just makes it a
bit easier to visualize the returned html to see what we need to target
to extract our required information.
```python
>>> from pprint import pprint
>>> pprint(r.html.search('Python 2.7 will retire in...{}Enable')[0])
('</h1>\n'
' </div>\n'
' <div class="python-27-clock is-countdown"><span class="countdown-row '
'countdown-show6"><span class="countdown-section"><span '
'class="countdown-amount">1</span><span '
'class="countdown-period">Year</span></span><span '
'class="countdown-section"><span class="countdown-amount">2</span><span '
'class="countdown-period">Months</span></span><span '
'class="countdown-section"><span class="countdown-amount">28</span><span '
'class="countdown-period">Days</span></span><span '
'class="countdown-section"><span class="countdown-amount">16</span><span '
'class="countdown-period">Hours</span></span><span '
'class="countdown-section"><span class="countdown-amount">52</span><span '
'class="countdown-period">Minutes</span></span><span '
'class="countdown-section"><span class="countdown-amount">46</span><span '
'class="countdown-period">Seconds</span></span></span></div>\n'
' <div class="center">\n'
' <div class="guido-button-block">\n'
' <button class="js-guido-mode guido-button">')
```
The rendered html has all the same methods and attributes as above.
Let\'s extract just the data that we want out of the clock into
something easy to use elsewhere and introspect like a dictionary.
```python
>>> periods = [element.text for element in r.html.find('.countdown-period')]
>>> amounts = [element.text for element in r.html.find('.countdown-amount')]
>>> countdown_data = dict(zip(periods, amounts))
>>> countdown_data
{'Year': '1', 'Months': '2', 'Days': '5', 'Hours': '23', 'Minutes': '34', 'Seconds': '37'}
```
Or you can do this async also:
```python
>>> async def get_pyclock():
... r = await asession.get('https://pythonclock.org/')
... await r.html.arender()
... return r
...
>>> results = asession.run(get_pyclock, get_pyclock, get_pyclock)
```
The rest of the code operates the same way as the synchronous version
except that `results` is a list containing multiple response objects
however the same basic processes can be applied as above to extract the
data you want.
# Using without Requests
You can also use this library without Requests:
```python
>>> from requests_html_playwright import HTML
>>> doc = """<a href='https://httpbin.org'>"""
>>> html = HTML(html=doc)
>>> html.links
{'https://httpbin.org'}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Asugawara/requests-html-playwright",
"name": "requests-html-playwright",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.8",
"maintainer_email": null,
"keywords": "requests-html, playwright",
"author": "Asugawara",
"author_email": "asgasw@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b0/fd/4a6a4bb4bb2dd563a0b490220656443e80b4657f34e1d4e62074f51765b7/requests_html_playwright-0.12.3.tar.gz",
"platform": null,
"description": "![image](https://github.com/Asugawara/requests-html-playwright/actions/workflows/run_test.yml/badge.svg)\n![PyPI](https://img.shields.io/pypi/v/requests-html-playwright?color=green)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/requests-html-playwright)\n![GitHub](https://img.shields.io/github/license/Asugawara/requests-html-playwright)\n\n# Requests-HTML(with microsoft/playwright-python): HTML Parsing for Humans\u2122\n\nThis library intends to make parsing HTML (e.g. scraping the web) as\nsimple and intuitive as possible.\n\nWhen using this library you automatically get:\n\n- **Full JavaScript support**! (Using Chromium(Webkit, Firefox), thanks to playwright)\n- *CSS Selectors* (a.k.a jQuery-style, thanks to PyQuery).\n- *XPath Selectors*, for the faint of heart.\n- Mocked user-agent (like a real web browser).\n- Automatic following of redirects.\n- Connection--pooling and cookie persistence.\n- The Requests experience you know and love, with magical parsing\n abilities.\n- **Async Support**\n\n# Installation\n\n```bash\n$ pip install requests-html-playwright\n\u2728\ud83c\udf70\u2728\n```\n\nOnly **Python 3.8 and above** is supported.\n# Tutorial & Usage\n\nMake a GET request to \\'python.org\\', using Requests:\n\n```python\n>>> from requests_html_playwright import HTMLSession\n>>> session = HTMLSession()\n>>> r = session.get('https://python.org/')\n```\n\nTry async and get some sites at the same time:\n\n```python\n>>> from requests_html_playwright import AsyncHTMLSession\n>>> asession = AsyncHTMLSession()\n>>> async def get_pythonorg():\n... r = await asession.get('https://python.org/')\n... return r\n...\n>>> async def get_reddit():\n... r = await asession.get('https://reddit.com/')\n... return r\n...\n>>> async def get_google():\n... r = await asession.get('https://google.com/')\n... return r\n...\n>>> results = asession.run(get_pythonorg, get_reddit, get_google)\n>>> results # check the requests all returned a 200 (success) code\n[<Response [200]>, <Response [200]>, <Response [200]>]\n>>> # Each item in the results list is a response object and can be interacted with as such\n>>> for result in results:\n... print(result.html.url)\n...\nhttps://www.python.org/\nhttps://www.google.com/\nhttps://www.reddit.com/\n```\n\nNote that the order of the objects in the results list represents the\norder they were returned in, not the order that the coroutines are\npassed to the `run` method, which is shown in the example by the order\nbeing different.\n\nGrab a list of all links on the page, as--is (anchors excluded):\n\n```python\n>>> r.html.links\n{'//docs.python.org/3/tutorial/', '/about/apps/', 'https://github.com/python/pythondotorg/issues', '/accounts/login/', '/dev/peps/', '/about/legal/', '//docs.python.org/3/tutorial/introduction.html#lists', '/download/alternatives', 'http://feedproxy.google.com/~r/PythonInsider/~3/kihd2DW98YY/python-370a4-is-available-for-testing.html', '/download/other/', '/downloads/windows/', 'https://mail.python.org/mailman/listinfo/python-dev', '/doc/av', 'https://devguide.python.org/', '/about/success/#engineering', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', 'https://www.openstack.org', '/about/gettingstarted/', 'http://feedproxy.google.com/~r/PythonInsider/~3/AMoBel8b8Mc/python-3.html', '/success-stories/industrial-light-magic-runs-python/', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', '/', 'http://pyfound.blogspot.com/', '/events/python-events/past/', '/downloads/release/python-2714/', 'https://wiki.python.org/moin/PythonBooks', 'http://plus.google.com/+Python', 'https://wiki.python.org/moin/', 'https://status.python.org/', '/community/workshops/', '/community/lists/', 'http://buildbot.net/', '/community/awards', 'http://twitter.com/ThePSF', 'https://docs.python.org/3/license.html', '/psf/donations/', 'http://wiki.python.org/moin/Languages', '/dev/', '/events/python-user-group/', 'https://wiki.qt.io/PySide', '/community/sigs/', 'https://wiki.gnome.org/Projects/PyGObject', 'http://www.ansible.com', 'http://www.saltstack.com', 'http://planetpython.org/', '/events/python-events', '/about/help/', '/events/python-user-group/past/', '/about/success/', '/psf-landing/', '/about/apps', '/about/', 'http://www.wxpython.org/', '/events/python-user-group/665/', 'https://www.python.org/psf/codeofconduct/', '/dev/peps/peps.rss', '/downloads/source/', '/psf/sponsorship/sponsors/', 'http://bottlepy.org', 'http://roundup.sourceforge.net/', 'http://pandas.pydata.org/', 'http://brochure.getpython.info/', 'https://bugs.python.org/', '/community/merchandise/', 'http://tornadoweb.org', '/events/python-user-group/650/', 'http://flask.pocoo.org/', '/downloads/release/python-364/', '/events/python-user-group/660/', '/events/python-user-group/638/', '/psf/', '/doc/', 'http://blog.python.org', '/events/python-events/604/', '/about/success/#government', 'http://python.org/dev/peps/', 'https://docs.python.org', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', '/users/membership/', '/about/success/#arts', 'https://wiki.python.org/moin/Python2orPython3', '/downloads/', '/jobs/', 'http://trac.edgewall.org/', 'http://feedproxy.google.com/~r/PythonInsider/~3/wh73_1A-N7Q/python-355rc1-and-python-348rc1-are-now.html', '/privacy/', 'https://pypi.python.org/', 'http://www.riverbankcomputing.co.uk/software/pyqt/intro', 'http://www.scipy.org', '/community/forums/', '/about/success/#scientific', '/about/success/#software-development', '/shell/', '/accounts/signup/', 'http://www.facebook.com/pythonlang?fref=ts', '/community/', 'https://kivy.org/', '/about/quotes/', 'http://www.web2py.com/', '/community/logos/', '/community/diversity/', '/events/calendars/', 'https://wiki.python.org/moin/BeginnersGuide', '/success-stories/', '/doc/essays/', '/dev/core-mentorship/', 'http://ipython.org', '/events/', '//docs.python.org/3/tutorial/controlflow.html', '/about/success/#education', '/blogs/', '/community/irc/', 'http:/python.blogspot.com/', '//jobs.python.org', 'http://www.pylonsproject.org/', 'http://www.djangoproject.com/', '/downloads/mac-osx/', '/about/success/#business', 'http://feedproxy.google.com/~r/PythonInsider/~3/x_c9D0S-4C4/python-370b1-is-now-available-for.html', 'http://wiki.python.org/moin/TkInter', 'https://docs.python.org/faq/', '//docs.python.org/3/tutorial/controlflow.html#defining-functions'}\n```\n\nGrab a list of all links on the page, in absolute form (anchors\nexcluded):\n\n```python\n>>> r.html.absolute_links\n{'https://github.com/python/pythondotorg/issues', 'https://docs.python.org/3/tutorial/', 'https://www.python.org/about/success/', 'http://feedproxy.google.com/~r/PythonInsider/~3/kihd2DW98YY/python-370a4-is-available-for-testing.html', 'https://www.python.org/dev/peps/', 'https://mail.python.org/mailman/listinfo/python-dev', 'https://www.python.org/doc/', 'https://www.python.org/', 'https://www.python.org/about/', 'https://www.python.org/events/python-events/past/', 'https://devguide.python.org/', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', 'https://www.openstack.org', 'http://feedproxy.google.com/~r/PythonInsider/~3/AMoBel8b8Mc/python-3.html', 'https://docs.python.org/3/tutorial/introduction.html#lists', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', 'http://pyfound.blogspot.com/', 'https://wiki.python.org/moin/PythonBooks', 'http://plus.google.com/+Python', 'https://wiki.python.org/moin/', 'https://www.python.org/events/python-events', 'https://status.python.org/', 'https://www.python.org/about/apps', 'https://www.python.org/downloads/release/python-2714/', 'https://www.python.org/psf/donations/', 'http://buildbot.net/', 'http://twitter.com/ThePSF', 'https://docs.python.org/3/license.html', 'http://wiki.python.org/moin/Languages', 'https://docs.python.org/faq/', 'https://jobs.python.org', 'https://www.python.org/about/success/#software-development', 'https://www.python.org/about/success/#education', 'https://www.python.org/community/logos/', 'https://www.python.org/doc/av', 'https://wiki.qt.io/PySide', 'https://www.python.org/events/python-user-group/660/', 'https://wiki.gnome.org/Projects/PyGObject', 'http://www.ansible.com', 'http://www.saltstack.com', 'https://www.python.org/dev/peps/peps.rss', 'http://planetpython.org/', 'https://www.python.org/events/python-user-group/past/', 'https://docs.python.org/3/tutorial/controlflow.html#defining-functions', 'https://www.python.org/community/diversity/', 'https://docs.python.org/3/tutorial/controlflow.html', 'https://www.python.org/community/awards', 'https://www.python.org/events/python-user-group/638/', 'https://www.python.org/about/legal/', 'https://www.python.org/dev/', 'https://www.python.org/download/alternatives', 'https://www.python.org/downloads/', 'https://www.python.org/community/lists/', 'http://www.wxpython.org/', 'https://www.python.org/about/success/#government', 'https://www.python.org/psf/', 'https://www.python.org/psf/codeofconduct/', 'http://bottlepy.org', 'http://roundup.sourceforge.net/', 'http://pandas.pydata.org/', 'http://brochure.getpython.info/', 'https://www.python.org/downloads/source/', 'https://bugs.python.org/', 'https://www.python.org/downloads/mac-osx/', 'https://www.python.org/about/help/', 'http://tornadoweb.org', 'http://flask.pocoo.org/', 'https://www.python.org/users/membership/', 'http://blog.python.org', 'https://www.python.org/privacy/', 'https://www.python.org/about/gettingstarted/', 'http://python.org/dev/peps/', 'https://www.python.org/about/apps/', 'https://docs.python.org', 'https://www.python.org/success-stories/', 'https://www.python.org/community/forums/', 'http://feedproxy.google.com/~r/PythonInsider/~3/zVC80sq9s00/python-364-is-now-available.html', 'https://www.python.org/community/merchandise/', 'https://www.python.org/about/success/#arts', 'https://wiki.python.org/moin/Python2orPython3', 'http://trac.edgewall.org/', 'http://feedproxy.google.com/~r/PythonInsider/~3/wh73_1A-N7Q/python-355rc1-and-python-348rc1-are-now.html', 'https://pypi.python.org/', 'https://www.python.org/events/python-user-group/650/', 'http://www.riverbankcomputing.co.uk/software/pyqt/intro', 'https://www.python.org/about/quotes/', 'https://www.python.org/downloads/windows/', 'https://www.python.org/events/calendars/', 'http://www.scipy.org', 'https://www.python.org/community/workshops/', 'https://www.python.org/blogs/', 'https://www.python.org/accounts/signup/', 'https://www.python.org/events/', 'https://kivy.org/', 'http://www.facebook.com/pythonlang?fref=ts', 'http://www.web2py.com/', 'https://www.python.org/psf/sponsorship/sponsors/', 'https://www.python.org/community/', 'https://www.python.org/download/other/', 'https://www.python.org/psf-landing/', 'https://www.python.org/events/python-user-group/665/', 'https://wiki.python.org/moin/BeginnersGuide', 'https://www.python.org/accounts/login/', 'https://www.python.org/downloads/release/python-364/', 'https://www.python.org/dev/core-mentorship/', 'https://www.python.org/about/success/#business', 'https://www.python.org/community/sigs/', 'https://www.python.org/events/python-user-group/', 'http://ipython.org', 'https://www.python.org/shell/', 'https://www.python.org/community/irc/', 'https://www.python.org/about/success/#engineering', 'http://www.pylonsproject.org/', 'http:/python.blogspot.com/', 'https://www.python.org/about/success/#scientific', 'https://www.python.org/doc/essays/', 'http://www.djangoproject.com/', 'https://www.python.org/success-stories/industrial-light-magic-runs-python/', 'http://feedproxy.google.com/~r/PythonInsider/~3/x_c9D0S-4C4/python-370b1-is-now-available-for.html', 'http://wiki.python.org/moin/TkInter', 'https://www.python.org/jobs/', 'https://www.python.org/events/python-events/604/'}\n```\n\nSelect an element with a CSS Selector:\n\n```python\n>>> about = r.html.find('#about', first=True)\n```\n\nGrab an element\\'s text contents:\n\n```python\n>>> print(about.text)\nAbout\nApplications\nQuotes\nGetting Started\nHelp\nPython Brochure\n```\n\nIntrospect an Element\\'s attributes:\n\n```python\n>>> about.attrs\n{'id': 'about', 'class': ('tier-1', 'element-1'), 'aria-haspopup': 'true'}\n```\n\nRender out an Element\\'s HTML:\n\n```python\n>>> about.html\n'<li aria-haspopup=\"true\" class=\"tier-1 element-1 \" id=\"about\">\\n<a class=\"\" href=\"/about/\" title=\"\">About</a>\\n<ul aria-hidden=\"true\" class=\"subnav menu\" role=\"menu\">\\n<li class=\"tier-2 element-1\" role=\"treeitem\"><a href=\"/about/apps/\" title=\"\">Applications</a></li>\\n<li class=\"tier-2 element-2\" role=\"treeitem\"><a href=\"/about/quotes/\" title=\"\">Quotes</a></li>\\n<li class=\"tier-2 element-3\" role=\"treeitem\"><a href=\"/about/gettingstarted/\" title=\"\">Getting Started</a></li>\\n<li class=\"tier-2 element-4\" role=\"treeitem\"><a href=\"/about/help/\" title=\"\">Help</a></li>\\n<li class=\"tier-2 element-5\" role=\"treeitem\"><a href=\"http://brochure.getpython.info/\" title=\"\">Python Brochure</a></li>\\n</ul>\\n</li>'\n```\n\nSelect Elements within Elements:\n\n```python\n>>> about.find('a')\n[<Element 'a' href='/about/' title='' class=''>, <Element 'a' href='/about/apps/' title=''>, <Element 'a' href='/about/quotes/' title=''>, <Element 'a' href='/about/gettingstarted/' title=''>, <Element 'a' href='/about/help/' title=''>, <Element 'a' href='http://brochure.getpython.info/' title=''>]\n```\n\nSearch for links within an element:\n\n```python\n>>> about.absolute_links\n{'http://brochure.getpython.info/', 'https://www.python.org/about/gettingstarted/', 'https://www.python.org/about/', 'https://www.python.org/about/quotes/', 'https://www.python.org/about/help/', 'https://www.python.org/about/apps/'}\n```\n\nSearch for text on the page:\n\n```python\n>>> r.html.search('Python is a {} language')[0]\nprogramming\n```\n\nMore complex CSS Selector example (copied from Chrome dev tools):\n\n```python\n>>> r = session.get('https://python.org/')\n>>> sel = '#about > ul > li.tier-2.element-1 > a'\n>>> print(r.html.find(sel, first=True).text)\nApplications\n```\n\nXPath is also supported:\n\n```python\n>>> r.html.xpath('//*[@id=\"about\"]/ul/li[1]/a')\n[<Element 'a' href='/about/apps/' title=''>]\n```\n\n# JavaScript Support\n\nLet\\'s grab some text that\\'s rendered by JavaScript. Until 2020, the\nPython 2.7 countdown clock (<https://pythonclock.org>) will serve as a\ngood test page:\n\n```python\n>>> r = session.get('https://pythonclock.org')\n```\n\nLet\\'s try and see the dynamically rendered code (The countdown clock).\nTo do that quickly at first, we\\'ll search between the last text we see\nbefore it (\\'Python 2.7 will retire in\\...\\') and the first text we see\nafter it (\\'Enable Guido Mode\\').\n\n```python\n>>> r.html.search('Python 2.7 will retire in...{}Enable Guido Mode')[0]\n'</h1>\\n </div>\\n <div class=\"python-27-clock\"></div>\\n <div class=\"center\">\\n <div class=\"guido-button-block\">\\n <button class=\"js-guido-mode guido-button\">'\n```\n\nNotice the clock is missing. The `render()` method takes the response\nand renders the dynamic content just like a web browser would.\n\n```python\n>>> r.html.render()\n>>> r.html.search('Python 2.7 will retire in...{}Enable Guido Mode')[0]\n'</h1>\\n </div>\\n <div class=\"python-27-clock is-countdown\"><span class=\"countdown-row countdown-show6\"><span class=\"countdown-section\"><span class=\"countdown-amount\">1</span><span class=\"countdown-period\">Year</span></span><span class=\"countdown-section\"><span class=\"countdown-amount\">2</span><span class=\"countdown-period\">Months</span></span><span class=\"countdown-section\"><span class=\"countdown-amount\">28</span><span class=\"countdown-period\">Days</span></span><span class=\"countdown-section\"><span class=\"countdown-amount\">16</span><span class=\"countdown-period\">Hours</span></span><span class=\"countdown-section\"><span class=\"countdown-amount\">52</span><span class=\"countdown-period\">Minutes</span></span><span class=\"countdown-section\"><span class=\"countdown-amount\">46</span><span class=\"countdown-period\">Seconds</span></span></span></div>\\n <div class=\"center\">\\n <div class=\"guido-button-block\">\\n <button class=\"js-guido-mode guido-button\">'\n```\n\nLet\\'s clean it up a bit. This step is not needed, it just makes it a\nbit easier to visualize the returned html to see what we need to target\nto extract our required information.\n\n```python\n>>> from pprint import pprint\n>>> pprint(r.html.search('Python 2.7 will retire in...{}Enable')[0])\n('</h1>\\n'\n' </div>\\n'\n' <div class=\"python-27-clock is-countdown\"><span class=\"countdown-row '\n'countdown-show6\"><span class=\"countdown-section\"><span '\n'class=\"countdown-amount\">1</span><span '\n'class=\"countdown-period\">Year</span></span><span '\n'class=\"countdown-section\"><span class=\"countdown-amount\">2</span><span '\n'class=\"countdown-period\">Months</span></span><span '\n'class=\"countdown-section\"><span class=\"countdown-amount\">28</span><span '\n'class=\"countdown-period\">Days</span></span><span '\n'class=\"countdown-section\"><span class=\"countdown-amount\">16</span><span '\n'class=\"countdown-period\">Hours</span></span><span '\n'class=\"countdown-section\"><span class=\"countdown-amount\">52</span><span '\n'class=\"countdown-period\">Minutes</span></span><span '\n'class=\"countdown-section\"><span class=\"countdown-amount\">46</span><span '\n'class=\"countdown-period\">Seconds</span></span></span></div>\\n'\n' <div class=\"center\">\\n'\n' <div class=\"guido-button-block\">\\n'\n' <button class=\"js-guido-mode guido-button\">')\n```\n\nThe rendered html has all the same methods and attributes as above.\nLet\\'s extract just the data that we want out of the clock into\nsomething easy to use elsewhere and introspect like a dictionary.\n\n```python\n>>> periods = [element.text for element in r.html.find('.countdown-period')]\n>>> amounts = [element.text for element in r.html.find('.countdown-amount')]\n>>> countdown_data = dict(zip(periods, amounts))\n>>> countdown_data\n{'Year': '1', 'Months': '2', 'Days': '5', 'Hours': '23', 'Minutes': '34', 'Seconds': '37'}\n```\n\nOr you can do this async also:\n\n```python\n>>> async def get_pyclock():\n... r = await asession.get('https://pythonclock.org/')\n... await r.html.arender()\n... return r\n...\n>>> results = asession.run(get_pyclock, get_pyclock, get_pyclock)\n```\n\nThe rest of the code operates the same way as the synchronous version\nexcept that `results` is a list containing multiple response objects\nhowever the same basic processes can be applied as above to extract the\ndata you want.\n\n# Using without Requests\n\nYou can also use this library without Requests:\n\n```python\n>>> from requests_html_playwright import HTML\n>>> doc = \"\"\"<a href='https://httpbin.org'>\"\"\"\n>>> html = HTML(html=doc)\n>>> html.links\n{'https://httpbin.org'}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Requests-HTML(with microsoft/playwright-python): HTML Parsing for Humans\u2122",
"version": "0.12.3",
"project_urls": {
"Homepage": "https://github.com/Asugawara/requests-html-playwright",
"Repository": "https://github.com/Asugawara/requests-html-playwright"
},
"split_keywords": [
"requests-html",
" playwright"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d4f63fed890cbc043e41622a58207711af822f452bc3d192f8dc5142f2edb10e",
"md5": "d9cd9c49e7df888e021f91ba6478931d",
"sha256": "7470c741db0f9a8284aa80b03e22813f074a4896298260aec5b288527d279954"
},
"downloads": -1,
"filename": "requests_html_playwright-0.12.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d9cd9c49e7df888e021f91ba6478931d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8",
"size": 14416,
"upload_time": "2024-06-15T06:21:33",
"upload_time_iso_8601": "2024-06-15T06:21:33.835059Z",
"url": "https://files.pythonhosted.org/packages/d4/f6/3fed890cbc043e41622a58207711af822f452bc3d192f8dc5142f2edb10e/requests_html_playwright-0.12.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b0fd4a6a4bb4bb2dd563a0b490220656443e80b4657f34e1d4e62074f51765b7",
"md5": "9c963294110da9fb60ddd00943d81a30",
"sha256": "9f17c6670cbdeaaaca66fed236df2099531d7c25ca185ab81481617962e998db"
},
"downloads": -1,
"filename": "requests_html_playwright-0.12.3.tar.gz",
"has_sig": false,
"md5_digest": "9c963294110da9fb60ddd00943d81a30",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8",
"size": 18165,
"upload_time": "2024-06-15T06:21:35",
"upload_time_iso_8601": "2024-06-15T06:21:35.664673Z",
"url": "https://files.pythonhosted.org/packages/b0/fd/4a6a4bb4bb2dd563a0b490220656443e80b4657f34e1d4e62074f51765b7/requests_html_playwright-0.12.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-15 06:21:35",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Asugawara",
"github_project": "requests-html-playwright",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "requests-html-playwright"
}