Name | pixelripper JSON |
Version |
0.0.1
JSON |
| download |
home_page | |
Summary | Package and CLI for downloading media from a webpage. |
upload_time | 2023-03-22 21:30:41 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.10 |
license | |
keywords |
webscraping
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Pixelripper
Package and CLI for downloading media from a webpage. <br>
Install with:<br>
<pre>
pip install pixelripper
</pre>
Pixelripper contains a class called PixelRipper and a subclass called PixelRipperSelenium.<br>
PixelRipper uses the requests library to fetch webpages and PixelRipperSelenium uses a selenium based engine to do the same.<br>
The selenium engine is slower and requires more resources, but is useful for webpages
that don't render their media content without a JavaScript engine.<br>
It can use either Firefox or Chrome browsers.<br>
Note: You must have the appropriate webdriver for your machine and browser
version installed in order to use PixelRipperSelenium.<br>
pixelripper can be used programmatically or from the command line.<br>
<br>
### Programmatic usage:
<pre>
from pixelripper import PixelRipper
from pathlib import Path
ripper = PixelRipper()
# Scrape the page for image, video, and audio urls.
ripper.rip("https://somewebsite.com")
# Any content urls found will now be accessible as members of ripper.
print(ripper.image_urls)
print(ripper.video_urls)
print(ripper.audio_urls)
# All the urls found on a page can be accessed through the ripper.scraper member.
all_urls = ripper.scraper.get_links("all")
# The urls can also be filtered according to a list of extensions
# with the filter_by_extensions function.
# The following will return only .jpg and .mp3 file urls.
urls = ripper.filter_by_extensions([".jpg", ".mp3"])
# The content can then be downloaded.
ripper.download_files(urls, Path.cwd()/"somewebsite")
# Alternatively, everything in ripper.image_urls, ripper.video_urls, and ripper.audio_urls
# can be downloaded with just a call to ripper.download_all()
ripper.download_all(Path.cwd()/"somewebsite")
# Separate subfolders named "images", "videos", and "audio"
# will be created inside the "somewebsite" folder when using this function.
</pre>
### Command line usage:
<pre>
>pixelripper -h
usage: pixelripper [-h] [-s] [-nh] [-b BROWSER] [-o OUTPUT_PATH] [-eh [EXTRA_HEADERS ...]] url
positional arguments:
url The url to scrape for media.
options:
-h, --help show this help message and exit
-s, --selenium Use selenium to get page content instead of requests.
-nh, --no_headless Don't use headless mode when using -s/--selenium.
-b BROWSER, --browser BROWSER
The browser to use when using -s/--selenium. Can be 'firefox' or 'chrome'. You must have the appropriate webdriver installed for your machine and browser version in order to use the selenium engine.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Output directory to save results to. If not specified, a folder with the name of the webpage will be created in the current working directory.
-eh [EXTRA_HEADERS ...], --extra_headers [EXTRA_HEADERS ...]
Extra headers to use when requesting files as key, value pairs. Keys and values whould be colon separated and pairs should be space separated. e.g. -eh Referer:website.com/page Host:website.com
</pre>
Raw data
{
"_id": null,
"home_page": "",
"name": "pixelripper",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "",
"keywords": "webscraping",
"author": "",
"author_email": "Matt Manes <mattmanes@pm.me>",
"download_url": "https://files.pythonhosted.org/packages/db/32/c14da54fbe9c0dd0b85f2e563478d40457f6c97d52ef12c88b62c8b175f0/pixelripper-0.0.1.tar.gz",
"platform": null,
"description": "# Pixelripper\nPackage and CLI for downloading media from a webpage. <br>\nInstall with:<br>\n<pre>\npip install pixelripper\n</pre>\n\nPixelripper contains a class called PixelRipper and a subclass called PixelRipperSelenium.<br>\nPixelRipper uses the requests library to fetch webpages and PixelRipperSelenium uses a selenium based engine to do the same.<br>\nThe selenium engine is slower and requires more resources, but is useful for webpages\nthat don't render their media content without a JavaScript engine.<br>\nIt can use either Firefox or Chrome browsers.<br>\nNote: You must have the appropriate webdriver for your machine and browser\nversion installed in order to use PixelRipperSelenium.<br>\npixelripper can be used programmatically or from the command line.<br>\n<br>\n### Programmatic usage:\n<pre>\nfrom pixelripper import PixelRipper\nfrom pathlib import Path\nripper = PixelRipper()\n# Scrape the page for image, video, and audio urls.\nripper.rip(\"https://somewebsite.com\")\n# Any content urls found will now be accessible as members of ripper.\nprint(ripper.image_urls)\nprint(ripper.video_urls)\nprint(ripper.audio_urls)\n# All the urls found on a page can be accessed through the ripper.scraper member.\nall_urls = ripper.scraper.get_links(\"all\")\n# The urls can also be filtered according to a list of extensions \n# with the filter_by_extensions function.\n# The following will return only .jpg and .mp3 file urls.\nurls = ripper.filter_by_extensions([\".jpg\", \".mp3\"])\n# The content can then be downloaded.\nripper.download_files(urls, Path.cwd()/\"somewebsite\")\n# Alternatively, everything in ripper.image_urls, ripper.video_urls, and ripper.audio_urls\n# can be downloaded with just a call to ripper.download_all()\nripper.download_all(Path.cwd()/\"somewebsite\")\n# Separate subfolders named \"images\", \"videos\", and \"audio\"\n# will be created inside the \"somewebsite\" folder when using this function.\n\n</pre>\n### Command line usage:\n<pre>\n>pixelripper -h\nusage: pixelripper [-h] [-s] [-nh] [-b BROWSER] [-o OUTPUT_PATH] [-eh [EXTRA_HEADERS ...]] url\n\npositional arguments:\n url The url to scrape for media.\n\noptions:\n -h, --help show this help message and exit\n -s, --selenium Use selenium to get page content instead of requests.\n -nh, --no_headless Don't use headless mode when using -s/--selenium.\n -b BROWSER, --browser BROWSER\n The browser to use when using -s/--selenium. Can be 'firefox' or 'chrome'. You must have the appropriate webdriver installed for your machine and browser version in order to use the selenium engine.\n -o OUTPUT_PATH, --output_path OUTPUT_PATH\n Output directory to save results to. If not specified, a folder with the name of the webpage will be created in the current working directory.\n -eh [EXTRA_HEADERS ...], --extra_headers [EXTRA_HEADERS ...]\n Extra headers to use when requesting files as key, value pairs. Keys and values whould be colon separated and pairs should be space separated. e.g. -eh Referer:website.com/page Host:website.com\n</pre>",
"bugtrack_url": null,
"license": "",
"summary": "Package and CLI for downloading media from a webpage.",
"version": "0.0.1",
"split_keywords": [
"webscraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9f203441aaaa8dcee8feab509f4b4c486f8adb39c36b41b00c8ef479bc671250",
"md5": "ac5af15e3cb6694152c2b0ed0114006e",
"sha256": "aa7dd6b59fce552b162fc9147c1cb4d72b2897ef620f639d89330f4df8e7e010"
},
"downloads": -1,
"filename": "pixelripper-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ac5af15e3cb6694152c2b0ed0114006e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 7782,
"upload_time": "2023-03-22T21:30:38",
"upload_time_iso_8601": "2023-03-22T21:30:38.386956Z",
"url": "https://files.pythonhosted.org/packages/9f/20/3441aaaa8dcee8feab509f4b4c486f8adb39c36b41b00c8ef479bc671250/pixelripper-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "db32c14da54fbe9c0dd0b85f2e563478d40457f6c97d52ef12c88b62c8b175f0",
"md5": "0b13cb0ff105214ab2bbd5513827b207",
"sha256": "d2cce23ca51db8dcdd9c8acb7c25f7a08d128194bf1b3b296f4dd7de56dde687"
},
"downloads": -1,
"filename": "pixelripper-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "0b13cb0ff105214ab2bbd5513827b207",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 57356,
"upload_time": "2023-03-22T21:30:41",
"upload_time_iso_8601": "2023-03-22T21:30:41.095460Z",
"url": "https://files.pythonhosted.org/packages/db/32/c14da54fbe9c0dd0b85f2e563478d40457f6c97d52ef12c88b62c8b175f0/pixelripper-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-22 21:30:41",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "pixelripper"
}