pixelripper


Namepixelripper JSON
Version 0.0.1 PyPI version JSON
download
home_page
SummaryPackage and CLI for downloading media from a webpage.
upload_time2023-03-22 21:30:41
maintainer
docs_urlNone
author
requires_python>=3.10
license
keywords webscraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Pixelripper
Package and CLI for downloading media from a webpage. <br>
Install with:<br>
<pre>
pip install pixelripper
</pre>

Pixelripper contains a class called PixelRipper and a subclass called PixelRipperSelenium.<br>
PixelRipper uses the requests library to fetch webpages and PixelRipperSelenium uses a selenium based engine to do the same.<br>
The selenium engine is slower and requires more resources, but is useful for webpages
that don't render their media content without a JavaScript engine.<br>
It can use either Firefox or Chrome browsers.<br>
Note: You must have the appropriate webdriver for your machine and browser
version installed in order to use PixelRipperSelenium.<br>
pixelripper can be used programmatically or from the command line.<br>
<br>
### Programmatic usage:
<pre>
from pixelripper import PixelRipper
from pathlib import Path
ripper = PixelRipper()
# Scrape the page for image, video, and audio urls.
ripper.rip("https://somewebsite.com")
# Any content urls found will now be accessible as members of ripper.
print(ripper.image_urls)
print(ripper.video_urls)
print(ripper.audio_urls)
# All the urls found on a page can be accessed through the ripper.scraper member.
all_urls = ripper.scraper.get_links("all")
# The urls can also be filtered according to a list of extensions 
# with the filter_by_extensions function.
# The following will return only .jpg and .mp3 file urls.
urls = ripper.filter_by_extensions([".jpg", ".mp3"])
# The content can then be downloaded.
ripper.download_files(urls, Path.cwd()/"somewebsite")
# Alternatively, everything in ripper.image_urls, ripper.video_urls, and ripper.audio_urls
# can be downloaded with just a call to ripper.download_all()
ripper.download_all(Path.cwd()/"somewebsite")
# Separate subfolders named "images", "videos", and "audio"
# will be created inside the "somewebsite" folder when using this function.

</pre>
### Command line usage:
<pre>
>pixelripper -h
usage: pixelripper [-h] [-s] [-nh] [-b BROWSER] [-o OUTPUT_PATH] [-eh [EXTRA_HEADERS ...]] url

positional arguments:
  url                   The url to scrape for media.

options:
  -h, --help            show this help message and exit
  -s, --selenium        Use selenium to get page content instead of requests.
  -nh, --no_headless    Don't use headless mode when using -s/--selenium.
  -b BROWSER, --browser BROWSER
                        The browser to use when using -s/--selenium. Can be 'firefox' or 'chrome'. You must have the appropriate webdriver installed for your machine and browser version in order to use the selenium engine.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Output directory to save results to. If not specified, a folder with the name of the webpage will be created in the current working directory.
  -eh [EXTRA_HEADERS ...], --extra_headers [EXTRA_HEADERS ...]
                        Extra headers to use when requesting files as key, value pairs. Keys and values whould be colon separated and pairs should be space separated. e.g. -eh Referer:website.com/page Host:website.com
</pre>
            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pixelripper",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "webscraping",
    "author": "",
    "author_email": "Matt Manes <mattmanes@pm.me>",
    "download_url": "https://files.pythonhosted.org/packages/db/32/c14da54fbe9c0dd0b85f2e563478d40457f6c97d52ef12c88b62c8b175f0/pixelripper-0.0.1.tar.gz",
    "platform": null,
    "description": "# Pixelripper\nPackage and CLI for downloading media from a webpage. <br>\nInstall with:<br>\n<pre>\npip install pixelripper\n</pre>\n\nPixelripper contains a class called PixelRipper and a subclass called PixelRipperSelenium.<br>\nPixelRipper uses the requests library to fetch webpages and PixelRipperSelenium uses a selenium based engine to do the same.<br>\nThe selenium engine is slower and requires more resources, but is useful for webpages\nthat don't render their media content without a JavaScript engine.<br>\nIt can use either Firefox or Chrome browsers.<br>\nNote: You must have the appropriate webdriver for your machine and browser\nversion installed in order to use PixelRipperSelenium.<br>\npixelripper can be used programmatically or from the command line.<br>\n<br>\n### Programmatic usage:\n<pre>\nfrom pixelripper import PixelRipper\nfrom pathlib import Path\nripper = PixelRipper()\n# Scrape the page for image, video, and audio urls.\nripper.rip(\"https://somewebsite.com\")\n# Any content urls found will now be accessible as members of ripper.\nprint(ripper.image_urls)\nprint(ripper.video_urls)\nprint(ripper.audio_urls)\n# All the urls found on a page can be accessed through the ripper.scraper member.\nall_urls = ripper.scraper.get_links(\"all\")\n# The urls can also be filtered according to a list of extensions \n# with the filter_by_extensions function.\n# The following will return only .jpg and .mp3 file urls.\nurls = ripper.filter_by_extensions([\".jpg\", \".mp3\"])\n# The content can then be downloaded.\nripper.download_files(urls, Path.cwd()/\"somewebsite\")\n# Alternatively, everything in ripper.image_urls, ripper.video_urls, and ripper.audio_urls\n# can be downloaded with just a call to ripper.download_all()\nripper.download_all(Path.cwd()/\"somewebsite\")\n# Separate subfolders named \"images\", \"videos\", and \"audio\"\n# will be created inside the \"somewebsite\" folder when using this function.\n\n</pre>\n### Command line usage:\n<pre>\n>pixelripper -h\nusage: pixelripper [-h] [-s] [-nh] [-b BROWSER] [-o OUTPUT_PATH] [-eh [EXTRA_HEADERS ...]] url\n\npositional arguments:\n  url                   The url to scrape for media.\n\noptions:\n  -h, --help            show this help message and exit\n  -s, --selenium        Use selenium to get page content instead of requests.\n  -nh, --no_headless    Don't use headless mode when using -s/--selenium.\n  -b BROWSER, --browser BROWSER\n                        The browser to use when using -s/--selenium. Can be 'firefox' or 'chrome'. You must have the appropriate webdriver installed for your machine and browser version in order to use the selenium engine.\n  -o OUTPUT_PATH, --output_path OUTPUT_PATH\n                        Output directory to save results to. If not specified, a folder with the name of the webpage will be created in the current working directory.\n  -eh [EXTRA_HEADERS ...], --extra_headers [EXTRA_HEADERS ...]\n                        Extra headers to use when requesting files as key, value pairs. Keys and values whould be colon separated and pairs should be space separated. e.g. -eh Referer:website.com/page Host:website.com\n</pre>",
    "bugtrack_url": null,
    "license": "",
    "summary": "Package and CLI for downloading media from a webpage.",
    "version": "0.0.1",
    "split_keywords": [
        "webscraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9f203441aaaa8dcee8feab509f4b4c486f8adb39c36b41b00c8ef479bc671250",
                "md5": "ac5af15e3cb6694152c2b0ed0114006e",
                "sha256": "aa7dd6b59fce552b162fc9147c1cb4d72b2897ef620f639d89330f4df8e7e010"
            },
            "downloads": -1,
            "filename": "pixelripper-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ac5af15e3cb6694152c2b0ed0114006e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 7782,
            "upload_time": "2023-03-22T21:30:38",
            "upload_time_iso_8601": "2023-03-22T21:30:38.386956Z",
            "url": "https://files.pythonhosted.org/packages/9f/20/3441aaaa8dcee8feab509f4b4c486f8adb39c36b41b00c8ef479bc671250/pixelripper-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "db32c14da54fbe9c0dd0b85f2e563478d40457f6c97d52ef12c88b62c8b175f0",
                "md5": "0b13cb0ff105214ab2bbd5513827b207",
                "sha256": "d2cce23ca51db8dcdd9c8acb7c25f7a08d128194bf1b3b296f4dd7de56dde687"
            },
            "downloads": -1,
            "filename": "pixelripper-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0b13cb0ff105214ab2bbd5513827b207",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 57356,
            "upload_time": "2023-03-22T21:30:41",
            "upload_time_iso_8601": "2023-03-22T21:30:41.095460Z",
            "url": "https://files.pythonhosted.org/packages/db/32/c14da54fbe9c0dd0b85f2e563478d40457f6c97d52ef12c88b62c8b175f0/pixelripper-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-22 21:30:41",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "pixelripper"
}
        
Elapsed time: 0.15758s