urltotext


Nameurltotext JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/ChinmayShrivastava/url2text
SummaryA light weight library that takes in a url and extracts any readable text in it.
upload_time2024-03-20 04:24:00
maintainerNone
docs_urlNone
authorChinmay Shrivastava
requires_pythonNone
licenseGPLv3
keywords
VCS
bugtrack_url
requirements attrs beautifulsoup4 bs4 certifi charset-normalizer h11 idna langdetect outcome PySocks requests selenium six sniffio sortedcontainers soupsieve trio trio-websocket typing_extensions urllib3 wsproto
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # urltotext
 A light weight library that takes in a url and extracts any readable text in it.

 Accepting any and all PRs!

## Installation

```bash
pip install urltotext
```

## Pre-requisites

1. `urltotext` uses `selenium` with the driver scope currently limited to `chrome` only. Please ensure that chromedriver is properly configured. Use this [link](https://www.swtestacademy.com/install-chrome-driver-on-mac/) for installation instructions.

## Usage

1. Import and initialize ContentFinder

```python
from urltotext import ContentFinder
cf = ContentFinder()
```

2. Scrape a url

```python
# scrape a url
cs.scrape_url(url="your_url_here")

# print the article
cs.print_article(url="your_url_here")

# all urls passed will be stored in the class instance.
# use the flush_data method to free memory
cs.flush_data()
```

Enjoy!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ChinmayShrivastava/url2text",
    "name": "urltotext",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Chinmay Shrivastava",
    "author_email": "cshrivastava99@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a3/04/923ca3bbd26f493555bafe7ee6227d86c962ef4f8ef04c60de02f5cc029f/urltotext-0.3.0.tar.gz",
    "platform": null,
    "description": "# urltotext\n A light weight library that takes in a url and extracts any readable text in it.\n\n Accepting any and all PRs!\n\n## Installation\n\n```bash\npip install urltotext\n```\n\n## Pre-requisites\n\n1. `urltotext` uses `selenium` with the driver scope currently limited to `chrome` only. Please ensure that chromedriver is properly configured. Use this [link](https://www.swtestacademy.com/install-chrome-driver-on-mac/) for installation instructions.\n\n## Usage\n\n1. Import and initialize ContentFinder\n\n```python\nfrom urltotext import ContentFinder\ncf = ContentFinder()\n```\n\n2. Scrape a url\n\n```python\n# scrape a url\ncs.scrape_url(url=\"your_url_here\")\n\n# print the article\ncs.print_article(url=\"your_url_here\")\n\n# all urls passed will be stored in the class instance.\n# use the flush_data method to free memory\ncs.flush_data()\n```\n\nEnjoy!\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "A light weight library that takes in a url and extracts any readable text in it.",
    "version": "0.3.0",
    "project_urls": {
        "Homepage": "https://github.com/ChinmayShrivastava/url2text"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9eb5a9e7a8540124e27c1a8206a2eeecc63933c355c40b34dea536c4a6ae5a1d",
                "md5": "470dcc38a66a23b3514f49c13b9b9952",
                "sha256": "598b47b8e71a4ac07618aec5af09f6916340b174dfb57c5d26247c45fbe9765c"
            },
            "downloads": -1,
            "filename": "urltotext-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "470dcc38a66a23b3514f49c13b9b9952",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 15827,
            "upload_time": "2024-03-20T04:23:58",
            "upload_time_iso_8601": "2024-03-20T04:23:58.232626Z",
            "url": "https://files.pythonhosted.org/packages/9e/b5/a9e7a8540124e27c1a8206a2eeecc63933c355c40b34dea536c4a6ae5a1d/urltotext-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a304923ca3bbd26f493555bafe7ee6227d86c962ef4f8ef04c60de02f5cc029f",
                "md5": "85f3b6eb42a498230abba92fc0fa6ada",
                "sha256": "86a9204af6c38c734a4eb0ee34882477d13f9635eceae6dfc716aff68a638b7c"
            },
            "downloads": -1,
            "filename": "urltotext-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "85f3b6eb42a498230abba92fc0fa6ada",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15628,
            "upload_time": "2024-03-20T04:24:00",
            "upload_time_iso_8601": "2024-03-20T04:24:00.142962Z",
            "url": "https://files.pythonhosted.org/packages/a3/04/923ca3bbd26f493555bafe7ee6227d86c962ef4f8ef04c60de02f5cc029f/urltotext-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-20 04:24:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ChinmayShrivastava",
    "github_project": "url2text",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "attrs",
            "specs": [
                [
                    "==",
                    "23.2.0"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    "==",
                    "4.12.3"
                ]
            ]
        },
        {
            "name": "bs4",
            "specs": [
                [
                    "==",
                    "0.0.2"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2024.2.2"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.3.2"
                ]
            ]
        },
        {
            "name": "h11",
            "specs": [
                [
                    "==",
                    "0.14.0"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.6"
                ]
            ]
        },
        {
            "name": "langdetect",
            "specs": [
                [
                    "==",
                    "1.0.9"
                ]
            ]
        },
        {
            "name": "outcome",
            "specs": [
                [
                    "==",
                    "1.3.0.post0"
                ]
            ]
        },
        {
            "name": "PySocks",
            "specs": [
                [
                    "==",
                    "1.7.1"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "selenium",
            "specs": [
                [
                    "==",
                    "4.18.1"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.16.0"
                ]
            ]
        },
        {
            "name": "sniffio",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "sortedcontainers",
            "specs": [
                [
                    "==",
                    "2.4.0"
                ]
            ]
        },
        {
            "name": "soupsieve",
            "specs": [
                [
                    "==",
                    "2.5"
                ]
            ]
        },
        {
            "name": "trio",
            "specs": [
                [
                    "==",
                    "0.25.0"
                ]
            ]
        },
        {
            "name": "trio-websocket",
            "specs": [
                [
                    "==",
                    "0.11.1"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    "==",
                    "4.10.0"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.2.1"
                ]
            ]
        },
        {
            "name": "wsproto",
            "specs": [
                [
                    "==",
                    "1.2.0"
                ]
            ]
        }
    ],
    "lcname": "urltotext"
}
        
Elapsed time: 0.84918s