atai-web-tool


Nameatai-web-tool JSON
Version 0.0.6 PyPI version JSON
download
home_pagehttps://github.com/AtomGradient/atai-web-tool
SummaryExtract the main content from a webpage using Playwright, readability-lxml, and BeautifulSoup.
upload_time2025-02-26 05:35:56
maintainerNone
docs_urlNone
authorAtomGradient
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # atai-web-tool

`atai-web-tool` is a command-line utility that extracts the main content from a webpage. It leverages [zendriver](https://pypi.org/project/zendriver/), [readability-lxml](https://pypi.org/project/readability-lxml/), and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) to fetch pages, extract primary content, and display a clean, text-only version.

## Features

- **Headless Browsing:** Fetch webpages using zendriver.
- **Content Extraction:** Extract main content with readability-lxml.
- **Clean Output:** Remove unwanted HTML tags using BeautifulSoup.
- **Easy CLI:** Run from the terminal with a single command.

## Installation

You can install `atai-web-tool` via pip:

```bash
pip install atai-web-tool
```

If you prefer to install from source, clone the repository and run:

```bash
pip install .
```

## Usage

Extract the main content from a webpage by running:

```bash
atai-web-tool https://example.com
```

This command will open the specified URL, extract the primary content, and print it to the terminal.

## Requirements

- Python 3.6 or higher
- [zendriver](https://pypi.org/project/zendriver/)
- [readability-lxml](https://pypi.org/project/readability-lxml/)
- [BeautifulSoup4](https://pypi.org/project/beautifulsoup4/)
- [lxml[html_clean]](https://pypi.org/project/lxml/)

## Development

For local development, install the required dependencies using:

```bash
pip install -r requirements.txt
```

## Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements. For major changes, open an issue first to discuss what you would like to change.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AtomGradient/atai-web-tool",
    "name": "atai-web-tool",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "AtomGradient",
    "author_email": "AtomGradient <alex@atomgradient.com>",
    "download_url": "https://files.pythonhosted.org/packages/cd/e8/135fb2306d2127fc0bfb2da5b978b65df65f7344bafa241d34f9a3300d0d/atai_web_tool-0.0.6.tar.gz",
    "platform": null,
    "description": "# atai-web-tool\n\n`atai-web-tool` is a command-line utility that extracts the main content from a webpage. It leverages [zendriver](https://pypi.org/project/zendriver/), [readability-lxml](https://pypi.org/project/readability-lxml/), and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) to fetch pages, extract primary content, and display a clean, text-only version.\n\n## Features\n\n- **Headless Browsing:** Fetch webpages using zendriver.\n- **Content Extraction:** Extract main content with readability-lxml.\n- **Clean Output:** Remove unwanted HTML tags using BeautifulSoup.\n- **Easy CLI:** Run from the terminal with a single command.\n\n## Installation\n\nYou can install `atai-web-tool` via pip:\n\n```bash\npip install atai-web-tool\n```\n\nIf you prefer to install from source, clone the repository and run:\n\n```bash\npip install .\n```\n\n## Usage\n\nExtract the main content from a webpage by running:\n\n```bash\natai-web-tool https://example.com\n```\n\nThis command will open the specified URL, extract the primary content, and print it to the terminal.\n\n## Requirements\n\n- Python 3.6 or higher\n- [zendriver](https://pypi.org/project/zendriver/)\n- [readability-lxml](https://pypi.org/project/readability-lxml/)\n- [BeautifulSoup4](https://pypi.org/project/beautifulsoup4/)\n- [lxml[html_clean]](https://pypi.org/project/lxml/)\n\n## Development\n\nFor local development, install the required dependencies using:\n\n```bash\npip install -r requirements.txt\n```\n\n## Contributing\n\nContributions are welcome! Please fork the repository and submit a pull request with your improvements. For major changes, open an issue first to discuss what you would like to change.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Extract the main content from a webpage using Playwright, readability-lxml, and BeautifulSoup.",
    "version": "0.0.6",
    "project_urls": {
        "Homepage": "https://github.com/AtomGradient/atai-web-tool"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "25bbc4c0eea4ef81ffd7398f6799926144043b54d2a579fecd82ae790d91f5db",
                "md5": "9cabedf13f7324c7444423dca8b0905d",
                "sha256": "afac656131047472d64460ddbc9644596f52ddeff0280b4af822a1f66f2ce50a"
            },
            "downloads": -1,
            "filename": "atai_web_tool-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9cabedf13f7324c7444423dca8b0905d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5468,
            "upload_time": "2025-02-26T05:35:55",
            "upload_time_iso_8601": "2025-02-26T05:35:55.085768Z",
            "url": "https://files.pythonhosted.org/packages/25/bb/c4c0eea4ef81ffd7398f6799926144043b54d2a579fecd82ae790d91f5db/atai_web_tool-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cde8135fb2306d2127fc0bfb2da5b978b65df65f7344bafa241d34f9a3300d0d",
                "md5": "61462b1ea24fb1045bfc3a3f0e68e0f2",
                "sha256": "7a038b89ef9ed75a6d32a2307edafa446717273d92126fd7250c7d00a481eb42"
            },
            "downloads": -1,
            "filename": "atai_web_tool-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "61462b1ea24fb1045bfc3a3f0e68e0f2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 4265,
            "upload_time": "2025-02-26T05:35:56",
            "upload_time_iso_8601": "2025-02-26T05:35:56.692663Z",
            "url": "https://files.pythonhosted.org/packages/cd/e8/135fb2306d2127fc0bfb2da5b978b65df65f7344bafa241d34f9a3300d0d/atai_web_tool-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-26 05:35:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AtomGradient",
    "github_project": "atai-web-tool",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "atai-web-tool"
}
        
Elapsed time: 1.37176s