# atai-web-tool
`atai-web-tool` is a command-line utility that extracts the main content from a webpage. It leverages [zendriver](https://pypi.org/project/zendriver/), [readability-lxml](https://pypi.org/project/readability-lxml/), and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) to fetch pages, extract primary content, and display a clean, text-only version.
## Features
- **Headless Browsing:** Fetch webpages using zendriver.
- **Content Extraction:** Extract main content with readability-lxml.
- **Clean Output:** Remove unwanted HTML tags using BeautifulSoup.
- **Easy CLI:** Run from the terminal with a single command.
## Installation
You can install `atai-web-tool` via pip:
```bash
pip install atai-web-tool
```
If you prefer to install from source, clone the repository and run:
```bash
pip install .
```
## Usage
Extract the main content from a webpage by running:
```bash
atai-web-tool https://example.com
```
This command will open the specified URL, extract the primary content, and print it to the terminal.
## Requirements
- Python 3.6 or higher
- [zendriver](https://pypi.org/project/zendriver/)
- [readability-lxml](https://pypi.org/project/readability-lxml/)
- [BeautifulSoup4](https://pypi.org/project/beautifulsoup4/)
- [lxml[html_clean]](https://pypi.org/project/lxml/)
## Development
For local development, install the required dependencies using:
```bash
pip install -r requirements.txt
```
## Contributing
Contributions are welcome! Please fork the repository and submit a pull request with your improvements. For major changes, open an issue first to discuss what you would like to change.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/AtomGradient/atai-web-tool",
"name": "atai-web-tool",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": null,
"author": "AtomGradient",
"author_email": "AtomGradient <alex@atomgradient.com>",
"download_url": "https://files.pythonhosted.org/packages/cd/e8/135fb2306d2127fc0bfb2da5b978b65df65f7344bafa241d34f9a3300d0d/atai_web_tool-0.0.6.tar.gz",
"platform": null,
"description": "# atai-web-tool\n\n`atai-web-tool` is a command-line utility that extracts the main content from a webpage. It leverages [zendriver](https://pypi.org/project/zendriver/), [readability-lxml](https://pypi.org/project/readability-lxml/), and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) to fetch pages, extract primary content, and display a clean, text-only version.\n\n## Features\n\n- **Headless Browsing:** Fetch webpages using zendriver.\n- **Content Extraction:** Extract main content with readability-lxml.\n- **Clean Output:** Remove unwanted HTML tags using BeautifulSoup.\n- **Easy CLI:** Run from the terminal with a single command.\n\n## Installation\n\nYou can install `atai-web-tool` via pip:\n\n```bash\npip install atai-web-tool\n```\n\nIf you prefer to install from source, clone the repository and run:\n\n```bash\npip install .\n```\n\n## Usage\n\nExtract the main content from a webpage by running:\n\n```bash\natai-web-tool https://example.com\n```\n\nThis command will open the specified URL, extract the primary content, and print it to the terminal.\n\n## Requirements\n\n- Python 3.6 or higher\n- [zendriver](https://pypi.org/project/zendriver/)\n- [readability-lxml](https://pypi.org/project/readability-lxml/)\n- [BeautifulSoup4](https://pypi.org/project/beautifulsoup4/)\n- [lxml[html_clean]](https://pypi.org/project/lxml/)\n\n## Development\n\nFor local development, install the required dependencies using:\n\n```bash\npip install -r requirements.txt\n```\n\n## Contributing\n\nContributions are welcome! Please fork the repository and submit a pull request with your improvements. For major changes, open an issue first to discuss what you would like to change.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Extract the main content from a webpage using Playwright, readability-lxml, and BeautifulSoup.",
"version": "0.0.6",
"project_urls": {
"Homepage": "https://github.com/AtomGradient/atai-web-tool"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "25bbc4c0eea4ef81ffd7398f6799926144043b54d2a579fecd82ae790d91f5db",
"md5": "9cabedf13f7324c7444423dca8b0905d",
"sha256": "afac656131047472d64460ddbc9644596f52ddeff0280b4af822a1f66f2ce50a"
},
"downloads": -1,
"filename": "atai_web_tool-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9cabedf13f7324c7444423dca8b0905d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 5468,
"upload_time": "2025-02-26T05:35:55",
"upload_time_iso_8601": "2025-02-26T05:35:55.085768Z",
"url": "https://files.pythonhosted.org/packages/25/bb/c4c0eea4ef81ffd7398f6799926144043b54d2a579fecd82ae790d91f5db/atai_web_tool-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cde8135fb2306d2127fc0bfb2da5b978b65df65f7344bafa241d34f9a3300d0d",
"md5": "61462b1ea24fb1045bfc3a3f0e68e0f2",
"sha256": "7a038b89ef9ed75a6d32a2307edafa446717273d92126fd7250c7d00a481eb42"
},
"downloads": -1,
"filename": "atai_web_tool-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "61462b1ea24fb1045bfc3a3f0e68e0f2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 4265,
"upload_time": "2025-02-26T05:35:56",
"upload_time_iso_8601": "2025-02-26T05:35:56.692663Z",
"url": "https://files.pythonhosted.org/packages/cd/e8/135fb2306d2127fc0bfb2da5b978b65df65f7344bafa241d34f9a3300d0d/atai_web_tool-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-26 05:35:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "AtomGradient",
"github_project": "atai-web-tool",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "atai-web-tool"
}