# PageWeaver
![Python](https://img.shields.io/badge/python-v3.9+-blue.svg)
![PyPI](https://img.shields.io/pypi/v/pageweaver.svg)
![License](https://img.shields.io/github/license/KTS-o7/pageweaver.svg)
![ViewCount](https://views.whatilearened.today/views/github/KTS-o7/pageweaver.svg)
This project is a CLI tool designed to crawl web novels from FreeWebNovel and generate a PDF document containing the chapters. The tool uses Python libraries such as `requests`, `BeautifulSoup`, and `pylatex` to fetch, process, and compile the novel content into a well-formatted PDF.
## Features
- Fetches novel chapters from FreeWebNovel.
- Processes and cleans the text to remove non-UTF8 characters.
- Generates a PDF document with a title page, table of contents, and chapters.
- Supports multi-threaded crawling for faster processing.
- Option to allow non-English characters in the novel title and author name.
## Requirements
- Python 3.9+
- `requests`
- `beautifulsoup4`
- `pylatex`
- `argparse`
## Installation
### Via pip
```bash
pip install pageweaver
```
### Via source
```bash
git clone https://github.com/KTS-o7/pageweaver.git
cd pageweaver
pip install -r requirements.txt
python setup.py install
```
## Usage
```bash
pageweaver <novel_url> <start_chapter_number> <end_chapter_number> [--output_dir <output_dir>] [--num-workers <num_workers>] [--allow-non-english]
```
### Arguments
- `novel_url`: The FreeWebNovel URL of the novel to crawl.
- `start_chapter`: The starting chapter number.
- `end_chapter`: The ending chapter number.
- `--output_dir`: (Optional) The destination directory for the generated PDF. Defaults to the current working directory.
- `--num-workers`: (Optional) The number of workers to use for crawling. Defaults to 10.
- `--allow-non-english`: (Optional) Allow non-English characters in the novel title and author name.
### Example Usage
```bash
pageweaver https://freewebnovel.com/global-fog-survival.html 1 15 --num-workers 5
pageweaver https://freewebnovel.com/global-fog-survival.html 1 30 --output_dir /path/to/output --allow-non-english
```
## How It Works
- **WebCrawler**: Fetches the HTML content of the novel chapters and extracts the text.
- **TextProcessor**: Cleans the text by removing non-UTF8 characters and escaping LaTeX special characters.
- **DocumentGenerator**: Uses pylatex to create a PDF document with the novel content.
- **NovelCrawlerService**: Manages the crawling process, coordinates the fetching and processing of chapters, and generates the final PDF.
### Example
To crawl the novel "Global Fog Survival" from chapters 1 to 2 and generate a PDF, run:
```bash
pageweaver https://freewebnovel.com/global-fog-survival.html 1 2 --num-workers 10
```
This will create a PDF document in the current working directory with the title and author extracted from the novel's metadata.
## License
This project is licensed under the MIT License.
## Contributing
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
## Contact
For any questions or support, please open an issue on the GitHub repository.
## Disclaimer
This tool is not intended to promote piracy. It should be used for educational or personal reading purposes only. Please respect the copyrights of the original authors and publishers.
## Authors
- [Krishnatejaswi S](https://github.com/KTS-o7/)
- [Sridhar D Kedlaya](https://github.com/DeathStroke19891)
# Star Graph
![Star History Chart](https://api.star-history.com/svg?repos=KTS-o7/pageweaver&type=Date)
Raw data
{
"_id": null,
"home_page": "https://github.com/KTS-o7/pageweaver",
"name": "pageweaver",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "web novel, crawler, PDF generation, web scraping",
"author": "Krishnatejaswi S, Sridhar D Kedlaya",
"author_email": "shentharkrishnatejaswi@gmail.com, sridhardked@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/d6/3c/c33667fbef514111de4ba05eaf2ddf7f670924e1a1475dceca73ca2fc25e/pageweaver-1.1.1.tar.gz",
"platform": null,
"description": "# PageWeaver\n\n![Python](https://img.shields.io/badge/python-v3.9+-blue.svg)\n![PyPI](https://img.shields.io/pypi/v/pageweaver.svg)\n![License](https://img.shields.io/github/license/KTS-o7/pageweaver.svg)\n![ViewCount](https://views.whatilearened.today/views/github/KTS-o7/pageweaver.svg)\n\nThis project is a CLI tool designed to crawl web novels from FreeWebNovel and generate a PDF document containing the chapters. The tool uses Python libraries such as `requests`, `BeautifulSoup`, and `pylatex` to fetch, process, and compile the novel content into a well-formatted PDF.\n\n## Features\n\n- Fetches novel chapters from FreeWebNovel.\n- Processes and cleans the text to remove non-UTF8 characters.\n- Generates a PDF document with a title page, table of contents, and chapters.\n- Supports multi-threaded crawling for faster processing.\n- Option to allow non-English characters in the novel title and author name.\n\n## Requirements\n\n- Python 3.9+\n- `requests`\n- `beautifulsoup4`\n- `pylatex`\n- `argparse`\n\n## Installation\n\n### Via pip\n\n```bash\npip install pageweaver\n```\n\n### Via source\n\n```bash\ngit clone https://github.com/KTS-o7/pageweaver.git\ncd pageweaver\npip install -r requirements.txt\npython setup.py install\n```\n\n## Usage\n\n```bash\npageweaver <novel_url> <start_chapter_number> <end_chapter_number> [--output_dir <output_dir>] [--num-workers <num_workers>] [--allow-non-english]\n```\n\n### Arguments\n\n- `novel_url`: The FreeWebNovel URL of the novel to crawl.\n- `start_chapter`: The starting chapter number.\n- `end_chapter`: The ending chapter number.\n- `--output_dir`: (Optional) The destination directory for the generated PDF. Defaults to the current working directory.\n- `--num-workers`: (Optional) The number of workers to use for crawling. Defaults to 10.\n- `--allow-non-english`: (Optional) Allow non-English characters in the novel title and author name.\n\n### Example Usage\n\n```bash\npageweaver https://freewebnovel.com/global-fog-survival.html 1 15 --num-workers 5\npageweaver https://freewebnovel.com/global-fog-survival.html 1 30 --output_dir /path/to/output --allow-non-english\n```\n\n## How It Works\n\n- **WebCrawler**: Fetches the HTML content of the novel chapters and extracts the text.\n- **TextProcessor**: Cleans the text by removing non-UTF8 characters and escaping LaTeX special characters.\n- **DocumentGenerator**: Uses pylatex to create a PDF document with the novel content.\n- **NovelCrawlerService**: Manages the crawling process, coordinates the fetching and processing of chapters, and generates the final PDF.\n\n### Example\n\nTo crawl the novel \"Global Fog Survival\" from chapters 1 to 2 and generate a PDF, run:\n\n```bash\npageweaver https://freewebnovel.com/global-fog-survival.html 1 2 --num-workers 10\n```\n\nThis will create a PDF document in the current working directory with the title and author extracted from the novel's metadata.\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Contributing\n\nContributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.\n\n## Contact\n\nFor any questions or support, please open an issue on the GitHub repository.\n\n## Disclaimer\n\nThis tool is not intended to promote piracy. It should be used for educational or personal reading purposes only. Please respect the copyrights of the original authors and publishers.\n\n## Authors\n\n- [Krishnatejaswi S](https://github.com/KTS-o7/)\n- [Sridhar D Kedlaya](https://github.com/DeathStroke19891)\n\n# Star Graph\n\n![Star History Chart](https://api.star-history.com/svg?repos=KTS-o7/pageweaver&type=Date)\n",
"bugtrack_url": null,
"license": null,
"summary": "A web crawler to fetch web novel chapters and generate a PDF.",
"version": "1.1.1",
"project_urls": {
"Homepage": "https://github.com/KTS-o7/pageweaver"
},
"split_keywords": [
"web novel",
" crawler",
" pdf generation",
" web scraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "15b265ddc2a9a454cce9e9d15f8895434cd443d44dd3f48ca3f8d11562e0868f",
"md5": "88d5d59b27ad999ea668ac22b7275f60",
"sha256": "4385d2914c12828ca7c1cd806e36ee1764ee0b392251cde6363dc3fe2293924a"
},
"downloads": -1,
"filename": "pageweaver-1.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "88d5d59b27ad999ea668ac22b7275f60",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 10503,
"upload_time": "2024-11-05T07:59:00",
"upload_time_iso_8601": "2024-11-05T07:59:00.802440Z",
"url": "https://files.pythonhosted.org/packages/15/b2/65ddc2a9a454cce9e9d15f8895434cd443d44dd3f48ca3f8d11562e0868f/pageweaver-1.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d63cc33667fbef514111de4ba05eaf2ddf7f670924e1a1475dceca73ca2fc25e",
"md5": "810171ea932f4e7285519f2cb7959a0f",
"sha256": "4d4ac707a5e89f2efa21d0324bea1a7a82b606281e0631cc0d92737aeca62065"
},
"downloads": -1,
"filename": "pageweaver-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "810171ea932f4e7285519f2cb7959a0f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 10842,
"upload_time": "2024-11-05T07:59:02",
"upload_time_iso_8601": "2024-11-05T07:59:02.243790Z",
"url": "https://files.pythonhosted.org/packages/d6/3c/c33667fbef514111de4ba05eaf2ddf7f670924e1a1475dceca73ca2fc25e/pageweaver-1.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-05 07:59:02",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "KTS-o7",
"github_project": "pageweaver",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "pageweaver"
}