pageweaver


Namepageweaver JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/KTS-o7/pageweaver
SummaryA web crawler to fetch web novel chapters and generate a PDF.
upload_time2024-09-28 15:28:12
maintainerNone
docs_urlNone
authorKrishnatejaswi S, Sridhar D Kedlaya
requires_python>=3.9
licenseNone
keywords web novel crawler pdf generation web scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PageWeaver

![Python](https://img.shields.io/badge/python-v3.9+-blue.svg)
![PyPI](https://img.shields.io/pypi/v/pageweaver.svg)
![License](https://img.shields.io/github/license/KTS-o7/pageweaver.svg)
![ViewCount](https://views.whatilearened.today/views/github/KTS-o7/pageweaver.svg)

This project is a CLI tool designed to crawl web novels from FreeWebNovel and generate a PDF document containing the chapters. The tool uses Python libraries such as `requests`, `BeautifulSoup`, and `pylatex` to fetch, process, and compile the novel content into a well-formatted PDF.

## Features

- Fetches novel chapters from FreeWebNovel.
- Processes and cleans the text to remove non-UTF8 characters.
- Generates a PDF document with a title page, table of contents, and chapters.
- Supports multi-threaded crawling for faster processing.
- Option to allow non-English characters in the novel title and author name.

## Requirements

- Python 3.9+
- `requests`
- `beautifulsoup4`
- `pylatex`
- `argparse`

## Installation

### Via pip

```bash
pip install pageweaver
```

### Via source

```bash
git clone https://github.com/KTS-o7/pageweaver.git
cd pageweaver
pip install -r requirements.txt
python setup.py install
```

## Usage

```bash
pageweaver <novel_url> <start_chapter_number> <end_chapter_number> [--output_dir <output_dir>] [--num-workers <num_workers>] [--allow-non-english]
```

### Arguments

- `novel_url`: The FreeWebNovel URL of the novel to crawl.
- `start_chapter`: The starting chapter number.
- `end_chapter`: The ending chapter number.
- `--output_dir`: (Optional) The destination directory for the generated PDF. Defaults to the current working directory.
- `--num-workers`: (Optional) The number of workers to use for crawling. Defaults to 10.
- `--allow-non-english`: (Optional) Allow non-English characters in the novel title and author name.

### Example Usage

```bash
pageweaver https://freewebnovel.com/global-fog-survival.html 1 15 --num-workers 5
pageweaver https://freewebnovel.com/global-fog-survival.html 1 30 --output_dir /path/to/output --allow-non-english
```

## How It Works

- **WebCrawler**: Fetches the HTML content of the novel chapters and extracts the text.
- **TextProcessor**: Cleans the text by removing non-UTF8 characters and escaping LaTeX special characters.
- **DocumentGenerator**: Uses pylatex to create a PDF document with the novel content.
- **NovelCrawlerService**: Manages the crawling process, coordinates the fetching and processing of chapters, and generates the final PDF.

### Example

To crawl the novel "Global Fog Survival" from chapters 1 to 2 and generate a PDF, run:

```bash
pageweaver https://freewebnovel.com/global-fog-survival.html 1 2 --num-workers 10
```

This will create a PDF document in the current working directory with the title and author extracted from the novel's metadata.

## License

This project is licensed under the MIT License.

## Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

## Contact

For any questions or support, please open an issue on the GitHub repository.

## Disclaimer

This tool is not intended to promote piracy. It should be used for educational or personal reading purposes only. Please respect the copyrights of the original authors and publishers.

## Authors

- [Krishnatejaswi S](https://github.com/KTS-o7/)
- [Sridhar D Kedlaya](https://github.com/DeathStroke19891)

# Star Graph

![Star History Chart](https://api.star-history.com/svg?repos=KTS-o7/pageweaver&type=Date)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/KTS-o7/pageweaver",
    "name": "pageweaver",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "web novel, crawler, PDF generation, web scraping",
    "author": "Krishnatejaswi S, Sridhar D Kedlaya",
    "author_email": "shentharkrishnatejaswi@gmail.com, sridhardked@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/74/6c/ed2024305ecf2a1d2477136ac9dbdb571448ef31a6445e62fa0511da0ef6/pageweaver-1.0.0.tar.gz",
    "platform": null,
    "description": "# PageWeaver\n\n![Python](https://img.shields.io/badge/python-v3.9+-blue.svg)\n![PyPI](https://img.shields.io/pypi/v/pageweaver.svg)\n![License](https://img.shields.io/github/license/KTS-o7/pageweaver.svg)\n![ViewCount](https://views.whatilearened.today/views/github/KTS-o7/pageweaver.svg)\n\nThis project is a CLI tool designed to crawl web novels from FreeWebNovel and generate a PDF document containing the chapters. The tool uses Python libraries such as `requests`, `BeautifulSoup`, and `pylatex` to fetch, process, and compile the novel content into a well-formatted PDF.\n\n## Features\n\n- Fetches novel chapters from FreeWebNovel.\n- Processes and cleans the text to remove non-UTF8 characters.\n- Generates a PDF document with a title page, table of contents, and chapters.\n- Supports multi-threaded crawling for faster processing.\n- Option to allow non-English characters in the novel title and author name.\n\n## Requirements\n\n- Python 3.9+\n- `requests`\n- `beautifulsoup4`\n- `pylatex`\n- `argparse`\n\n## Installation\n\n### Via pip\n\n```bash\npip install pageweaver\n```\n\n### Via source\n\n```bash\ngit clone https://github.com/KTS-o7/pageweaver.git\ncd pageweaver\npip install -r requirements.txt\npython setup.py install\n```\n\n## Usage\n\n```bash\npageweaver <novel_url> <start_chapter_number> <end_chapter_number> [--output_dir <output_dir>] [--num-workers <num_workers>] [--allow-non-english]\n```\n\n### Arguments\n\n- `novel_url`: The FreeWebNovel URL of the novel to crawl.\n- `start_chapter`: The starting chapter number.\n- `end_chapter`: The ending chapter number.\n- `--output_dir`: (Optional) The destination directory for the generated PDF. Defaults to the current working directory.\n- `--num-workers`: (Optional) The number of workers to use for crawling. Defaults to 10.\n- `--allow-non-english`: (Optional) Allow non-English characters in the novel title and author name.\n\n### Example Usage\n\n```bash\npageweaver https://freewebnovel.com/global-fog-survival.html 1 15 --num-workers 5\npageweaver https://freewebnovel.com/global-fog-survival.html 1 30 --output_dir /path/to/output --allow-non-english\n```\n\n## How It Works\n\n- **WebCrawler**: Fetches the HTML content of the novel chapters and extracts the text.\n- **TextProcessor**: Cleans the text by removing non-UTF8 characters and escaping LaTeX special characters.\n- **DocumentGenerator**: Uses pylatex to create a PDF document with the novel content.\n- **NovelCrawlerService**: Manages the crawling process, coordinates the fetching and processing of chapters, and generates the final PDF.\n\n### Example\n\nTo crawl the novel \"Global Fog Survival\" from chapters 1 to 2 and generate a PDF, run:\n\n```bash\npageweaver https://freewebnovel.com/global-fog-survival.html 1 2 --num-workers 10\n```\n\nThis will create a PDF document in the current working directory with the title and author extracted from the novel's metadata.\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Contributing\n\nContributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.\n\n## Contact\n\nFor any questions or support, please open an issue on the GitHub repository.\n\n## Disclaimer\n\nThis tool is not intended to promote piracy. It should be used for educational or personal reading purposes only. Please respect the copyrights of the original authors and publishers.\n\n## Authors\n\n- [Krishnatejaswi S](https://github.com/KTS-o7/)\n- [Sridhar D Kedlaya](https://github.com/DeathStroke19891)\n\n# Star Graph\n\n![Star History Chart](https://api.star-history.com/svg?repos=KTS-o7/pageweaver&type=Date)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A web crawler to fetch web novel chapters and generate a PDF.",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/KTS-o7/pageweaver"
    },
    "split_keywords": [
        "web novel",
        " crawler",
        " pdf generation",
        " web scraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "66922843b700fcddccd4c3f72efa114b90dda8c5bc52a5bd02671c38562e2789",
                "md5": "1cb1c4dab767cd2fd208062b0e676f61",
                "sha256": "3f8e8c5a3cbfeabc6050d917a41b04eff08a84ce142c377aff2de63480f54859"
            },
            "downloads": -1,
            "filename": "pageweaver-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1cb1c4dab767cd2fd208062b0e676f61",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 8811,
            "upload_time": "2024-09-28T15:28:11",
            "upload_time_iso_8601": "2024-09-28T15:28:11.404564Z",
            "url": "https://files.pythonhosted.org/packages/66/92/2843b700fcddccd4c3f72efa114b90dda8c5bc52a5bd02671c38562e2789/pageweaver-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "746ced2024305ecf2a1d2477136ac9dbdb571448ef31a6445e62fa0511da0ef6",
                "md5": "056cb6b347acdceee5238baf37fa986d",
                "sha256": "9f8b26350193ad6ba5231765e06d4f5082a5d766c7f01ed6025b3f81ccfbb5ca"
            },
            "downloads": -1,
            "filename": "pageweaver-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "056cb6b347acdceee5238baf37fa986d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 7991,
            "upload_time": "2024-09-28T15:28:12",
            "upload_time_iso_8601": "2024-09-28T15:28:12.625147Z",
            "url": "https://files.pythonhosted.org/packages/74/6c/ed2024305ecf2a1d2477136ac9dbdb571448ef31a6445e62fa0511da0ef6/pageweaver-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-28 15:28:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "KTS-o7",
    "github_project": "pageweaver",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "pageweaver"
}
        
Elapsed time: 1.79704s