lit-extractor


Namelit-extractor JSON
Version 0.2.6 PyPI version JSON
download
home_pagehttps://github.com/username/my_project
SummaryA mini script to read a list of url and extract all the url's present in the webpage
upload_time2024-11-09 16:06:39
maintainerNone
docs_urlNone
authorMunish chandra jha
requires_python>=3.6
licenseMIT
keywords extractor lit-extractor
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Literotica URL Extractor CLI Tool

This tool allows users to extract and download URLs from fanfiction sites using `FanFicFare`. It supports processing URLs from a file, saving extracted URLs to a new file, and downloading all stories from those URLs. The tool is designed for fanfiction enthusiasts and researchers who want a streamlined way to manage and download content from various fanfiction sources.

## Features

- **Extract URLs**: Reads a file with URLs, processes each to list available stories on that page, and saves all URLs to a specified output file.
- **Download Stories**: Allows downloading stories listed in a file with extracted URLs.
- **Progress Tracking**: Uses `tqdm` to display download progress.

## Requirements

- Python 3.6 or later
- [Click](https://pypi.org/project/click/) - For creating the CLI
- [tqdm](https://pypi.org/project/tqdm/) - For progress bars
- [FanFicFare](https://pypi.org/project/FanFicFare/) - The core library for extracting and downloading fanfiction
- [Pygments](https://pypi.org/project/Pygments/) - Syntax highlighting (optional)

To install dependencies, run:
```bash
pip install -r requirements.txt
Installation
Clone the repository and navigate to the folder:

bash
Copy code
git clone https://github.com/username/literotica-url-extractor.git
cd literotica-url-extractor
Install the tool:

bash
Copy code
pip install .
Usage
The tool provides two primary commands: url and download.

1. Extract URLs from File (url)
Extracts all URLs from a text file, processes them using FanFicFare, and saves the extracted list to a specified output file. This command can also download the stories from the URLs if the --d flag is used.

Usage:

bash
Copy code
literotica url <path_to_file> [OPTIONS]
Arguments:

<path_to_file>: Path to the text file containing a list of URLs.
Options:

--o: Output file name to save the extracted URLs. If omitted, defaults to extracted_list.txt.
--d: If set to True, downloads all the stories from the extracted URLs.
Example:

bash
Copy code
literotica url urls.txt --o extracted_urls.txt --d True
In this example:

urls.txt is the input file containing URLs to be processed.
The extracted URLs are saved in extracted_urls.txt.
The --d True option initiates the download of all listed stories.
2. Download Stories from File (download)
Downloads all stories listed in the provided file.

Usage:

bash
Copy code
literotica download <file>
Arguments:

<file>: Path to the file containing URLs of the stories to download.
Example:

bash
Copy code
literotica download extracted_urls.txt
In this example:

extracted_urls.txt is the file containing URLs to download.
Example Workflow
Extract URLs and Save to a File

bash
Copy code
literotica url urls.txt --o extracted_urls.txt
This command extracts URLs from urls.txt and saves them in extracted_urls.txt.

Download Stories from Extracted URLs

bash
Copy code
literotica download extracted_urls.txt
Downloads all stories from the URLs listed in extracted_urls.txt.

Code Structure
extract_urls_from_file: Reads and cleans URL list from a file.
fff_url_extractor: Calls FanFicFare to list all stories for a given URL.
download_url_from_file: Initiates story download from URLs in a file with progress tracking using tqdm.
prettify_url: Formats URLs for better readability.
save_to_file: Saves processed URLs to a file, appending to an existing file or creating a new one.
License
This project is licensed under the MIT License. See the LICENSE file for details.

Contributing
Contributions are welcome! Please fork the repository and submit a pull request.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/username/my_project",
    "name": "lit-extractor",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "extractor, lit-extractor",
    "author": "Munish chandra jha",
    "author_email": "mcj130101@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/45/e7/844f3e619b00c5cd438c25d0afe3eb6dfdda4b9226dbd66b79ce99d4deeb/lit_extractor-0.2.6.tar.gz",
    "platform": null,
    "description": "# Literotica URL Extractor CLI Tool\r\n\r\nThis tool allows users to extract and download URLs from fanfiction sites using `FanFicFare`. It supports processing URLs from a file, saving extracted URLs to a new file, and downloading all stories from those URLs. The tool is designed for fanfiction enthusiasts and researchers who want a streamlined way to manage and download content from various fanfiction sources.\r\n\r\n## Features\r\n\r\n- **Extract URLs**: Reads a file with URLs, processes each to list available stories on that page, and saves all URLs to a specified output file.\r\n- **Download Stories**: Allows downloading stories listed in a file with extracted URLs.\r\n- **Progress Tracking**: Uses `tqdm` to display download progress.\r\n\r\n## Requirements\r\n\r\n- Python 3.6 or later\r\n- [Click](https://pypi.org/project/click/) - For creating the CLI\r\n- [tqdm](https://pypi.org/project/tqdm/) - For progress bars\r\n- [FanFicFare](https://pypi.org/project/FanFicFare/) - The core library for extracting and downloading fanfiction\r\n- [Pygments](https://pypi.org/project/Pygments/) - Syntax highlighting (optional)\r\n\r\nTo install dependencies, run:\r\n```bash\r\npip install -r requirements.txt\r\nInstallation\r\nClone the repository and navigate to the folder:\r\n\r\nbash\r\nCopy code\r\ngit clone https://github.com/username/literotica-url-extractor.git\r\ncd literotica-url-extractor\r\nInstall the tool:\r\n\r\nbash\r\nCopy code\r\npip install .\r\nUsage\r\nThe tool provides two primary commands: url and download.\r\n\r\n1. Extract URLs from File (url)\r\nExtracts all URLs from a text file, processes them using FanFicFare, and saves the extracted list to a specified output file. This command can also download the stories from the URLs if the --d flag is used.\r\n\r\nUsage:\r\n\r\nbash\r\nCopy code\r\nliterotica url <path_to_file> [OPTIONS]\r\nArguments:\r\n\r\n<path_to_file>: Path to the text file containing a list of URLs.\r\nOptions:\r\n\r\n--o: Output file name to save the extracted URLs. If omitted, defaults to extracted_list.txt.\r\n--d: If set to True, downloads all the stories from the extracted URLs.\r\nExample:\r\n\r\nbash\r\nCopy code\r\nliterotica url urls.txt --o extracted_urls.txt --d True\r\nIn this example:\r\n\r\nurls.txt is the input file containing URLs to be processed.\r\nThe extracted URLs are saved in extracted_urls.txt.\r\nThe --d True option initiates the download of all listed stories.\r\n2. Download Stories from File (download)\r\nDownloads all stories listed in the provided file.\r\n\r\nUsage:\r\n\r\nbash\r\nCopy code\r\nliterotica download <file>\r\nArguments:\r\n\r\n<file>: Path to the file containing URLs of the stories to download.\r\nExample:\r\n\r\nbash\r\nCopy code\r\nliterotica download extracted_urls.txt\r\nIn this example:\r\n\r\nextracted_urls.txt is the file containing URLs to download.\r\nExample Workflow\r\nExtract URLs and Save to a File\r\n\r\nbash\r\nCopy code\r\nliterotica url urls.txt --o extracted_urls.txt\r\nThis command extracts URLs from urls.txt and saves them in extracted_urls.txt.\r\n\r\nDownload Stories from Extracted URLs\r\n\r\nbash\r\nCopy code\r\nliterotica download extracted_urls.txt\r\nDownloads all stories from the URLs listed in extracted_urls.txt.\r\n\r\nCode Structure\r\nextract_urls_from_file: Reads and cleans URL list from a file.\r\nfff_url_extractor: Calls FanFicFare to list all stories for a given URL.\r\ndownload_url_from_file: Initiates story download from URLs in a file with progress tracking using tqdm.\r\nprettify_url: Formats URLs for better readability.\r\nsave_to_file: Saves processed URLs to a file, appending to an existing file or creating a new one.\r\nLicense\r\nThis project is licensed under the MIT License. See the LICENSE file for details.\r\n\r\nContributing\r\nContributions are welcome! Please fork the repository and submit a pull request.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A mini script to read a list of url and extract all the url's present in the webpage",
    "version": "0.2.6",
    "project_urls": {
        "Homepage": "https://github.com/username/my_project"
    },
    "split_keywords": [
        "extractor",
        " lit-extractor"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8ceaaf042bdf5bb759ec803335d187aec51dfcf53e134e0219d3f2a7ccbdc9af",
                "md5": "3a44b8e656f1bc24af662eb7c649c16f",
                "sha256": "8e4a150b0c2e3b4d973f82f4c0d74c7e5b1ea7163308b5a1f5314e466f17e188"
            },
            "downloads": -1,
            "filename": "lit_extractor-0.2.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3a44b8e656f1bc24af662eb7c649c16f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5713,
            "upload_time": "2024-11-09T16:06:38",
            "upload_time_iso_8601": "2024-11-09T16:06:38.106055Z",
            "url": "https://files.pythonhosted.org/packages/8c/ea/af042bdf5bb759ec803335d187aec51dfcf53e134e0219d3f2a7ccbdc9af/lit_extractor-0.2.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "45e7844f3e619b00c5cd438c25d0afe3eb6dfdda4b9226dbd66b79ce99d4deeb",
                "md5": "612733ce7eaf1195cfa339e479ee40cb",
                "sha256": "810aac790f69294c470fb6ea28be39d675133975a2cf7220c97e2d60ad0bed94"
            },
            "downloads": -1,
            "filename": "lit_extractor-0.2.6.tar.gz",
            "has_sig": false,
            "md5_digest": "612733ce7eaf1195cfa339e479ee40cb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 5220,
            "upload_time": "2024-11-09T16:06:39",
            "upload_time_iso_8601": "2024-11-09T16:06:39.145119Z",
            "url": "https://files.pythonhosted.org/packages/45/e7/844f3e619b00c5cd438c25d0afe3eb6dfdda4b9226dbd66b79ce99d4deeb/lit_extractor-0.2.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-09 16:06:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "username",
    "github_project": "my_project",
    "github_not_found": true,
    "lcname": "lit-extractor"
}
        
Elapsed time: 0.47822s