xsget


Namexsget JSON
Version 0.1.13 PyPI version JSON
download
home_pageNone
SummaryConsole tools to download online novel and convert to text file.
upload_time2024-03-31 15:37:17
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # xsget

Console tools to download online novel and convert to text file.

## Installation

Stable version From PyPI using `pipx`:

```console
pipx install xsget playwright
playwright install
```

Stable version From PyPI using `pip`:

```console
python3 -m pip install xsget playwright
playwright install
```

Upgrade to latest stable version:

```console
python3 -m pip install xsget --upgrade
```

Latest development version from GitHub:

```console
python3 -m pip install -e git+https://github.com/kianmeng/xsget.git
playwright install
```

## xsget

```console
xsget -h
```

```console
usage: xsget [-l CSS_PATH] [-p URL_PARAM] [-g [FILENAME] | -c [FILENAME]] [-r]
             [-t] [-b] [-bs SESSION] [-bd DELAY] [-q] [--env] [-d] [-h] [-V]
             URL

xsget is a console app that crawl and download online novel.

  website: https://github.com/kianmeng/xsget
  changelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md
  issues: https://github.com/kianmeng/xsget/issues

positional arguments:
  URL   set url of the index page to crawl

optional arguments:
  -l CSS_PATH, --link-css-path CSS_PATH
        set css path of the link to a chapter (default: 'a')
  -p URL_PARAM, -url-param-as-filename URL_PARAM
        use url param key as filename (default: '')
  -g [FILENAME], --generate-config-file [FILENAME]
        generate config file from options (default: 'xsget.toml')
  -c [FILENAME], --config-file [FILENAME]
        load config from file (default: 'xsget.toml')
  -r, --refresh
        refresh the index page
  -t, --test
        show extracted urls without crawling
  -b, --browser
        crawl by actual browser (default: 'False')
  -bs SESSION, --browser-session SESSION
        set the number of browser session (default: 2)
  -bd DELAY, --browser-delay DELAY
        set the second to wait for page to load in browser (default: 0)
  -q, --quiet
        suppress all logging
  --env
        print environment information for bug reporting
  -d, --debug
        show debugging log and stacktrace
  -h, --help
        show this help message and exit
  -V, --version
        show program's version number and exit

examples:
  xsget http://localhost
  xsget http://localhost/page[1-100].html
  xsget -g -l "a" -p "id" http://localhost
```

## xstxt

```console
xstxt -h
```

```console
usage: xstxt [-pt CSS_PATH] [-pb CSS_PATH] [-la LANGUAGE] [-ps SEPARATOR]
             [-rh REGEX REGEX] [-rt REGEX REGEX] [-bt TITLE] [-ba AUTHOR]
             [-ic INDENT_CHARS] [-fw] [-oi] [-ow] [-i GLOB_PATTERN]
             [-e GLOB_PATTERN] [-l TOTAL_FILES] [-w WIDTH] [-o FILENAME]
             [-od OUTPUT_DIR] [-y] [-p] [-g [FILENAME] | -c [FILENAME]] [-m]
             [-q] [--env] [-d] [-h] [-V]

xstxt is a console app that extract content from HTML to text file.

  website: https://github.com/kianmeng/xsget
  changelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md
  issues: https://github.com/kianmeng/xsget/issues

optional arguments:
  -pt CSS_PATH, --title-css-path CSS_PATH
        set css path of chapter title (default: 'title')
  -pb CSS_PATH, --body-css-path CSS_PATH
        set css path of chapter body (default: 'body')
  -la LANGUAGE, --language LANGUAGE
        language of the ebook (default: 'zh')
  -ps SEPARATOR, --paragraph-separator SEPARATOR
        set paragraph separator (default: '\n\n')
  -rh REGEX REGEX, --html-replace REGEX REGEX
        set regex to replace word or pharase in html file
  -rt REGEX REGEX, --txt-replace REGEX REGEX
        set regex to replace word or pharase in txt file
  -bt TITLE, --book-title TITLE
        set title of the novel (default: '不详')
  -ba AUTHOR, --book-author AUTHOR
        set author of the novel (default: '不详')
  -ic INDENT_CHARS, --indent-chars INDENT_CHARS
        set indent characters for a paragraph (default: '')
  -fw, --fullwidth
        convert ASCII character to from halfwidth to fullwidth (default: 'False')
  -oi, --output-individual-file
        convert each html file into own txt file
  -ow, --overwrite
        overwrite output file
  -i GLOB_PATTERN, --input GLOB_PATTERN
        set glob pattern of html files to process (default: '['./*.html']')
  -e GLOB_PATTERN, --exclude GLOB_PATTERN
        set glob pattern of html files to exclude (default: '[]')
  -l TOTAL_FILES, --limit TOTAL_FILES
        set number of html files to process (default: '3')
  -w WIDTH, --width WIDTH
        set the line width for wrapping (default: 0, 0 to disable)
  -o FILENAME, --output FILENAME
        set output txt file name (default: 'book.txt')
  -od OUTPUT_DIR, --output-dir OUTPUT_DIR
        set output directory (default: 'output')
  -y, --yes
        yes to prompt
  -p, --purge
        remove extracted files specified by --output-folder option (default: 'False')
  -g [FILENAME], --generate-config-file [FILENAME]
        generate config file from options (default: 'xstxt.toml')
  -c [FILENAME], --config-file [FILENAME]
        load config from file (default: 'xstxt.toml')
  -m, --monitor
        monitor config file changes and re-run when needed
  -q, --quiet
        suppress all logging
  --env
        print environment information for bug reporting
  -d, --debug
        show debugging log and stacktrace
  -h, --help
        show this help message and exit
  -V, --version
        show program's version number and exit

examples:
  xsget -g
  xstxt --input *.html
  xstxt --output-individual-file --input *.html
  xstxt --config --monitor
```

## Copyright and License

Copyright (C) 2021,2022,2023,2024 Kian-Meng Ang

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU Affero General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option) any
later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along
with this program. If not, see <https://www.gnu.org/licenses/>.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "xsget",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "\"Kian-Meng, Ang\" <kianmeng@cpan.org>",
    "download_url": "https://files.pythonhosted.org/packages/38/d0/18b43ef600d1c04ba2be43b6736f90ecad3f0bf979e2a1505c3086feb0d5/xsget-0.1.13.tar.gz",
    "platform": null,
    "description": "# xsget\n\nConsole tools to download online novel and convert to text file.\n\n## Installation\n\nStable version From PyPI using `pipx`:\n\n```console\npipx install xsget playwright\nplaywright install\n```\n\nStable version From PyPI using `pip`:\n\n```console\npython3 -m pip install xsget playwright\nplaywright install\n```\n\nUpgrade to latest stable version:\n\n```console\npython3 -m pip install xsget --upgrade\n```\n\nLatest development version from GitHub:\n\n```console\npython3 -m pip install -e git+https://github.com/kianmeng/xsget.git\nplaywright install\n```\n\n## xsget\n\n```console\nxsget -h\n```\n\n```console\nusage: xsget [-l CSS_PATH] [-p URL_PARAM] [-g [FILENAME] | -c [FILENAME]] [-r]\n             [-t] [-b] [-bs SESSION] [-bd DELAY] [-q] [--env] [-d] [-h] [-V]\n             URL\n\nxsget is a console app that crawl and download online novel.\n\n  website: https://github.com/kianmeng/xsget\n  changelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md\n  issues: https://github.com/kianmeng/xsget/issues\n\npositional arguments:\n  URL   set url of the index page to crawl\n\noptional arguments:\n  -l CSS_PATH, --link-css-path CSS_PATH\n        set css path of the link to a chapter (default: 'a')\n  -p URL_PARAM, -url-param-as-filename URL_PARAM\n        use url param key as filename (default: '')\n  -g [FILENAME], --generate-config-file [FILENAME]\n        generate config file from options (default: 'xsget.toml')\n  -c [FILENAME], --config-file [FILENAME]\n        load config from file (default: 'xsget.toml')\n  -r, --refresh\n        refresh the index page\n  -t, --test\n        show extracted urls without crawling\n  -b, --browser\n        crawl by actual browser (default: 'False')\n  -bs SESSION, --browser-session SESSION\n        set the number of browser session (default: 2)\n  -bd DELAY, --browser-delay DELAY\n        set the second to wait for page to load in browser (default: 0)\n  -q, --quiet\n        suppress all logging\n  --env\n        print environment information for bug reporting\n  -d, --debug\n        show debugging log and stacktrace\n  -h, --help\n        show this help message and exit\n  -V, --version\n        show program's version number and exit\n\nexamples:\n  xsget http://localhost\n  xsget http://localhost/page[1-100].html\n  xsget -g -l \"a\" -p \"id\" http://localhost\n```\n\n## xstxt\n\n```console\nxstxt -h\n```\n\n```console\nusage: xstxt [-pt CSS_PATH] [-pb CSS_PATH] [-la LANGUAGE] [-ps SEPARATOR]\n             [-rh REGEX REGEX] [-rt REGEX REGEX] [-bt TITLE] [-ba AUTHOR]\n             [-ic INDENT_CHARS] [-fw] [-oi] [-ow] [-i GLOB_PATTERN]\n             [-e GLOB_PATTERN] [-l TOTAL_FILES] [-w WIDTH] [-o FILENAME]\n             [-od OUTPUT_DIR] [-y] [-p] [-g [FILENAME] | -c [FILENAME]] [-m]\n             [-q] [--env] [-d] [-h] [-V]\n\nxstxt is a console app that extract content from HTML to text file.\n\n  website: https://github.com/kianmeng/xsget\n  changelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md\n  issues: https://github.com/kianmeng/xsget/issues\n\noptional arguments:\n  -pt CSS_PATH, --title-css-path CSS_PATH\n        set css path of chapter title (default: 'title')\n  -pb CSS_PATH, --body-css-path CSS_PATH\n        set css path of chapter body (default: 'body')\n  -la LANGUAGE, --language LANGUAGE\n        language of the ebook (default: 'zh')\n  -ps SEPARATOR, --paragraph-separator SEPARATOR\n        set paragraph separator (default: '\\n\\n')\n  -rh REGEX REGEX, --html-replace REGEX REGEX\n        set regex to replace word or pharase in html file\n  -rt REGEX REGEX, --txt-replace REGEX REGEX\n        set regex to replace word or pharase in txt file\n  -bt TITLE, --book-title TITLE\n        set title of the novel (default: '\u4e0d\u8be6')\n  -ba AUTHOR, --book-author AUTHOR\n        set author of the novel (default: '\u4e0d\u8be6')\n  -ic INDENT_CHARS, --indent-chars INDENT_CHARS\n        set indent characters for a paragraph (default: '')\n  -fw, --fullwidth\n        convert ASCII character to from halfwidth to fullwidth (default: 'False')\n  -oi, --output-individual-file\n        convert each html file into own txt file\n  -ow, --overwrite\n        overwrite output file\n  -i GLOB_PATTERN, --input GLOB_PATTERN\n        set glob pattern of html files to process (default: '['./*.html']')\n  -e GLOB_PATTERN, --exclude GLOB_PATTERN\n        set glob pattern of html files to exclude (default: '[]')\n  -l TOTAL_FILES, --limit TOTAL_FILES\n        set number of html files to process (default: '3')\n  -w WIDTH, --width WIDTH\n        set the line width for wrapping (default: 0, 0 to disable)\n  -o FILENAME, --output FILENAME\n        set output txt file name (default: 'book.txt')\n  -od OUTPUT_DIR, --output-dir OUTPUT_DIR\n        set output directory (default: 'output')\n  -y, --yes\n        yes to prompt\n  -p, --purge\n        remove extracted files specified by --output-folder option (default: 'False')\n  -g [FILENAME], --generate-config-file [FILENAME]\n        generate config file from options (default: 'xstxt.toml')\n  -c [FILENAME], --config-file [FILENAME]\n        load config from file (default: 'xstxt.toml')\n  -m, --monitor\n        monitor config file changes and re-run when needed\n  -q, --quiet\n        suppress all logging\n  --env\n        print environment information for bug reporting\n  -d, --debug\n        show debugging log and stacktrace\n  -h, --help\n        show this help message and exit\n  -V, --version\n        show program's version number and exit\n\nexamples:\n  xsget -g\n  xstxt --input *.html\n  xstxt --output-individual-file --input *.html\n  xstxt --config --monitor\n```\n\n## Copyright and License\n\nCopyright (C) 2021,2022,2023,2024 Kian-Meng Ang\n\nThis program is free software: you can redistribute it and/or modify it under\nthe terms of the GNU Affero General Public License as published by the Free\nSoftware Foundation, either version 3 of the License, or (at your option) any\nlater version.\n\nThis program is distributed in the hope that it will be useful, but WITHOUT ANY\nWARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A\nPARTICULAR PURPOSE. See the GNU Affero General Public License for more details.\n\nYou should have received a copy of the GNU Affero General Public License along\nwith this program. If not, see <https://www.gnu.org/licenses/>.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Console tools to download online novel and convert to text file.",
    "version": "0.1.13",
    "project_urls": {
        "Changelog": "https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md",
        "Issues": "https://github.com/kianmeng/xsget/issues",
        "Source": "https://github.com/kianmeng/xsget"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "33595b7a877dbe82bad3b60617f071e7f50ed37eebda2f0ac23b6bb1013d2934",
                "md5": "05c7cd357431db3663cfc13642595e3d",
                "sha256": "c92336992866446e5819b65ce852668cf4ead62c86a027a279a13ab19b50617c"
            },
            "downloads": -1,
            "filename": "xsget-0.1.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "05c7cd357431db3663cfc13642595e3d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 32083,
            "upload_time": "2024-03-31T15:37:14",
            "upload_time_iso_8601": "2024-03-31T15:37:14.697287Z",
            "url": "https://files.pythonhosted.org/packages/33/59/5b7a877dbe82bad3b60617f071e7f50ed37eebda2f0ac23b6bb1013d2934/xsget-0.1.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "38d018b43ef600d1c04ba2be43b6736f90ecad3f0bf979e2a1505c3086feb0d5",
                "md5": "85e1a04861c4bb810214a10b6b52c6e9",
                "sha256": "a84f5fb7a3a99f1d744f566a87ccae38cb959f3b80be0d0914af4d7da9779f06"
            },
            "downloads": -1,
            "filename": "xsget-0.1.13.tar.gz",
            "has_sig": false,
            "md5_digest": "85e1a04861c4bb810214a10b6b52c6e9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 183104,
            "upload_time": "2024-03-31T15:37:17",
            "upload_time_iso_8601": "2024-03-31T15:37:17.828635Z",
            "url": "https://files.pythonhosted.org/packages/38/d0/18b43ef600d1c04ba2be43b6736f90ecad3f0bf979e2a1505c3086feb0d5/xsget-0.1.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-31 15:37:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kianmeng",
    "github_project": "xsget",
    "github_not_found": true,
    "lcname": "xsget"
}
        
Elapsed time: 0.24654s