Name | xsget JSON |
Version |
0.1.27
JSON |
| download |
home_page | None |
Summary | Console tools to download online novel and convert to text file. |
upload_time | 2025-02-16 14:31:13 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# xsget
Console tools to download online novel and convert to text file.
## Installation
Stable version From PyPI using `pipx`:
```console
pipx install xsget playwright
playwright install
```
Stable version From PyPI using `pip`:
```console
python3 -m pip install xsget playwright
playwright install
```
Upgrade to latest stable version:
```console
python3 -m pip install xsget --upgrade
```
Latest development version from GitHub:
```console
python3 -m pip install -e git+https://github.com/kianmeng/xsget.git
playwright install
```
## xsget
```console
xsget -h
```
<!--help-xsget !-->
```console
usage: xsget [-l CSS_PATH] [-p URL_PARAM] [-g [FILENAME] | -c [FILENAME]] [-r]
[-t] [-b] [-bs SESSION] [-bd DELAY] [-od OUTPUT_DIR] [-q] [-e]
[-d] [-h] [-V]
URL
xsget is a console app that crawl and download online novel.
website: https://github.com/kianmeng/xsget
changelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md
issues: https://github.com/kianmeng/xsget/issues
positional arguments:
URL set url of the index page to crawl
options:
-l, --link-css-path CSS_PATH
set css path of the link to a chapter (default: 'a')
-p, -url-param-as-filename URL_PARAM
use url param key as filename (default: '')
-g, --generate-config-file [FILENAME]
generate config file from options (default: 'xsget.toml')
-c, --config-file [FILENAME]
load config from file (default: 'xsget.toml')
-r, --refresh
refresh the index page
-t, --test
show extracted urls without crawling
-b, --browser
crawl by actual browser (default: 'False')
-bs, --browser-session SESSION
set the number of browser session (default: 2)
-bd, --browser-delay DELAY
set the second to wait for page to load in browser (default: 0)
-od, --output-dir OUTPUT_DIR
set default output folder (default: 'output')
-q, --quiet
suppress all logging
-e, --env
print environment information for bug reporting
-d, --debug
show debugging log and stacktrace
-h, --help
show this help message and exit
-V, --version
show program's version number and exit
examples:
xsget http://localhost
xsget http://localhost/page[1-100].html
xsget -g -l "a" -p "id" http://localhost
```
<!--help-xsget !-->
## xstxt
```console
xstxt -h
```
<!--help-xstxt !-->
```console
usage: xstxt [-pt CSS_PATH] [-pb CSS_PATH] [-la LANGUAGE] [-ps SEPARATOR]
[-rh REGEX REGEX] [-rt REGEX REGEX] [-bt TITLE] [-ba AUTHOR]
[-ic INDENT_CHARS] [-fw] [-oi] [-ow] [-i GLOB_PATTERN]
[-e GLOB_PATTERN] [-l TOTAL_FILES] [-w WIDTH] [-o FILENAME]
[-od OUTPUT_DIR] [-y] [-p] [-g [FILENAME] | -c [FILENAME]] [-m]
[-q] [--env] [-d] [-h] [-V]
xstxt is a console app that extract content from HTML to text file.
website: https://github.com/kianmeng/xsget
changelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md
issues: https://github.com/kianmeng/xsget/issues
options:
-pt, --title-css-path CSS_PATH
set css path of chapter title (default: 'title')
-pb, --body-css-path CSS_PATH
set css path of chapter body (default: 'body')
-la, --language LANGUAGE
language of the ebook (default: 'zh')
-ps, --paragraph-separator SEPARATOR
set paragraph separator (default: '\n\n')
-rh, --html-replace REGEX REGEX
set regex to replace word or pharase in html file
-rt, --txt-replace REGEX REGEX
set regex to replace word or pharase in txt file
-bt, --book-title TITLE
set title of the novel (default: '不详')
-ba, --book-author AUTHOR
set author of the novel (default: '不详')
-ic, --indent-chars INDENT_CHARS
set indent characters for a paragraph (default: '')
-fw, --fullwidth
convert ASCII character to from halfwidth to fullwidth (default: 'False')
-oi, --output-individual-file
convert each html file into own txt file
-ow, --overwrite
overwrite output file
-i, --input GLOB_PATTERN
set glob pattern of html files to process (default: '['./*.html']')
-e, --exclude GLOB_PATTERN
set glob pattern of html files to exclude (default: '[]')
-l, --limit TOTAL_FILES
set number of html files to process (default: '3')
-w, --width WIDTH
set the line width for wrapping (default: 0, 0 to disable)
-o, --output FILENAME
set output txt file name (default: 'book.txt')
-od, --output-dir OUTPUT_DIR
set output directory (default: 'output')
-y, --yes
yes to prompt
-p, --purge
remove extracted files specified by --output-folder option (default: 'False')
-g, --generate-config-file [FILENAME]
generate config file from options (default: 'xstxt.toml')
-c, --config-file [FILENAME]
load config from file (default: 'xstxt.toml')
-m, --monitor
monitor config file changes and re-run when needed
-q, --quiet
suppress all logging
--env
print environment information for bug reporting
-d, --debug
show debugging log and stacktrace
-h, --help
show this help message and exit
-V, --version
show program's version number and exit
examples:
xsget -g
xstxt --input *.html
xstxt --output-individual-file --input *.html
xstxt --config --monitor
```
<!--help-xstxt !-->
## Copyright and License
Copyright (C) 2021,2022,2023,2024,2025 Kian-Meng Ang
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU Affero General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option) any
later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along
with this program. If not, see <https://www.gnu.org/licenses/>.
Raw data
{
"_id": null,
"home_page": null,
"name": "xsget",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "\"Kian-Meng, Ang\" <kianmeng@cpan.org>",
"download_url": "https://files.pythonhosted.org/packages/96/9b/f857d2c89f57e9ccbfdfc18887e54bea48856f68decce564273f4ef7b0b8/xsget-0.1.27.tar.gz",
"platform": null,
"description": "# xsget\n\nConsole tools to download online novel and convert to text file.\n\n## Installation\n\nStable version From PyPI using `pipx`:\n\n```console\npipx install xsget playwright\nplaywright install\n```\n\nStable version From PyPI using `pip`:\n\n```console\npython3 -m pip install xsget playwright\nplaywright install\n```\n\nUpgrade to latest stable version:\n\n```console\npython3 -m pip install xsget --upgrade\n```\n\nLatest development version from GitHub:\n\n```console\npython3 -m pip install -e git+https://github.com/kianmeng/xsget.git\nplaywright install\n```\n\n## xsget\n\n```console\nxsget -h\n```\n\n<!--help-xsget !-->\n\n```console\nusage: xsget [-l CSS_PATH] [-p URL_PARAM] [-g [FILENAME] | -c [FILENAME]] [-r]\n [-t] [-b] [-bs SESSION] [-bd DELAY] [-od OUTPUT_DIR] [-q] [-e]\n [-d] [-h] [-V]\n URL\n\nxsget is a console app that crawl and download online novel.\n\nwebsite: https://github.com/kianmeng/xsget\nchangelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md\nissues: https://github.com/kianmeng/xsget/issues\n\npositional arguments:\n URL set url of the index page to crawl\n\noptions:\n -l, --link-css-path CSS_PATH\n set css path of the link to a chapter (default: 'a')\n -p, -url-param-as-filename URL_PARAM\n use url param key as filename (default: '')\n -g, --generate-config-file [FILENAME]\n generate config file from options (default: 'xsget.toml')\n -c, --config-file [FILENAME]\n load config from file (default: 'xsget.toml')\n -r, --refresh\n refresh the index page\n -t, --test\n show extracted urls without crawling\n -b, --browser\n crawl by actual browser (default: 'False')\n -bs, --browser-session SESSION\n set the number of browser session (default: 2)\n -bd, --browser-delay DELAY\n set the second to wait for page to load in browser (default: 0)\n -od, --output-dir OUTPUT_DIR\n set default output folder (default: 'output')\n -q, --quiet\n suppress all logging\n -e, --env\n print environment information for bug reporting\n -d, --debug\n show debugging log and stacktrace\n -h, --help\n show this help message and exit\n -V, --version\n show program's version number and exit\n\nexamples:\n xsget http://localhost\n xsget http://localhost/page[1-100].html\n xsget -g -l \"a\" -p \"id\" http://localhost\n```\n\n<!--help-xsget !-->\n\n## xstxt\n\n```console\nxstxt -h\n```\n\n<!--help-xstxt !-->\n\n```console\nusage: xstxt [-pt CSS_PATH] [-pb CSS_PATH] [-la LANGUAGE] [-ps SEPARATOR]\n [-rh REGEX REGEX] [-rt REGEX REGEX] [-bt TITLE] [-ba AUTHOR]\n [-ic INDENT_CHARS] [-fw] [-oi] [-ow] [-i GLOB_PATTERN]\n [-e GLOB_PATTERN] [-l TOTAL_FILES] [-w WIDTH] [-o FILENAME]\n [-od OUTPUT_DIR] [-y] [-p] [-g [FILENAME] | -c [FILENAME]] [-m]\n [-q] [--env] [-d] [-h] [-V]\n\nxstxt is a console app that extract content from HTML to text file.\n\nwebsite: https://github.com/kianmeng/xsget\nchangelog: https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md\nissues: https://github.com/kianmeng/xsget/issues\n\noptions:\n -pt, --title-css-path CSS_PATH\n set css path of chapter title (default: 'title')\n -pb, --body-css-path CSS_PATH\n set css path of chapter body (default: 'body')\n -la, --language LANGUAGE\n language of the ebook (default: 'zh')\n -ps, --paragraph-separator SEPARATOR\n set paragraph separator (default: '\\n\\n')\n -rh, --html-replace REGEX REGEX\n set regex to replace word or pharase in html file\n -rt, --txt-replace REGEX REGEX\n set regex to replace word or pharase in txt file\n -bt, --book-title TITLE\n set title of the novel (default: '\u4e0d\u8be6')\n -ba, --book-author AUTHOR\n set author of the novel (default: '\u4e0d\u8be6')\n -ic, --indent-chars INDENT_CHARS\n set indent characters for a paragraph (default: '')\n -fw, --fullwidth\n convert ASCII character to from halfwidth to fullwidth (default: 'False')\n -oi, --output-individual-file\n convert each html file into own txt file\n -ow, --overwrite\n overwrite output file\n -i, --input GLOB_PATTERN\n set glob pattern of html files to process (default: '['./*.html']')\n -e, --exclude GLOB_PATTERN\n set glob pattern of html files to exclude (default: '[]')\n -l, --limit TOTAL_FILES\n set number of html files to process (default: '3')\n -w, --width WIDTH\n set the line width for wrapping (default: 0, 0 to disable)\n -o, --output FILENAME\n set output txt file name (default: 'book.txt')\n -od, --output-dir OUTPUT_DIR\n set output directory (default: 'output')\n -y, --yes\n yes to prompt\n -p, --purge\n remove extracted files specified by --output-folder option (default: 'False')\n -g, --generate-config-file [FILENAME]\n generate config file from options (default: 'xstxt.toml')\n -c, --config-file [FILENAME]\n load config from file (default: 'xstxt.toml')\n -m, --monitor\n monitor config file changes and re-run when needed\n -q, --quiet\n suppress all logging\n --env\n print environment information for bug reporting\n -d, --debug\n show debugging log and stacktrace\n -h, --help\n show this help message and exit\n -V, --version\n show program's version number and exit\n\nexamples:\n xsget -g\n xstxt --input *.html\n xstxt --output-individual-file --input *.html\n xstxt --config --monitor\n```\n\n<!--help-xstxt !-->\n\n## Copyright and License\n\nCopyright (C) 2021,2022,2023,2024,2025 Kian-Meng Ang\n\nThis program is free software: you can redistribute it and/or modify it under\nthe terms of the GNU Affero General Public License as published by the Free\nSoftware Foundation, either version 3 of the License, or (at your option) any\nlater version.\n\nThis program is distributed in the hope that it will be useful, but WITHOUT ANY\nWARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A\nPARTICULAR PURPOSE. See the GNU Affero General Public License for more details.\n\nYou should have received a copy of the GNU Affero General Public License along\nwith this program. If not, see <https://www.gnu.org/licenses/>.\n",
"bugtrack_url": null,
"license": null,
"summary": "Console tools to download online novel and convert to text file.",
"version": "0.1.27",
"project_urls": {
"Changelog": "https://github.com/kianmeng/xsget/blob/master/CHANGELOG.md",
"Issues": "https://github.com/kianmeng/xsget/issues",
"Source": "https://github.com/kianmeng/xsget"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8cc04cd8655a856e6915fb957ec91c2530a4bda28963569db896a1b4bee61f84",
"md5": "e58863941ba421dd0aed2fdc4296ac2b",
"sha256": "0c37905902060e504fd569c4aa7344956c384072ff6db0bfaba0577d9d8f5b6b"
},
"downloads": -1,
"filename": "xsget-0.1.27-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e58863941ba421dd0aed2fdc4296ac2b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 33192,
"upload_time": "2025-02-16T14:31:08",
"upload_time_iso_8601": "2025-02-16T14:31:08.907871Z",
"url": "https://files.pythonhosted.org/packages/8c/c0/4cd8655a856e6915fb957ec91c2530a4bda28963569db896a1b4bee61f84/xsget-0.1.27-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "969bf857d2c89f57e9ccbfdfc18887e54bea48856f68decce564273f4ef7b0b8",
"md5": "6f93c1aea110aa7eeafc24a2da205512",
"sha256": "e924c6037afd309fe9c8b9bf8aae0a90dd0559476c16540eaa6b48a71774ae39"
},
"downloads": -1,
"filename": "xsget-0.1.27.tar.gz",
"has_sig": false,
"md5_digest": "6f93c1aea110aa7eeafc24a2da205512",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 201492,
"upload_time": "2025-02-16T14:31:13",
"upload_time_iso_8601": "2025-02-16T14:31:13.138044Z",
"url": "https://files.pythonhosted.org/packages/96/9b/f857d2c89f57e9ccbfdfc18887e54bea48856f68decce564273f4ef7b0b8/xsget-0.1.27.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-16 14:31:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kianmeng",
"github_project": "xsget",
"github_not_found": true,
"lcname": "xsget"
}