Sotoki
======
`Sotoki` (*Stack Overflow to Kiwix*) is an
[openZIM](https://github.com/openzim) scraper to create offline
versions of [Stack Exchange](https://stackexchange.com) websites such
as [Stack Overflow](https://stackoverflow.com/).
It is based on Stack Exchange's Data Dumps hosted by [The Internet
Archive](https://archive.org/download/stackexchange/).
[![CodeFactor](https://www.codefactor.io/repository/github/openzim/sotoki/badge)](https://www.codefactor.io/repository/github/openzim/sotoki)
[![Docker](https://ghcr-badge.deta.dev/openzim/sotoki/latest_tag?label=docker)](https://ghcr.io/openzim/sotoki)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version shields.io](https://img.shields.io/pypi/v/sotoki.svg)](https://pypi.org/project/sotoki/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sotoki.svg)](https://pypi.org/project/sotoki)
## Usage
`Sotoki` works off a `domain` that you must provide. That is the
domain-name of the stackexchange website you want to scrape. Run
`sotoki --list-all` to get a list of those
### Docker
```bash
docker run -v my_dir:/output ghcr.io/openzim/sotoki sotoki --help
```
### Installation
`sotoki` is a Python3 software. If you are not using the
[Docker](https://ghcr.io/openzim/sotoki/) image, you are advised to use it in a
virtual environment to avoid installing software dependencies on your
system.
```sh
python3 -m venv ./env # creates a virtual python environment in ./env folder
./env/bin/pip install -U pip # upgrade pip (package manager). recommended
./env/bin/pip install -U sotoki # install/upgrade sotoki inside virtualenv
# direct access to in-virtualenv sotoki binary, without shell-attachment
./env/bin/sotoki --help
# alias or link it for convenience
sudo ln -s $(pwd)/env/bin/sotoki /usr/local/bin/
# alternatively, attach virtualenv to shell
source env/bin/activate
sotoki --help
deactivate # unloads virtualenv from shell
```
## Developers
Anybody is welcome to improve the Sotoki.
To run Sotoki off the git repository, you'll need to download a few
external dependencies that we pack in Python releases. Just run
`python src/sotoki/dependencies.py`.
See `requirements.txt` for the list of python dependencies.
## Users
You don't have to make your own ZIM files of Stack Exchange's Web
sites. Updated ZIM files are built on a regular basis for all
of them. Look at https://library.kiwix.org/?category=stack_exchange
to download them.
Raw data
{
"_id": null,
"home_page": "https://github.com/openzim/sotoki",
"name": "sotoki",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "kiwix zim offline stackechange stackoverflow",
"author": "Kiwix",
"author_email": "contact+dev@kiwix.org",
"download_url": "https://files.pythonhosted.org/packages/7b/ae/bc9ffb6ab894b29a4a77e5d45ec9d0771a26323874bc712c4d86a0e79a04/sotoki-2.1.2.tar.gz",
"platform": null,
"description": "Sotoki\n======\n\n`Sotoki` (*Stack Overflow to Kiwix*) is an\n[openZIM](https://github.com/openzim) scraper to create offline\nversions of [Stack Exchange](https://stackexchange.com) websites such\nas [Stack Overflow](https://stackoverflow.com/).\n\nIt is based on Stack Exchange's Data Dumps hosted by [The Internet\nArchive](https://archive.org/download/stackexchange/).\n\n[![CodeFactor](https://www.codefactor.io/repository/github/openzim/sotoki/badge)](https://www.codefactor.io/repository/github/openzim/sotoki)\n[![Docker](https://ghcr-badge.deta.dev/openzim/sotoki/latest_tag?label=docker)](https://ghcr.io/openzim/sotoki)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/sotoki.svg)](https://pypi.org/project/sotoki/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sotoki.svg)](https://pypi.org/project/sotoki)\n\n## Usage\n\n`Sotoki` works off a `domain` that you must provide. That is the\ndomain-name of the stackexchange website you want to scrape. Run\n`sotoki --list-all` to get a list of those\n\n### Docker\n\n```bash\ndocker run -v my_dir:/output ghcr.io/openzim/sotoki sotoki --help\n```\n\n### Installation\n\n`sotoki` is a Python3 software. If you are not using the\n[Docker](https://ghcr.io/openzim/sotoki/) image, you are advised to use it in a\nvirtual environment to avoid installing software dependencies on your\nsystem.\n\n```sh\npython3 -m venv ./env # creates a virtual python environment in ./env folder\n./env/bin/pip install -U pip # upgrade pip (package manager). recommended\n./env/bin/pip install -U sotoki # install/upgrade sotoki inside virtualenv\n\n# direct access to in-virtualenv sotoki binary, without shell-attachment\n./env/bin/sotoki --help\n# alias or link it for convenience\nsudo ln -s $(pwd)/env/bin/sotoki /usr/local/bin/\n\n# alternatively, attach virtualenv to shell\nsource env/bin/activate\nsotoki --help\ndeactivate # unloads virtualenv from shell\n```\n\n## Developers\n\nAnybody is welcome to improve the Sotoki.\n\nTo run Sotoki off the git repository, you'll need to download a few\nexternal dependencies that we pack in Python releases. Just run\n`python src/sotoki/dependencies.py`.\n\nSee `requirements.txt` for the list of python dependencies.\n\n## Users\n\nYou don't have to make your own ZIM files of Stack Exchange's Web \nsites. Updated ZIM files are built on a regular basis for all \nof them. Look at https://library.kiwix.org/?category=stack_exchange\nto download them.\n",
"bugtrack_url": null,
"license": "GPLv3+",
"summary": "Turn StackExchange dumps into ZIM files for offline usage",
"version": "2.1.2",
"project_urls": {
"Homepage": "https://github.com/openzim/sotoki"
},
"split_keywords": [
"kiwix",
"zim",
"offline",
"stackechange",
"stackoverflow"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fb7eebedbdbe60b598678765c10e7c95676be346b2e277fa10e906d86a07798e",
"md5": "5c0883ba7f7a59b652a974abf25415cc",
"sha256": "3acb19f5d2919673003147485b2dfe86ff82f90a0aff9a2c42bfea1d601fc7d0"
},
"downloads": -1,
"filename": "sotoki-2.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5c0883ba7f7a59b652a974abf25415cc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 1685861,
"upload_time": "2024-05-13T09:26:45",
"upload_time_iso_8601": "2024-05-13T09:26:45.650635Z",
"url": "https://files.pythonhosted.org/packages/fb/7e/ebedbdbe60b598678765c10e7c95676be346b2e277fa10e906d86a07798e/sotoki-2.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7baebc9ffb6ab894b29a4a77e5d45ec9d0771a26323874bc712c4d86a0e79a04",
"md5": "8d3d109a306f18f5c1f9457798f08aea",
"sha256": "db9be1040b8455045e01fe4b27d9e517266f8a58f5ad4c9edb2e16856ff6c763"
},
"downloads": -1,
"filename": "sotoki-2.1.2.tar.gz",
"has_sig": false,
"md5_digest": "8d3d109a306f18f5c1f9457798f08aea",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 1659045,
"upload_time": "2024-05-13T09:26:47",
"upload_time_iso_8601": "2024-05-13T09:26:47.224508Z",
"url": "https://files.pythonhosted.org/packages/7b/ae/bc9ffb6ab894b29a4a77e5d45ec9d0771a26323874bc712c4d86a0e79a04/sotoki-2.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-13 09:26:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "openzim",
"github_project": "sotoki",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "kiwixstorage",
"specs": [
[
"<",
"1.0"
],
[
">=",
"0.8.1"
]
]
},
{
"name": "pif",
"specs": [
[
">=",
"0.8.2"
],
[
"<",
"0.9"
]
]
},
{
"name": "zimscraperlib",
"specs": [
[
">=",
"3.3.0"
],
[
"<",
"4.0"
]
]
},
{
"name": "xml_to_dict",
"specs": [
[
">=",
"0.1.6"
],
[
"<",
"0.2"
]
]
},
{
"name": "cli-formatter",
"specs": [
[
">=",
"1.2.0"
],
[
"<",
"1.3"
]
]
},
{
"name": "py7zr",
"specs": [
[
"<",
"0.21"
],
[
">=",
"0.20.4"
]
]
},
{
"name": "python-slugify",
"specs": [
[
">=",
"8.0.1"
],
[
"<",
"9.0.0"
]
]
},
{
"name": "jinja2",
"specs": [
[
"<",
"3.2"
],
[
">=",
"3.1.0"
]
]
},
{
"name": "redis",
"specs": [
[
"<",
"5.0"
],
[
"!=",
"4.5.2"
],
[
">=",
"4.5.1"
]
]
},
{
"name": "beautifulsoup4",
"specs": [
[
">=",
"4.9.3"
],
[
"<",
"5.0"
]
]
},
{
"name": "lxml",
"specs": [
[
"<",
"4.10"
],
[
">=",
"4.9.1"
]
]
},
{
"name": "jinja2-pluralize",
"specs": [
[
">=",
"0.3.0"
],
[
"<",
"0.4"
]
]
},
{
"name": "tld",
"specs": [
[
">=",
"0.13"
],
[
"<",
"0.14"
]
]
},
{
"name": "mistune",
"specs": [
[
">=",
"2.0.5"
],
[
"<",
"3.0.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"<",
"2.9"
],
[
">=",
"2.8.2"
]
]
},
{
"name": "psutil",
"specs": [
[
">=",
"5.9.4"
],
[
"<",
"6.0"
]
]
},
{
"name": "python-snappy",
"specs": [
[
">=",
"0.6.0"
],
[
"<",
"1.0"
]
]
},
{
"name": "bidict",
"specs": [
[
">=",
"0.22.1"
],
[
"<",
"0.23"
]
]
},
{
"name": "cchardet",
"specs": [
[
"<",
"2.2"
],
[
">=",
"2.1.7"
]
]
}
],
"lcname": "sotoki"
}