sotoki


Namesotoki JSON
Version 2.1.2 PyPI version JSON
download
home_pagehttps://github.com/openzim/sotoki
SummaryTurn StackExchange dumps into ZIM files for offline usage
upload_time2024-05-13 09:26:47
maintainerNone
docs_urlNone
authorKiwix
requires_python>=3.6
licenseGPLv3+
keywords kiwix zim offline stackechange stackoverflow
VCS
bugtrack_url
requirements kiwixstorage pif zimscraperlib xml_to_dict cli-formatter py7zr python-slugify jinja2 redis beautifulsoup4 lxml jinja2-pluralize tld mistune python-dateutil psutil python-snappy bidict cchardet
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Sotoki
======

`Sotoki` (*Stack Overflow to Kiwix*) is an
[openZIM](https://github.com/openzim) scraper to create offline
versions of [Stack Exchange](https://stackexchange.com) websites such
as [Stack Overflow](https://stackoverflow.com/).

It is based on Stack Exchange's Data Dumps hosted by [The Internet
Archive](https://archive.org/download/stackexchange/).

[![CodeFactor](https://www.codefactor.io/repository/github/openzim/sotoki/badge)](https://www.codefactor.io/repository/github/openzim/sotoki)
[![Docker](https://ghcr-badge.deta.dev/openzim/sotoki/latest_tag?label=docker)](https://ghcr.io/openzim/sotoki)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version shields.io](https://img.shields.io/pypi/v/sotoki.svg)](https://pypi.org/project/sotoki/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sotoki.svg)](https://pypi.org/project/sotoki)

## Usage

`Sotoki` works off a `domain` that you must provide. That is the
domain-name of the stackexchange website you want to scrape. Run
`sotoki --list-all` to get a list of those

### Docker

```bash
docker run -v my_dir:/output ghcr.io/openzim/sotoki sotoki --help
```

### Installation

`sotoki` is a Python3 software. If you are not using the
[Docker](https://ghcr.io/openzim/sotoki/) image, you are advised to use it in a
virtual environment to avoid installing software dependencies on your
system.

```sh
python3 -m venv ./env  # creates a virtual python environment in ./env folder
./env/bin/pip install -U pip  # upgrade pip (package manager). recommended
./env/bin/pip install -U sotoki  # install/upgrade sotoki inside virtualenv

# direct access to in-virtualenv sotoki binary, without shell-attachment
./env/bin/sotoki --help
# alias or link it for convenience
sudo ln -s $(pwd)/env/bin/sotoki /usr/local/bin/

# alternatively, attach virtualenv to shell
source env/bin/activate
sotoki --help
deactivate  # unloads virtualenv from shell
```

## Developers

Anybody is welcome to improve the Sotoki.

To run Sotoki off the git repository, you'll need to download a few
external dependencies that we pack in Python releases. Just run
`python src/sotoki/dependencies.py`.

See `requirements.txt` for the list of python dependencies.

## Users

You don't have to make your own ZIM files of Stack Exchange's Web 
sites. Updated ZIM files are built on a regular basis for all 
of them. Look at https://library.kiwix.org/?category=stack_exchange
to download them.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/openzim/sotoki",
    "name": "sotoki",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "kiwix zim offline stackechange stackoverflow",
    "author": "Kiwix",
    "author_email": "contact+dev@kiwix.org",
    "download_url": "https://files.pythonhosted.org/packages/7b/ae/bc9ffb6ab894b29a4a77e5d45ec9d0771a26323874bc712c4d86a0e79a04/sotoki-2.1.2.tar.gz",
    "platform": null,
    "description": "Sotoki\n======\n\n`Sotoki` (*Stack Overflow to Kiwix*) is an\n[openZIM](https://github.com/openzim) scraper to create offline\nversions of [Stack Exchange](https://stackexchange.com) websites such\nas [Stack Overflow](https://stackoverflow.com/).\n\nIt is based on Stack Exchange's Data Dumps hosted by [The Internet\nArchive](https://archive.org/download/stackexchange/).\n\n[![CodeFactor](https://www.codefactor.io/repository/github/openzim/sotoki/badge)](https://www.codefactor.io/repository/github/openzim/sotoki)\n[![Docker](https://ghcr-badge.deta.dev/openzim/sotoki/latest_tag?label=docker)](https://ghcr.io/openzim/sotoki)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/sotoki.svg)](https://pypi.org/project/sotoki/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sotoki.svg)](https://pypi.org/project/sotoki)\n\n## Usage\n\n`Sotoki` works off a `domain` that you must provide. That is the\ndomain-name of the stackexchange website you want to scrape. Run\n`sotoki --list-all` to get a list of those\n\n### Docker\n\n```bash\ndocker run -v my_dir:/output ghcr.io/openzim/sotoki sotoki --help\n```\n\n### Installation\n\n`sotoki` is a Python3 software. If you are not using the\n[Docker](https://ghcr.io/openzim/sotoki/) image, you are advised to use it in a\nvirtual environment to avoid installing software dependencies on your\nsystem.\n\n```sh\npython3 -m venv ./env  # creates a virtual python environment in ./env folder\n./env/bin/pip install -U pip  # upgrade pip (package manager). recommended\n./env/bin/pip install -U sotoki  # install/upgrade sotoki inside virtualenv\n\n# direct access to in-virtualenv sotoki binary, without shell-attachment\n./env/bin/sotoki --help\n# alias or link it for convenience\nsudo ln -s $(pwd)/env/bin/sotoki /usr/local/bin/\n\n# alternatively, attach virtualenv to shell\nsource env/bin/activate\nsotoki --help\ndeactivate  # unloads virtualenv from shell\n```\n\n## Developers\n\nAnybody is welcome to improve the Sotoki.\n\nTo run Sotoki off the git repository, you'll need to download a few\nexternal dependencies that we pack in Python releases. Just run\n`python src/sotoki/dependencies.py`.\n\nSee `requirements.txt` for the list of python dependencies.\n\n## Users\n\nYou don't have to make your own ZIM files of Stack Exchange's Web \nsites. Updated ZIM files are built on a regular basis for all \nof them. Look at https://library.kiwix.org/?category=stack_exchange\nto download them.\n",
    "bugtrack_url": null,
    "license": "GPLv3+",
    "summary": "Turn StackExchange dumps into ZIM files for offline usage",
    "version": "2.1.2",
    "project_urls": {
        "Homepage": "https://github.com/openzim/sotoki"
    },
    "split_keywords": [
        "kiwix",
        "zim",
        "offline",
        "stackechange",
        "stackoverflow"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fb7eebedbdbe60b598678765c10e7c95676be346b2e277fa10e906d86a07798e",
                "md5": "5c0883ba7f7a59b652a974abf25415cc",
                "sha256": "3acb19f5d2919673003147485b2dfe86ff82f90a0aff9a2c42bfea1d601fc7d0"
            },
            "downloads": -1,
            "filename": "sotoki-2.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5c0883ba7f7a59b652a974abf25415cc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 1685861,
            "upload_time": "2024-05-13T09:26:45",
            "upload_time_iso_8601": "2024-05-13T09:26:45.650635Z",
            "url": "https://files.pythonhosted.org/packages/fb/7e/ebedbdbe60b598678765c10e7c95676be346b2e277fa10e906d86a07798e/sotoki-2.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7baebc9ffb6ab894b29a4a77e5d45ec9d0771a26323874bc712c4d86a0e79a04",
                "md5": "8d3d109a306f18f5c1f9457798f08aea",
                "sha256": "db9be1040b8455045e01fe4b27d9e517266f8a58f5ad4c9edb2e16856ff6c763"
            },
            "downloads": -1,
            "filename": "sotoki-2.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8d3d109a306f18f5c1f9457798f08aea",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 1659045,
            "upload_time": "2024-05-13T09:26:47",
            "upload_time_iso_8601": "2024-05-13T09:26:47.224508Z",
            "url": "https://files.pythonhosted.org/packages/7b/ae/bc9ffb6ab894b29a4a77e5d45ec9d0771a26323874bc712c4d86a0e79a04/sotoki-2.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-13 09:26:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "openzim",
    "github_project": "sotoki",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "kiwixstorage",
            "specs": [
                [
                    "<",
                    "1.0"
                ],
                [
                    ">=",
                    "0.8.1"
                ]
            ]
        },
        {
            "name": "pif",
            "specs": [
                [
                    ">=",
                    "0.8.2"
                ],
                [
                    "<",
                    "0.9"
                ]
            ]
        },
        {
            "name": "zimscraperlib",
            "specs": [
                [
                    ">=",
                    "3.3.0"
                ],
                [
                    "<",
                    "4.0"
                ]
            ]
        },
        {
            "name": "xml_to_dict",
            "specs": [
                [
                    ">=",
                    "0.1.6"
                ],
                [
                    "<",
                    "0.2"
                ]
            ]
        },
        {
            "name": "cli-formatter",
            "specs": [
                [
                    ">=",
                    "1.2.0"
                ],
                [
                    "<",
                    "1.3"
                ]
            ]
        },
        {
            "name": "py7zr",
            "specs": [
                [
                    "<",
                    "0.21"
                ],
                [
                    ">=",
                    "0.20.4"
                ]
            ]
        },
        {
            "name": "python-slugify",
            "specs": [
                [
                    ">=",
                    "8.0.1"
                ],
                [
                    "<",
                    "9.0.0"
                ]
            ]
        },
        {
            "name": "jinja2",
            "specs": [
                [
                    "<",
                    "3.2"
                ],
                [
                    ">=",
                    "3.1.0"
                ]
            ]
        },
        {
            "name": "redis",
            "specs": [
                [
                    "<",
                    "5.0"
                ],
                [
                    "!=",
                    "4.5.2"
                ],
                [
                    ">=",
                    "4.5.1"
                ]
            ]
        },
        {
            "name": "beautifulsoup4",
            "specs": [
                [
                    ">=",
                    "4.9.3"
                ],
                [
                    "<",
                    "5.0"
                ]
            ]
        },
        {
            "name": "lxml",
            "specs": [
                [
                    "<",
                    "4.10"
                ],
                [
                    ">=",
                    "4.9.1"
                ]
            ]
        },
        {
            "name": "jinja2-pluralize",
            "specs": [
                [
                    ">=",
                    "0.3.0"
                ],
                [
                    "<",
                    "0.4"
                ]
            ]
        },
        {
            "name": "tld",
            "specs": [
                [
                    ">=",
                    "0.13"
                ],
                [
                    "<",
                    "0.14"
                ]
            ]
        },
        {
            "name": "mistune",
            "specs": [
                [
                    ">=",
                    "2.0.5"
                ],
                [
                    "<",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "<",
                    "2.9"
                ],
                [
                    ">=",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "psutil",
            "specs": [
                [
                    ">=",
                    "5.9.4"
                ],
                [
                    "<",
                    "6.0"
                ]
            ]
        },
        {
            "name": "python-snappy",
            "specs": [
                [
                    ">=",
                    "0.6.0"
                ],
                [
                    "<",
                    "1.0"
                ]
            ]
        },
        {
            "name": "bidict",
            "specs": [
                [
                    ">=",
                    "0.22.1"
                ],
                [
                    "<",
                    "0.23"
                ]
            ]
        },
        {
            "name": "cchardet",
            "specs": [
                [
                    "<",
                    "2.2"
                ],
                [
                    ">=",
                    "2.1.7"
                ]
            ]
        }
    ],
    "lcname": "sotoki"
}
        
Elapsed time: 0.32323s