# Devdocs scraper
This scraper downloads [devdocs.io](https://devdocs.io/) documentation databases and puts them in ZIM files,
a clean and user friendly format for storing content for offline usage.
[![CodeFactor](https://www.codefactor.io/repository/github/openzim/devdocs/badge)](https://www.codefactor.io/repository/github/openzim/devdocs)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![codecov](https://codecov.io/gh/openzim/devdocs/branch/main/graph/badge.svg)](https://codecov.io/gh/openzim/devdocs)
[![PyPI version shields.io](https://img.shields.io/pypi/v/devdocs2zim.svg)](https://pypi.org/project/devdocs2zim/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/devdocs2zim.svg)](https://pypi.org/project/devdocs2zim)
[![Docker](https://ghcr-badge.egpl.dev/openzim/devdocs/latest_tag?label=docker)](https://ghcr.io/openzim/devdocs)
## Installation
There are three main ways to install and use `devdocs2zim` from most recommended to least:
<details>
<summary>Install using a pre-built container</summary>
1. Download the image using `docker`:
```sh
docker pull ghcr.io/openzim/devdocs
```
</details>
<details>
<summary>Build your own container</summary>
1. Clone the repository locally:
```sh
git clone https://github.com/openzim/devdocs.git && cd devdocs
```
1. Build the image:
```sh
docker build -t ghcr.io/openzim/devdocs .
```
</details>
<details>
<summary>Run the software locally using Hatch</summary>
1. Clone the repository locally:
```sh
git clone https://github.com/openzim/devdocs.git && cd devdocs
```
1. Install [Hatch](https://hatch.pypa.io/):
```sh
pip3 install hatch
```
1. Start a hatch shell to install software and dependencies in an isolated virtual environment.
```sh
hatch shell
```
1. Run the `devdocs2zim` command:
```sh
devdocs2zim --help
```
</details>
## Usage
> [!WARNING]
> This project is still a work in progress and isn't ready for use yet, the commands below are examples only.
```sh
# Usage
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim [--all|--slug=SLUG|--first=N]
# Fetch all documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all
# Fetch all documents except Ansible
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all --skip-slug-regex "^ansible.*"
# Fetch Vue related documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --slug vue~3 --slug vue_router~4
# Fetch the docs for the two most recent versions of each software
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --first=2
```
**One of the following flags is required:**
* `--all`: Fetch all Devdocs resources, and produce one ZIM per resource.
* `--slug SLUG`: Fetch the provided Devdocs resource. Slugs are the first path entry in the Devdocs URL.
For example, the slug for: `https://devdocs.io/gcc~12/` is `gcc~12`. Use --slug several times to add multiple.
* `--first N`: Fetch the first number of items per slug as shown in the DevDocs UI.
**Optional Flags:**
* `--skip-slug-regex REGEX`: Skips slugs matching the given regular expression.
* `--output OUTPUT_FOLDER`: Output folder for ZIMs. Default: /output
* `--creator CREATOR`: Name of content creator. Default: 'DevDocs'
* `--publisher PUBLISHER`: Custom publisher name. Default: 'openZIM'
* `--name-format FORMAT`: Custom name format for individual ZIMs.
Default: 'devdocs_{slug_without_version}_{version}'
* `--title-format FORMAT`: Custom title format for individual ZIMs.
Value will be truncated to 30 chars. Default: '{full_name} Documentation'
* `--description-format FORMAT`: Custom description format for individual ZIMs.
Value will be truncated to 80 chars. Default: '{full_name} Documentation'
* `--long-description-format FORMAT`: Custom long description format for your ZIM.
Value will be truncated to 4000 chars.Default: '{full_name} documentation by DevDocs'
* `--tag TAG`: Add tag to the ZIM. Use --tag several times to add multiple.
Formatting is supported. Default: ['devdocs', '{slug_without_version}']
* `--logo-format FORMAT`: URL/path for the ZIM logo in PNG, JPG, or SVG format.
Formatting placeholders are supported. If unset, a DevDocs logo will be used.
**Formatting Placeholders**
The following formatting placeholders are supported:
* `{name}`: Human readable name of the resource e.g. `Python`.
* `{full_name}`: Name with optional version for the resource e.g. `Python 3.12`.
* `{slug}`: Devdocs slug for the resource e.g. `python~3.12`.
* `{clean_slug}`: Slug with non alphanumeric/period characters replaced with `-` e.g. `python-3.12`.
* `{slug_without_version}`: Devdocs slug for the resource without the version e.g. `python`.
* `{version}`: Shortened version displayed in devdocs, if any e.g. `3.12`.
* `{release}`: Specific release of the software the documentation is for, if any e.g. `3.12.1`.
* `{attribution}`: License and attribution information about the resource.
* `{home_link}`: Link to the project's home page, if any: e.g. `https://python.org`.
* `{code_link}`: Link to the project's source, if any: e.g. `https://github.com/python/cpython`.
* `{period}`: The current date in `YYYY-MM` format e.g. `2024-02`.
## Developing
Use the commands below to set up the project once:
```sh
# Install hatch if it isn't installed already.
❯ pip install hatch
# Local install (in default env) / re-sync packages
❯ hatch run pip list
# Set-up pre-commit
❯ pre-commit install
```
The following commands can be used to build and test the scraper:
```sh
# Show scripts
❯ hatch env show
# linting, testing, coverage, checking
❯ hatch run lint:all
❯ hatch run lint:fixall
# run tests on all matrixed' envs
❯ hatch run test:run
# run tests in a single matrixed' env
❯ hatch env run -e test -i py=3.12 coverage
# run static type checks
❯ hatch env run check:all
# building packages
❯ hatch build
```
### Contributing
This project adheres to openZIM's [Contribution Guidelines](https://github.com/openzim/overview/wiki/Contributing).
This project has implemented openZIM's [Python bootstrap, conventions and policies](https://github.com/openzim/_python-bootstrap/docs/Policy.md) **v1.0.3**.
Raw data
{
"_id": null,
"home_page": null,
"name": "devdocs2zim",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.12",
"maintainer_email": null,
"keywords": "devdocs, offline, openzim, zim",
"author": null,
"author_email": "openZIM <dev@openzim.org>",
"download_url": "https://files.pythonhosted.org/packages/9e/a8/bb0a677ad55425c340c5ff6129423b30c0309f6de15725b1eeb38cefdff0/devdocs2zim-0.2.0.tar.gz",
"platform": null,
"description": "# Devdocs scraper\n\nThis scraper downloads [devdocs.io](https://devdocs.io/) documentation databases and puts them in ZIM files,\na clean and user friendly format for storing content for offline usage.\n\n[![CodeFactor](https://www.codefactor.io/repository/github/openzim/devdocs/badge)](https://www.codefactor.io/repository/github/openzim/devdocs)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![codecov](https://codecov.io/gh/openzim/devdocs/branch/main/graph/badge.svg)](https://codecov.io/gh/openzim/devdocs)\n[![PyPI version shields.io](https://img.shields.io/pypi/v/devdocs2zim.svg)](https://pypi.org/project/devdocs2zim/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/devdocs2zim.svg)](https://pypi.org/project/devdocs2zim)\n[![Docker](https://ghcr-badge.egpl.dev/openzim/devdocs/latest_tag?label=docker)](https://ghcr.io/openzim/devdocs)\n\n\n## Installation\n\nThere are three main ways to install and use `devdocs2zim` from most recommended to least:\n\n<details>\n<summary>Install using a pre-built container</summary>\n\n\n1. Download the image using `docker`:\n\n ```sh\n docker pull ghcr.io/openzim/devdocs\n ```\n\n</details>\n<details>\n<summary>Build your own container</summary>\n\n1. Clone the repository locally:\n\n ```sh\n git clone https://github.com/openzim/devdocs.git && cd devdocs\n ```\n\n1. Build the image:\n\n ```sh\n docker build -t ghcr.io/openzim/devdocs .\n ```\n\n</details>\n<details>\n<summary>Run the software locally using Hatch</summary>\n\n1. Clone the repository locally:\n\n ```sh\n git clone https://github.com/openzim/devdocs.git && cd devdocs\n ```\n\n1. Install [Hatch](https://hatch.pypa.io/):\n\n ```sh\n pip3 install hatch\n ```\n\n1. Start a hatch shell to install software and dependencies in an isolated virtual environment.\n\n ```sh\n hatch shell\n ```\n\n1. Run the `devdocs2zim` command:\n\n ```sh\n devdocs2zim --help\n ```\n\n</details>\n\n## Usage\n\n> [!WARNING]\n> This project is still a work in progress and isn't ready for use yet, the commands below are examples only.\n\n\n```sh\n# Usage\ndocker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim [--all|--slug=SLUG|--first=N]\n\n# Fetch all documents\ndocker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all\n\n# Fetch all documents except Ansible\ndocker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all --skip-slug-regex \"^ansible.*\"\n\n# Fetch Vue related documents\ndocker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --slug vue~3 --slug vue_router~4\n\n# Fetch the docs for the two most recent versions of each software\ndocker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --first=2\n```\n\n\n**One of the following flags is required:**\n\n* `--all`: Fetch all Devdocs resources, and produce one ZIM per resource.\n* `--slug SLUG`: Fetch the provided Devdocs resource. Slugs are the first path entry in the Devdocs URL.\n For example, the slug for: `https://devdocs.io/gcc~12/` is `gcc~12`. Use --slug several times to add multiple.\n* `--first N`: Fetch the first number of items per slug as shown in the DevDocs UI.\n\n\n**Optional Flags:**\n\n* `--skip-slug-regex REGEX`: Skips slugs matching the given regular expression.\n* `--output OUTPUT_FOLDER`: Output folder for ZIMs. Default: /output\n* `--creator CREATOR`: Name of content creator. Default: 'DevDocs'\n* `--publisher PUBLISHER`: Custom publisher name. Default: 'openZIM'\n* `--name-format FORMAT`: Custom name format for individual ZIMs.\n Default: 'devdocs_{slug_without_version}_{version}'\n* `--title-format FORMAT`: Custom title format for individual ZIMs.\n Value will be truncated to 30 chars. Default: '{full_name} Documentation'\n* `--description-format FORMAT`: Custom description format for individual ZIMs.\n Value will be truncated to 80 chars. Default: '{full_name} Documentation'\n* `--long-description-format FORMAT`: Custom long description format for your ZIM.\n Value will be truncated to 4000 chars.Default: '{full_name} documentation by DevDocs'\n* `--tag TAG`: Add tag to the ZIM. Use --tag several times to add multiple.\n Formatting is supported. Default: ['devdocs', '{slug_without_version}']\n* `--logo-format FORMAT`: URL/path for the ZIM logo in PNG, JPG, or SVG format.\n Formatting placeholders are supported. If unset, a DevDocs logo will be used.\n\n**Formatting Placeholders**\n\nThe following formatting placeholders are supported:\n\n* `{name}`: Human readable name of the resource e.g. `Python`.\n* `{full_name}`: Name with optional version for the resource e.g. `Python 3.12`.\n* `{slug}`: Devdocs slug for the resource e.g. `python~3.12`.\n* `{clean_slug}`: Slug with non alphanumeric/period characters replaced with `-` e.g. `python-3.12`.\n* `{slug_without_version}`: Devdocs slug for the resource without the version e.g. `python`.\n* `{version}`: Shortened version displayed in devdocs, if any e.g. `3.12`.\n* `{release}`: Specific release of the software the documentation is for, if any e.g. `3.12.1`.\n* `{attribution}`: License and attribution information about the resource.\n* `{home_link}`: Link to the project's home page, if any: e.g. `https://python.org`.\n* `{code_link}`: Link to the project's source, if any: e.g. `https://github.com/python/cpython`.\n* `{period}`: The current date in `YYYY-MM` format e.g. `2024-02`.\n\n## Developing\n\nUse the commands below to set up the project once:\n\n```sh\n# Install hatch if it isn't installed already.\n\u276f pip install hatch\n\n# Local install (in default env) / re-sync packages\n\u276f hatch run pip list\n\n# Set-up pre-commit\n\u276f pre-commit install\n```\n\nThe following commands can be used to build and test the scraper:\n\n```sh\n# Show scripts\n\u276f hatch env show\n\n# linting, testing, coverage, checking\n\u276f hatch run lint:all\n\u276f hatch run lint:fixall\n\n# run tests on all matrixed' envs\n\u276f hatch run test:run\n\n# run tests in a single matrixed' env\n\u276f hatch env run -e test -i py=3.12 coverage\n\n# run static type checks\n\u276f hatch env run check:all\n\n# building packages\n\u276f hatch build\n```\n\n\n### Contributing\n\nThis project adheres to openZIM's [Contribution Guidelines](https://github.com/openzim/overview/wiki/Contributing).\n\nThis project has implemented openZIM's [Python bootstrap, conventions and policies](https://github.com/openzim/_python-bootstrap/docs/Policy.md) **v1.0.3**.\n",
"bugtrack_url": null,
"license": "GPL-3.0-or-later",
"summary": "Make ZIM files from DevDocs.io",
"version": "0.2.0",
"project_urls": {
"Donate": "https://www.kiwix.org/en/support-us/",
"Homepage": "https://github.com/openzim/devdocs"
},
"split_keywords": [
"devdocs",
" offline",
" openzim",
" zim"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "96029f57a16e2dad80353442bc99e9f2b4828cb620339521c66cadcbbcf94606",
"md5": "a93f1b15f9633e3dc4607ac6b2ab09b7",
"sha256": "487ec9e35b9ea633347a7650e4135360a57519f8ee5b63ffa4d6d3d97f356fc4"
},
"downloads": -1,
"filename": "devdocs2zim-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a93f1b15f9633e3dc4607ac6b2ab09b7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.12",
"size": 37659,
"upload_time": "2024-11-14T08:20:04",
"upload_time_iso_8601": "2024-11-14T08:20:04.352207Z",
"url": "https://files.pythonhosted.org/packages/96/02/9f57a16e2dad80353442bc99e9f2b4828cb620339521c66cadcbbcf94606/devdocs2zim-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9ea8bb0a677ad55425c340c5ff6129423b30c0309f6de15725b1eeb38cefdff0",
"md5": "10f42eeca53233005d18fcd5a5915ec8",
"sha256": "9831f977fe086e3f8e7fe63094ff2dc1b849bcbf2f36d2dc907dc2aff8bdb874"
},
"downloads": -1,
"filename": "devdocs2zim-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "10f42eeca53233005d18fcd5a5915ec8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.12",
"size": 107273,
"upload_time": "2024-11-14T08:20:05",
"upload_time_iso_8601": "2024-11-14T08:20:05.899519Z",
"url": "https://files.pythonhosted.org/packages/9e/a8/bb0a677ad55425c340c5ff6129423b30c0309f6de15725b1eeb38cefdff0/devdocs2zim-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-14 08:20:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "openzim",
"github_project": "devdocs",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "devdocs2zim"
}