md-dead-link-check


Namemd-dead-link-check JSON
Version 1.1.0 PyPI version JSON
download
home_pageNone
SummaryThis is a lightweight and fast tool to help you keep your Markdown files free of broken links.
upload_time2025-01-26 21:24:48
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT
keywords broken link broken link checker dead link dead link checker documentation maintenance link checker link health markdown
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Markdown Dead Link Checker

[![GitHub Action](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/github_action.yml/badge.svg?branch=main)](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/github_action.yml?query=branch%3Amain)
[![Ubuntu](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/ubuntu.yml/badge.svg?branch=main)](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/ubuntu.yml?query=branch%3Amain)
[![Windows](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/win.yml/badge.svg?branch=main)](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/win.yml?query=branch%3Amain)
[![MacOS](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/mac.yml/badge.svg?branch=main)](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/mac.yml?query=branch%3Amain)

This handy tool helps you maintain the integrity of your Markdown files by identifying broken links.
It scans your files and detects:

Here's what it does:

- Missing webpages: Links that no longer exist on the internet.
- Incorrect file links: Links that point to the wrong file in your project.
- Non-existent fragments (anchors): Links to specific sections that don't exist, e.g. `README.md#no-fragment`.

Example of output for [fail.md](tests/test_md_files/fail.md)

```bash
File: tests/test_md_files/fail.md:3 • Link: https://github.com/AlexanderDokuchaev/FAILED • Error: 404: Not Found
File: tests/test_md_files/fail.md:4 • Link: https://not_exist_github.githubcom/ • Error: 500: Internal Server Error
File: tests/test_md_files/fail.md:8 • Link: /test/fail.md1 • Error: Path not found
File: tests/test_md_files/fail.md:9 • Link: fail.md1 • Error: Path not found
File: tests/test_md_files/fail.md:13 • Link: /tests/test_md_files/fail.md#fail • Error: Fragment not found
File: tests/test_md_files/fail.md:15 • Link: not_exist_dir • Error: Path not found
❌ Found 6 dead links 🙀
```

> [!NOTE]
> By defaults, only error codes like **404 (Not Found)**, **410 (Gone)**, and **500 (Internal Server Error)**,
> and links that don't exist are considered "dead links". Other error codes typically indicate
> temporary issues with the host server or unsupported links for the HEAD request type.

## How to Use It

### Option 1: GitHub Actions

Add Github Action config to `.github/workflow/`

```yaml
jobs:
  md-dead-link-check:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: AlexanderDokuchaev/md-dead-link-check@v1.1.0
```

### Option 2: Pre-Commit

Adding to your `.pre-commit-config.yaml` to integrate in [pre-commit](https://pre-commit.com/) tool

```yaml
  - repo: https://github.com/AlexanderDokuchaev/md-dead-link-check
    rev: "v1.1.0"
    hooks:
      - id: md-dead-link-check
```

> [!NOTE]
> For the `pull_request` event type, the action will only check external links for files that have been modified.
> To scan all links, consider using a separate action that runs periodically on target branches.
> This approach helps prevent pull request merges from being blocked by broken links unrelated to the files
> modified in the pull request.

```yaml
# .github/workflows/nightly.yaml
name: nightly
on:
  workflow_dispatch:
  schedule:
    - cron: '0 0 * * *'
jobs:
  md-dead-link-check:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: AlexanderDokuchaev/md-dead-link-check@v1.1.0
```

```yaml
# .github/workflows/pull_request.yaml
name: pull_request
on:
  pull_request:
    types:
      - opened
      - reopened
      - synchronize
jobs:
  md-dead-link-check:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: AlexanderDokuchaev/md-dead-link-check@v1.1.0
```

### Option 3: Install from pip

For direct use, install with pip and run:

```bash
pip install md-dead-link-check
md-dead-link-check
```

## Performance

This tool utilizes asynchronous API calls and avoids downloading full web pages,
enabling it to process thousands links in several seconds.

## Proxy

This tool leverages your system's existing HTTP and HTTPS proxy configuration.
It achieves this by trusting the environment variables that your operating system utilizes to define proxy settings.
This functionality is enabled by the `aiohttp.ClientSession(trust_env=True)` option.
For further technical details, you can refer to the
[aiohttp documentation](https://docs.aiohttp.org/en/v3.9.3/client_advanced.html#proxy-support).

> [!WARNING]
> **Without proxy configuration in environment, link failures may not be reported.**
> If your environment lacks proxy configuration (variables like `http_proxy` and `https_proxy`),
> link retrieval attempts may time out without indicating a failure.
> To help diagnose this issue, use the `--warn` argument to log all processed links.

## Configuration

This tool seamlessly integrates with your project's `pyproject.toml` file for configuration.
To leverage a different file, invoke the `--config` option during execution.

- timeout: Specifies the maximum time (in seconds) to wait for web link responses. Default: `5` seconds.
- catch_response_codes: List of HTTP response codes to consider as failures.
If empty, all codes greater than 400 will be marked as failures. Default: `[404, 410, 500]`.
- exclude_links: List of links to exclude from checks. Default: `[]`.
- exclude_files: List of files to exclude from checks. Default: `[]`.
- force_get_requests_for_links: List of links for which the tool will use `GET` requests during checks. Default: `[]`.
- check_web_links: Toggle web link checks on or off. Default: `true`.
- validate_ssl: Toggles whether to validate SSL certificates when checking web links. Default: `true`.
- throttle_groups: Number of domain groups to divide requests across for throttling. Default: `100` seconds.
- throttle_delay: Time to wait between requests, scaled by domain load and group size. Default: `20` seconds.
- throttle_max_delay: Maximum allowable delay (in seconds) for throttling a single domain. Default: `100` seconds.

> [!TIP]
> Leverage wildcard patterns ([fnmatch](https://docs.python.org/3/library/fnmatch.html) syntax) for
> `exclude_links`, `exclude_files` and `force_get_requests_for_links` parameters.

```toml
[tool.md_dead_link_check]
timeout = 5
exclude_links = ["https://github.com/", "https://github.com/*"]
exclude_files = ["tests/test_md_files/fail.md", "tests/*"]
check_web_links = true
catch_response_codes = [404, 410, 500]
force_get_requests_for_links = []
validate_ssl = true
throttle_groups = 100
throttle_delay = 20
throttle_max_delay = 100
```

## Rate Limiting and Request Throttling

Websites often have limits on how many requests you can make within a certain period.
If these limits are exceeded, the server will return a 429 Too Many Requests status code.

### Failure Handling

By default, the 429 status code is treated as a warning.
You can modify this behavior and configure how the tool handles different status codes.

```toml
catch_response_codes = [404, 410, 429, 500]
```

### Throttling Mechanism

To prevent your requests from overwhelming a website and potentially getting you blocked, this tool implements
a throttling mechanism. This mechanism limits the number of requests that can be made in a given period.

You can control the following parameters to fine-tune request throttling:

```toml
throttle_groups = 40  # default: 100
throttle_delay = 30  # default: 20
throttle_max_delay = 240  # default: 100
```

### Filter Links to Check

By filtering out non-critical links and files, you can stay within rate limits while throttling requests.

#### Exclude Links by Pattern

Exclude specific URLs that match patterns:

```toml
exclude_links = ["https://github.com/AlexanderDokuchaev/md-dead-link-check/pull/*"]
```

#### Exclude Specific Files

Prevent specific files (e.g., changelogs) from being checked:

```toml
exclude_files = ["CHANGELOG.md"]
```

#### Exclude Parts of Files Using Comments

Ignore sections of files using a special comment `<!-- md-dead-link-check: off -->`.

```md
...

<!-- md-dead-link-check: off -->

All links will be ignored in this part of the file.

<!-- md-dead-link-check: on -->

...
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "md-dead-link-check",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "broken link, broken link checker, dead link, dead link checker, documentation maintenance, link checker, link health, markdown",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/1a/d0/fa1ee2d8a92c8ea501087f39be6072a411fcd76f518199bd5d694cc67e91/md_dead_link_check-1.1.0.tar.gz",
    "platform": null,
    "description": "# Markdown Dead Link Checker\n\n[![GitHub Action](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/github_action.yml/badge.svg?branch=main)](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/github_action.yml?query=branch%3Amain)\n[![Ubuntu](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/ubuntu.yml/badge.svg?branch=main)](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/ubuntu.yml?query=branch%3Amain)\n[![Windows](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/win.yml/badge.svg?branch=main)](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/win.yml?query=branch%3Amain)\n[![MacOS](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/mac.yml/badge.svg?branch=main)](https://github.com/AlexanderDokuchaev/md-dead-link-check/actions/workflows/mac.yml?query=branch%3Amain)\n\nThis handy tool helps you maintain the integrity of your Markdown files by identifying broken links.\nIt scans your files and detects:\n\nHere's what it does:\n\n- Missing webpages: Links that no longer exist on the internet.\n- Incorrect file links: Links that point to the wrong file in your project.\n- Non-existent fragments (anchors): Links to specific sections that don't exist, e.g. `README.md#no-fragment`.\n\nExample of output for [fail.md](tests/test_md_files/fail.md)\n\n```bash\nFile: tests/test_md_files/fail.md:3 \u2022 Link: https://github.com/AlexanderDokuchaev/FAILED \u2022 Error: 404: Not Found\nFile: tests/test_md_files/fail.md:4 \u2022 Link: https://not_exist_github.githubcom/ \u2022 Error: 500: Internal Server Error\nFile: tests/test_md_files/fail.md:8 \u2022 Link: /test/fail.md1 \u2022 Error: Path not found\nFile: tests/test_md_files/fail.md:9 \u2022 Link: fail.md1 \u2022 Error: Path not found\nFile: tests/test_md_files/fail.md:13 \u2022 Link: /tests/test_md_files/fail.md#fail \u2022 Error: Fragment not found\nFile: tests/test_md_files/fail.md:15 \u2022 Link: not_exist_dir \u2022 Error: Path not found\n\u274c Found 6 dead links \ud83d\ude40\n```\n\n> [!NOTE]\n> By defaults, only error codes like **404 (Not Found)**, **410 (Gone)**, and **500 (Internal Server Error)**,\n> and links that don't exist are considered \"dead links\". Other error codes typically indicate\n> temporary issues with the host server or unsupported links for the HEAD request type.\n\n## How to Use It\n\n### Option 1: GitHub Actions\n\nAdd Github Action config to `.github/workflow/`\n\n```yaml\njobs:\n  md-dead-link-check:\n    runs-on: ubuntu-22.04\n    steps:\n      - uses: actions/checkout@v4\n      - uses: AlexanderDokuchaev/md-dead-link-check@v1.1.0\n```\n\n### Option 2: Pre-Commit\n\nAdding to your `.pre-commit-config.yaml` to integrate in [pre-commit](https://pre-commit.com/) tool\n\n```yaml\n  - repo: https://github.com/AlexanderDokuchaev/md-dead-link-check\n    rev: \"v1.1.0\"\n    hooks:\n      - id: md-dead-link-check\n```\n\n> [!NOTE]\n> For the `pull_request` event type, the action will only check external links for files that have been modified.\n> To scan all links, consider using a separate action that runs periodically on target branches.\n> This approach helps prevent pull request merges from being blocked by broken links unrelated to the files\n> modified in the pull request.\n\n```yaml\n# .github/workflows/nightly.yaml\nname: nightly\non:\n  workflow_dispatch:\n  schedule:\n    - cron: '0 0 * * *'\njobs:\n  md-dead-link-check:\n    runs-on: ubuntu-22.04\n    steps:\n      - uses: actions/checkout@v4\n      - uses: AlexanderDokuchaev/md-dead-link-check@v1.1.0\n```\n\n```yaml\n# .github/workflows/pull_request.yaml\nname: pull_request\non:\n  pull_request:\n    types:\n      - opened\n      - reopened\n      - synchronize\njobs:\n  md-dead-link-check:\n    runs-on: ubuntu-22.04\n    steps:\n      - uses: actions/checkout@v4\n      - uses: AlexanderDokuchaev/md-dead-link-check@v1.1.0\n```\n\n### Option 3: Install from pip\n\nFor direct use, install with pip and run:\n\n```bash\npip install md-dead-link-check\nmd-dead-link-check\n```\n\n## Performance\n\nThis tool utilizes asynchronous API calls and avoids downloading full web pages,\nenabling it to process thousands links in several seconds.\n\n## Proxy\n\nThis tool leverages your system's existing HTTP and HTTPS proxy configuration.\nIt achieves this by trusting the environment variables that your operating system utilizes to define proxy settings.\nThis functionality is enabled by the `aiohttp.ClientSession(trust_env=True)` option.\nFor further technical details, you can refer to the\n[aiohttp documentation](https://docs.aiohttp.org/en/v3.9.3/client_advanced.html#proxy-support).\n\n> [!WARNING]\n> **Without proxy configuration in environment, link failures may not be reported.**\n> If your environment lacks proxy configuration (variables like `http_proxy` and `https_proxy`),\n> link retrieval attempts may time out without indicating a failure.\n> To help diagnose this issue, use the `--warn` argument to log all processed links.\n\n## Configuration\n\nThis tool seamlessly integrates with your project's `pyproject.toml` file for configuration.\nTo leverage a different file, invoke the `--config` option during execution.\n\n- timeout: Specifies the maximum time (in seconds) to wait for web link responses. Default: `5` seconds.\n- catch_response_codes: List of HTTP response codes to consider as failures.\nIf empty, all codes greater than 400 will be marked as failures. Default: `[404, 410, 500]`.\n- exclude_links: List of links to exclude from checks. Default: `[]`.\n- exclude_files: List of files to exclude from checks. Default: `[]`.\n- force_get_requests_for_links: List of links for which the tool will use `GET` requests during checks. Default: `[]`.\n- check_web_links: Toggle web link checks on or off. Default: `true`.\n- validate_ssl: Toggles whether to validate SSL certificates when checking web links. Default: `true`.\n- throttle_groups: Number of domain groups to divide requests across for throttling. Default: `100` seconds.\n- throttle_delay: Time to wait between requests, scaled by domain load and group size. Default: `20` seconds.\n- throttle_max_delay: Maximum allowable delay (in seconds) for throttling a single domain. Default: `100` seconds.\n\n> [!TIP]\n> Leverage wildcard patterns ([fnmatch](https://docs.python.org/3/library/fnmatch.html) syntax) for\n> `exclude_links`, `exclude_files` and `force_get_requests_for_links` parameters.\n\n```toml\n[tool.md_dead_link_check]\ntimeout = 5\nexclude_links = [\"https://github.com/\", \"https://github.com/*\"]\nexclude_files = [\"tests/test_md_files/fail.md\", \"tests/*\"]\ncheck_web_links = true\ncatch_response_codes = [404, 410, 500]\nforce_get_requests_for_links = []\nvalidate_ssl = true\nthrottle_groups = 100\nthrottle_delay = 20\nthrottle_max_delay = 100\n```\n\n## Rate Limiting and Request Throttling\n\nWebsites often have limits on how many requests you can make within a certain period.\nIf these limits are exceeded, the server will return a 429 Too Many Requests status code.\n\n### Failure Handling\n\nBy default, the 429 status code is treated as a warning.\nYou can modify this behavior and configure how the tool handles different status codes.\n\n```toml\ncatch_response_codes = [404, 410, 429, 500]\n```\n\n### Throttling Mechanism\n\nTo prevent your requests from overwhelming a website and potentially getting you blocked, this tool implements\na throttling mechanism. This mechanism limits the number of requests that can be made in a given period.\n\nYou can control the following parameters to fine-tune request throttling:\n\n```toml\nthrottle_groups = 40  # default: 100\nthrottle_delay = 30  # default: 20\nthrottle_max_delay = 240  # default: 100\n```\n\n### Filter Links to Check\n\nBy filtering out non-critical links and files, you can stay within rate limits while throttling requests.\n\n#### Exclude Links by Pattern\n\nExclude specific URLs that match patterns:\n\n```toml\nexclude_links = [\"https://github.com/AlexanderDokuchaev/md-dead-link-check/pull/*\"]\n```\n\n#### Exclude Specific Files\n\nPrevent specific files (e.g., changelogs) from being checked:\n\n```toml\nexclude_files = [\"CHANGELOG.md\"]\n```\n\n#### Exclude Parts of Files Using Comments\n\nIgnore sections of files using a special comment `<!-- md-dead-link-check: off -->`.\n\n```md\n...\n\n<!-- md-dead-link-check: off -->\n\nAll links will be ignored in this part of the file.\n\n<!-- md-dead-link-check: on -->\n\n...\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "This is a lightweight and fast tool to help you keep your Markdown files free of broken links.",
    "version": "1.1.0",
    "project_urls": {
        "Issues": "https://github.com/AlexanderDokuchaev/md-dead-link-check/issues",
        "Repository": "https://github.com/AlexanderDokuchaev/md-dead-link-check"
    },
    "split_keywords": [
        "broken link",
        " broken link checker",
        " dead link",
        " dead link checker",
        " documentation maintenance",
        " link checker",
        " link health",
        " markdown"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2ef3dfb111d60c467c1894c276cce525e82c8341ec5a2de82e616a3d2c250aa9",
                "md5": "cb8599560e125496d8899c876ca59077",
                "sha256": "d1fb9121371666919d15ea6a934b7748319fb960a1b13ae26122101808b1bf21"
            },
            "downloads": -1,
            "filename": "md_dead_link_check-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cb8599560e125496d8899c876ca59077",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 13694,
            "upload_time": "2025-01-26T21:24:46",
            "upload_time_iso_8601": "2025-01-26T21:24:46.904953Z",
            "url": "https://files.pythonhosted.org/packages/2e/f3/dfb111d60c467c1894c276cce525e82c8341ec5a2de82e616a3d2c250aa9/md_dead_link_check-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1ad0fa1ee2d8a92c8ea501087f39be6072a411fcd76f518199bd5d694cc67e91",
                "md5": "d3372ab55a0adec7f5a104808ff3cc00",
                "sha256": "6831a0be89c71d5b0dac9b641a0abe71932c4091995d2e31a4c6d009446dfcc7"
            },
            "downloads": -1,
            "filename": "md_dead_link_check-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d3372ab55a0adec7f5a104808ff3cc00",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 12650,
            "upload_time": "2025-01-26T21:24:48",
            "upload_time_iso_8601": "2025-01-26T21:24:48.171592Z",
            "url": "https://files.pythonhosted.org/packages/1a/d0/fa1ee2d8a92c8ea501087f39be6072a411fcd76f518199bd5d694cc67e91/md_dead_link_check-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-26 21:24:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AlexanderDokuchaev",
    "github_project": "md-dead-link-check",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "md-dead-link-check"
}
        
Elapsed time: 0.53297s