broken-links


Namebroken-links JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/merlos/broken-links
SummaryA tool to scrape a website and check for the broken links.
upload_time2024-08-11 14:17:23
maintainerNone
docs_urlNone
authormerlos
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Broken links Checker GitHub Action && command line tool

This tool scrapes all pages within a specified URL and checks if the destination links exist. It reports the original page, the text of the anchor, the destination URL, and whether the link is working or not. If any link does not work, the tool exits with an error code. It also provides a summary of the analysis.

It can be run as a GitHub Action or as a command line tool.

## GitHub Action

This tool can also be used as a GitHub Action to automatically check links in your repository.

### Inputs
- `url` (optional): The base URL to start scraping from. Default is `http://localhost:4444/`.
- `only-errors` (optional): If set to true, only display errors. Default is `false`.
- `ignore-file` (optional): Path to the ignore file. Default is `./check-ignore`. If the parameter is set and the file does not exist, the action exits with an error. See _Ignore File Format_ section above for more information.

### Ignore File Format

The ignore file should contain one URL pattern per line. The patterns can include wildcards (*) to match multiple URLs. Here are some examples:

- `http://example.com/ignore-this-page` - Ignores this specific URL.
- `http://example.com/ignore/*` - Ignores all URLs that start with `http://example.com/ignore/`.
- `*/ignore-this-path/*` - Ignores all URLs that contain `/ignore-this-path/`.
- `https://*.domain.com*` - Ignores all subdomains of `domain.com` such as `https://sub.domain.com` or `https://sub2.domain.com/page`, etc.


### Outputs

This action does not produce any outputs. However, at the end of the analysis, it prints a summary of the results with:

- Number of pages analyzed
- Number of links analyzed
- Total number of links working
- Total number of links not working
- Number of external links working
- Number of external links not working
- Number of internal links working
- Number of internal links not working

### Examples of Usage

#### Basic Usage (external URL)

```yaml
name: Broken-links Checker

on: [push]

jobs:
  check-links:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v2

      - name: Run Link Checker
        uses: merlos/broken-links@0.2.2
        with:
          url: 'http://example.com'
          only-errors: 'true'
```

#### Check links with MkDocs

```yaml
name: MkDocs Preview and Link Check

on:
  push:
    branches:
      - main

jobs:
  preview_and_check:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.x'

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install mkdocs mkdocs-material

      - name: Run MkDocs server
        run: mkdocs serve -a 0.0.0.0:4444 &
        continue-on-error: true

      - name: Wait for server to start
        run: sleep 10

      - name: Run Link Checker
        uses: merlos/broken-links@0.2.2
        with:
          url: 'http://localhost:4444'
          only-errors: 'true'
          ignore-file: './check-ignore'
```

#### Check links with Quarto

```yaml
name: Quarto Preview and Link Check

on:
  push:
    branches:
      - main

jobs:
  preview_and_check:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v2

      - name: Set up Quarto
        uses: quarto-dev/quarto-actions/setup@v2

      - name: Render Quarto project
        run: quarto preview --port 444 &
        continue-on-error: true

      - name: Wait for server to start
        run: sleep 10

      - name: Run Link Checker
        uses: merlos/broken-links@0.2.2
        with:
          url: 'http://localhost:444'
          only-errors: 'true'
          ignore-file: './check-ignore'
```


## Command-Line Utility

#### Installation

1. Clone the repository:

   ```sh
   git clone https://github.com/merlos/broken-links.git
   cd broken-links
   ````

2. Install the package:

    ```
    pip install .
    ```

3. Use the `broken-links` command to run the script:

```
broken-links http://example.com --only-error --ignore-file ./check-ignore
```

Command-line arguments:

- `url` (optional): The base URL to start scraping from. Default is `http://localhost:4444/`.
- `--only-error` or `-o` (optional): If set, only display errors. Default is `false`.
- `--ignore-file` or `-i` (optional): Path to the ignore file. Default is `./check-ignore`. If the parameter is NOT set and the file does not exist, it checks all the links. If the parameter is set and the file does not exist, the tool exits with an error. 


## Development

Clone the repository:
```sh
git clone https://github.com/merlos/broken-links
cd broken-links
```

Set a virtual environment:
```sh
python -m venv venv
source venv/bin/activate
```
Install the package in edit mode (`-e`)
```sh
pip install -e .
```
Start coding!

### Build the docker image

```sh
docker build -t broken-links .
```
```sh
docker run --rm broken-links http://example.com --only-error --ignore-file ./check-ignore
```

### Tests
To run the tests, use the following command:

```sh
python -m unittest discover tests
```
## Contributing
Fork and send a pull request. Please update/add the unit tests.

## License
This project is licensed under the terms of the [GNU General Public License v3.0](LICENSE) by merlos.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/merlos/broken-links",
    "name": "broken-links",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "merlos",
    "author_email": "merlos@users.noreply.github.com",
    "download_url": "https://files.pythonhosted.org/packages/91/e0/b125cf5d6ec617bfe5059a0fc92103de8b0bd30308112fad4a6b62fefd9c/broken_links-0.2.0.tar.gz",
    "platform": null,
    "description": "# Broken links Checker GitHub Action && command line tool\n\nThis tool scrapes all pages within a specified URL and checks if the destination links exist. It reports the original page, the text of the anchor, the destination URL, and whether the link is working or not. If any link does not work, the tool exits with an error code. It also provides a summary of the analysis.\n\nIt can be run as a GitHub Action or as a command line tool.\n\n## GitHub Action\n\nThis tool can also be used as a GitHub Action to automatically check links in your repository.\n\n### Inputs\n- `url` (optional): The base URL to start scraping from. Default is `http://localhost:4444/`.\n- `only-errors` (optional): If set to true, only display errors. Default is `false`.\n- `ignore-file` (optional): Path to the ignore file. Default is `./check-ignore`. If the parameter is set and the file does not exist, the action exits with an error. See _Ignore File Format_ section above for more information.\n\n### Ignore File Format\n\nThe ignore file should contain one URL pattern per line. The patterns can include wildcards (*) to match multiple URLs. Here are some examples:\n\n- `http://example.com/ignore-this-page` - Ignores this specific URL.\n- `http://example.com/ignore/*` - Ignores all URLs that start with `http://example.com/ignore/`.\n- `*/ignore-this-path/*` - Ignores all URLs that contain `/ignore-this-path/`.\n- `https://*.domain.com*` - Ignores all subdomains of `domain.com` such as `https://sub.domain.com` or `https://sub2.domain.com/page`, etc.\n\n\n### Outputs\n\nThis action does not produce any outputs. However, at the end of the analysis, it prints a summary of the results with:\n\n- Number of pages analyzed\n- Number of links analyzed\n- Total number of links working\n- Total number of links not working\n- Number of external links working\n- Number of external links not working\n- Number of internal links working\n- Number of internal links not working\n\n### Examples of Usage\n\n#### Basic Usage (external URL)\n\n```yaml\nname: Broken-links Checker\n\non: [push]\n\njobs:\n  check-links:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout repository\n        uses: actions/checkout@v2\n\n      - name: Run Link Checker\n        uses: merlos/broken-links@0.2.2\n        with:\n          url: 'http://example.com'\n          only-errors: 'true'\n```\n\n#### Check links with MkDocs\n\n```yaml\nname: MkDocs Preview and Link Check\n\non:\n  push:\n    branches:\n      - main\n\njobs:\n  preview_and_check:\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout repository\n        uses: actions/checkout@v2\n\n      - name: Set up Python\n        uses: actions/setup-python@v2\n        with:\n          python-version: '3.x'\n\n      - name: Install dependencies\n        run: |\n          python -m pip install --upgrade pip\n          pip install mkdocs mkdocs-material\n\n      - name: Run MkDocs server\n        run: mkdocs serve -a 0.0.0.0:4444 &\n        continue-on-error: true\n\n      - name: Wait for server to start\n        run: sleep 10\n\n      - name: Run Link Checker\n        uses: merlos/broken-links@0.2.2\n        with:\n          url: 'http://localhost:4444'\n          only-errors: 'true'\n          ignore-file: './check-ignore'\n```\n\n#### Check links with Quarto\n\n```yaml\nname: Quarto Preview and Link Check\n\non:\n  push:\n    branches:\n      - main\n\njobs:\n  preview_and_check:\n    runs-on: ubuntu-latest\n\n    steps:\n      - name: Checkout repository\n        uses: actions/checkout@v2\n\n      - name: Set up Quarto\n        uses: quarto-dev/quarto-actions/setup@v2\n\n      - name: Render Quarto project\n        run: quarto preview --port 444 &\n        continue-on-error: true\n\n      - name: Wait for server to start\n        run: sleep 10\n\n      - name: Run Link Checker\n        uses: merlos/broken-links@0.2.2\n        with:\n          url: 'http://localhost:444'\n          only-errors: 'true'\n          ignore-file: './check-ignore'\n```\n\n\n## Command-Line Utility\n\n#### Installation\n\n1. Clone the repository:\n\n   ```sh\n   git clone https://github.com/merlos/broken-links.git\n   cd broken-links\n   ````\n\n2. Install the package:\n\n    ```\n    pip install .\n    ```\n\n3. Use the `broken-links` command to run the script:\n\n```\nbroken-links http://example.com --only-error --ignore-file ./check-ignore\n```\n\nCommand-line arguments:\n\n- `url` (optional): The base URL to start scraping from. Default is `http://localhost:4444/`.\n- `--only-error` or `-o` (optional): If set, only display errors. Default is `false`.\n- `--ignore-file` or `-i` (optional): Path to the ignore file. Default is `./check-ignore`. If the parameter is NOT set and the file does not exist, it checks all the links. If the parameter is set and the file does not exist, the tool exits with an error. \n\n\n## Development\n\nClone the repository:\n```sh\ngit clone https://github.com/merlos/broken-links\ncd broken-links\n```\n\nSet a virtual environment:\n```sh\npython -m venv venv\nsource venv/bin/activate\n```\nInstall the package in edit mode (`-e`)\n```sh\npip install -e .\n```\nStart coding!\n\n### Build the docker image\n\n```sh\ndocker build -t broken-links .\n```\n```sh\ndocker run --rm broken-links http://example.com --only-error --ignore-file ./check-ignore\n```\n\n### Tests\nTo run the tests, use the following command:\n\n```sh\npython -m unittest discover tests\n```\n## Contributing\nFork and send a pull request. Please update/add the unit tests.\n\n## License\nThis project is licensed under the terms of the [GNU General Public License v3.0](LICENSE) by merlos.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A tool to scrape a website and check for the broken links.",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/merlos/broken-links"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "174792d0cbbface6130cc82b06bd2809932598200dc58f723ec9b1d5f7008263",
                "md5": "2aaee228804c4d6d45dfdf0b6b32e5d3",
                "sha256": "f17738a332d0795cbd908059648fee1f347bf9ebe83793ddcb97126a90c238a8"
            },
            "downloads": -1,
            "filename": "broken_links-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2aaee228804c4d6d45dfdf0b6b32e5d3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 19293,
            "upload_time": "2024-08-11T14:17:21",
            "upload_time_iso_8601": "2024-08-11T14:17:21.960909Z",
            "url": "https://files.pythonhosted.org/packages/17/47/92d0cbbface6130cc82b06bd2809932598200dc58f723ec9b1d5f7008263/broken_links-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "91e0b125cf5d6ec617bfe5059a0fc92103de8b0bd30308112fad4a6b62fefd9c",
                "md5": "0d943c11a688ed6a635ca3a26d696697",
                "sha256": "7eb6851732a2bc8873c37c6f144c0fa378a21da890ac8becf03f4324bf92b419"
            },
            "downloads": -1,
            "filename": "broken_links-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0d943c11a688ed6a635ca3a26d696697",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 17656,
            "upload_time": "2024-08-11T14:17:23",
            "upload_time_iso_8601": "2024-08-11T14:17:23.258606Z",
            "url": "https://files.pythonhosted.org/packages/91/e0/b125cf5d6ec617bfe5059a0fc92103de8b0bd30308112fad4a6b62fefd9c/broken_links-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-11 14:17:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "merlos",
    "github_project": "broken-links",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "broken-links"
}
        
Elapsed time: 0.29345s