**********
Link Medic
**********
.. image:: https://img.shields.io/pypi/v/linkmedic
:name: PyPI
:target: https://pypi.org/project/linkmedic/
.. image:: https://img.shields.io/badge/License-BSD_3--Clause-blue.svg
:name: License: 3-Clause BSD
:target: https://opensource.org/licenses/BSD-3-Clause
.. image:: https://img.shields.io/badge/python-%3E=3.7-blue
:name: Minimum required python version: 3.7
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:name: Coding style: Black
:target: https://github.com/psf/black
A python script for checking links in static webpages (``.htm``, ``.html``), OpenDocument files (``.odt``, ``.odp``, ``.ods``), and single OpenDocument XML files (``.fodt``, ``.fodp``, ``.fods``).
``linkmedic`` starts a test webserver and crawls all the pages, starting from the entry page. All the links in the resource tags (<a> <img> <script> <link> <iframe> <event-listener>) are checked and the dead links are reported. If a link is present in multiple pages, only the first one will be tested. By default, links to the external websites are ignored. If there is a ``.linkignore`` file in the website's root, the links listed in that file will be ignored during the tests (one link per line, see below for examples). After checking all the links if any dead links are discovered, ``linkmedic`` exits with an error code.
Quick start
###########
Install
*******
You can install the ``linkmedic`` using your favorite python package manager. For example using ``pip`` you can download it from `PyPI <https://pypi.org/project/linkmedic/>`__:
.. code-block:: shell
pip install linkmedic --user
Run
***
Start a test webserver with files at ``/var/www`` and crawl the pages and test all the links starting from ``/var/www/index.html`` page.
.. code-block:: shell
linkmedic --root=/var/www
Usage & Options
###############
Mirror package repository
*************************
You can also install ``linkmedic`` from MPCDF GitLab package repository:
.. code-block:: shell
pip install linkmedic --user --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/5763/packages/pypi/simple
Container
*********
You can use one of the container images with required libraries (and `linkmedkit <https://gitlab.mpcdf.mpg.de/tabriz/linkmedkit>`_ tools) already installed in:
.. code-block:: shell
quay.io/meisam/linkmedic:latest
.. code-block:: shell
gitlab-registry.mpcdf.mpg.de/tabriz/linkmedic:latest
Using a container image, you will need to have access to your website pages from inside the container. Depending on your container engine, you may need to mount the path to your files inside the container. For example, using `podman <https://podman.io>`_:
.. code-block:: shell
podman run -v /www/public:/test quay.io/meisam/linkmedic:latest linkmedic --root=/test
Here, ``-v /www/public:/test`` flag mounts ``/www/public`` inside the container at ``/test`` path.
.. _ci-cd:
CI/CD
*****
You can also use the container image in your CI/CD pipelines. For example, for GitLab CI in ``.gitlab-ci.yml``:
.. code-block:: yaml
test_internal_links:
image: quay.io/meisam/linkmedic:latest
script:
- linkmedic --root=/var/www/ --entry=index.html --warn-http --with-badge
after_script:
- gitlab_badge_sticker.sh
or for Woodpecker CI in ``.woodpecker.yml``:
.. code-block:: yaml
test_internal_links:
image: quay.io/meisam/linkmedic:latest
commands:
- linkmedic --root=/var/www/ --entry=index.html --warn-http
If you want to check the external links of your website in CI, you must avoid running multiple tests in a short period of time, e.g. on each commit of the development branches. Otherwise, the IP of your CI runners may get banned by external web servers. For example, in GitLab CI you can limit the external link checks only to the default branch of your git repository:
.. code-block:: yaml
test_external_links:
image: quay.io/meisam/linkmedic:latest
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
script:
- linkmedic --root=/var/www/ --ignore-local --with-badge
after_script:
- gitlab_badge_sticker.sh
allow_failure: true
CLI reference
*************
* Display help. Will show all the command line options and their default values.
.. code-block:: shell
linkmedic -h
* Start the webserver with the current directory as the root path of the server. Starting from ``index.html`` crawl the pages and test all the links.
.. code-block:: shell
linkmedic
* Start the webserver with ``./tests/public1/`` as the root path of the server. Starting from ``index.html`` crawl the pages and test all the links.
.. code-block:: shell
linkmedic --root=./tests/public1/
* Start the webserver with ``./tests/public1/`` as the root path of the server. Starting from ``index2.html`` crawl the pages and test all the links. Entry point should be relative to the server root. (in the example below, ``index2.html`` should be accessible at ``./tests/public1/index2.html``)
.. code-block:: shell
linkmedic --root=./tests/public1/ --entry=index2.html
* If any missing pages as ``/directory/page`` are encountered, do not redirect to ``/directory/page.html``.
.. code-block:: shell
linkmedic --no-redirect
* Check links to external websites.
[**IMPORTANT**: You should avoid running the link checker on external links multiple times in a short period of time, e.g. on each commit of the develop branch. Otherwise, IP of your machine (or CI runners) may get banned by the CDN or the DoS mitigation solution of the external webservers. See `CI/CD section <ci-cd_>`_ for a solution.]
.. code-block:: shell
linkmedic --check-external
* Check only the links to external websites and ignore the local dead links.
.. code-block:: shell
linkmedic --ignore-local
* Do not consider the external links which return HTTP status codes 403 and 503 as dead links.
.. code-block:: shell
linkmedic --ignore-status 403 503
* Check links in an OpenDocument file (``.odt``, ``.odp``, ``.ods``), or a single OpenDocument XML file (``.fodt``, ``.fodp``, ``.fods``).
.. code-block:: shell
linkmedic --entry=./presentation.odp
* Show warning for HTTP links.
.. code-block:: shell
linkmedic --warn-http
* If any link to ``mydomain.com`` is encountered, treat them as internal links and resolve locally.
.. code-block:: shell
linkmedic --domain=mydomain.com
* Start the webserver on port 3000. If the webserver could not be started on the requested port, the initializer will automatically try the next ports.
.. code-block:: shell
linkmedic --port=3000
* Generate badge information file. Depending on the type of diagnosis, this file will be named ``badge.dead_internal_links.json``, ``badge.dead_external_links.json``, or ``badge.dead_links.json``. if ``--warn-http`` flag is used, badge file for the number of discovered HTTP links will be also written to ``badge.http_links.json`` file. These files can be used to generate badges (see `linkmedkit`_ scripts) or to serve for `shields.io endpoint <https://shields.io/endpoint>`_ response.
.. code-block:: shell
linkmedic --with-badge
* Check the links but always exit with code 0.
.. code-block:: shell
linkmedic --exit-zero
* Log the output in a different level of verbosity. If more than one of these flags are defined, the most restrictive one will be in effect.
- ``--verbose`` : log debug information
- ``--quiet`` : only log errors
- ``--silent`` : completely silence the output logs
Example .linkignore
*******************
.. code-block:: shell
invalidfile.tar.gz
will_add/later.html
https://not.accessible.com
Development
###########
This project is using `PDM <https://pdm.fming.dev/latest/>`_ for packaging and dependency management, `vermin <https://pypi.org/project/vermin/>`_ and `bandit <https://pypi.org/project/bandit/>`_ for validation, `black <https://pypi.org/project/black/>`_ and `isort <https://pypi.org/project/isort/>`_ for styling, and `jsonschema <https://pypi.org/project/jsonschema/>`_ and `jq <https://jqlang.github.io/jq/>`_ for testing. See `developers guide <DEVELOPERS.rst>`_ for more details.
History
#######
The original idea of this project is from Dr. Klaus Reuter (MPCDF). Fruitful discussions with Dr. Sebastian Kehl (MPCDF) facilitated this project’s packaging and release.
Accompanying tools for the ``linkmedic`` have been moved to a separate repository (`linkmedkit`_) in version 0.7.
License
#######
Copyright 2021-2023 M. Farzalipour Tabriz, Max Planck Computing and Data Facility (MPCDF)
All rights reserved.
This software may be modified and distributed under the terms of the 3-Clause BSD License. See the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "",
"name": "linkmedic",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7.2",
"maintainer_email": "",
"keywords": "html xml",
"author": "M. Farzalipour Tabriz",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/52/f0/87e10e626acbc3f060daf1dcceacfd59b337128ce6a53ac7be4988ce538f/linkmedic-0.7.4.tar.gz",
"platform": null,
"description": "**********\nLink Medic\n**********\n\n.. image:: https://img.shields.io/pypi/v/linkmedic\n :name: PyPI\n :target: https://pypi.org/project/linkmedic/\n\n.. image:: https://img.shields.io/badge/License-BSD_3--Clause-blue.svg\n :name: License: 3-Clause BSD\n :target: https://opensource.org/licenses/BSD-3-Clause\n\n.. image:: https://img.shields.io/badge/python-%3E=3.7-blue\n :name: Minimum required python version: 3.7\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n :name: Coding style: Black\n :target: https://github.com/psf/black\n\nA python script for checking links in static webpages (``.htm``, ``.html``), OpenDocument files (``.odt``, ``.odp``, ``.ods``), and single OpenDocument XML files (``.fodt``, ``.fodp``, ``.fods``).\n\n``linkmedic`` starts a test webserver and crawls all the pages, starting from the entry page. All the links in the resource tags (<a> <img> <script> <link> <iframe> <event-listener>) are checked and the dead links are reported. If a link is present in multiple pages, only the first one will be tested. By default, links to the external websites are ignored. If there is a ``.linkignore`` file in the website's root, the links listed in that file will be ignored during the tests (one link per line, see below for examples). After checking all the links if any dead links are discovered, ``linkmedic`` exits with an error code.\n\nQuick start\n###########\n\nInstall\n*******\nYou can install the ``linkmedic`` using your favorite python package manager. For example using ``pip`` you can download it from `PyPI <https://pypi.org/project/linkmedic/>`__:\n\n.. code-block:: shell\n\n pip install linkmedic --user\n\n\nRun\n***\nStart a test webserver with files at ``/var/www`` and crawl the pages and test all the links starting from ``/var/www/index.html`` page.\n\n.. code-block:: shell\n\n linkmedic --root=/var/www\n\n\nUsage & Options\n###############\n\nMirror package repository\n*************************\n\nYou can also install ``linkmedic`` from MPCDF GitLab package repository:\n\n.. code-block:: shell\n\n pip install linkmedic --user --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/5763/packages/pypi/simple\n\n\nContainer\n*********\nYou can use one of the container images with required libraries (and `linkmedkit <https://gitlab.mpcdf.mpg.de/tabriz/linkmedkit>`_ tools) already installed in:\n\n.. code-block:: shell\n\n quay.io/meisam/linkmedic:latest\n\n.. code-block:: shell\n\n gitlab-registry.mpcdf.mpg.de/tabriz/linkmedic:latest\n\nUsing a container image, you will need to have access to your website pages from inside the container. Depending on your container engine, you may need to mount the path to your files inside the container. For example, using `podman <https://podman.io>`_:\n\n.. code-block:: shell\n\n podman run -v /www/public:/test quay.io/meisam/linkmedic:latest linkmedic --root=/test\n\nHere, ``-v /www/public:/test`` flag mounts ``/www/public`` inside the container at ``/test`` path.\n\n.. _ci-cd:\n\nCI/CD\n*****\nYou can also use the container image in your CI/CD pipelines. For example, for GitLab CI in ``.gitlab-ci.yml``:\n\n.. code-block:: yaml\n\n test_internal_links:\n image: quay.io/meisam/linkmedic:latest\n script:\n - linkmedic --root=/var/www/ --entry=index.html --warn-http --with-badge\n after_script:\n - gitlab_badge_sticker.sh\n\n\nor for Woodpecker CI in ``.woodpecker.yml``:\n\n.. code-block:: yaml\n\n test_internal_links:\n image: quay.io/meisam/linkmedic:latest\n commands:\n - linkmedic --root=/var/www/ --entry=index.html --warn-http\n\nIf you want to check the external links of your website in CI, you must avoid running multiple tests in a short period of time, e.g. on each commit of the development branches. Otherwise, the IP of your CI runners may get banned by external web servers. For example, in GitLab CI you can limit the external link checks only to the default branch of your git repository:\n\n.. code-block:: yaml\n\n test_external_links:\n image: quay.io/meisam/linkmedic:latest\n rules:\n - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH\n script:\n - linkmedic --root=/var/www/ --ignore-local --with-badge\n after_script:\n - gitlab_badge_sticker.sh\n allow_failure: true \n\nCLI reference\n*************\n\n* Display help. Will show all the command line options and their default values.\n\n.. code-block:: shell\n\n linkmedic -h\n\n* Start the webserver with the current directory as the root path of the server. Starting from ``index.html`` crawl the pages and test all the links.\n\n.. code-block:: shell\n\n linkmedic\n\n* Start the webserver with ``./tests/public1/`` as the root path of the server. Starting from ``index.html`` crawl the pages and test all the links.\n\n.. code-block:: shell\n\n linkmedic --root=./tests/public1/\n\n* Start the webserver with ``./tests/public1/`` as the root path of the server. Starting from ``index2.html`` crawl the pages and test all the links. Entry point should be relative to the server root. (in the example below, ``index2.html`` should be accessible at ``./tests/public1/index2.html``)\n\n.. code-block:: shell\n\n linkmedic --root=./tests/public1/ --entry=index2.html\n\n* If any missing pages as ``/directory/page`` are encountered, do not redirect to ``/directory/page.html``.\n\n.. code-block:: shell\n\n linkmedic --no-redirect\n\n* Check links to external websites.\n \n [**IMPORTANT**: You should avoid running the link checker on external links multiple times in a short period of time, e.g. on each commit of the develop branch. Otherwise, IP of your machine (or CI runners) may get banned by the CDN or the DoS mitigation solution of the external webservers. See `CI/CD section <ci-cd_>`_ for a solution.]\n\n.. code-block:: shell\n\n linkmedic --check-external\n\n* Check only the links to external websites and ignore the local dead links.\n\n.. code-block:: shell\n\n linkmedic --ignore-local\n\n* Do not consider the external links which return HTTP status codes 403 and 503 as dead links.\n\n.. code-block:: shell\n\n linkmedic --ignore-status 403 503\n\n* Check links in an OpenDocument file (``.odt``, ``.odp``, ``.ods``), or a single OpenDocument XML file (``.fodt``, ``.fodp``, ``.fods``).\n\n.. code-block:: shell\n\n linkmedic --entry=./presentation.odp\n\n* Show warning for HTTP links.\n\n.. code-block:: shell\n\n linkmedic --warn-http\n\n* If any link to ``mydomain.com`` is encountered, treat them as internal links and resolve locally.\n\n.. code-block:: shell\n\n linkmedic --domain=mydomain.com\n\n* Start the webserver on port 3000. If the webserver could not be started on the requested port, the initializer will automatically try the next ports.\n\n.. code-block:: shell\n\n linkmedic --port=3000\n\n* Generate badge information file. Depending on the type of diagnosis, this file will be named ``badge.dead_internal_links.json``, ``badge.dead_external_links.json``, or ``badge.dead_links.json``. if ``--warn-http`` flag is used, badge file for the number of discovered HTTP links will be also written to ``badge.http_links.json`` file. These files can be used to generate badges (see `linkmedkit`_ scripts) or to serve for `shields.io endpoint <https://shields.io/endpoint>`_ response.\n\n.. code-block:: shell\n\n linkmedic --with-badge\n\n* Check the links but always exit with code 0.\n\n.. code-block:: shell\n\n linkmedic --exit-zero\n\n* Log the output in a different level of verbosity. If more than one of these flags are defined, the most restrictive one will be in effect.\n\n - ``--verbose`` : log debug information\n - ``--quiet`` : only log errors\n - ``--silent`` : completely silence the output logs\n\nExample .linkignore\n*******************\n\n.. code-block:: shell\n\n invalidfile.tar.gz\n will_add/later.html\n https://not.accessible.com\n\n\nDevelopment\n###########\nThis project is using `PDM <https://pdm.fming.dev/latest/>`_ for packaging and dependency management, `vermin <https://pypi.org/project/vermin/>`_ and `bandit <https://pypi.org/project/bandit/>`_ for validation, `black <https://pypi.org/project/black/>`_ and `isort <https://pypi.org/project/isort/>`_ for styling, and `jsonschema <https://pypi.org/project/jsonschema/>`_ and `jq <https://jqlang.github.io/jq/>`_ for testing. See `developers guide <DEVELOPERS.rst>`_ for more details.\n\nHistory\n#######\nThe original idea of this project is from Dr. Klaus Reuter (MPCDF). Fruitful discussions with Dr. Sebastian Kehl (MPCDF) facilitated this project\u2019s packaging and release.\n\nAccompanying tools for the ``linkmedic`` have been moved to a separate repository (`linkmedkit`_) in version 0.7.\n\nLicense\n#######\nCopyright 2021-2023 M. Farzalipour Tabriz, Max Planck Computing and Data Facility (MPCDF)\n\nAll rights reserved.\n\nThis software may be modified and distributed under the terms of the 3-Clause BSD License. See the LICENSE file for details.\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "Website links checker",
"version": "0.7.4",
"project_urls": {
"Repository": "https://gitlab.mpcdf.mpg.de/tabriz/linkmedic.git"
},
"split_keywords": [
"html",
"xml"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2eaad40949a0d759d213d2c65f88a764ff2deb022a43b272aed1228234211fc3",
"md5": "4310ad6c592c2d4ac7423fd0d9e15703",
"sha256": "550ca3b2273d4da6fc80959e299e7bfb1e2c67fc0496243a5169e9e3041e730f"
},
"downloads": -1,
"filename": "linkmedic-0.7.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4310ad6c592c2d4ac7423fd0d9e15703",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7.2",
"size": 14809,
"upload_time": "2023-09-06T20:23:50",
"upload_time_iso_8601": "2023-09-06T20:23:50.418504Z",
"url": "https://files.pythonhosted.org/packages/2e/aa/d40949a0d759d213d2c65f88a764ff2deb022a43b272aed1228234211fc3/linkmedic-0.7.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "52f087e10e626acbc3f060daf1dcceacfd59b337128ce6a53ac7be4988ce538f",
"md5": "6ef090de75e7542e17e8b4996e58c37d",
"sha256": "e909229c2317729aa8e797af17c27465a5811cf8dda2a3675636106c2f0ab78e"
},
"downloads": -1,
"filename": "linkmedic-0.7.4.tar.gz",
"has_sig": false,
"md5_digest": "6ef090de75e7542e17e8b4996e58c37d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7.2",
"size": 85446,
"upload_time": "2023-09-06T20:23:52",
"upload_time_iso_8601": "2023-09-06T20:23:52.990837Z",
"url": "https://files.pythonhosted.org/packages/52/f0/87e10e626acbc3f060daf1dcceacfd59b337128ce6a53ac7be4988ce538f/linkmedic-0.7.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-06 20:23:52",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "linkmedic"
}