linkmedic


Namelinkmedic JSON
Version 0.8.1 PyPI version JSON
download
home_pageNone
SummaryWebsite links checker
upload_time2025-02-08 21:31:44
maintainerNone
docs_urlNone
authorM. Farzalipour Tabriz
requires_python>=3.9
licenseBSD-3-Clause
keywords html odf xml
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            **********
Link Medic
**********

.. image:: https://img.shields.io/pypi/v/linkmedic
   :name: PyPI
   :target: https://pypi.org/project/linkmedic/

.. image:: https://img.shields.io/badge/License-BSD_3--Clause-blue.svg
   :name: License: 3-Clause BSD
   :target: https://opensource.org/licenses/BSD-3-Clause

.. image:: https://img.shields.io/badge/Python-%3E=3.9-blue
   :name: Minimum supported Python version: 3.9

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :name: Coding style: Black
   :target: https://github.com/psf/black

A Python script for checking links and resources used in local static webpages (``.htm``, ``.html``), OpenDocument files (``.odt``, ``.odp``, ``.ods``), and single OpenDocument XML files (``.fodt``, ``.fodp``, ``.fods``).

``linkmedic`` starts a test web server, requests an entry page from the test web server and crawls all the local pages. All the links in the tags (``<a>`` ``<img>`` ``<script>`` ``<link>`` ``<iframe>`` ``<event-listener>``) are checked and the dead links are reported. If a link is present in multiple pages, it will be tested only once. By default, links to the external websites are ignored. If there is a ``.linkignore`` file in the website's root, the links listed in that file will be ignored during the tests (one link per line; see below for examples). After checking all the links if any dead links are discovered, ``linkmedic`` exits with a non-zero code.

For testing links in dynamic HTML content (e.g., using JavaScript template engines) or other document formats, you must first convert your files (using a third-party tool) to static HTML and then run ``linkmedic``.

Quick start
###########

Install prerequisites
*********************
Depending on your operating system, you may have multiple options for installing the prerequisites:

* `Python <https://www.python.org/downloads/>`__: ``linkmedic`` is only tested on `officially supported Python versions <https://devguide.python.org/versions/>`__.
* A ``Python`` package installer: For example, `pip <https://pip.pypa.io/en/stable/installation/>`__

Install linkmedic
*****************
You can install the ``linkmedic`` using your favorite Python package installer. For example, using ``pip``, you can download it from `PyPI <https://pypi.org/project/linkmedic/>`__:

.. code-block:: shell

  pip install linkmedic


Run
***
To start a test web server with files at ``/var/www`` and crawl the pages and test all the links starting from the ``/var/www/index.html`` page, run:

.. code-block:: shell

  linkmedic --root=/var/www


Usage & Options
###############

Mirror package repository
*************************

You can also install ``linkmedic`` from the MPCDF GitLab package repository:

.. code-block:: shell

  pip install linkmedic --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/5763/packages/pypi/simple


Container
*********
You can use one of the container images, which have the required libraries and `linkmedkit <https://gitlab.mpcdf.mpg.de/tbz/linkmedkit>`_ already installed:

.. code-block:: shell

  quay.io/meisam/linkmedic:latest

.. code-block:: shell

  gitlab-registry.mpcdf.mpg.de/tbz/linkmedic:latest

You can access a specific version of ``linkmedic`` using container tags e.g. ``linkmedic:v0.7.4`` instead of ``linkmedic:latest``. See all available container tags `here <https://quay.io/repository/meisam/linkmedic?tab=tags>`_.

When using a container image, ``linkmedic``'s test web server needs to have access to the files for your website pages from inside the container. Depending on your container engine, you may need to mount the path to your files inside the container. For example, using `podman <https://podman.io>`_:

.. code-block:: shell

  podman run --volume /www/public:/test quay.io/meisam/linkmedic:latest linkmedic --root=/test

Here, the ``--volume /www/public:/test`` flag mounts the directory ``/www/public`` inside the container at the path ``/test``.

.. _ci-cd:

CI/CD
*****
You can also use the container image in your CI/CD pipelines. For example, for GitLab CI, in the ``.gitlab-ci.yml`` file:

.. code-block:: yaml

  test_internal_links:
    image: quay.io/meisam/linkmedic:latest
    script:
      - linkmedic --root=/var/www/ --entry=index.html --warn-http --with-badge
    after_script:
      - gitlab_badge_sticker.sh


or for Woodpecker CI in the ``.woodpecker.yml`` file:

.. code-block:: yaml

  test_internal_links:
    image: quay.io/meisam/linkmedic:latest
    commands:
      - linkmedic --root=/var/www/ --entry=index.html --warn-http

If you want to check the external links of your website in your CI pipeline, you must avoid running multiple tests in a short period of time, e.g., on each commit to the development branches. Otherwise, the IP address of your CI runners may get banned by external web servers. For example, in GitLab CI, you can limit the external link checks to only the default branch of your Git repository:

.. code-block:: yaml

  test_external_links:
    image: quay.io/meisam/linkmedic:latest
    rules:
      - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
    script:
      - linkmedic --root=/var/www/ --ignore-local --with-badge
    after_script:
      - gitlab_badge_sticker.sh
    allow_failure: true  

Please note that the ``gitlab_badge_sticker.sh`` script used in these examples requires an API access token ``CI_API_TOKEN`` with maintainer permission to modify the GitLab repository badges. See the `linkmedkit documentation <https://gitlab.mpcdf.mpg.de/tbz/linkmedkit>`_ for more details.

CLI reference
*************

* Display help: This will show all the command-line options and their default values.

.. code-block:: shell

  linkmedic -h

* Start the web server with the current directory as the root path of the server. Starting from ``index.html``, crawl the pages and test all the links.

.. code-block:: shell

  linkmedic

* Start the web server with ``./tests/public1/`` as the root path of the server. Starting from ``index.html``, crawl the pages and test all the links.

.. code-block:: shell

  linkmedic --root=./tests/public1/

* Start the web server with ``./tests/public1/`` as the root path of the server. Starting from ``index2.html``, crawl the pages and test all the links. The entry point should be relative to the server root. (In the example, ``index2.html`` should be accessible at ``./tests/public1/index2.html``)

.. code-block:: shell

  linkmedic --root=./tests/public1/ --entry=index2.html

* Configure the test web server not to redirect missing local pages (e.g., from ``/directory/page`` to ``/directory/page.html``).

.. code-block:: shell

  linkmedic --no-local-redirect

* Check links to external websites.
  
  [**IMPORTANT**: You must avoid running the link checker on external links multiple times in a short period, e.g., on each commit to the development branch. Otherwise, the IP address of your machine (or CI runners) may get banned by the CDN or the DoS mitigation solution of the external web servers. See the `CI/CD section <ci-cd_>`__ for a solution.]

.. code-block:: shell

  linkmedic --check-external

* Do not follow external link redirections. Depending on the configuration of external web servers, this option can result in some dead links not being detetcted when instead of returning 404 page directly, the webserver is asking the client to load another page.

.. code-block:: shell

  linkmedic --no-external-redirects

* Ignore local dead links and activates external link checking.

.. code-block:: shell

  linkmedic --ignore-local

* Do not consider external links that return HTTP status codes 403 and 503 as dead links.

.. code-block:: shell

  linkmedic --ignore-status 403 503

* Check links in an OpenDocument file (e.g., ``.odt``, ``.odp``, ``.ods``), or a single OpenDocument XML file (e.g., ``.fodt``, ``.fodp``, ``.fods``).

.. code-block:: shell

  linkmedic --entry=./presentation.odp

* Show warning for HTTP links.

.. code-block:: shell

  linkmedic --warn-http

* If any link to ``mydomain.com`` is encountered, treat it as an internal link and resolve it locally.

.. code-block:: shell

  linkmedic --domain=mydomain.com

* Start the web server on port 3000. If the web server cannot be started on the requested port, the initializer will automatically try the next available ports.

.. code-block:: shell

  linkmedic --port=3000

* Generate badge information file. Depending on the type of diagnosis, this file will be named ``badge.dead_internal_links.json``, ``badge.dead_external_links.json``, or ``badge.dead_links.json``. If the ``--warn-http`` flag is used, a badge file for the number of discovered HTTP links will also be written to the ``badge.http_links.json`` file. These files can be used to generate badges (see `linkmedkit`_ scripts) or to serve as a response for the `shields.io endpoint <https://shields.io/endpoint>`_.

.. code-block:: shell

  linkmedic --with-badge

* Check the links but always exit with code 0.

.. code-block:: shell

  linkmedic --exit-zero

* Log the output at a different level of verbosity. If more than one of these flags is defined, the most restrictive one will be in effect.

  -  ``--verbose`` : log debug information
  -  ``--quiet`` : log only errors
  -  ``--silent`` : completely silence the output logs

* Dump the crawler links list to the ``linkmedic.links`` file. If the ``--domain`` flag has not been set, local links will be referenced from the website root as ``/your/path/page.html``.

.. code-block:: shell

  linkmedic --dump-links

Example .linkignore
*******************

.. code-block:: shell

  invalidfile.tar.gz
  will_add/later.html
  https://not.accessible.com


Development
###########
This project uses `PDM <https://pdm.fming.dev/latest/>`_ for packaging and dependency management, `vermin <https://pypi.org/project/vermin/>`_ and `bandit <https://pypi.org/project/bandit/>`_ for validation, `black <https://pypi.org/project/black/>`_ and `isort <https://pypi.org/project/isort/>`_ for code styling, and `jsonschema <https://pypi.org/project/jsonschema/>`_ and `jq <https://jqlang.github.io/jq/>`_ for testing. See the `developers guide <DEVELOPERS.rst>`_ for more details.

History
#######
The original idea for this project came from Dr. Klaus Reuter (MPCDF). Fruitful discussions with Dr. Sebastian Kehl (MPCDF) facilitated the packaging and release of this project.

Accompanying tools for ``linkmedic`` have been moved to a separate repository (`linkmedkit`_) starting with version 0.7.

License
#######
* Copyright 2021-2023 M. Farzalipour Tabriz, Max Planck Computing and Data Facility (MPCDF)
* Copyright 2023-2025 M. Farzalipour Tabriz, Max Planck Institute for Physics (MPP)

All rights reserved.

This software may be modified and distributed under the terms of the 3-Clause BSD License. See the LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "linkmedic",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "html, odf, xml",
    "author": "M. Farzalipour Tabriz",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/0a/d8/090f5663c3b0cc466a6010bdbfd07345e70528a6dbabb93a5df043c573eb/linkmedic-0.8.1.tar.gz",
    "platform": null,
    "description": "**********\nLink Medic\n**********\n\n.. image:: https://img.shields.io/pypi/v/linkmedic\n   :name: PyPI\n   :target: https://pypi.org/project/linkmedic/\n\n.. image:: https://img.shields.io/badge/License-BSD_3--Clause-blue.svg\n   :name: License: 3-Clause BSD\n   :target: https://opensource.org/licenses/BSD-3-Clause\n\n.. image:: https://img.shields.io/badge/Python-%3E=3.9-blue\n   :name: Minimum supported Python version: 3.9\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n   :name: Coding style: Black\n   :target: https://github.com/psf/black\n\nA Python script for checking links and resources used in local static webpages (``.htm``, ``.html``), OpenDocument files (``.odt``, ``.odp``, ``.ods``), and single OpenDocument XML files (``.fodt``, ``.fodp``, ``.fods``).\n\n``linkmedic`` starts a test web server, requests an entry page from the test web server and crawls all the local pages. All the links in the tags (``<a>`` ``<img>`` ``<script>`` ``<link>`` ``<iframe>`` ``<event-listener>``) are checked and the dead links are reported. If a link is present in multiple pages, it will be tested only once. By default, links to the external websites are ignored. If there is a ``.linkignore`` file in the website's root, the links listed in that file will be ignored during the tests (one link per line; see below for examples). After checking all the links if any dead links are discovered, ``linkmedic`` exits with a non-zero code.\n\nFor testing links in dynamic HTML content (e.g., using JavaScript template engines) or other document formats, you must first convert your files (using a third-party tool) to static HTML and then run ``linkmedic``.\n\nQuick start\n###########\n\nInstall prerequisites\n*********************\nDepending on your operating system, you may have multiple options for installing the prerequisites:\n\n* `Python <https://www.python.org/downloads/>`__: ``linkmedic`` is only tested on `officially supported Python versions <https://devguide.python.org/versions/>`__.\n* A ``Python`` package installer: For example, `pip <https://pip.pypa.io/en/stable/installation/>`__\n\nInstall linkmedic\n*****************\nYou can install the ``linkmedic`` using your favorite Python package installer. For example, using ``pip``, you can download it from `PyPI <https://pypi.org/project/linkmedic/>`__:\n\n.. code-block:: shell\n\n  pip install linkmedic\n\n\nRun\n***\nTo start a test web server with files at ``/var/www`` and crawl the pages and test all the links starting from the ``/var/www/index.html`` page, run:\n\n.. code-block:: shell\n\n  linkmedic --root=/var/www\n\n\nUsage & Options\n###############\n\nMirror package repository\n*************************\n\nYou can also install ``linkmedic`` from the MPCDF GitLab package repository:\n\n.. code-block:: shell\n\n  pip install linkmedic --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/5763/packages/pypi/simple\n\n\nContainer\n*********\nYou can use one of the container images, which have the required libraries and `linkmedkit <https://gitlab.mpcdf.mpg.de/tbz/linkmedkit>`_ already installed:\n\n.. code-block:: shell\n\n  quay.io/meisam/linkmedic:latest\n\n.. code-block:: shell\n\n  gitlab-registry.mpcdf.mpg.de/tbz/linkmedic:latest\n\nYou can access a specific version of ``linkmedic`` using container tags e.g. ``linkmedic:v0.7.4`` instead of ``linkmedic:latest``. See all available container tags `here <https://quay.io/repository/meisam/linkmedic?tab=tags>`_.\n\nWhen using a container image, ``linkmedic``'s test web server needs to have access to the files for your website pages from inside the container. Depending on your container engine, you may need to mount the path to your files inside the container. For example, using `podman <https://podman.io>`_:\n\n.. code-block:: shell\n\n  podman run --volume /www/public:/test quay.io/meisam/linkmedic:latest linkmedic --root=/test\n\nHere, the ``--volume /www/public:/test`` flag mounts the directory ``/www/public`` inside the container at the path ``/test``.\n\n.. _ci-cd:\n\nCI/CD\n*****\nYou can also use the container image in your CI/CD pipelines. For example, for GitLab CI, in the ``.gitlab-ci.yml`` file:\n\n.. code-block:: yaml\n\n  test_internal_links:\n    image: quay.io/meisam/linkmedic:latest\n    script:\n      - linkmedic --root=/var/www/ --entry=index.html --warn-http --with-badge\n    after_script:\n      - gitlab_badge_sticker.sh\n\n\nor for Woodpecker CI in the ``.woodpecker.yml`` file:\n\n.. code-block:: yaml\n\n  test_internal_links:\n    image: quay.io/meisam/linkmedic:latest\n    commands:\n      - linkmedic --root=/var/www/ --entry=index.html --warn-http\n\nIf you want to check the external links of your website in your CI pipeline, you must avoid running multiple tests in a short period of time, e.g., on each commit to the development branches. Otherwise, the IP address of your CI runners may get banned by external web servers. For example, in GitLab CI, you can limit the external link checks to only the default branch of your Git repository:\n\n.. code-block:: yaml\n\n  test_external_links:\n    image: quay.io/meisam/linkmedic:latest\n    rules:\n      - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH\n    script:\n      - linkmedic --root=/var/www/ --ignore-local --with-badge\n    after_script:\n      - gitlab_badge_sticker.sh\n    allow_failure: true  \n\nPlease note that the ``gitlab_badge_sticker.sh`` script used in these examples requires an API access token ``CI_API_TOKEN`` with maintainer permission to modify the GitLab repository badges. See the `linkmedkit documentation <https://gitlab.mpcdf.mpg.de/tbz/linkmedkit>`_ for more details.\n\nCLI reference\n*************\n\n* Display help: This will show all the command-line options and their default values.\n\n.. code-block:: shell\n\n  linkmedic -h\n\n* Start the web server with the current directory as the root path of the server. Starting from ``index.html``, crawl the pages and test all the links.\n\n.. code-block:: shell\n\n  linkmedic\n\n* Start the web server with ``./tests/public1/`` as the root path of the server. Starting from ``index.html``, crawl the pages and test all the links.\n\n.. code-block:: shell\n\n  linkmedic --root=./tests/public1/\n\n* Start the web server with ``./tests/public1/`` as the root path of the server. Starting from ``index2.html``, crawl the pages and test all the links. The entry point should be relative to the server root. (In the example, ``index2.html`` should be accessible at ``./tests/public1/index2.html``)\n\n.. code-block:: shell\n\n  linkmedic --root=./tests/public1/ --entry=index2.html\n\n* Configure the test web server not to redirect missing local pages (e.g., from ``/directory/page`` to ``/directory/page.html``).\n\n.. code-block:: shell\n\n  linkmedic --no-local-redirect\n\n* Check links to external websites.\n  \n  [**IMPORTANT**: You must avoid running the link checker on external links multiple times in a short period, e.g., on each commit to the development branch. Otherwise, the IP address of your machine (or CI runners) may get banned by the CDN or the DoS mitigation solution of the external web servers. See the `CI/CD section <ci-cd_>`__ for a solution.]\n\n.. code-block:: shell\n\n  linkmedic --check-external\n\n* Do not follow external link redirections. Depending on the configuration of external web servers, this option can result in some dead links not being detetcted when instead of returning 404 page directly, the webserver is asking the client to load another page.\n\n.. code-block:: shell\n\n  linkmedic --no-external-redirects\n\n* Ignore local dead links and activates external link checking.\n\n.. code-block:: shell\n\n  linkmedic --ignore-local\n\n* Do not consider external links that return HTTP status codes 403 and 503 as dead links.\n\n.. code-block:: shell\n\n  linkmedic --ignore-status 403 503\n\n* Check links in an OpenDocument file (e.g., ``.odt``, ``.odp``, ``.ods``), or a single OpenDocument XML file (e.g., ``.fodt``, ``.fodp``, ``.fods``).\n\n.. code-block:: shell\n\n  linkmedic --entry=./presentation.odp\n\n* Show warning for HTTP links.\n\n.. code-block:: shell\n\n  linkmedic --warn-http\n\n* If any link to ``mydomain.com`` is encountered, treat it as an internal link and resolve it locally.\n\n.. code-block:: shell\n\n  linkmedic --domain=mydomain.com\n\n* Start the web server on port 3000. If the web server cannot be started on the requested port, the initializer will automatically try the next available ports.\n\n.. code-block:: shell\n\n  linkmedic --port=3000\n\n* Generate badge information file. Depending on the type of diagnosis, this file will be named ``badge.dead_internal_links.json``, ``badge.dead_external_links.json``, or ``badge.dead_links.json``. If the ``--warn-http`` flag is used, a badge file for the number of discovered HTTP links will also be written to the ``badge.http_links.json`` file. These files can be used to generate badges (see `linkmedkit`_ scripts) or to serve as a response for the `shields.io endpoint <https://shields.io/endpoint>`_.\n\n.. code-block:: shell\n\n  linkmedic --with-badge\n\n* Check the links but always exit with code 0.\n\n.. code-block:: shell\n\n  linkmedic --exit-zero\n\n* Log the output at a different level of verbosity. If more than one of these flags is defined, the most restrictive one will be in effect.\n\n  -  ``--verbose`` : log debug information\n  -  ``--quiet`` : log only errors\n  -  ``--silent`` : completely silence the output logs\n\n* Dump the crawler links list to the ``linkmedic.links`` file. If the ``--domain`` flag has not been set, local links will be referenced from the website root as ``/your/path/page.html``.\n\n.. code-block:: shell\n\n  linkmedic --dump-links\n\nExample .linkignore\n*******************\n\n.. code-block:: shell\n\n  invalidfile.tar.gz\n  will_add/later.html\n  https://not.accessible.com\n\n\nDevelopment\n###########\nThis project uses `PDM <https://pdm.fming.dev/latest/>`_ for packaging and dependency management, `vermin <https://pypi.org/project/vermin/>`_ and `bandit <https://pypi.org/project/bandit/>`_ for validation, `black <https://pypi.org/project/black/>`_ and `isort <https://pypi.org/project/isort/>`_ for code styling, and `jsonschema <https://pypi.org/project/jsonschema/>`_ and `jq <https://jqlang.github.io/jq/>`_ for testing. See the `developers guide <DEVELOPERS.rst>`_ for more details.\n\nHistory\n#######\nThe original idea for this project came from Dr. Klaus Reuter (MPCDF). Fruitful discussions with Dr. Sebastian Kehl (MPCDF) facilitated the packaging and release of this project.\n\nAccompanying tools for ``linkmedic`` have been moved to a separate repository (`linkmedkit`_) starting with version 0.7.\n\nLicense\n#######\n* Copyright 2021-2023 M. Farzalipour Tabriz, Max Planck Computing and Data Facility (MPCDF)\n* Copyright 2023-2025 M. Farzalipour Tabriz, Max Planck Institute for Physics (MPP)\n\nAll rights reserved.\n\nThis software may be modified and distributed under the terms of the 3-Clause BSD License. See the LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "Website links checker",
    "version": "0.8.1",
    "project_urls": {
        "repository": "https://gitlab.mpcdf.mpg.de/tbz/linkmedic.git"
    },
    "split_keywords": [
        "html",
        " odf",
        " xml"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c6950b4354f9dafd722d49c94a24a61419cd8b1bf75b50fca77d4254bf55def6",
                "md5": "bd58cdc5006f6d6506ded547221dcabf",
                "sha256": "6cb3adaa2b9089dffe3171445ee6643402aa68a33785f3b4521374526a5e6a83"
            },
            "downloads": -1,
            "filename": "linkmedic-0.8.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bd58cdc5006f6d6506ded547221dcabf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18042,
            "upload_time": "2025-02-08T21:31:42",
            "upload_time_iso_8601": "2025-02-08T21:31:42.976713Z",
            "url": "https://files.pythonhosted.org/packages/c6/95/0b4354f9dafd722d49c94a24a61419cd8b1bf75b50fca77d4254bf55def6/linkmedic-0.8.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0ad8090f5663c3b0cc466a6010bdbfd07345e70528a6dbabb93a5df043c573eb",
                "md5": "fa6583122be3b64be85a11f94749ea4f",
                "sha256": "8f233dde312d90b9941c92ef90a56fe532af15ebbd7338b301e0481c91c9830c"
            },
            "downloads": -1,
            "filename": "linkmedic-0.8.1.tar.gz",
            "has_sig": false,
            "md5_digest": "fa6583122be3b64be85a11f94749ea4f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 90891,
            "upload_time": "2025-02-08T21:31:44",
            "upload_time_iso_8601": "2025-02-08T21:31:44.580901Z",
            "url": "https://files.pythonhosted.org/packages/0a/d8/090f5663c3b0cc466a6010bdbfd07345e70528a6dbabb93a5df043c573eb/linkmedic-0.8.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-08 21:31:44",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "linkmedic"
}
        
Elapsed time: 0.49940s