linkmedic


Namelinkmedic JSON
Version 0.7.4 PyPI version JSON
download
home_page
SummaryWebsite links checker
upload_time2023-09-06 20:23:52
maintainer
docs_urlNone
authorM. Farzalipour Tabriz
requires_python>=3.7.2
licenseBSD-3-Clause
keywords html xml
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            **********
Link Medic
**********

.. image:: https://img.shields.io/pypi/v/linkmedic
   :name: PyPI
   :target: https://pypi.org/project/linkmedic/

.. image:: https://img.shields.io/badge/License-BSD_3--Clause-blue.svg
   :name: License: 3-Clause BSD
   :target: https://opensource.org/licenses/BSD-3-Clause

.. image:: https://img.shields.io/badge/python-%3E=3.7-blue
   :name: Minimum required python version: 3.7

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
   :name: Coding style: Black
   :target: https://github.com/psf/black

A python script for checking links in static webpages (``.htm``, ``.html``), OpenDocument files (``.odt``, ``.odp``, ``.ods``), and single OpenDocument XML files (``.fodt``, ``.fodp``, ``.fods``).

``linkmedic`` starts a test webserver and crawls all the pages, starting from the entry page. All the links in the resource tags (<a> <img> <script> <link> <iframe> <event-listener>) are checked and the dead links are reported. If a link is present in multiple pages, only the first one will be tested. By default, links to the external websites are ignored. If there is a ``.linkignore`` file in the website's root, the links listed in that file will be ignored during the tests (one link per line, see below for examples). After checking all the links if any dead links are discovered, ``linkmedic`` exits with an error code.

Quick start
###########

Install
*******
You can install the ``linkmedic`` using your favorite python package manager. For example using ``pip`` you can download it from `PyPI <https://pypi.org/project/linkmedic/>`__:

.. code-block:: shell

  pip install linkmedic --user


Run
***
Start a test webserver with files at ``/var/www`` and crawl the pages and test all the links starting from ``/var/www/index.html`` page.

.. code-block:: shell

  linkmedic --root=/var/www


Usage & Options
###############

Mirror package repository
*************************

You can also install ``linkmedic`` from MPCDF GitLab package repository:

.. code-block:: shell

  pip install linkmedic --user --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/5763/packages/pypi/simple


Container
*********
You can use one of the container images with required libraries (and `linkmedkit <https://gitlab.mpcdf.mpg.de/tabriz/linkmedkit>`_ tools) already installed in:

.. code-block:: shell

  quay.io/meisam/linkmedic:latest

.. code-block:: shell

  gitlab-registry.mpcdf.mpg.de/tabriz/linkmedic:latest

Using a container image, you will need to have access to your website pages from inside the container. Depending on your container engine, you may need to mount the path to your files inside the container. For example, using `podman <https://podman.io>`_:

.. code-block:: shell

  podman run -v /www/public:/test quay.io/meisam/linkmedic:latest linkmedic --root=/test

Here, ``-v /www/public:/test`` flag mounts ``/www/public`` inside the container at ``/test`` path.

.. _ci-cd:

CI/CD
*****
You can also use the container image in your CI/CD pipelines. For example, for GitLab CI in ``.gitlab-ci.yml``:

.. code-block:: yaml

  test_internal_links:
    image: quay.io/meisam/linkmedic:latest
    script:
      - linkmedic --root=/var/www/ --entry=index.html --warn-http --with-badge
    after_script:
      - gitlab_badge_sticker.sh


or for Woodpecker CI in ``.woodpecker.yml``:

.. code-block:: yaml

  test_internal_links:
    image: quay.io/meisam/linkmedic:latest
    commands:
      - linkmedic --root=/var/www/ --entry=index.html --warn-http

If you want to check the external links of your website in CI, you must avoid running multiple tests in a short period of time, e.g. on each commit of the development branches. Otherwise, the IP of your CI runners may get banned by external web servers. For example, in GitLab CI you can limit the external link checks only to the default branch of your git repository:

.. code-block:: yaml

  test_external_links:
    image: quay.io/meisam/linkmedic:latest
    rules:
      - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
    script:
      - linkmedic --root=/var/www/ --ignore-local --with-badge
    after_script:
      - gitlab_badge_sticker.sh
    allow_failure: true  

CLI reference
*************

* Display help. Will show all the command line options and their default values.

.. code-block:: shell

  linkmedic -h

* Start the webserver with the current directory as the root path of the server. Starting from ``index.html`` crawl the pages and test all the links.

.. code-block:: shell

  linkmedic

* Start the webserver with ``./tests/public1/`` as the root path of the server. Starting from ``index.html`` crawl the pages and test all the links.

.. code-block:: shell

  linkmedic --root=./tests/public1/

* Start the webserver with ``./tests/public1/`` as the root path of the server. Starting from ``index2.html`` crawl the pages and test all the links. Entry point should be relative to the server root. (in the example below, ``index2.html`` should be accessible at ``./tests/public1/index2.html``)

.. code-block:: shell

  linkmedic --root=./tests/public1/ --entry=index2.html

* If any missing pages as ``/directory/page`` are encountered, do not redirect to ``/directory/page.html``.

.. code-block:: shell

  linkmedic --no-redirect

* Check links to external websites.
  
  [**IMPORTANT**: You should avoid running the link checker on external links multiple times in a short period of time, e.g. on each commit of the develop branch. Otherwise, IP of your machine (or CI runners) may get banned by the CDN or the DoS mitigation solution of the external webservers. See `CI/CD section <ci-cd_>`_ for a solution.]

.. code-block:: shell

  linkmedic --check-external

* Check only the links to external websites and ignore the local dead links.

.. code-block:: shell

  linkmedic --ignore-local

* Do not consider the external links which return HTTP status codes 403 and 503 as dead links.

.. code-block:: shell

  linkmedic --ignore-status 403 503

* Check links in an OpenDocument file (``.odt``, ``.odp``, ``.ods``), or a single OpenDocument XML file (``.fodt``, ``.fodp``, ``.fods``).

.. code-block:: shell

  linkmedic --entry=./presentation.odp

* Show warning for HTTP links.

.. code-block:: shell

  linkmedic --warn-http

* If any link to ``mydomain.com`` is encountered, treat them as internal links and resolve locally.

.. code-block:: shell

  linkmedic --domain=mydomain.com

* Start the webserver on port 3000. If the webserver could not be started on the requested port, the initializer will automatically try the next ports.

.. code-block:: shell

  linkmedic --port=3000

* Generate badge information file. Depending on the type of diagnosis, this file will be named ``badge.dead_internal_links.json``, ``badge.dead_external_links.json``, or ``badge.dead_links.json``. if ``--warn-http`` flag is used, badge file for the number of discovered HTTP links will be also written to ``badge.http_links.json`` file. These files can be used to generate badges (see `linkmedkit`_ scripts) or to serve for `shields.io endpoint <https://shields.io/endpoint>`_ response.

.. code-block:: shell

  linkmedic --with-badge

* Check the links but always exit with code 0.

.. code-block:: shell

  linkmedic --exit-zero

* Log the output in a different level of verbosity. If more than one of these flags are defined, the most restrictive one will be in effect.

  -  ``--verbose`` : log debug information
  -  ``--quiet`` : only log errors
  -  ``--silent`` : completely silence the output logs

Example .linkignore
*******************

.. code-block:: shell

  invalidfile.tar.gz
  will_add/later.html
  https://not.accessible.com


Development
###########
This project is using `PDM <https://pdm.fming.dev/latest/>`_ for packaging and dependency management, `vermin <https://pypi.org/project/vermin/>`_ and `bandit <https://pypi.org/project/bandit/>`_ for validation, `black <https://pypi.org/project/black/>`_ and `isort <https://pypi.org/project/isort/>`_ for styling, and `jsonschema <https://pypi.org/project/jsonschema/>`_ and `jq <https://jqlang.github.io/jq/>`_ for testing. See `developers guide <DEVELOPERS.rst>`_ for more details.

History
#######
The original idea of this project is from Dr. Klaus Reuter (MPCDF). Fruitful discussions with Dr. Sebastian Kehl (MPCDF) facilitated this project’s packaging and release.

Accompanying tools for the ``linkmedic`` have been moved to a separate repository (`linkmedkit`_) in version 0.7.

License
#######
Copyright 2021-2023 M. Farzalipour Tabriz, Max Planck Computing and Data Facility (MPCDF)

All rights reserved.

This software may be modified and distributed under the terms of the 3-Clause BSD License. See the LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "linkmedic",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7.2",
    "maintainer_email": "",
    "keywords": "html xml",
    "author": "M. Farzalipour Tabriz",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/52/f0/87e10e626acbc3f060daf1dcceacfd59b337128ce6a53ac7be4988ce538f/linkmedic-0.7.4.tar.gz",
    "platform": null,
    "description": "**********\nLink Medic\n**********\n\n.. image:: https://img.shields.io/pypi/v/linkmedic\n   :name: PyPI\n   :target: https://pypi.org/project/linkmedic/\n\n.. image:: https://img.shields.io/badge/License-BSD_3--Clause-blue.svg\n   :name: License: 3-Clause BSD\n   :target: https://opensource.org/licenses/BSD-3-Clause\n\n.. image:: https://img.shields.io/badge/python-%3E=3.7-blue\n   :name: Minimum required python version: 3.7\n\n.. image:: https://img.shields.io/badge/code%20style-black-000000.svg\n   :name: Coding style: Black\n   :target: https://github.com/psf/black\n\nA python script for checking links in static webpages (``.htm``, ``.html``), OpenDocument files (``.odt``, ``.odp``, ``.ods``), and single OpenDocument XML files (``.fodt``, ``.fodp``, ``.fods``).\n\n``linkmedic`` starts a test webserver and crawls all the pages, starting from the entry page. All the links in the resource tags (<a> <img> <script> <link> <iframe> <event-listener>) are checked and the dead links are reported. If a link is present in multiple pages, only the first one will be tested. By default, links to the external websites are ignored. If there is a ``.linkignore`` file in the website's root, the links listed in that file will be ignored during the tests (one link per line, see below for examples). After checking all the links if any dead links are discovered, ``linkmedic`` exits with an error code.\n\nQuick start\n###########\n\nInstall\n*******\nYou can install the ``linkmedic`` using your favorite python package manager. For example using ``pip`` you can download it from `PyPI <https://pypi.org/project/linkmedic/>`__:\n\n.. code-block:: shell\n\n  pip install linkmedic --user\n\n\nRun\n***\nStart a test webserver with files at ``/var/www`` and crawl the pages and test all the links starting from ``/var/www/index.html`` page.\n\n.. code-block:: shell\n\n  linkmedic --root=/var/www\n\n\nUsage & Options\n###############\n\nMirror package repository\n*************************\n\nYou can also install ``linkmedic`` from MPCDF GitLab package repository:\n\n.. code-block:: shell\n\n  pip install linkmedic --user --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/5763/packages/pypi/simple\n\n\nContainer\n*********\nYou can use one of the container images with required libraries (and `linkmedkit <https://gitlab.mpcdf.mpg.de/tabriz/linkmedkit>`_ tools) already installed in:\n\n.. code-block:: shell\n\n  quay.io/meisam/linkmedic:latest\n\n.. code-block:: shell\n\n  gitlab-registry.mpcdf.mpg.de/tabriz/linkmedic:latest\n\nUsing a container image, you will need to have access to your website pages from inside the container. Depending on your container engine, you may need to mount the path to your files inside the container. For example, using `podman <https://podman.io>`_:\n\n.. code-block:: shell\n\n  podman run -v /www/public:/test quay.io/meisam/linkmedic:latest linkmedic --root=/test\n\nHere, ``-v /www/public:/test`` flag mounts ``/www/public`` inside the container at ``/test`` path.\n\n.. _ci-cd:\n\nCI/CD\n*****\nYou can also use the container image in your CI/CD pipelines. For example, for GitLab CI in ``.gitlab-ci.yml``:\n\n.. code-block:: yaml\n\n  test_internal_links:\n    image: quay.io/meisam/linkmedic:latest\n    script:\n      - linkmedic --root=/var/www/ --entry=index.html --warn-http --with-badge\n    after_script:\n      - gitlab_badge_sticker.sh\n\n\nor for Woodpecker CI in ``.woodpecker.yml``:\n\n.. code-block:: yaml\n\n  test_internal_links:\n    image: quay.io/meisam/linkmedic:latest\n    commands:\n      - linkmedic --root=/var/www/ --entry=index.html --warn-http\n\nIf you want to check the external links of your website in CI, you must avoid running multiple tests in a short period of time, e.g. on each commit of the development branches. Otherwise, the IP of your CI runners may get banned by external web servers. For example, in GitLab CI you can limit the external link checks only to the default branch of your git repository:\n\n.. code-block:: yaml\n\n  test_external_links:\n    image: quay.io/meisam/linkmedic:latest\n    rules:\n      - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH\n    script:\n      - linkmedic --root=/var/www/ --ignore-local --with-badge\n    after_script:\n      - gitlab_badge_sticker.sh\n    allow_failure: true  \n\nCLI reference\n*************\n\n* Display help. Will show all the command line options and their default values.\n\n.. code-block:: shell\n\n  linkmedic -h\n\n* Start the webserver with the current directory as the root path of the server. Starting from ``index.html`` crawl the pages and test all the links.\n\n.. code-block:: shell\n\n  linkmedic\n\n* Start the webserver with ``./tests/public1/`` as the root path of the server. Starting from ``index.html`` crawl the pages and test all the links.\n\n.. code-block:: shell\n\n  linkmedic --root=./tests/public1/\n\n* Start the webserver with ``./tests/public1/`` as the root path of the server. Starting from ``index2.html`` crawl the pages and test all the links. Entry point should be relative to the server root. (in the example below, ``index2.html`` should be accessible at ``./tests/public1/index2.html``)\n\n.. code-block:: shell\n\n  linkmedic --root=./tests/public1/ --entry=index2.html\n\n* If any missing pages as ``/directory/page`` are encountered, do not redirect to ``/directory/page.html``.\n\n.. code-block:: shell\n\n  linkmedic --no-redirect\n\n* Check links to external websites.\n  \n  [**IMPORTANT**: You should avoid running the link checker on external links multiple times in a short period of time, e.g. on each commit of the develop branch. Otherwise, IP of your machine (or CI runners) may get banned by the CDN or the DoS mitigation solution of the external webservers. See `CI/CD section <ci-cd_>`_ for a solution.]\n\n.. code-block:: shell\n\n  linkmedic --check-external\n\n* Check only the links to external websites and ignore the local dead links.\n\n.. code-block:: shell\n\n  linkmedic --ignore-local\n\n* Do not consider the external links which return HTTP status codes 403 and 503 as dead links.\n\n.. code-block:: shell\n\n  linkmedic --ignore-status 403 503\n\n* Check links in an OpenDocument file (``.odt``, ``.odp``, ``.ods``), or a single OpenDocument XML file (``.fodt``, ``.fodp``, ``.fods``).\n\n.. code-block:: shell\n\n  linkmedic --entry=./presentation.odp\n\n* Show warning for HTTP links.\n\n.. code-block:: shell\n\n  linkmedic --warn-http\n\n* If any link to ``mydomain.com`` is encountered, treat them as internal links and resolve locally.\n\n.. code-block:: shell\n\n  linkmedic --domain=mydomain.com\n\n* Start the webserver on port 3000. If the webserver could not be started on the requested port, the initializer will automatically try the next ports.\n\n.. code-block:: shell\n\n  linkmedic --port=3000\n\n* Generate badge information file. Depending on the type of diagnosis, this file will be named ``badge.dead_internal_links.json``, ``badge.dead_external_links.json``, or ``badge.dead_links.json``. if ``--warn-http`` flag is used, badge file for the number of discovered HTTP links will be also written to ``badge.http_links.json`` file. These files can be used to generate badges (see `linkmedkit`_ scripts) or to serve for `shields.io endpoint <https://shields.io/endpoint>`_ response.\n\n.. code-block:: shell\n\n  linkmedic --with-badge\n\n* Check the links but always exit with code 0.\n\n.. code-block:: shell\n\n  linkmedic --exit-zero\n\n* Log the output in a different level of verbosity. If more than one of these flags are defined, the most restrictive one will be in effect.\n\n  -  ``--verbose`` : log debug information\n  -  ``--quiet`` : only log errors\n  -  ``--silent`` : completely silence the output logs\n\nExample .linkignore\n*******************\n\n.. code-block:: shell\n\n  invalidfile.tar.gz\n  will_add/later.html\n  https://not.accessible.com\n\n\nDevelopment\n###########\nThis project is using `PDM <https://pdm.fming.dev/latest/>`_ for packaging and dependency management, `vermin <https://pypi.org/project/vermin/>`_ and `bandit <https://pypi.org/project/bandit/>`_ for validation, `black <https://pypi.org/project/black/>`_ and `isort <https://pypi.org/project/isort/>`_ for styling, and `jsonschema <https://pypi.org/project/jsonschema/>`_ and `jq <https://jqlang.github.io/jq/>`_ for testing. See `developers guide <DEVELOPERS.rst>`_ for more details.\n\nHistory\n#######\nThe original idea of this project is from Dr. Klaus Reuter (MPCDF). Fruitful discussions with Dr. Sebastian Kehl (MPCDF) facilitated this project\u2019s packaging and release.\n\nAccompanying tools for the ``linkmedic`` have been moved to a separate repository (`linkmedkit`_) in version 0.7.\n\nLicense\n#######\nCopyright 2021-2023 M. Farzalipour Tabriz, Max Planck Computing and Data Facility (MPCDF)\n\nAll rights reserved.\n\nThis software may be modified and distributed under the terms of the 3-Clause BSD License. See the LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "Website links checker",
    "version": "0.7.4",
    "project_urls": {
        "Repository": "https://gitlab.mpcdf.mpg.de/tabriz/linkmedic.git"
    },
    "split_keywords": [
        "html",
        "xml"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2eaad40949a0d759d213d2c65f88a764ff2deb022a43b272aed1228234211fc3",
                "md5": "4310ad6c592c2d4ac7423fd0d9e15703",
                "sha256": "550ca3b2273d4da6fc80959e299e7bfb1e2c67fc0496243a5169e9e3041e730f"
            },
            "downloads": -1,
            "filename": "linkmedic-0.7.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4310ad6c592c2d4ac7423fd0d9e15703",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7.2",
            "size": 14809,
            "upload_time": "2023-09-06T20:23:50",
            "upload_time_iso_8601": "2023-09-06T20:23:50.418504Z",
            "url": "https://files.pythonhosted.org/packages/2e/aa/d40949a0d759d213d2c65f88a764ff2deb022a43b272aed1228234211fc3/linkmedic-0.7.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "52f087e10e626acbc3f060daf1dcceacfd59b337128ce6a53ac7be4988ce538f",
                "md5": "6ef090de75e7542e17e8b4996e58c37d",
                "sha256": "e909229c2317729aa8e797af17c27465a5811cf8dda2a3675636106c2f0ab78e"
            },
            "downloads": -1,
            "filename": "linkmedic-0.7.4.tar.gz",
            "has_sig": false,
            "md5_digest": "6ef090de75e7542e17e8b4996e58c37d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7.2",
            "size": 85446,
            "upload_time": "2023-09-06T20:23:52",
            "upload_time_iso_8601": "2023-09-06T20:23:52.990837Z",
            "url": "https://files.pythonhosted.org/packages/52/f0/87e10e626acbc3f060daf1dcceacfd59b337128ce6a53ac7be4988ce538f/linkmedic-0.7.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-06 20:23:52",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "linkmedic"
}
        
Elapsed time: 0.10836s