filter-url


Namefilter-url JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryA simple, fast, and configurable URL sensitive data filter
upload_time2025-07-14 23:27:07
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords url filter filtering url filtering logging
VCS
bugtrack_url
requirements iniconfig packaging pluggy pygments pytest
Travis-CI No Travis.
coveralls test coverage
            filter-url
==========

[![PyPI version](https://img.shields.io/pypi/v/filter-url.svg)](https://pypi.org/project/filter-url/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/filter-url.svg)](https://pypi.org/project/filter-url/)
[![PyPI - License](https://img.shields.io/pypi/l/filter-url.svg)](https://pypi.org/project/filter-url/)
[![Coverage Status](https://coveralls.io/repos/github/alexsemenyaka/filter_url/badge.svg?branch=main)](https://coveralls.io/github/alexsemenyaka/filter_url?branch=main)
[![CI/CD Status](https://github.com/alexsemenyaka/filter_url/actions/workflows/ci.yml/badge.svg)](https://github.com/alexsemenyaka/filter_url/actions/workflows/ci.yml)

A simple, fast, and configurable Python utility to censor sensitive data (passwords, API keys, tokens) from URLs, making them safe for logging, monitoring, and debugging.

Key Features
------------

* **Comprehensive Censoring**: Censors passwords in userinfo (`user:[...]@host`), query parameter values, and parts of the URL path.
* **Flexible Rules**: Filter query parameters by exact key names or by powerful regular expressions.
* **Advanced Path Filtering**: Use regex with named capture groups to censor specific dynamic parts of a URL path while leaving the rest intact.
* **Order Preserving**: Guarantees that the order of query parameters in the output is identical to the input.
* **Logging Integration**: Provides a ready-to-use `logging.Filter` subclass for seamless integration into your application's logging setup.
* **Lightweight**: Zero external dependencies.

Installation
------------

    pip install filter-url

Quick Start
-----------

The quickest way to use the library is the standalone `filter_url()` function, which uses a default set of rules to catch common sensitive keys.

    from filter_url import filter_url

    dirty_url = "https://user:my-secret-password@example.com/data?token=abc-123-xyz"

    # Use the function with default filters
    clean_url = filter_url(dirty_url)

    print(clean_url)
    # >> https://user:[...]@example.com/data?token=[...]

Usage & Examples
----------------

### Basic Filtering (Standalone Function)

The `filter_url()` function is great for one-off tasks. You can pass your own filtering rules directly to it. If a rule is not provided, a sensible default is used.

    from filter_url import filter_url

    # Define custom rules
    custom_path_re = r'/user/(?P<user_id>\d+)/profile'

    dirty_url = "https://example.com/user/123456/profile?credit_card_number=5555"

    # Censor using a custom path regex
    clean_url = filter_url(
        url=dirty_url,
        bad_path_re=custom_path_re
    )

    print(clean_url)
    # >> https://example.com/user/[...]/profile?credit_card_number=5555

### Advanced: Using the `FilterURL` Class for Performance

When you need to filter a large number of URLs with the same configuration, it's much more efficient to instantiate the `FilterURL` class once. This pre-compiles the regular expressions and avoids redundant work in a loop.

    from filter_url import FilterURL

    # Create the filter instance ONCE with your custom rules.
    # The regexes are compiled here.
    my_filter = FilterURL(
        bad_keys={'api_key'},
        bad_keys_re=[r'session']
    )

    urls_to_process = [
        "https://service.com/api?api_key=key-1",
        "https://service.com/api?user_session=sess-2",
        "https://service.com/api?id=3"
    ]

    # Reuse the same instance in a loop for high performance
    clean_urls = [my_filter.remove_sensitive(url) for url in urls_to_process]

    # clean_urls will be:
    # [
    #   'https://service.com/api?api_key=[...]',
    #   'https://service.com/api?user_session=[...]',
    #   'https://service.com/api?id=3'
    # ]

### Integration with Python's `logging` Module

This is the most powerful feature for real-world applications. The `URLFilter` automatically censors URLs in your logs. The filter works in two ways:
1. **(Preferred)** It looks for a `url` key in the `extra` dictionary of your logging call.
2. **(Fallback)** If `fallback=True` (the default), it searches for URLs in the positional arguments of the log message.


```python
    import logging
    import sys
    from filter_url import URLFilter

    # 1. Configure a logger

    logger = logging.getLogger('my_app')
    logger.setLevel(logging.INFO)
    if logger.hasHandlers():
        logger.handlers.clear()

    # 2. Simply add our filter. Let's use custom rules for this example

    custom_filter = URLFilter(
        bad_keys={'access_token'},
        fallback=True # Default, but shown for clarity
    )
    logger.addFilter(custom_filter)

    # 3. Use a standard Formatter. No special formatter is needed

    handler = logging.StreamHandler(sys.stdout)
    formatter = logging.Formatter('%(levelname)s: %(message)s')
    handler.setFormatter(formatter)
    logger.addHandler(handler)

    # --- Usage Examples ---

    # Case 1: (Preferred) Pass the URL via 'extra'

    logger.info(
        "User login attempt failed",
        extra={'url': "<https://auth.service.com/login?access_token=12345"}>
    )

    # Case 2: (Fallback) The URL is an argument in the message string

    logger.info(
        "API call to %s resulted in a 404 error.",
        "<https://api.service.com/data/v1/user?password=abc>"
    )

    # Case 3: No URL in the message. Nothing extra is added

    logger.info("Application started successfully.")
```

**Expected Output:**

    INFO: User login attempt failed | (URL data: https://auth.service.com/login?access_token=[...])
    INFO: API call to https://api.service.com/data/v1/user?password=[...] was made. | (URL data: https://api.service.com/data/v1/user?password=[...])
    INFO: Application started successfully.

Corner Cases & Considerations
-----------------------------

* **Log String vs. Valid URL**: The primary goal of this library is to produce a human-readable, safe string for logging. The output string containing `[...]` in the userinfo (password) section is not a valid URL according to RFC standards and may fail if you try to parse it again with `urllib.parse`.
* **Performance**: For filtering a large number of URLs, always instantiate the `FilterURL` class once and reuse the instance. The standalone `filter_url()` function re-compiles regexes on every call and is less performant for batch jobs.
* **Logging Filter Precedence**: When using `URLFilter`, providing a URL in the `extra` dictionary is always the preferred method. The `fallback` search will only trigger if a `url` key is not found in `extra`.

API Reference
-------------

* `filter_url(url, censored, bad_keys, bad_keys_re, bad_path_re)`: A standalone function for one-off URL censoring.
* `FilterURL(bad_keys, bad_keys_re, bad_path_re)`: A class that holds a compiled filter configuration for efficient, repeated use.
  * `.remove_sensitive(url, censored)`: The method that performs the censoring.
* `URLFilter(bad_keys, bad_keys_re, bad_path_re, fallback)`: A `logging.Filter` subclass for easy integration with Python's logging module.

License
-------

This project is licensed under the MIT License.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "filter-url",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "url, filter, filtering, URL filtering, logging",
    "author": null,
    "author_email": "Alex Semenyaka <alex.semenyaka@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d2/0c/9c8fa3c2bb5835715ffbc5ce570bf8f973aa9879bca45f76109b559761f0/filter_url-1.0.0.tar.gz",
    "platform": null,
    "description": "filter-url\n==========\n\n[![PyPI version](https://img.shields.io/pypi/v/filter-url.svg)](https://pypi.org/project/filter-url/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/filter-url.svg)](https://pypi.org/project/filter-url/)\n[![PyPI - License](https://img.shields.io/pypi/l/filter-url.svg)](https://pypi.org/project/filter-url/)\n[![Coverage Status](https://coveralls.io/repos/github/alexsemenyaka/filter_url/badge.svg?branch=main)](https://coveralls.io/github/alexsemenyaka/filter_url?branch=main)\n[![CI/CD Status](https://github.com/alexsemenyaka/filter_url/actions/workflows/ci.yml/badge.svg)](https://github.com/alexsemenyaka/filter_url/actions/workflows/ci.yml)\n\nA simple, fast, and configurable Python utility to censor sensitive data (passwords, API keys, tokens) from URLs, making them safe for logging, monitoring, and debugging.\n\nKey Features\n------------\n\n* **Comprehensive Censoring**: Censors passwords in userinfo (`user:[...]@host`), query parameter values, and parts of the URL path.\n* **Flexible Rules**: Filter query parameters by exact key names or by powerful regular expressions.\n* **Advanced Path Filtering**: Use regex with named capture groups to censor specific dynamic parts of a URL path while leaving the rest intact.\n* **Order Preserving**: Guarantees that the order of query parameters in the output is identical to the input.\n* **Logging Integration**: Provides a ready-to-use `logging.Filter` subclass for seamless integration into your application's logging setup.\n* **Lightweight**: Zero external dependencies.\n\nInstallation\n------------\n\n    pip install filter-url\n\nQuick Start\n-----------\n\nThe quickest way to use the library is the standalone `filter_url()` function, which uses a default set of rules to catch common sensitive keys.\n\n    from filter_url import filter_url\n\n    dirty_url = \"https://user:my-secret-password@example.com/data?token=abc-123-xyz\"\n\n    # Use the function with default filters\n    clean_url = filter_url(dirty_url)\n\n    print(clean_url)\n    # >> https://user:[...]@example.com/data?token=[...]\n\nUsage & Examples\n----------------\n\n### Basic Filtering (Standalone Function)\n\nThe `filter_url()` function is great for one-off tasks. You can pass your own filtering rules directly to it. If a rule is not provided, a sensible default is used.\n\n    from filter_url import filter_url\n\n    # Define custom rules\n    custom_path_re = r'/user/(?P<user_id>\\d+)/profile'\n\n    dirty_url = \"https://example.com/user/123456/profile?credit_card_number=5555\"\n\n    # Censor using a custom path regex\n    clean_url = filter_url(\n        url=dirty_url,\n        bad_path_re=custom_path_re\n    )\n\n    print(clean_url)\n    # >> https://example.com/user/[...]/profile?credit_card_number=5555\n\n### Advanced: Using the `FilterURL` Class for Performance\n\nWhen you need to filter a large number of URLs with the same configuration, it's much more efficient to instantiate the `FilterURL` class once. This pre-compiles the regular expressions and avoids redundant work in a loop.\n\n    from filter_url import FilterURL\n\n    # Create the filter instance ONCE with your custom rules.\n    # The regexes are compiled here.\n    my_filter = FilterURL(\n        bad_keys={'api_key'},\n        bad_keys_re=[r'session']\n    )\n\n    urls_to_process = [\n        \"https://service.com/api?api_key=key-1\",\n        \"https://service.com/api?user_session=sess-2\",\n        \"https://service.com/api?id=3\"\n    ]\n\n    # Reuse the same instance in a loop for high performance\n    clean_urls = [my_filter.remove_sensitive(url) for url in urls_to_process]\n\n    # clean_urls will be:\n    # [\n    #   'https://service.com/api?api_key=[...]',\n    #   'https://service.com/api?user_session=[...]',\n    #   'https://service.com/api?id=3'\n    # ]\n\n### Integration with Python's `logging` Module\n\nThis is the most powerful feature for real-world applications. The `URLFilter` automatically censors URLs in your logs. The filter works in two ways:\n1. **(Preferred)** It looks for a `url` key in the `extra` dictionary of your logging call.\n2. **(Fallback)** If `fallback=True` (the default), it searches for URLs in the positional arguments of the log message.\n\n\n```python\n    import logging\n    import sys\n    from filter_url import URLFilter\n\n    # 1. Configure a logger\n\n    logger = logging.getLogger('my_app')\n    logger.setLevel(logging.INFO)\n    if logger.hasHandlers():\n        logger.handlers.clear()\n\n    # 2. Simply add our filter. Let's use custom rules for this example\n\n    custom_filter = URLFilter(\n        bad_keys={'access_token'},\n        fallback=True # Default, but shown for clarity\n    )\n    logger.addFilter(custom_filter)\n\n    # 3. Use a standard Formatter. No special formatter is needed\n\n    handler = logging.StreamHandler(sys.stdout)\n    formatter = logging.Formatter('%(levelname)s: %(message)s')\n    handler.setFormatter(formatter)\n    logger.addHandler(handler)\n\n    # --- Usage Examples ---\n\n    # Case 1: (Preferred) Pass the URL via 'extra'\n\n    logger.info(\n        \"User login attempt failed\",\n        extra={'url': \"<https://auth.service.com/login?access_token=12345\"}>\n    )\n\n    # Case 2: (Fallback) The URL is an argument in the message string\n\n    logger.info(\n        \"API call to %s resulted in a 404 error.\",\n        \"<https://api.service.com/data/v1/user?password=abc>\"\n    )\n\n    # Case 3: No URL in the message. Nothing extra is added\n\n    logger.info(\"Application started successfully.\")\n```\n\n**Expected Output:**\n\n    INFO: User login attempt failed | (URL data: https://auth.service.com/login?access_token=[...])\n    INFO: API call to https://api.service.com/data/v1/user?password=[...] was made. | (URL data: https://api.service.com/data/v1/user?password=[...])\n    INFO: Application started successfully.\n\nCorner Cases & Considerations\n-----------------------------\n\n* **Log String vs. Valid URL**: The primary goal of this library is to produce a human-readable, safe string for logging. The output string containing `[...]` in the userinfo (password) section is not a valid URL according to RFC standards and may fail if you try to parse it again with `urllib.parse`.\n* **Performance**: For filtering a large number of URLs, always instantiate the `FilterURL` class once and reuse the instance. The standalone `filter_url()` function re-compiles regexes on every call and is less performant for batch jobs.\n* **Logging Filter Precedence**: When using `URLFilter`, providing a URL in the `extra` dictionary is always the preferred method. The `fallback` search will only trigger if a `url` key is not found in `extra`.\n\nAPI Reference\n-------------\n\n* `filter_url(url, censored, bad_keys, bad_keys_re, bad_path_re)`: A standalone function for one-off URL censoring.\n* `FilterURL(bad_keys, bad_keys_re, bad_path_re)`: A class that holds a compiled filter configuration for efficient, repeated use.\n  * `.remove_sensitive(url, censored)`: The method that performs the censoring.\n* `URLFilter(bad_keys, bad_keys_re, bad_path_re, fallback)`: A `logging.Filter` subclass for easy integration with Python's logging module.\n\nLicense\n-------\n\nThis project is licensed under the MIT License.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A simple, fast, and configurable URL sensitive data filter",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/alexsemenyaka/filter_url",
        "Issues": "https://github.com/alexsemenyaka/filter_url/issues",
        "Repository": "https://github.com/alexsemenyaka/filter_url"
    },
    "split_keywords": [
        "url",
        " filter",
        " filtering",
        " url filtering",
        " logging"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ae5d3d2887a21fd1715add2cbded47814596ddb408d80171c409b90d0fe0203a",
                "md5": "6e753a1b56a326b5285340c52f564858",
                "sha256": "78a2190f9f7445058f0f81a89d90fe6cb1b0e1b6d1651e895d4c703dcff107fd"
            },
            "downloads": -1,
            "filename": "filter_url-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6e753a1b56a326b5285340c52f564858",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 8404,
            "upload_time": "2025-07-14T23:27:05",
            "upload_time_iso_8601": "2025-07-14T23:27:05.905325Z",
            "url": "https://files.pythonhosted.org/packages/ae/5d/3d2887a21fd1715add2cbded47814596ddb408d80171c409b90d0fe0203a/filter_url-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d20c9c8fa3c2bb5835715ffbc5ce570bf8f973aa9879bca45f76109b559761f0",
                "md5": "7a181e7f3e6fa29f24a479b192860923",
                "sha256": "672ee5a4c9af8092e66bb16deb9b0f4291e59f375f05bb9345ff7a82242042e6"
            },
            "downloads": -1,
            "filename": "filter_url-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7a181e7f3e6fa29f24a479b192860923",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 10078,
            "upload_time": "2025-07-14T23:27:07",
            "upload_time_iso_8601": "2025-07-14T23:27:07.093693Z",
            "url": "https://files.pythonhosted.org/packages/d2/0c/9c8fa3c2bb5835715ffbc5ce570bf8f973aa9879bca45f76109b559761f0/filter_url-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-14 23:27:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "alexsemenyaka",
    "github_project": "filter_url",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "iniconfig",
            "specs": [
                [
                    "==",
                    "2.1.0"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "25.0"
                ]
            ]
        },
        {
            "name": "pluggy",
            "specs": [
                [
                    "==",
                    "1.6.0"
                ]
            ]
        },
        {
            "name": "pygments",
            "specs": [
                [
                    "==",
                    "2.19.2"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.4.1"
                ]
            ]
        }
    ],
    "lcname": "filter-url"
}
        
Elapsed time: 2.05784s