filter-url
==========
[](https://pypi.org/project/filter-url/)
[](https://pypi.org/project/filter-url/)
[](https://pypi.org/project/filter-url/)
[](https://coveralls.io/github/alexsemenyaka/filter_url?branch=main)
[](https://github.com/alexsemenyaka/filter_url/actions/workflows/ci.yml)
A simple, fast, and configurable Python utility to censor sensitive data (passwords, API keys, tokens) from URLs, making them safe for logging, monitoring, and debugging.
Key Features
------------
* **Comprehensive Censoring**: Censors passwords in userinfo (`user:[...]@host`), query parameter values, and parts of the URL path.
* **Flexible Rules**: Filter query parameters by exact key names or by powerful regular expressions.
* **Advanced Path Filtering**: Use regex with named capture groups to censor specific dynamic parts of a URL path while leaving the rest intact.
* **Order Preserving**: Guarantees that the order of query parameters in the output is identical to the input.
* **Logging Integration**: Provides a ready-to-use `logging.Filter` subclass for seamless integration into your application's logging setup.
* **Lightweight**: Zero external dependencies.
Installation
------------
pip install filter-url
Quick Start
-----------
The quickest way to use the library is the standalone `filter_url()` function, which uses a default set of rules to catch common sensitive keys.
from filter_url import filter_url
dirty_url = "https://user:my-secret-password@example.com/data?token=abc-123-xyz"
# Use the function with default filters
clean_url = filter_url(dirty_url)
print(clean_url)
# >> https://user:[...]@example.com/data?token=[...]
Usage & Examples
----------------
### Basic Filtering (Standalone Function)
The `filter_url()` function is great for one-off tasks. You can pass your own filtering rules directly to it. If a rule is not provided, a sensible default is used.
from filter_url import filter_url
# Define custom rules
custom_path_re = r'/user/(?P<user_id>\d+)/profile'
dirty_url = "https://example.com/user/123456/profile?credit_card_number=5555"
# Censor using a custom path regex
clean_url = filter_url(
url=dirty_url,
bad_path_re=custom_path_re
)
print(clean_url)
# >> https://example.com/user/[...]/profile?credit_card_number=5555
### Advanced: Using the `FilterURL` Class for Performance
When you need to filter a large number of URLs with the same configuration, it's much more efficient to instantiate the `FilterURL` class once. This pre-compiles the regular expressions and avoids redundant work in a loop.
from filter_url import FilterURL
# Create the filter instance ONCE with your custom rules.
# The regexes are compiled here.
my_filter = FilterURL(
bad_keys={'api_key'},
bad_keys_re=[r'session']
)
urls_to_process = [
"https://service.com/api?api_key=key-1",
"https://service.com/api?user_session=sess-2",
"https://service.com/api?id=3"
]
# Reuse the same instance in a loop for high performance
clean_urls = [my_filter.remove_sensitive(url) for url in urls_to_process]
# clean_urls will be:
# [
# 'https://service.com/api?api_key=[...]',
# 'https://service.com/api?user_session=[...]',
# 'https://service.com/api?id=3'
# ]
The class has an internal cache for filtered URLs, you can tune it or turn it off completely with the parameter cache\_size (see API description below)
### Integration with Python's `logging` Module
This is the most powerful feature for real-world applications. The `URLFilter` automatically censors URLs in your logs. The filter works in two ways:
1. **(Preferred)** It looks for a `url` key in the `extra` dictionary of your logging call.
2. **(Fallback)** If `fallback=True` (the default), it searches for URLs in the positional arguments of the log message.
```python
import logging
import sys
from filter_url import URLFilter
# 1. Configure a logger
logger = logging.getLogger('my_app')
logger.setLevel(logging.INFO)
if logger.hasHandlers():
logger.handlers.clear()
# 2. Simply add our filter. Let's use custom rules for this example
custom_filter = URLFilter(
bad_keys={'access_token'},
fallback=True # Default, but shown for clarity
)
logger.addFilter(custom_filter)
# 3. Use a standard Formatter. No special formatter is needed
handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter('%(levelname)s: %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
# --- Usage Examples ---
# Case 1: (Preferred) Pass the URL via 'extra'
logger.info(
"User login attempt failed",
extra={'url': "<https://auth.service.com/login?access_token=12345"}>
)
# Case 2: (Fallback) The URL is an argument in the message string
logger.info(
"API call to %s resulted in a 404 error.",
"<https://api.service.com/data/v1/user?password=abc>"
)
# Case 3: No URL in the message. Nothing extra is added
logger.info("Application started successfully.")
```
Be aware of a minor trade-off between using a filter for the logging module and the FilterURL class.
Provided each URL is only output once, then a filter for logging is the perfect solution: it will make your code much more straightforward and cleaner.
When processing URLs and outputting them multiple times during different stages, prepare them in advance using the FilterURL class to save CPU cycles.
The filtered URTs are stored in the internal cache inside FilterURL to mitigate this difference. However, it can still be notable under load.
**Expected Output:**
INFO: User login attempt failed | (URL data: https://auth.service.com/login?access_token=[...])
INFO: API call to https://api.service.com/data/v1/user?password=[...] was made. | (URL data: https://api.service.com/data/v1/user?password=[...])
INFO: Application started successfully.
Corner Cases & Considerations
-----------------------------
* **Log String vs. Valid URL**: The primary goal of this library is to produce a human-readable, safe string for logging. The output string containing `[...]` in the userinfo (password) section is not a valid URL according to RFC standards and may fail if you try to parse it again with `urllib.parse`.
* **Performance**: For filtering a large number of URLs, always instantiate the `FilterURL` class once and reuse the instance. The standalone `filter_url()` function re-compiles regexes on every call and is less performant for batch jobs.
* **Logging Filter Precedence**: When using `URLFilter`, providing a URL in the `extra` dictionary is always the preferred method. The `fallback` search will only trigger if a `url` key is not found in `extra`. Also, using fallback option needs extra CPU cycles, which may be unwanted.
API Reference
-------------
* `filter_url(url, censored, bad_keys, bad_keys_re, bad_path_re)`: A standalone function for one-off URL censoring.
* **url:str - (required)** an URL to 'censor'
* **censored:str - (optional)** a placeholder to use insted aof redacted parts, '[...]' by default
* **bad_keys:list: - (optional)** a list of keys in the HTTP method GET that may contain a sensitive data. Default:
[ "password", "token", "key", "secret", "auth", "apikey", "credentials", ]
* **bad_keys_re:list: - (optional)** a list of regexs matching keys in the HTTP method GET that may contain a sensitive data. Default:
[ r"session", r"csrf", r".*_secret", r".*_token", r".*_key", ]
* **bad_path_re:str: - (optional)** a regex to match a path port of the URL, each defined group in it will be redacted. Default: None. Examples:
custom_path_re_named = r"/api/v1/(?P<api_key>[^/]+)/resource"
custom_path_re_simple = r"(?<=/user/)\d+(?=/delete)"
* `FilterURL(bad_keys, bad_keys_re, bad_path_re, cache_size)`: A class that holds a compiled filter configuration for efficient, repeated use.
Meaning of **bad_keys:list, bad_keys_re:list, bad_path_re:str** and their defaults are the same
as for filter\_url() (see above)
* **cache_size:int - (optional)** Size of the cache to keep filtered URLs, 0 or None means no caching. Default: 512
* `.remove_sensitive(url, censored)`: The method that performs the censoring.
* **censored:str - (optional)** a placeholder to use insted aof redacted parts, '[...]' by default
* `URLFilter(bad_keys, bad_keys_re, bad_path_re, fmt, url_filter_instance, fallback, cache_size, name)`: A `logging.Filter` subclass for easy integration with Python's logging module.
* **bad_keys:list, bad_keys_re:list, bad_path_re:str** are the same as for filter\_url() (see above)
* **fmt:str - (optional)** Format to add an filtered URL into the log message, default: ' | (URL={filtered\_url})' ({filtered\_url} will be
replaced with your filtered URL)
* **url_filter_instance:FilterURL - (optional)** Pre-configured instance of FilterURL-like class to use for filtering. Default: None (will be created by the filter)
* **fallback:bool - (optional)** Do we look for URL in the text when URL is not specified explicitly with extra={'url':...}? Default: True
* **cache_size:int - (optional)** Size of the cache to keep filtered URLs, 0 or None means no caching. Default: 512
* **name:str - (optional)** The name of the filter (inherited from the logging.Filter)
License
-------
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": null,
"name": "filter-url",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "url, filter, filtering, URL filtering, logging",
"author": null,
"author_email": "Alex Semenyaka <alex.semenyaka@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ca/38/b0243052d7f287f219bd47339f523ff97e3b38b8c7902b6576aa0866e67f/filter_url-1.2.0.tar.gz",
"platform": null,
"description": "filter-url\n==========\n\n[](https://pypi.org/project/filter-url/)\n[](https://pypi.org/project/filter-url/)\n[](https://pypi.org/project/filter-url/)\n[](https://coveralls.io/github/alexsemenyaka/filter_url?branch=main)\n[](https://github.com/alexsemenyaka/filter_url/actions/workflows/ci.yml)\n\nA simple, fast, and configurable Python utility to censor sensitive data (passwords, API keys, tokens) from URLs, making them safe for logging, monitoring, and debugging.\n\nKey Features\n------------\n\n* **Comprehensive Censoring**: Censors passwords in userinfo (`user:[...]@host`), query parameter values, and parts of the URL path.\n* **Flexible Rules**: Filter query parameters by exact key names or by powerful regular expressions.\n* **Advanced Path Filtering**: Use regex with named capture groups to censor specific dynamic parts of a URL path while leaving the rest intact.\n* **Order Preserving**: Guarantees that the order of query parameters in the output is identical to the input.\n* **Logging Integration**: Provides a ready-to-use `logging.Filter` subclass for seamless integration into your application's logging setup.\n* **Lightweight**: Zero external dependencies.\n\nInstallation\n------------\n\n pip install filter-url\n\nQuick Start\n-----------\n\nThe quickest way to use the library is the standalone `filter_url()` function, which uses a default set of rules to catch common sensitive keys.\n\n from filter_url import filter_url\n\n dirty_url = \"https://user:my-secret-password@example.com/data?token=abc-123-xyz\"\n\n # Use the function with default filters\n clean_url = filter_url(dirty_url)\n\n print(clean_url)\n # >> https://user:[...]@example.com/data?token=[...]\n\nUsage & Examples\n----------------\n\n### Basic Filtering (Standalone Function)\n\nThe `filter_url()` function is great for one-off tasks. You can pass your own filtering rules directly to it. If a rule is not provided, a sensible default is used.\n\n from filter_url import filter_url\n\n # Define custom rules\n custom_path_re = r'/user/(?P<user_id>\\d+)/profile'\n\n dirty_url = \"https://example.com/user/123456/profile?credit_card_number=5555\"\n\n # Censor using a custom path regex\n clean_url = filter_url(\n url=dirty_url,\n bad_path_re=custom_path_re\n )\n\n print(clean_url)\n # >> https://example.com/user/[...]/profile?credit_card_number=5555\n\n### Advanced: Using the `FilterURL` Class for Performance\n\nWhen you need to filter a large number of URLs with the same configuration, it's much more efficient to instantiate the `FilterURL` class once. This pre-compiles the regular expressions and avoids redundant work in a loop.\n\n from filter_url import FilterURL\n\n # Create the filter instance ONCE with your custom rules.\n # The regexes are compiled here.\n my_filter = FilterURL(\n bad_keys={'api_key'},\n bad_keys_re=[r'session']\n )\n\n urls_to_process = [\n \"https://service.com/api?api_key=key-1\",\n \"https://service.com/api?user_session=sess-2\",\n \"https://service.com/api?id=3\"\n ]\n\n # Reuse the same instance in a loop for high performance\n clean_urls = [my_filter.remove_sensitive(url) for url in urls_to_process]\n\n # clean_urls will be:\n # [\n # 'https://service.com/api?api_key=[...]',\n # 'https://service.com/api?user_session=[...]',\n # 'https://service.com/api?id=3'\n # ]\n\nThe class has an internal cache for filtered URLs, you can tune it or turn it off completely with the parameter cache\\_size (see API description below)\n\n### Integration with Python's `logging` Module\n\nThis is the most powerful feature for real-world applications. The `URLFilter` automatically censors URLs in your logs. The filter works in two ways:\n1. **(Preferred)** It looks for a `url` key in the `extra` dictionary of your logging call.\n2. **(Fallback)** If `fallback=True` (the default), it searches for URLs in the positional arguments of the log message.\n\n\n```python\n import logging\n import sys\n from filter_url import URLFilter\n\n # 1. Configure a logger\n\n logger = logging.getLogger('my_app')\n logger.setLevel(logging.INFO)\n if logger.hasHandlers():\n logger.handlers.clear()\n\n # 2. Simply add our filter. Let's use custom rules for this example\n\n custom_filter = URLFilter(\n bad_keys={'access_token'},\n fallback=True # Default, but shown for clarity\n )\n logger.addFilter(custom_filter)\n\n # 3. Use a standard Formatter. No special formatter is needed\n\n handler = logging.StreamHandler(sys.stdout)\n formatter = logging.Formatter('%(levelname)s: %(message)s')\n handler.setFormatter(formatter)\n logger.addHandler(handler)\n\n # --- Usage Examples ---\n\n # Case 1: (Preferred) Pass the URL via 'extra'\n\n logger.info(\n \"User login attempt failed\",\n extra={'url': \"<https://auth.service.com/login?access_token=12345\"}>\n )\n\n # Case 2: (Fallback) The URL is an argument in the message string\n\n logger.info(\n \"API call to %s resulted in a 404 error.\",\n \"<https://api.service.com/data/v1/user?password=abc>\"\n )\n\n # Case 3: No URL in the message. Nothing extra is added\n\n logger.info(\"Application started successfully.\")\n```\n\nBe aware of a minor trade-off between using a filter for the logging module and the FilterURL class.\nProvided each URL is only output once, then a filter for logging is the perfect solution: it will make your code much more straightforward and cleaner.\nWhen processing URLs and outputting them multiple times during different stages, prepare them in advance using the FilterURL class to save CPU cycles.\nThe filtered URTs are stored in the internal cache inside FilterURL to mitigate this difference. However, it can still be notable under load.\n\n**Expected Output:**\n\n INFO: User login attempt failed | (URL data: https://auth.service.com/login?access_token=[...])\n INFO: API call to https://api.service.com/data/v1/user?password=[...] was made. | (URL data: https://api.service.com/data/v1/user?password=[...])\n INFO: Application started successfully.\n\nCorner Cases & Considerations\n-----------------------------\n\n* **Log String vs. Valid URL**: The primary goal of this library is to produce a human-readable, safe string for logging. The output string containing `[...]` in the userinfo (password) section is not a valid URL according to RFC standards and may fail if you try to parse it again with `urllib.parse`.\n* **Performance**: For filtering a large number of URLs, always instantiate the `FilterURL` class once and reuse the instance. The standalone `filter_url()` function re-compiles regexes on every call and is less performant for batch jobs.\n* **Logging Filter Precedence**: When using `URLFilter`, providing a URL in the `extra` dictionary is always the preferred method. The `fallback` search will only trigger if a `url` key is not found in `extra`. Also, using fallback option needs extra CPU cycles, which may be unwanted.\n\nAPI Reference\n-------------\n\n* `filter_url(url, censored, bad_keys, bad_keys_re, bad_path_re)`: A standalone function for one-off URL censoring.\n * **url:str - (required)** an URL to 'censor'\n * **censored:str - (optional)** a placeholder to use insted aof redacted parts, '[...]' by default\n * **bad_keys:list: - (optional)** a list of keys in the HTTP method GET that may contain a sensitive data. Default:\n\n [ \"password\", \"token\", \"key\", \"secret\", \"auth\", \"apikey\", \"credentials\", ]\n\n * **bad_keys_re:list: - (optional)** a list of regexs matching keys in the HTTP method GET that may contain a sensitive data. Default:\n\n [ r\"session\", r\"csrf\", r\".*_secret\", r\".*_token\", r\".*_key\", ]\n\n * **bad_path_re:str: - (optional)** a regex to match a path port of the URL, each defined group in it will be redacted. Default: None. Examples:\n\n custom_path_re_named = r\"/api/v1/(?P<api_key>[^/]+)/resource\"\n custom_path_re_simple = r\"(?<=/user/)\\d+(?=/delete)\"\n\n* `FilterURL(bad_keys, bad_keys_re, bad_path_re, cache_size)`: A class that holds a compiled filter configuration for efficient, repeated use.\n Meaning of **bad_keys:list, bad_keys_re:list, bad_path_re:str** and their defaults are the same\n as for filter\\_url() (see above)\n * **cache_size:int - (optional)** Size of the cache to keep filtered URLs, 0 or None means no caching. Default: 512\n * `.remove_sensitive(url, censored)`: The method that performs the censoring.\n * **censored:str - (optional)** a placeholder to use insted aof redacted parts, '[...]' by default\n* `URLFilter(bad_keys, bad_keys_re, bad_path_re, fmt, url_filter_instance, fallback, cache_size, name)`: A `logging.Filter` subclass for easy integration with Python's logging module.\n * **bad_keys:list, bad_keys_re:list, bad_path_re:str** are the same as for filter\\_url() (see above)\n * **fmt:str - (optional)** Format to add an filtered URL into the log message, default: ' | (URL={filtered\\_url})' ({filtered\\_url} will be\n replaced with your filtered URL)\n * **url_filter_instance:FilterURL - (optional)** Pre-configured instance of FilterURL-like class to use for filtering. Default: None (will be created by the filter)\n * **fallback:bool - (optional)** Do we look for URL in the text when URL is not specified explicitly with extra={'url':...}? Default: True\n * **cache_size:int - (optional)** Size of the cache to keep filtered URLs, 0 or None means no caching. Default: 512\n * **name:str - (optional)** The name of the filter (inherited from the logging.Filter)\n\nLicense\n-------\n\nThis project is licensed under the MIT License.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A simple, fast, and configurable URL sensitive data filter",
"version": "1.2.0",
"project_urls": {
"Homepage": "https://github.com/alexsemenyaka/filter_url",
"Issues": "https://github.com/alexsemenyaka/filter_url/issues",
"Repository": "https://github.com/alexsemenyaka/filter_url"
},
"split_keywords": [
"url",
" filter",
" filtering",
" url filtering",
" logging"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0b9ae38227deafaa6934017f94980286727548e4f4f44b0497cdb4be39649e2e",
"md5": "4c14f13da5c34b02335cac8483b805af",
"sha256": "37e47a9170d7bb7d2eb1f11bd9afc9d1d33fba9413b4ba8c503604fae853bf95"
},
"downloads": -1,
"filename": "filter_url-1.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4c14f13da5c34b02335cac8483b805af",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 9566,
"upload_time": "2025-07-15T16:07:47",
"upload_time_iso_8601": "2025-07-15T16:07:47.149530Z",
"url": "https://files.pythonhosted.org/packages/0b/9a/e38227deafaa6934017f94980286727548e4f4f44b0497cdb4be39649e2e/filter_url-1.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ca38b0243052d7f287f219bd47339f523ff97e3b38b8c7902b6576aa0866e67f",
"md5": "e39d07221feda451fe039cb42455bb32",
"sha256": "d0138995c96917aa75048227d714e0eee849dfa4fdff28a918af3f228403a66e"
},
"downloads": -1,
"filename": "filter_url-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "e39d07221feda451fe039cb42455bb32",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 11233,
"upload_time": "2025-07-15T16:07:47",
"upload_time_iso_8601": "2025-07-15T16:07:47.954222Z",
"url": "https://files.pythonhosted.org/packages/ca/38/b0243052d7f287f219bd47339f523ff97e3b38b8c7902b6576aa0866e67f/filter_url-1.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-15 16:07:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "alexsemenyaka",
"github_project": "filter_url",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [
{
"name": "iniconfig",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"25.0"
]
]
},
{
"name": "pluggy",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "pygments",
"specs": [
[
"==",
"2.19.2"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"8.4.1"
]
]
}
],
"lcname": "filter-url"
}