filter-url
==========
[](https://pypi.org/project/filter-url/)
[](https://pypi.org/project/filter-url/)
[](https://pypi.org/project/filter-url/)
[](https://coveralls.io/github/alexsemenyaka/filter_url?branch=main)
[](https://github.com/alexsemenyaka/filter_url/actions/workflows/ci.yml)
A simple, fast, and configurable Python utility to censor sensitive data (passwords, API keys, tokens) from URLs, making them safe for logging, monitoring, and debugging.
Key Features
------------
* **Comprehensive Censoring**: Censors passwords in userinfo (`user:[...]@host`), query parameter values, and parts of the URL path.
* **Flexible Rules**: Filter query parameters by exact key names or by powerful regular expressions.
* **Advanced Path Filtering**: Use regex with named capture groups to censor specific dynamic parts of a URL path while leaving the rest intact.
* **Order Preserving**: Guarantees that the order of query parameters in the output is identical to the input.
* **Logging Integration**: Provides a ready-to-use `logging.Filter` subclass for seamless integration into your application's logging setup.
* **Lightweight**: Zero external dependencies.
Installation
------------
pip install filter-url
Quick Start
-----------
The quickest way to use the library is the standalone `filter_url()` function, which uses a default set of rules to catch common sensitive keys.
from filter_url import filter_url
dirty_url = "https://user:my-secret-password@example.com/data?token=abc-123-xyz"
# Use the function with default filters
clean_url = filter_url(dirty_url)
print(clean_url)
# >> https://user:[...]@example.com/data?token=[...]
Usage & Examples
----------------
### Basic Filtering (Standalone Function)
The `filter_url()` function is great for one-off tasks. You can pass your own filtering rules directly to it. If a rule is not provided, a sensible default is used.
from filter_url import filter_url
# Define custom rules
custom_path_re = r'/user/(?P<user_id>\d+)/profile'
dirty_url = "https://example.com/user/123456/profile?credit_card_number=5555"
# Censor using a custom path regex
clean_url = filter_url(
url=dirty_url,
bad_path_re=custom_path_re
)
print(clean_url)
# >> https://example.com/user/[...]/profile?credit_card_number=5555
### Advanced: Using the `FilterURL` Class for Performance
When you need to filter a large number of URLs with the same configuration, it's much more efficient to instantiate the `FilterURL` class once. This pre-compiles the regular expressions and avoids redundant work in a loop.
from filter_url import FilterURL
# Create the filter instance ONCE with your custom rules.
# The regexes are compiled here.
my_filter = FilterURL(
bad_keys={'api_key'},
bad_keys_re=[r'session']
)
urls_to_process = [
"https://service.com/api?api_key=key-1",
"https://service.com/api?user_session=sess-2",
"https://service.com/api?id=3"
]
# Reuse the same instance in a loop for high performance
clean_urls = [my_filter.remove_sensitive(url) for url in urls_to_process]
# clean_urls will be:
# [
# 'https://service.com/api?api_key=[...]',
# 'https://service.com/api?user_session=[...]',
# 'https://service.com/api?id=3'
# ]
### Integration with Python's `logging` Module
This is the most powerful feature for real-world applications. The `URLFilter` automatically censors URLs in your logs. The filter works in two ways:
1. **(Preferred)** It looks for a `url` key in the `extra` dictionary of your logging call.
2. **(Fallback)** If `fallback=True` (the default), it searches for URLs in the positional arguments of the log message.
```python
import logging
import sys
from filter_url import URLFilter
# 1. Configure a logger
logger = logging.getLogger('my_app')
logger.setLevel(logging.INFO)
if logger.hasHandlers():
logger.handlers.clear()
# 2. Simply add our filter. Let's use custom rules for this example
custom_filter = URLFilter(
bad_keys={'access_token'},
fallback=True # Default, but shown for clarity
)
logger.addFilter(custom_filter)
# 3. Use a standard Formatter. No special formatter is needed
handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter('%(levelname)s: %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
# --- Usage Examples ---
# Case 1: (Preferred) Pass the URL via 'extra'
logger.info(
"User login attempt failed",
extra={'url': "<https://auth.service.com/login?access_token=12345"}>
)
# Case 2: (Fallback) The URL is an argument in the message string
logger.info(
"API call to %s resulted in a 404 error.",
"<https://api.service.com/data/v1/user?password=abc>"
)
# Case 3: No URL in the message. Nothing extra is added
logger.info("Application started successfully.")
```
**Expected Output:**
INFO: User login attempt failed | (URL data: https://auth.service.com/login?access_token=[...])
INFO: API call to https://api.service.com/data/v1/user?password=[...] was made. | (URL data: https://api.service.com/data/v1/user?password=[...])
INFO: Application started successfully.
Corner Cases & Considerations
-----------------------------
* **Log String vs. Valid URL**: The primary goal of this library is to produce a human-readable, safe string for logging. The output string containing `[...]` in the userinfo (password) section is not a valid URL according to RFC standards and may fail if you try to parse it again with `urllib.parse`.
* **Performance**: For filtering a large number of URLs, always instantiate the `FilterURL` class once and reuse the instance. The standalone `filter_url()` function re-compiles regexes on every call and is less performant for batch jobs.
* **Logging Filter Precedence**: When using `URLFilter`, providing a URL in the `extra` dictionary is always the preferred method. The `fallback` search will only trigger if a `url` key is not found in `extra`.
API Reference
-------------
* `filter_url(url, censored, bad_keys, bad_keys_re, bad_path_re)`: A standalone function for one-off URL censoring.
* `FilterURL(bad_keys, bad_keys_re, bad_path_re)`: A class that holds a compiled filter configuration for efficient, repeated use.
* `.remove_sensitive(url, censored)`: The method that performs the censoring.
* `URLFilter(bad_keys, bad_keys_re, bad_path_re, fallback)`: A `logging.Filter` subclass for easy integration with Python's logging module.
License
-------
This project is licensed under the MIT License.
Raw data
{
"_id": null,
"home_page": null,
"name": "filter-url",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "url, filter, filtering, URL filtering, logging",
"author": null,
"author_email": "Alex Semenyaka <alex.semenyaka@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/d2/0c/9c8fa3c2bb5835715ffbc5ce570bf8f973aa9879bca45f76109b559761f0/filter_url-1.0.0.tar.gz",
"platform": null,
"description": "filter-url\n==========\n\n[](https://pypi.org/project/filter-url/)\n[](https://pypi.org/project/filter-url/)\n[](https://pypi.org/project/filter-url/)\n[](https://coveralls.io/github/alexsemenyaka/filter_url?branch=main)\n[](https://github.com/alexsemenyaka/filter_url/actions/workflows/ci.yml)\n\nA simple, fast, and configurable Python utility to censor sensitive data (passwords, API keys, tokens) from URLs, making them safe for logging, monitoring, and debugging.\n\nKey Features\n------------\n\n* **Comprehensive Censoring**: Censors passwords in userinfo (`user:[...]@host`), query parameter values, and parts of the URL path.\n* **Flexible Rules**: Filter query parameters by exact key names or by powerful regular expressions.\n* **Advanced Path Filtering**: Use regex with named capture groups to censor specific dynamic parts of a URL path while leaving the rest intact.\n* **Order Preserving**: Guarantees that the order of query parameters in the output is identical to the input.\n* **Logging Integration**: Provides a ready-to-use `logging.Filter` subclass for seamless integration into your application's logging setup.\n* **Lightweight**: Zero external dependencies.\n\nInstallation\n------------\n\n pip install filter-url\n\nQuick Start\n-----------\n\nThe quickest way to use the library is the standalone `filter_url()` function, which uses a default set of rules to catch common sensitive keys.\n\n from filter_url import filter_url\n\n dirty_url = \"https://user:my-secret-password@example.com/data?token=abc-123-xyz\"\n\n # Use the function with default filters\n clean_url = filter_url(dirty_url)\n\n print(clean_url)\n # >> https://user:[...]@example.com/data?token=[...]\n\nUsage & Examples\n----------------\n\n### Basic Filtering (Standalone Function)\n\nThe `filter_url()` function is great for one-off tasks. You can pass your own filtering rules directly to it. If a rule is not provided, a sensible default is used.\n\n from filter_url import filter_url\n\n # Define custom rules\n custom_path_re = r'/user/(?P<user_id>\\d+)/profile'\n\n dirty_url = \"https://example.com/user/123456/profile?credit_card_number=5555\"\n\n # Censor using a custom path regex\n clean_url = filter_url(\n url=dirty_url,\n bad_path_re=custom_path_re\n )\n\n print(clean_url)\n # >> https://example.com/user/[...]/profile?credit_card_number=5555\n\n### Advanced: Using the `FilterURL` Class for Performance\n\nWhen you need to filter a large number of URLs with the same configuration, it's much more efficient to instantiate the `FilterURL` class once. This pre-compiles the regular expressions and avoids redundant work in a loop.\n\n from filter_url import FilterURL\n\n # Create the filter instance ONCE with your custom rules.\n # The regexes are compiled here.\n my_filter = FilterURL(\n bad_keys={'api_key'},\n bad_keys_re=[r'session']\n )\n\n urls_to_process = [\n \"https://service.com/api?api_key=key-1\",\n \"https://service.com/api?user_session=sess-2\",\n \"https://service.com/api?id=3\"\n ]\n\n # Reuse the same instance in a loop for high performance\n clean_urls = [my_filter.remove_sensitive(url) for url in urls_to_process]\n\n # clean_urls will be:\n # [\n # 'https://service.com/api?api_key=[...]',\n # 'https://service.com/api?user_session=[...]',\n # 'https://service.com/api?id=3'\n # ]\n\n### Integration with Python's `logging` Module\n\nThis is the most powerful feature for real-world applications. The `URLFilter` automatically censors URLs in your logs. The filter works in two ways:\n1. **(Preferred)** It looks for a `url` key in the `extra` dictionary of your logging call.\n2. **(Fallback)** If `fallback=True` (the default), it searches for URLs in the positional arguments of the log message.\n\n\n```python\n import logging\n import sys\n from filter_url import URLFilter\n\n # 1. Configure a logger\n\n logger = logging.getLogger('my_app')\n logger.setLevel(logging.INFO)\n if logger.hasHandlers():\n logger.handlers.clear()\n\n # 2. Simply add our filter. Let's use custom rules for this example\n\n custom_filter = URLFilter(\n bad_keys={'access_token'},\n fallback=True # Default, but shown for clarity\n )\n logger.addFilter(custom_filter)\n\n # 3. Use a standard Formatter. No special formatter is needed\n\n handler = logging.StreamHandler(sys.stdout)\n formatter = logging.Formatter('%(levelname)s: %(message)s')\n handler.setFormatter(formatter)\n logger.addHandler(handler)\n\n # --- Usage Examples ---\n\n # Case 1: (Preferred) Pass the URL via 'extra'\n\n logger.info(\n \"User login attempt failed\",\n extra={'url': \"<https://auth.service.com/login?access_token=12345\"}>\n )\n\n # Case 2: (Fallback) The URL is an argument in the message string\n\n logger.info(\n \"API call to %s resulted in a 404 error.\",\n \"<https://api.service.com/data/v1/user?password=abc>\"\n )\n\n # Case 3: No URL in the message. Nothing extra is added\n\n logger.info(\"Application started successfully.\")\n```\n\n**Expected Output:**\n\n INFO: User login attempt failed | (URL data: https://auth.service.com/login?access_token=[...])\n INFO: API call to https://api.service.com/data/v1/user?password=[...] was made. | (URL data: https://api.service.com/data/v1/user?password=[...])\n INFO: Application started successfully.\n\nCorner Cases & Considerations\n-----------------------------\n\n* **Log String vs. Valid URL**: The primary goal of this library is to produce a human-readable, safe string for logging. The output string containing `[...]` in the userinfo (password) section is not a valid URL according to RFC standards and may fail if you try to parse it again with `urllib.parse`.\n* **Performance**: For filtering a large number of URLs, always instantiate the `FilterURL` class once and reuse the instance. The standalone `filter_url()` function re-compiles regexes on every call and is less performant for batch jobs.\n* **Logging Filter Precedence**: When using `URLFilter`, providing a URL in the `extra` dictionary is always the preferred method. The `fallback` search will only trigger if a `url` key is not found in `extra`.\n\nAPI Reference\n-------------\n\n* `filter_url(url, censored, bad_keys, bad_keys_re, bad_path_re)`: A standalone function for one-off URL censoring.\n* `FilterURL(bad_keys, bad_keys_re, bad_path_re)`: A class that holds a compiled filter configuration for efficient, repeated use.\n * `.remove_sensitive(url, censored)`: The method that performs the censoring.\n* `URLFilter(bad_keys, bad_keys_re, bad_path_re, fallback)`: A `logging.Filter` subclass for easy integration with Python's logging module.\n\nLicense\n-------\n\nThis project is licensed under the MIT License.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A simple, fast, and configurable URL sensitive data filter",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/alexsemenyaka/filter_url",
"Issues": "https://github.com/alexsemenyaka/filter_url/issues",
"Repository": "https://github.com/alexsemenyaka/filter_url"
},
"split_keywords": [
"url",
" filter",
" filtering",
" url filtering",
" logging"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ae5d3d2887a21fd1715add2cbded47814596ddb408d80171c409b90d0fe0203a",
"md5": "6e753a1b56a326b5285340c52f564858",
"sha256": "78a2190f9f7445058f0f81a89d90fe6cb1b0e1b6d1651e895d4c703dcff107fd"
},
"downloads": -1,
"filename": "filter_url-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6e753a1b56a326b5285340c52f564858",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 8404,
"upload_time": "2025-07-14T23:27:05",
"upload_time_iso_8601": "2025-07-14T23:27:05.905325Z",
"url": "https://files.pythonhosted.org/packages/ae/5d/3d2887a21fd1715add2cbded47814596ddb408d80171c409b90d0fe0203a/filter_url-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "d20c9c8fa3c2bb5835715ffbc5ce570bf8f973aa9879bca45f76109b559761f0",
"md5": "7a181e7f3e6fa29f24a479b192860923",
"sha256": "672ee5a4c9af8092e66bb16deb9b0f4291e59f375f05bb9345ff7a82242042e6"
},
"downloads": -1,
"filename": "filter_url-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "7a181e7f3e6fa29f24a479b192860923",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 10078,
"upload_time": "2025-07-14T23:27:07",
"upload_time_iso_8601": "2025-07-14T23:27:07.093693Z",
"url": "https://files.pythonhosted.org/packages/d2/0c/9c8fa3c2bb5835715ffbc5ce570bf8f973aa9879bca45f76109b559761f0/filter_url-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-14 23:27:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "alexsemenyaka",
"github_project": "filter_url",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [
{
"name": "iniconfig",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"25.0"
]
]
},
{
"name": "pluggy",
"specs": [
[
"==",
"1.6.0"
]
]
},
{
"name": "pygments",
"specs": [
[
"==",
"2.19.2"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"8.4.1"
]
]
}
],
"lcname": "filter-url"
}