| Name | xtractmime JSON |
| Version |
0.2.0
JSON |
| download |
| home_page | https://github.com/scrapy/xtractmime |
| Summary | Implementation of the MIME Sniffing standard (https://mimesniff.spec.whatwg.org/) |
| upload_time | 2023-08-31 13:35:23 |
| maintainer | |
| docs_url | None |
| author | Akshay Sharma |
| requires_python | >=3.7 |
| license | BSD |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# xtractmime
`xtractmime` is a [BSD-licensed](https://opensource.org/licenses/BSD-3-Clause)
Python 3.7+ implementation of the [MIME Sniffing
Standard](https://mimesniff.spec.whatwg.org/).
Install from [`PyPI`](https://pypi.python.org/pypi/xtractmime):
```
pip install xtractmime
```
---
## Basic usage
Below mentioned are some simple examples of using `xtractmime.extract_mime`:
```python
>>> from xtractmime import extract_mime
>>> extract_mime(b'Sample text content')
b'text/plain'
>>> extract_mime(b'', content_types=(b'text/html',))
b'text/html'
```
Additional functionality to check if a MIME type belongs to a specific MIME type group using
methods included in `xtractmime.mimegroups`:
```python
>>> from xtractmime.mimegroups import is_html_mime_type, is_image_mime_type
>>> mime_type = b'text/html'
>>> is_html_mime_type(mime_type)
True
>>> is_image_mime_type(mime_type)
False
```
---
## API Reference
### function `xtractmime.extract_mime(*args, **kwargs) -> Optional[bytes]`
**Parameters:**
* `body: bytes`
* `content_types: Optional[Tuple[bytes]] = None`
* `http_origin: bool = True`
* `no_sniff: bool = False`
* `extra_types: Optional[Tuple[Tuple[bytes, bytes, Optional[Set[bytes]], bytes], ...]] = None`
* `supported_types: Set[bytes] = None`
Return the [MIME type essence](https://mimesniff.spec.whatwg.org/#mime-type-essence) (e.g. `text/html`) matching the input data, or
`None` if no match can be found.
The `body` parameter is the byte sequence of which MIME type is to be determined. `xtractmime` only considers the first few
bytes of the `body` and the specific number of bytes read is defined in the `xtractmime.RESOURCE_HEADER_BUFFER_LENGTH` constant.
`content_types` is a tuple of MIME types given in the resource metadata. For example, for resources retrieved via HTTP, users should pass the list of MIME types mentioned in the `Content-Type` header.
`http_origin` indicates if the resource has been retrieved via HTTP (`True`, default) or not (`False`).
`no_sniff` is a flag which is *`True`* if the user agent does not wish to
perform sniffing on the resource and *`False`* (by default) otherwise. Users may want to set
this parameter to *`True`* if the [`X-Content-Type-Options`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Content-Type-Options) response header is set to `nosniff`. For more info, see [here](https://mimesniff.spec.whatwg.org/#no-sniff-flag).
`extra_types` is a tuple of patterns to support detecting additional MIME types. Each entry in the tuple should follow the format
**(Byte Pattern, Pattern Mask, Leading Bytes, MIME type)**:
* **Byte Pattern** is a byte sequence to compare with the first few bytes (``xtractmime.RESOURCE_HEADER_BUFFER_LENGTH``) of the `body`.
* **Pattern Mask** is a byte sequence that indicates the significance of **Byte Pattern** bytes: `b"\xff"` indicates the matching byte is strictly significant, `b"\xdf"` indicates that the byte is significant in an ASCII case-insensitive way, and `b"\x00"` indicates that the byte is not significant.
* **Leading Bytes** is a set of bytes to be ignored while matching the leading bytes in the content.
* **MIME type** should be returned if the pattern matches.
**Sample `extra_types`:**
```python
extra_types = ((b'test', b'\xff\xff\xff\xff', None, b'text/test'), ...)
```
---
**NOTE**
*Be careful while using the `extra_types` argument, as it may introduce some privilege escalation vulnerabilities for `xtractmime`. For more info, see [here](https://mimesniff.spec.whatwg.org/#ref-for-mime-type%E2%91%A1%E2%91%A8).*
---
Optional `supported_types` is a set of all [MIME types supported the by user agent](https://mimesniff.spec.whatwg.org/#supported-by-the-user-agent). If `supported_types` is not
specified, all MIME types are assumed to be supported. Using this parameter can improve the performance of `xtractmime`.
### function `xtractmime.is_binary_data(input_bytes: bytes) -> bool`
Return *`True`* if the provided byte sequence contains any binary data bytes, else *`False`*
### MIME type group functions
The following functions return `True` if a given MIME type belongs to a certain
[MIME type group](https://mimesniff.spec.whatwg.org/#mime-type-groups), or
`False` otherwise:
```
xtractmime.mimegroups.is_archive_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_audio_video_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_font_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_html_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_image_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_javascript_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_json_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_scriptable_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_xml_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_zip_mime_type(mime_type: bytes) -> bool
```
**Example**
```python
>>> from xtractmime.mimegroups import is_html_mime_type, is_image_mime_type, is_zip_mime_type
>>> mime_type = b'text/html'
>>> is_html_mime_type(mime_type)
True
>>> is_image_mime_type(mime_type)
False
>>> is_zip_mime_type(mime_type)
False
```
## Changelog
See the [changelog](CHANGELOG.md)
Raw data
{
"_id": null,
"home_page": "https://github.com/scrapy/xtractmime",
"name": "xtractmime",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "Akshay Sharma",
"author_email": "akshaysharmajs@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f6/86/a0d1a651cf4780cc6e7708f46b4de4255997bf21f22dfdd8e5d1310589ec/xtractmime-0.2.0.tar.gz",
"platform": null,
"description": "# xtractmime\n\n`xtractmime` is a [BSD-licensed](https://opensource.org/licenses/BSD-3-Clause)\nPython 3.7+ implementation of the [MIME Sniffing\nStandard](https://mimesniff.spec.whatwg.org/).\n\nInstall from [`PyPI`](https://pypi.python.org/pypi/xtractmime):\n\n```\npip install xtractmime\n```\n\n---\n\n## Basic usage\n\nBelow mentioned are some simple examples of using `xtractmime.extract_mime`:\n\n```python\n>>> from xtractmime import extract_mime\n>>> extract_mime(b'Sample text content')\nb'text/plain'\n>>> extract_mime(b'', content_types=(b'text/html',))\nb'text/html'\n```\n\nAdditional functionality to check if a MIME type belongs to a specific MIME type group using \nmethods included in `xtractmime.mimegroups`:\n\n```python\n>>> from xtractmime.mimegroups import is_html_mime_type, is_image_mime_type\n>>> mime_type = b'text/html'\n>>> is_html_mime_type(mime_type)\nTrue\n>>> is_image_mime_type(mime_type)\nFalse\n```\n\n---\n\n## API Reference\n\n### function `xtractmime.extract_mime(*args, **kwargs) -> Optional[bytes]`\n**Parameters:**\n\n* `body: bytes`\n* `content_types: Optional[Tuple[bytes]] = None`\n* `http_origin: bool = True`\n* `no_sniff: bool = False`\n* `extra_types: Optional[Tuple[Tuple[bytes, bytes, Optional[Set[bytes]], bytes], ...]] = None`\n* `supported_types: Set[bytes] = None`\n\nReturn the [MIME type essence](https://mimesniff.spec.whatwg.org/#mime-type-essence) (e.g. `text/html`) matching the input data, or \n`None` if no match can be found.\n\nThe `body` parameter is the byte sequence of which MIME type is to be determined. `xtractmime` only considers the first few\nbytes of the `body` and the specific number of bytes read is defined in the `xtractmime.RESOURCE_HEADER_BUFFER_LENGTH` constant.\n\n`content_types` is a tuple of MIME types given in the resource metadata. For example, for resources retrieved via HTTP, users should pass the list of MIME types mentioned in the `Content-Type` header.\n\n`http_origin` indicates if the resource has been retrieved via HTTP (`True`, default) or not (`False`).\n\n`no_sniff` is a flag which is *`True`* if the user agent does not wish to\nperform sniffing on the resource and *`False`* (by default) otherwise. Users may want to set\nthis parameter to *`True`* if the [`X-Content-Type-Options`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Content-Type-Options) response header is set to `nosniff`. For more info, see [here](https://mimesniff.spec.whatwg.org/#no-sniff-flag).\n\n`extra_types` is a tuple of patterns to support detecting additional MIME types. Each entry in the tuple should follow the format\n**(Byte Pattern, Pattern Mask, Leading Bytes, MIME type)**:\n\n* **Byte Pattern** is a byte sequence to compare with the first few bytes (``xtractmime.RESOURCE_HEADER_BUFFER_LENGTH``) of the `body`.\n* **Pattern Mask** is a byte sequence that indicates the significance of **Byte Pattern** bytes: `b\"\\xff\"` indicates the matching byte is strictly significant, `b\"\\xdf\"` indicates that the byte is significant in an ASCII case-insensitive way, and `b\"\\x00\"` indicates that the byte is not significant.\n* **Leading Bytes** is a set of bytes to be ignored while matching the leading bytes in the content.\n* **MIME type** should be returned if the pattern matches.\n\n**Sample `extra_types`:**\n```python\nextra_types = ((b'test', b'\\xff\\xff\\xff\\xff', None, b'text/test'), ...)\n```\n\n---\n**NOTE**\n\n*Be careful while using the `extra_types` argument, as it may introduce some privilege escalation vulnerabilities for `xtractmime`. For more info, see [here](https://mimesniff.spec.whatwg.org/#ref-for-mime-type%E2%91%A1%E2%91%A8).*\n\n---\n\nOptional `supported_types` is a set of all [MIME types supported the by user agent](https://mimesniff.spec.whatwg.org/#supported-by-the-user-agent). If `supported_types` is not\nspecified, all MIME types are assumed to be supported. Using this parameter can improve the performance of `xtractmime`.\n\n### function `xtractmime.is_binary_data(input_bytes: bytes) -> bool`\n\nReturn *`True`* if the provided byte sequence contains any binary data bytes, else *`False`*\n \n### MIME type group functions\n\nThe following functions return `True` if a given MIME type belongs to a certain \n[MIME type group](https://mimesniff.spec.whatwg.org/#mime-type-groups), or \n`False` otherwise:\n```\nxtractmime.mimegroups.is_archive_mime_type(mime_type: bytes) -> bool\nxtractmime.mimegroups.is_audio_video_mime_type(mime_type: bytes) -> bool\nxtractmime.mimegroups.is_font_mime_type(mime_type: bytes) -> bool\nxtractmime.mimegroups.is_html_mime_type(mime_type: bytes) -> bool\nxtractmime.mimegroups.is_image_mime_type(mime_type: bytes) -> bool\nxtractmime.mimegroups.is_javascript_mime_type(mime_type: bytes) -> bool\nxtractmime.mimegroups.is_json_mime_type(mime_type: bytes) -> bool\nxtractmime.mimegroups.is_scriptable_mime_type(mime_type: bytes) -> bool\nxtractmime.mimegroups.is_xml_mime_type(mime_type: bytes) -> bool\nxtractmime.mimegroups.is_zip_mime_type(mime_type: bytes) -> bool\n```\n**Example**\n```python\n>>> from xtractmime.mimegroups import is_html_mime_type, is_image_mime_type, is_zip_mime_type\n>>> mime_type = b'text/html'\n>>> is_html_mime_type(mime_type)\nTrue\n>>> is_image_mime_type(mime_type)\nFalse\n>>> is_zip_mime_type(mime_type)\nFalse\n```\n\n\n## Changelog\n\nSee the [changelog](CHANGELOG.md)\n",
"bugtrack_url": null,
"license": "BSD",
"summary": "Implementation of the MIME Sniffing standard (https://mimesniff.spec.whatwg.org/)",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/scrapy/xtractmime"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fafd2063d24613fedd7a42dcbe826f72a27173463eaf70781df1440ab8bda3e8",
"md5": "1f6447782b029df71cab347a84c8b54e",
"sha256": "348f1fe8b646877d9b97314c936542198175eb09517989d6925ae992b101b6a4"
},
"downloads": -1,
"filename": "xtractmime-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1f6447782b029df71cab347a84c8b54e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 10179,
"upload_time": "2023-08-31T13:35:21",
"upload_time_iso_8601": "2023-08-31T13:35:21.749400Z",
"url": "https://files.pythonhosted.org/packages/fa/fd/2063d24613fedd7a42dcbe826f72a27173463eaf70781df1440ab8bda3e8/xtractmime-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f686a0d1a651cf4780cc6e7708f46b4de4255997bf21f22dfdd8e5d1310589ec",
"md5": "e832e1c3376825b2ac7d3362c84cdb37",
"sha256": "9eae1b5947f37e83cae32e1d7c6e9cc43fab53fd90f90f0f16cf2859423be718"
},
"downloads": -1,
"filename": "xtractmime-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "e832e1c3376825b2ac7d3362c84cdb37",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 14540,
"upload_time": "2023-08-31T13:35:23",
"upload_time_iso_8601": "2023-08-31T13:35:23.387249Z",
"url": "https://files.pythonhosted.org/packages/f6/86/a0d1a651cf4780cc6e7708f46b4de4255997bf21f22dfdd8e5d1310589ec/xtractmime-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-31 13:35:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scrapy",
"github_project": "xtractmime",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "xtractmime"
}