puremagic
=========
puremagic is a pure python module that will identify a file based off
it's magic numbers.
|CoverageStatus| |License| |PyPi|
It is designed to be minimalistic and inherently cross platform
compatible. It is also designed to be a stand in for python-magic, it
incorporates the functions from\_file(filename[, mime]) and
from\_string(string[, mime]) however the magic\_file() and
magic\_string() are more powerful and will also display confidence and
duplicate matches.
It does NOT try to match files off non-magic string. In other words it
will not search for a string within a certain window of bytes like
others might.
Advantages over using a wrapper for 'file' or 'libmagic':
- Faster
- Lightweight
- Cross platform compatible
- No dependencies
Disadvantages:
- Does not have as many file types
- No multilingual comments
- Duplications due to small or reused magic numbers
(Help fix the first two disadvantages by contributing!)
Compatibility
~~~~~~~~~~~~~
- Python 3.7+
Using github ci to run continuous integration tests on listed platforms.
Install from pypy
-----------------
.. code:: bash
$ pip install puremagic
On linux environments, you may want to be clear you are using python3
.. code:: bash
$ python3 -m pip install puremagic
Install from source
-------------------
In either a virtualenv or globally, simply run:
.. code:: bash
$ python setup.py install
Usage
-----
"from_file" will return the most likely file extension. "magic_file"
will give you every possible result it finds, as well as the confidence.
.. code:: python
import puremagic
filename = "test/resources/images/test.gif"
ext = puremagic.from_file(filename)
# '.gif'
puremagic.magic_file(filename)
# [['.gif', 'image/gif', 'Graphics interchange format file (GIF87a)', 0.7],
# ['.gif', '', 'GIF file', 0.5]]
With "magic_file" it gives each match, highest confidence first:
- possible extension(s)
- mime type
- description
- confidence (All headers have to perfectly match to make the list,
however this orders it by longest header, therefore most precise,
first)
If you already have a file open, or raw byte string, you could also use:
* from_string
* from_stream
* magic_string
* magic_stream
.. code:: python
with open(r"test\resources\video\test.mp4", "rb") as file:
print(puremagic.magic_stream(file))
# [PureMagicWithConfidence(byte_match=b'ftypisom', offset=4, extension='.mp4', mime_type='video/mp4', name='MPEG-4 video', confidence=0.8),
# PureMagicWithConfidence(byte_match=b'iso2avc1mp4', offset=20, extension='.mp4', mime_type='video/mp4', name='MP4 Video', confidence=0.8)]
Script
------
*Usage*
.. code:: bash
$ python -m puremagic [options] filename <filename2>...
*Examples*
.. code:: bash
$ python -m puremagic test/resources/images/test.gif
'test/resources/images/test.gif' : .gif
$ python -m puremagic -m test/resources/images/test.gif test/resources/audio/test.mp3
'test/resources/images/test.gif' : image/gif
'test/resources/audio/test.mp3' : audio/mpeg
imghdr replacement
------------------
If you are looking for a replacement for the standard library's depreciated imghdr, you can use `puremagic.what()`
.. code:: python
import puremagic
filename = "test/resources/images/test.gif"
ext = puremagic.what(filename)
# 'gif'
FAQ
---
*The file type is actually X but it's showing up as Y with higher
confidence?*
This can happen when the file's signature happens to match a subset of a
file standard. The subset signature will be longer, therefore report
with greater confidence, because it will have both the base file type
signature plus the additional subset one.
*You don't have sliding offsets that could better detect plenty of
common formats, why's that?*
Design choice, so it will be a lot faster and more accurate. Without
more intelligent or deeper identification past a sliding offset I don't
feel comfortable including it as part of a 'magic number' library.
*Your version isn't as complete as I want it to be, where else should I
look?*
Look into python modules that wrap around libmagic or use something like
Apache Tika.
Acknowledgements
----------------
Gary C. Kessler
For use of his File Signature Tables, available at:
http://www.garykessler.net/library/file_sigs.html
Freedesktop.org
For use of their shared-mime-info file, available at:
https://cgit.freedesktop.org/xdg/shared-mime-info/
License
-------
MIT Licenced, see LICENSE, Copyright (c) 2013-2024 Chris Griffith
.. |CoverageStatus| image:: https://coveralls.io/repos/github/cdgriffith/puremagic/badge.svg?branch=develop
:target: https://coveralls.io/github/cdgriffith/puremagic?branch=develop
.. |PyPi| image:: https://img.shields.io/pypi/v/puremagic.svg?maxAge=2592000
:target: https://pypi.python.org/pypi/puremagic/
.. |License| image:: https://img.shields.io/pypi/l/puremagic.svg
:target: https://pypi.python.org/pypi/puremagic/
Raw data
{
"_id": null,
"home_page": "https://github.com/cdgriffith/puremagic",
"name": "puremagic",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Chris Griffith",
"author_email": "chris@cdgriffith.com",
"download_url": "https://files.pythonhosted.org/packages/09/2d/40599f25667733e41bbc3d7e4c7c36d5e7860874aa5fe9c584e90b34954d/puremagic-1.28.tar.gz",
"platform": "any",
"description": "puremagic\n=========\n\npuremagic is a pure python module that will identify a file based off\nit's magic numbers.\n\n|CoverageStatus| |License| |PyPi|\n\nIt is designed to be minimalistic and inherently cross platform\ncompatible. It is also designed to be a stand in for python-magic, it\nincorporates the functions from\\_file(filename[, mime]) and\nfrom\\_string(string[, mime]) however the magic\\_file() and\nmagic\\_string() are more powerful and will also display confidence and\nduplicate matches.\n\nIt does NOT try to match files off non-magic string. In other words it\nwill not search for a string within a certain window of bytes like\nothers might.\n\nAdvantages over using a wrapper for 'file' or 'libmagic':\n\n- Faster\n- Lightweight\n- Cross platform compatible\n- No dependencies\n\nDisadvantages:\n\n- Does not have as many file types\n- No multilingual comments\n- Duplications due to small or reused magic numbers\n\n(Help fix the first two disadvantages by contributing!)\n\nCompatibility\n~~~~~~~~~~~~~\n\n- Python 3.7+\n\nUsing github ci to run continuous integration tests on listed platforms.\n\nInstall from pypy\n-----------------\n\n.. code:: bash\n\n $ pip install puremagic\n\nOn linux environments, you may want to be clear you are using python3\n\n.. code:: bash\n\n $ python3 -m pip install puremagic\n\n\nInstall from source\n-------------------\n\nIn either a virtualenv or globally, simply run:\n\n.. code:: bash\n\n $ python setup.py install\n\nUsage\n-----\n\n\"from_file\" will return the most likely file extension. \"magic_file\"\nwill give you every possible result it finds, as well as the confidence.\n\n.. code:: python\n\n import puremagic\n\n filename = \"test/resources/images/test.gif\"\n\n ext = puremagic.from_file(filename)\n # '.gif'\n\n puremagic.magic_file(filename)\n # [['.gif', 'image/gif', 'Graphics interchange format file (GIF87a)', 0.7],\n # ['.gif', '', 'GIF file', 0.5]]\n\nWith \"magic_file\" it gives each match, highest confidence first:\n\n- possible extension(s)\n- mime type\n- description\n- confidence (All headers have to perfectly match to make the list,\n however this orders it by longest header, therefore most precise,\n first)\n\nIf you already have a file open, or raw byte string, you could also use:\n\n* from_string\n* from_stream\n* magic_string\n* magic_stream\n\n.. code:: python\n\n with open(r\"test\\resources\\video\\test.mp4\", \"rb\") as file:\n print(puremagic.magic_stream(file))\n\n # [PureMagicWithConfidence(byte_match=b'ftypisom', offset=4, extension='.mp4', mime_type='video/mp4', name='MPEG-4 video', confidence=0.8),\n # PureMagicWithConfidence(byte_match=b'iso2avc1mp4', offset=20, extension='.mp4', mime_type='video/mp4', name='MP4 Video', confidence=0.8)]\n\nScript\n------\n\n*Usage*\n\n.. code:: bash\n\n $ python -m puremagic [options] filename <filename2>...\n\n*Examples*\n\n.. code:: bash\n\n $ python -m puremagic test/resources/images/test.gif\n 'test/resources/images/test.gif' : .gif\n\n $ python -m puremagic -m test/resources/images/test.gif test/resources/audio/test.mp3\n 'test/resources/images/test.gif' : image/gif\n 'test/resources/audio/test.mp3' : audio/mpeg\n\nimghdr replacement\n------------------\n\nIf you are looking for a replacement for the standard library's depreciated imghdr, you can use `puremagic.what()`\n\n.. code:: python\n\n import puremagic\n\n filename = \"test/resources/images/test.gif\"\n\n ext = puremagic.what(filename)\n # 'gif'\n\nFAQ\n---\n\n*The file type is actually X but it's showing up as Y with higher\nconfidence?*\n\nThis can happen when the file's signature happens to match a subset of a\nfile standard. The subset signature will be longer, therefore report\nwith greater confidence, because it will have both the base file type\nsignature plus the additional subset one.\n\n*You don't have sliding offsets that could better detect plenty of\ncommon formats, why's that?*\n\nDesign choice, so it will be a lot faster and more accurate. Without\nmore intelligent or deeper identification past a sliding offset I don't\nfeel comfortable including it as part of a 'magic number' library.\n\n*Your version isn't as complete as I want it to be, where else should I\nlook?*\n\nLook into python modules that wrap around libmagic or use something like\nApache Tika.\n\nAcknowledgements\n----------------\n\nGary C. Kessler\n\n\nFor use of his File Signature Tables, available at:\nhttp://www.garykessler.net/library/file_sigs.html\n\nFreedesktop.org\n\nFor use of their shared-mime-info file, available at:\nhttps://cgit.freedesktop.org/xdg/shared-mime-info/\n\nLicense\n-------\n\nMIT Licenced, see LICENSE, Copyright (c) 2013-2024 Chris Griffith\n\n.. |CoverageStatus| image:: https://coveralls.io/repos/github/cdgriffith/puremagic/badge.svg?branch=develop\n :target: https://coveralls.io/github/cdgriffith/puremagic?branch=develop\n.. |PyPi| image:: https://img.shields.io/pypi/v/puremagic.svg?maxAge=2592000\n :target: https://pypi.python.org/pypi/puremagic/\n.. |License| image:: https://img.shields.io/pypi/l/puremagic.svg\n :target: https://pypi.python.org/pypi/puremagic/\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Pure python implementation of magic file detection",
"version": "1.28",
"project_urls": {
"Homepage": "https://github.com/cdgriffith/puremagic"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c553200a97332d10ed3edd7afcbc5f5543920ac59badfe5762598327999f012e",
"md5": "47c14ffee127ef084a4c0c743e182429",
"sha256": "e16cb9708ee2007142c37931c58f07f7eca956b3472489106a7245e5c3aa1241"
},
"downloads": -1,
"filename": "puremagic-1.28-py3-none-any.whl",
"has_sig": false,
"md5_digest": "47c14ffee127ef084a4c0c743e182429",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 43241,
"upload_time": "2024-09-25T19:55:55",
"upload_time_iso_8601": "2024-09-25T19:55:55.734907Z",
"url": "https://files.pythonhosted.org/packages/c5/53/200a97332d10ed3edd7afcbc5f5543920ac59badfe5762598327999f012e/puremagic-1.28-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "092d40599f25667733e41bbc3d7e4c7c36d5e7860874aa5fe9c584e90b34954d",
"md5": "de0256a7110744de7f2a3528e964a0ab",
"sha256": "195893fc129657f611b86b959aab337207d6df7f25372209269ed9e303c1a8c0"
},
"downloads": -1,
"filename": "puremagic-1.28.tar.gz",
"has_sig": false,
"md5_digest": "de0256a7110744de7f2a3528e964a0ab",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 314945,
"upload_time": "2024-09-25T19:55:58",
"upload_time_iso_8601": "2024-09-25T19:55:58.264975Z",
"url": "https://files.pythonhosted.org/packages/09/2d/40599f25667733e41bbc3d7e4c7c36d5e7860874aa5fe9c584e90b34954d/puremagic-1.28.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-25 19:55:58",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cdgriffith",
"github_project": "puremagic",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "puremagic"
}