puremagic


Namepuremagic JSON
Version 1.28 PyPI version JSON
download
home_pagehttps://github.com/cdgriffith/puremagic
SummaryPure python implementation of magic file detection
upload_time2024-09-25 19:55:58
maintainerNone
docs_urlNone
authorChris Griffith
requires_pythonNone
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            puremagic
=========

puremagic is a pure python module that will identify a file based off
it's magic numbers.

|CoverageStatus| |License| |PyPi|

It is designed to be minimalistic and inherently cross platform
compatible. It is also designed to be a stand in for python-magic, it
incorporates the functions from\_file(filename[, mime]) and
from\_string(string[, mime]) however the magic\_file() and
magic\_string() are more powerful and will also display confidence and
duplicate matches.

It does NOT try to match files off non-magic string. In other words it
will not search for a string within a certain window of bytes like
others might.

Advantages over using a wrapper for 'file' or 'libmagic':

-  Faster
-  Lightweight
-  Cross platform compatible
-  No dependencies

Disadvantages:

-  Does not have as many file types
-  No multilingual comments
-  Duplications due to small or reused magic numbers

(Help fix the first two disadvantages by contributing!)

Compatibility
~~~~~~~~~~~~~

-  Python 3.7+

Using github ci to run continuous integration tests on listed platforms.

Install from pypy
-----------------

.. code:: bash

        $ pip install puremagic

On linux environments, you may want to be clear you are using python3

.. code:: bash

        $ python3 -m pip install puremagic


Install from source
-------------------

In either a virtualenv or globally, simply run:

.. code:: bash

        $ python setup.py install

Usage
-----

"from_file" will return the most likely file extension. "magic_file"
will give you every possible result it finds, as well as the confidence.

.. code:: python

        import puremagic

        filename = "test/resources/images/test.gif"

        ext = puremagic.from_file(filename)
        # '.gif'

        puremagic.magic_file(filename)
        # [['.gif', 'image/gif', 'Graphics interchange format file (GIF87a)', 0.7],
        #  ['.gif', '', 'GIF file', 0.5]]

With "magic_file" it gives each match, highest confidence first:

-  possible extension(s)
-  mime type
-  description
-  confidence (All headers have to perfectly match to make the list,
   however this orders it by longest header, therefore most precise,
   first)

If you already have a file open, or raw byte string, you could also use:

* from_string
* from_stream
* magic_string
* magic_stream

.. code:: python

        with open(r"test\resources\video\test.mp4", "rb") as file:
            print(puremagic.magic_stream(file))

        # [PureMagicWithConfidence(byte_match=b'ftypisom', offset=4, extension='.mp4', mime_type='video/mp4', name='MPEG-4 video', confidence=0.8),
        #  PureMagicWithConfidence(byte_match=b'iso2avc1mp4', offset=20, extension='.mp4', mime_type='video/mp4', name='MP4 Video', confidence=0.8)]

Script
------

*Usage*

.. code:: bash

        $ python -m puremagic [options] filename <filename2>...

*Examples*

.. code:: bash

        $ python -m puremagic test/resources/images/test.gif
        'test/resources/images/test.gif' : .gif

        $ python -m puremagic -m test/resources/images/test.gif test/resources/audio/test.mp3
        'test/resources/images/test.gif' : image/gif
        'test/resources/audio/test.mp3' : audio/mpeg

imghdr replacement
------------------

If you are looking for a replacement for the standard library's depreciated imghdr, you can use `puremagic.what()`

.. code:: python

        import puremagic

        filename = "test/resources/images/test.gif"

        ext = puremagic.what(filename)
        # 'gif'

FAQ
---

*The file type is actually X but it's showing up as Y with higher
confidence?*

This can happen when the file's signature happens to match a subset of a
file standard. The subset signature will be longer, therefore report
with greater confidence, because it will have both the base file type
signature plus the additional subset one.

*You don't have sliding offsets that could better detect plenty of
common formats, why's that?*

Design choice, so it will be a lot faster and more accurate. Without
more intelligent or deeper identification past a sliding offset I don't
feel comfortable including it as part of a 'magic number' library.

*Your version isn't as complete as I want it to be, where else should I
look?*

Look into python modules that wrap around libmagic or use something like
Apache Tika.

Acknowledgements
----------------

Gary C. Kessler


For use of his File Signature Tables, available at:
http://www.garykessler.net/library/file_sigs.html

Freedesktop.org

For use of their shared-mime-info file, available at:
https://cgit.freedesktop.org/xdg/shared-mime-info/

License
-------

MIT Licenced, see LICENSE, Copyright (c) 2013-2024 Chris Griffith

.. |CoverageStatus| image:: https://coveralls.io/repos/github/cdgriffith/puremagic/badge.svg?branch=develop
   :target: https://coveralls.io/github/cdgriffith/puremagic?branch=develop
.. |PyPi| image:: https://img.shields.io/pypi/v/puremagic.svg?maxAge=2592000
   :target: https://pypi.python.org/pypi/puremagic/
.. |License| image:: https://img.shields.io/pypi/l/puremagic.svg
   :target: https://pypi.python.org/pypi/puremagic/



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cdgriffith/puremagic",
    "name": "puremagic",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Chris Griffith",
    "author_email": "chris@cdgriffith.com",
    "download_url": "https://files.pythonhosted.org/packages/09/2d/40599f25667733e41bbc3d7e4c7c36d5e7860874aa5fe9c584e90b34954d/puremagic-1.28.tar.gz",
    "platform": "any",
    "description": "puremagic\n=========\n\npuremagic is a pure python module that will identify a file based off\nit's magic numbers.\n\n|CoverageStatus| |License| |PyPi|\n\nIt is designed to be minimalistic and inherently cross platform\ncompatible. It is also designed to be a stand in for python-magic, it\nincorporates the functions from\\_file(filename[, mime]) and\nfrom\\_string(string[, mime]) however the magic\\_file() and\nmagic\\_string() are more powerful and will also display confidence and\nduplicate matches.\n\nIt does NOT try to match files off non-magic string. In other words it\nwill not search for a string within a certain window of bytes like\nothers might.\n\nAdvantages over using a wrapper for 'file' or 'libmagic':\n\n-  Faster\n-  Lightweight\n-  Cross platform compatible\n-  No dependencies\n\nDisadvantages:\n\n-  Does not have as many file types\n-  No multilingual comments\n-  Duplications due to small or reused magic numbers\n\n(Help fix the first two disadvantages by contributing!)\n\nCompatibility\n~~~~~~~~~~~~~\n\n-  Python 3.7+\n\nUsing github ci to run continuous integration tests on listed platforms.\n\nInstall from pypy\n-----------------\n\n.. code:: bash\n\n        $ pip install puremagic\n\nOn linux environments, you may want to be clear you are using python3\n\n.. code:: bash\n\n        $ python3 -m pip install puremagic\n\n\nInstall from source\n-------------------\n\nIn either a virtualenv or globally, simply run:\n\n.. code:: bash\n\n        $ python setup.py install\n\nUsage\n-----\n\n\"from_file\" will return the most likely file extension. \"magic_file\"\nwill give you every possible result it finds, as well as the confidence.\n\n.. code:: python\n\n        import puremagic\n\n        filename = \"test/resources/images/test.gif\"\n\n        ext = puremagic.from_file(filename)\n        # '.gif'\n\n        puremagic.magic_file(filename)\n        # [['.gif', 'image/gif', 'Graphics interchange format file (GIF87a)', 0.7],\n        #  ['.gif', '', 'GIF file', 0.5]]\n\nWith \"magic_file\" it gives each match, highest confidence first:\n\n-  possible extension(s)\n-  mime type\n-  description\n-  confidence (All headers have to perfectly match to make the list,\n   however this orders it by longest header, therefore most precise,\n   first)\n\nIf you already have a file open, or raw byte string, you could also use:\n\n* from_string\n* from_stream\n* magic_string\n* magic_stream\n\n.. code:: python\n\n        with open(r\"test\\resources\\video\\test.mp4\", \"rb\") as file:\n            print(puremagic.magic_stream(file))\n\n        # [PureMagicWithConfidence(byte_match=b'ftypisom', offset=4, extension='.mp4', mime_type='video/mp4', name='MPEG-4 video', confidence=0.8),\n        #  PureMagicWithConfidence(byte_match=b'iso2avc1mp4', offset=20, extension='.mp4', mime_type='video/mp4', name='MP4 Video', confidence=0.8)]\n\nScript\n------\n\n*Usage*\n\n.. code:: bash\n\n        $ python -m puremagic [options] filename <filename2>...\n\n*Examples*\n\n.. code:: bash\n\n        $ python -m puremagic test/resources/images/test.gif\n        'test/resources/images/test.gif' : .gif\n\n        $ python -m puremagic -m test/resources/images/test.gif test/resources/audio/test.mp3\n        'test/resources/images/test.gif' : image/gif\n        'test/resources/audio/test.mp3' : audio/mpeg\n\nimghdr replacement\n------------------\n\nIf you are looking for a replacement for the standard library's depreciated imghdr, you can use `puremagic.what()`\n\n.. code:: python\n\n        import puremagic\n\n        filename = \"test/resources/images/test.gif\"\n\n        ext = puremagic.what(filename)\n        # 'gif'\n\nFAQ\n---\n\n*The file type is actually X but it's showing up as Y with higher\nconfidence?*\n\nThis can happen when the file's signature happens to match a subset of a\nfile standard. The subset signature will be longer, therefore report\nwith greater confidence, because it will have both the base file type\nsignature plus the additional subset one.\n\n*You don't have sliding offsets that could better detect plenty of\ncommon formats, why's that?*\n\nDesign choice, so it will be a lot faster and more accurate. Without\nmore intelligent or deeper identification past a sliding offset I don't\nfeel comfortable including it as part of a 'magic number' library.\n\n*Your version isn't as complete as I want it to be, where else should I\nlook?*\n\nLook into python modules that wrap around libmagic or use something like\nApache Tika.\n\nAcknowledgements\n----------------\n\nGary C. Kessler\n\n\nFor use of his File Signature Tables, available at:\nhttp://www.garykessler.net/library/file_sigs.html\n\nFreedesktop.org\n\nFor use of their shared-mime-info file, available at:\nhttps://cgit.freedesktop.org/xdg/shared-mime-info/\n\nLicense\n-------\n\nMIT Licenced, see LICENSE, Copyright (c) 2013-2024 Chris Griffith\n\n.. |CoverageStatus| image:: https://coveralls.io/repos/github/cdgriffith/puremagic/badge.svg?branch=develop\n   :target: https://coveralls.io/github/cdgriffith/puremagic?branch=develop\n.. |PyPi| image:: https://img.shields.io/pypi/v/puremagic.svg?maxAge=2592000\n   :target: https://pypi.python.org/pypi/puremagic/\n.. |License| image:: https://img.shields.io/pypi/l/puremagic.svg\n   :target: https://pypi.python.org/pypi/puremagic/\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Pure python implementation of magic file detection",
    "version": "1.28",
    "project_urls": {
        "Homepage": "https://github.com/cdgriffith/puremagic"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c553200a97332d10ed3edd7afcbc5f5543920ac59badfe5762598327999f012e",
                "md5": "47c14ffee127ef084a4c0c743e182429",
                "sha256": "e16cb9708ee2007142c37931c58f07f7eca956b3472489106a7245e5c3aa1241"
            },
            "downloads": -1,
            "filename": "puremagic-1.28-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "47c14ffee127ef084a4c0c743e182429",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 43241,
            "upload_time": "2024-09-25T19:55:55",
            "upload_time_iso_8601": "2024-09-25T19:55:55.734907Z",
            "url": "https://files.pythonhosted.org/packages/c5/53/200a97332d10ed3edd7afcbc5f5543920ac59badfe5762598327999f012e/puremagic-1.28-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "092d40599f25667733e41bbc3d7e4c7c36d5e7860874aa5fe9c584e90b34954d",
                "md5": "de0256a7110744de7f2a3528e964a0ab",
                "sha256": "195893fc129657f611b86b959aab337207d6df7f25372209269ed9e303c1a8c0"
            },
            "downloads": -1,
            "filename": "puremagic-1.28.tar.gz",
            "has_sig": false,
            "md5_digest": "de0256a7110744de7f2a3528e964a0ab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 314945,
            "upload_time": "2024-09-25T19:55:58",
            "upload_time_iso_8601": "2024-09-25T19:55:58.264975Z",
            "url": "https://files.pythonhosted.org/packages/09/2d/40599f25667733e41bbc3d7e4c7c36d5e7860874aa5fe9c584e90b34954d/puremagic-1.28.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-25 19:55:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cdgriffith",
    "github_project": "puremagic",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "puremagic"
}
        
Elapsed time: 0.32556s