magika


Namemagika JSON
Version 0.5.1 PyPI version JSON
download
home_page
SummaryA tool to determine the content type of a file with deep-learning
upload_time2024-03-07 16:44:24
maintainer
docs_urlNone
authorYanick Fratantonio
requires_python>=3.8,<3.13
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Magika Python Package

Magika is a novel AI powered file type detection tool that rely on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized Keras model that only weighs about 1MB, and enables precise file identification within milliseconds, even when running on a single CPU.

Use Magika as a command line client or in your Python code!

Please check out Magika on GitHub for more information and documentation: [https://github.com/google/magika](https://github.com/google/magika).


## Installing Magika

```shell
$ pip install magika
```

If you intend to use Magika only as a command line, you may want to use `$ pipx install magika` instead.


## Using Magika as a command-line tool

```shell
$ magika examples/*
code.asm: Assembly (code)
code.py: Python source (code)
doc.docx: Microsoft Word 2007+ document (document)
doc.ini: INI configuration file (text)
elf64.elf: ELF executable (executable)
flac.flac: FLAC audio bitstream data (audio)
image.bmp: BMP image data (image)
java.class: Java compiled bytecode (executable)
jpg.jpg: JPEG image data (image)
pdf.pdf: PDF document (document)
pe32.exe: PE executable (executable)
png.png: PNG image data (image)
README.md: Markdown document (text)
tar.tar: POSIX tar archive (archive)
webm.webm: WebM data (video)
```

```help
$ magika --help
Usage: magika [OPTIONS] [FILE]...

  Magika - Determine type of FILEs with deep-learning.

Options:
  -r, --recursive                 When passing this option, magika scans every
                                  file within directories, instead of
                                  outputting "directory"
  --json                          Output in JSON format.
  --jsonl                         Output in JSONL format.
  -i, --mime-type                 Output the MIME type instead of a verbose
                                  content type description.
  -l, --label                     Output a simple label instead of a verbose
                                  content type description. Use --list-output-
                                  content-types for the list of supported
                                  output.
  -c, --compatibility-mode        Compatibility mode: output is as close as
                                  possible to `file` and colors are disabled.
  -s, --output-score              Output the prediction score in addition to
                                  the content type.
  -m, --prediction-mode [best-guess|medium-confidence|high-confidence]
  --batch-size INTEGER            How many files to process in one batch.
  --no-dereference                This option causes symlinks not to be
                                  followed. By default, symlinks are
                                  dereferenced.
  --colors / --no-colors          Enable/disable use of colors.
  -v, --verbose                   Enable more verbose output.
  -vv, --debug                    Enable debug logging.
  --generate-report               Generate report useful when reporting
                                  feedback.
  --version                       Print the version and exit.
  --list-output-content-types     Show a list of supported content types.
  --model-dir DIRECTORY           Use a custom model.
  -h, --help                      Show this message and exit.

  Magika version: "0.5.0"

  Default model: "standard_v1"

  Send any feedback to magika-dev@google.com or via GitHub issues.
```


## Using Magika as a Python module

```python
from magika import Magika
magika = Magika()
result = magika.identify_bytes(b"# Example\nThis is an example of markdown!")
print(result.output.ct_label)  # Output: "markdown"
```


## Citation
If you use this software for your research, please cite it as:
```bibtex
@software{magika,
author = {Fratantonio, Yanick and Bursztein, Elie and Invernizzi, Luca and Zhang, Marina and Metitieri, Giancarlo and Kurt, Thomas and Galilee, Francois and Petit-Bianco, Alexandre and Farah, Loua and Albertini, Ange},
title = {{Magika content-type scanner}},
url = {https://github.com/google/magika}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "magika",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<3.13",
    "maintainer_email": "",
    "keywords": "",
    "author": "Yanick Fratantonio",
    "author_email": "yanickf@google.com",
    "download_url": "https://files.pythonhosted.org/packages/1a/58/c1d8887354d0ff2256d4d78d08a69bcc55719a0189afa706c51da04390f2/magika-0.5.1.tar.gz",
    "platform": null,
    "description": "# Magika Python Package\n\nMagika is a novel AI powered file type detection tool that rely on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized Keras model that only weighs about 1MB, and enables precise file identification within milliseconds, even when running on a single CPU.\n\nUse Magika as a command line client or in your Python code!\n\nPlease check out Magika on GitHub for more information and documentation: [https://github.com/google/magika](https://github.com/google/magika).\n\n\n## Installing Magika\n\n```shell\n$ pip install magika\n```\n\nIf you intend to use Magika only as a command line, you may want to use `$ pipx install magika` instead.\n\n\n## Using Magika as a command-line tool\n\n```shell\n$ magika examples/*\ncode.asm: Assembly (code)\ncode.py: Python source (code)\ndoc.docx: Microsoft Word 2007+ document (document)\ndoc.ini: INI configuration file (text)\nelf64.elf: ELF executable (executable)\nflac.flac: FLAC audio bitstream data (audio)\nimage.bmp: BMP image data (image)\njava.class: Java compiled bytecode (executable)\njpg.jpg: JPEG image data (image)\npdf.pdf: PDF document (document)\npe32.exe: PE executable (executable)\npng.png: PNG image data (image)\nREADME.md: Markdown document (text)\ntar.tar: POSIX tar archive (archive)\nwebm.webm: WebM data (video)\n```\n\n```help\n$ magika --help\nUsage: magika [OPTIONS] [FILE]...\n\n  Magika - Determine type of FILEs with deep-learning.\n\nOptions:\n  -r, --recursive                 When passing this option, magika scans every\n                                  file within directories, instead of\n                                  outputting \"directory\"\n  --json                          Output in JSON format.\n  --jsonl                         Output in JSONL format.\n  -i, --mime-type                 Output the MIME type instead of a verbose\n                                  content type description.\n  -l, --label                     Output a simple label instead of a verbose\n                                  content type description. Use --list-output-\n                                  content-types for the list of supported\n                                  output.\n  -c, --compatibility-mode        Compatibility mode: output is as close as\n                                  possible to `file` and colors are disabled.\n  -s, --output-score              Output the prediction score in addition to\n                                  the content type.\n  -m, --prediction-mode [best-guess|medium-confidence|high-confidence]\n  --batch-size INTEGER            How many files to process in one batch.\n  --no-dereference                This option causes symlinks not to be\n                                  followed. By default, symlinks are\n                                  dereferenced.\n  --colors / --no-colors          Enable/disable use of colors.\n  -v, --verbose                   Enable more verbose output.\n  -vv, --debug                    Enable debug logging.\n  --generate-report               Generate report useful when reporting\n                                  feedback.\n  --version                       Print the version and exit.\n  --list-output-content-types     Show a list of supported content types.\n  --model-dir DIRECTORY           Use a custom model.\n  -h, --help                      Show this message and exit.\n\n  Magika version: \"0.5.0\"\n\n  Default model: \"standard_v1\"\n\n  Send any feedback to magika-dev@google.com or via GitHub issues.\n```\n\n\n## Using Magika as a Python module\n\n```python\nfrom magika import Magika\nmagika = Magika()\nresult = magika.identify_bytes(b\"# Example\\nThis is an example of markdown!\")\nprint(result.output.ct_label)  # Output: \"markdown\"\n```\n\n\n## Citation\nIf you use this software for your research, please cite it as:\n```bibtex\n@software{magika,\nauthor = {Fratantonio, Yanick and Bursztein, Elie and Invernizzi, Luca and Zhang, Marina and Metitieri, Giancarlo and Kurt, Thomas and Galilee, Francois and Petit-Bianco, Alexandre and Farah, Loua and Albertini, Ange},\ntitle = {{Magika content-type scanner}},\nurl = {https://github.com/google/magika}\n}\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A tool to determine the content type of a file with deep-learning",
    "version": "0.5.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6679e1c167ec35060692b70bfc4f2d0aa9314dd7e37ba8e30c1c27965e2f1daa",
                "md5": "b7198531cbbf7985862259bb10653a2f",
                "sha256": "a4d1f64f71460f335841c13c3d16cfc2cb21e839c1898a1ae9bd5adc8d66cb2b"
            },
            "downloads": -1,
            "filename": "magika-0.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b7198531cbbf7985862259bb10653a2f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<3.13",
            "size": 1008301,
            "upload_time": "2024-03-07T16:44:22",
            "upload_time_iso_8601": "2024-03-07T16:44:22.222115Z",
            "url": "https://files.pythonhosted.org/packages/66/79/e1c167ec35060692b70bfc4f2d0aa9314dd7e37ba8e30c1c27965e2f1daa/magika-0.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1a58c1d8887354d0ff2256d4d78d08a69bcc55719a0189afa706c51da04390f2",
                "md5": "278586fcc194faa4b2b3df09961c7654",
                "sha256": "43dc1153a1637327225a626a1550c0a395a1d45ea33ec1f5d46b9b080238bee0"
            },
            "downloads": -1,
            "filename": "magika-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "278586fcc194faa4b2b3df09961c7654",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<3.13",
            "size": 1005077,
            "upload_time": "2024-03-07T16:44:24",
            "upload_time_iso_8601": "2024-03-07T16:44:24.377635Z",
            "url": "https://files.pythonhosted.org/packages/1a/58/c1d8887354d0ff2256d4d78d08a69bcc55719a0189afa706c51da04390f2/magika-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-07 16:44:24",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "magika"
}
        
Elapsed time: 0.39389s