thinc-apple-ops


Namethinc-apple-ops JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/explosion/thinc-apple-ops
SummaryImprove Thinc's performance on Apple devices with native libraries
upload_time2024-10-01 09:54:04
maintainerNone
docs_urlNone
authorExplosion
requires_python>=3.7
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>

# thinc-apple-ops

Make [spaCy](https://spacy.io) and [Thinc](https://thinc.ai) **up to 8 &times; faster**
on macOS by calling into Apple's native libraries.

## ⏳ Install

Make sure you have [Xcode](https://developer.apple.com/xcode/) installed and
then install with `pip`:

```bash
pip install thinc-apple-ops
```

## 🏫 Motivation

Matrix multiplication is one of the primary operations in machine learning.
Since matrix multiplication is computationally expensive, using a fast matrix
multiplication implementation can speed up training and prediction
significantly.

Most linear algebra libraries provide matrix multiplication in the form of the
standardized
[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) `gemm`
functions. The work behind scences is done by a set of matrix multiplication
kernels that are meticulously tuned for specific architectures. Matrix
multiplication kernels use architecture-specific
[SIMD](https://en.wikipedia.org/wiki/SIMD) instructions for data-level parallism
and can take factors such as cache sizes and intstruction latency into account.
[Thinc](https://github.com/explosion/thinc) uses the
[BLIS](https://github.com/flame/blis) linear algebra library, which provides
optimized matrix multiplication kernels for most x86_64 and some ARM CPUs.

Recent [Apple Silicon](https://en.wikipedia.org/wiki/Apple_silicon) CPUs, such
as the [M-series](https://en.wikipedia.org/wiki/Apple_silicon#M_series) used in
Macs, differ from traditional x86_64 and ARM CPUs in that they have a separate
matrix co-processor(s) called AMX. Since AMX is not well-documented, it is
unclear how many AMX units Apple M CPUs have. It is certain that the (single)
performance cluster of the M1 has an AMX unit and there is [empirical
evidence](https://twitter.com/danieldekok/status/1454383754512945155?s=20) that
both performance clusters of the M1 Pro/Max have an AMX unit.


Even though AMX units use a set of [undocumented
instructions](https://gist.github.com/dougallj/7a75a3be1ec69ca550e7c36dc75e0d6f),
the units can be used through Apple's
[Accelerate](https://developer.apple.com/documentation/accelerate) linear
algebra library. Since Accelerate implements the BLAS interface, it can be used
as a replacement of the BLIS library that is used by Thinc. This is where the
`thinc-apple-ops` package comes in. `thinc-apple-ops` extends the default Thinc
ops, so that `gemm` matrix multiplication from Accelerate is used in place of
the BLIS implementation of `gemm`. As a result, matrix multiplication in Thinc
is performed on the fast AMX unit(s).

## ⏱ Benchmarks

Using `thinc-apple-ops` leads to large speedups in prediction and training on
Apple Silicon Macs, as shown by the benchmarks below.

### Prediction

This first benchmark compares prediction speed of the `de_core_news_lg` spaCy
model between the M1 with and without `thinc-apple-ops`. Results for an Intel
Mac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in
words per second. In this prediction benchmark, using `thinc-apple-ops` improves
performance by **4.3** times.

| *CPU*                      | *BLIS* | *thinc-apple-ops* | *Package power (Watt)* |
| -------------------------- | -----: | ----------------: | ---------------------: |
| Mac Mini (M1)              |   6492 |             27676 |                      5 |
| MacBook Air Core i5 2020   |   9790 |             10983 |                      9 |
| Mac Mini Core i7 Late 2018 |  16364 |             14858 |                     31 |
| AMD Ryzen 5900X            |  22568 |               N/A |                     52 |

### Training

In the second benchmark, we compare the training speed of the `de_core_news_lg`
spaCy model (without NER). The results are in training iterations per second.
Using `thinc-apple-ops` improves training time by **3.0** times.

| *CPU*                      | *BLIS* | *thinc-apple-ops* | *Package power (Watt)* |
| -------------------------- | -----: | ----------------: | ---------------------: |
| Mac Mini M1 2020           |   3.34 |             10.07 |                      5 |
| MacBook Air Core i5 2020   |   3.10 |              3.27 |                     10 |
| Mac Mini Core i7 Late 2018 |   4.71 |              4.93 |                     32 |
| AMD Ryzen 5900X            |   6.53 |               N/A |                     53 |

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/explosion/thinc-apple-ops",
    "name": "thinc-apple-ops",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": "Explosion",
    "author_email": "contact@explosion.ai",
    "download_url": "https://files.pythonhosted.org/packages/0d/44/15fa7fd8c4f3011f29393a1aed7616b3991fb7562ef044a31a818c25558f/thinc_apple_ops-1.0.0.tar.gz",
    "platform": null,
    "description": "<a href=\"https://explosion.ai\"><img src=\"https://explosion.ai/assets/img/logo.svg\" width=\"125\" height=\"125\" align=\"right\" /></a>\n\n# thinc-apple-ops\n\nMake [spaCy](https://spacy.io) and [Thinc](https://thinc.ai) **up to 8 &times; faster**\non macOS by calling into Apple's native libraries.\n\n## \u23f3 Install\n\nMake sure you have [Xcode](https://developer.apple.com/xcode/) installed and\nthen install with `pip`:\n\n```bash\npip install thinc-apple-ops\n```\n\n## \ud83c\udfeb Motivation\n\nMatrix multiplication is one of the primary operations in machine learning.\nSince matrix multiplication is computationally expensive, using a fast matrix\nmultiplication implementation can speed up training and prediction\nsignificantly.\n\nMost linear algebra libraries provide matrix multiplication in the form of the\nstandardized\n[BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) `gemm`\nfunctions. The work behind scences is done by a set of matrix multiplication\nkernels that are meticulously tuned for specific architectures. Matrix\nmultiplication kernels use architecture-specific\n[SIMD](https://en.wikipedia.org/wiki/SIMD) instructions for data-level parallism\nand can take factors such as cache sizes and intstruction latency into account.\n[Thinc](https://github.com/explosion/thinc) uses the\n[BLIS](https://github.com/flame/blis) linear algebra library, which provides\noptimized matrix multiplication kernels for most x86_64 and some ARM CPUs.\n\nRecent [Apple Silicon](https://en.wikipedia.org/wiki/Apple_silicon) CPUs, such\nas the [M-series](https://en.wikipedia.org/wiki/Apple_silicon#M_series) used in\nMacs, differ from traditional x86_64 and ARM CPUs in that they have a separate\nmatrix co-processor(s) called AMX. Since AMX is not well-documented, it is\nunclear how many AMX units Apple M CPUs have. It is certain that the (single)\nperformance cluster of the M1 has an AMX unit and there is [empirical\nevidence](https://twitter.com/danieldekok/status/1454383754512945155?s=20) that\nboth performance clusters of the M1 Pro/Max have an AMX unit.\n\n\nEven though AMX units use a set of [undocumented\ninstructions](https://gist.github.com/dougallj/7a75a3be1ec69ca550e7c36dc75e0d6f),\nthe units can be used through Apple's\n[Accelerate](https://developer.apple.com/documentation/accelerate) linear\nalgebra library. Since Accelerate implements the BLAS interface, it can be used\nas a replacement of the BLIS library that is used by Thinc. This is where the\n`thinc-apple-ops` package comes in. `thinc-apple-ops` extends the default Thinc\nops, so that `gemm` matrix multiplication from Accelerate is used in place of\nthe BLIS implementation of `gemm`. As a result, matrix multiplication in Thinc\nis performed on the fast AMX unit(s).\n\n## \u23f1 Benchmarks\n\nUsing `thinc-apple-ops` leads to large speedups in prediction and training on\nApple Silicon Macs, as shown by the benchmarks below.\n\n### Prediction\n\nThis first benchmark compares prediction speed of the `de_core_news_lg` spaCy\nmodel between the M1 with and without `thinc-apple-ops`. Results for an Intel\nMac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in\nwords per second. In this prediction benchmark, using `thinc-apple-ops` improves\nperformance by **4.3** times.\n\n| *CPU*                      | *BLIS* | *thinc-apple-ops* | *Package power (Watt)* |\n| -------------------------- | -----: | ----------------: | ---------------------: |\n| Mac Mini (M1)              |   6492 |             27676 |                      5 |\n| MacBook Air Core i5 2020   |   9790 |             10983 |                      9 |\n| Mac Mini Core i7 Late 2018 |  16364 |             14858 |                     31 |\n| AMD Ryzen 5900X            |  22568 |               N/A |                     52 |\n\n### Training\n\nIn the second benchmark, we compare the training speed of the `de_core_news_lg`\nspaCy model (without NER). The results are in training iterations per second.\nUsing `thinc-apple-ops` improves training time by **3.0** times.\n\n| *CPU*                      | *BLIS* | *thinc-apple-ops* | *Package power (Watt)* |\n| -------------------------- | -----: | ----------------: | ---------------------: |\n| Mac Mini M1 2020           |   3.34 |             10.07 |                      5 |\n| MacBook Air Core i5 2020   |   3.10 |              3.27 |                     10 |\n| Mac Mini Core i7 Late 2018 |   4.71 |              4.93 |                     32 |\n| AMD Ryzen 5900X            |   6.53 |               N/A |                     53 |\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Improve Thinc's performance on Apple devices with native libraries",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/explosion/thinc-apple-ops"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "84b50e006fd7a0bf0a0a153ddf3134a47854e731af231bc865413188ec5c84a3",
                "md5": "3a6329979bf203fb41fd9af5ef4a96cb",
                "sha256": "cef314b216ace57dddf403caabd12b8af9d5519b7c679b8a24871c28c7588845"
            },
            "downloads": -1,
            "filename": "thinc_apple_ops-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "3a6329979bf203fb41fd9af5ef4a96cb",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.7",
            "size": 161937,
            "upload_time": "2024-10-01T09:53:53",
            "upload_time_iso_8601": "2024-10-01T09:53:53.549469Z",
            "url": "https://files.pythonhosted.org/packages/84/b5/0e006fd7a0bf0a0a153ddf3134a47854e731af231bc865413188ec5c84a3/thinc_apple_ops-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f41cb2727569fadd5299c11f3eecc8d2016cbb433b7b7d12da345295b2794693",
                "md5": "f9213daf4c9e612b20b90c01cbd3c8c9",
                "sha256": "2f641448bfcefc5c5204484a62baac7c040741fa8d8ecb86574a0357f390e373"
            },
            "downloads": -1,
            "filename": "thinc_apple_ops-1.0.0-cp310-cp310-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "f9213daf4c9e612b20b90c01cbd3c8c9",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.7",
            "size": 156887,
            "upload_time": "2024-10-01T09:53:55",
            "upload_time_iso_8601": "2024-10-01T09:53:55.063789Z",
            "url": "https://files.pythonhosted.org/packages/f4/1c/b2727569fadd5299c11f3eecc8d2016cbb433b7b7d12da345295b2794693/thinc_apple_ops-1.0.0-cp310-cp310-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a2c5fe79f5aff1731a80104c7366023b71b42948a6da3dad7383cbe779bdb6db",
                "md5": "2c1d827e3684894e8d3092142c733054",
                "sha256": "24083c6f74869a3b92bb398f610271687e269c290deda124075b3577a0873445"
            },
            "downloads": -1,
            "filename": "thinc_apple_ops-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "2c1d827e3684894e8d3092142c733054",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.7",
            "size": 161499,
            "upload_time": "2024-10-01T09:53:56",
            "upload_time_iso_8601": "2024-10-01T09:53:56.469672Z",
            "url": "https://files.pythonhosted.org/packages/a2/c5/fe79f5aff1731a80104c7366023b71b42948a6da3dad7383cbe779bdb6db/thinc_apple_ops-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f3831717527eaebed4b71fd7e260deb7c8e2c33b35b9ac75a6bda6a1e7158e1b",
                "md5": "9ccb935f936887aeb8a3464a60ce4410",
                "sha256": "257aa6fcaac764fc72285bf1fb93d1986ed9de17192c2c7090a785ee999188f8"
            },
            "downloads": -1,
            "filename": "thinc_apple_ops-1.0.0-cp311-cp311-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "9ccb935f936887aeb8a3464a60ce4410",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.7",
            "size": 156643,
            "upload_time": "2024-10-01T09:53:57",
            "upload_time_iso_8601": "2024-10-01T09:53:57.857196Z",
            "url": "https://files.pythonhosted.org/packages/f3/83/1717527eaebed4b71fd7e260deb7c8e2c33b35b9ac75a6bda6a1e7158e1b/thinc_apple_ops-1.0.0-cp311-cp311-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c08c223217366308b794105b3269a5de183e898d0eb7d316143c33ca0d44d13",
                "md5": "270e345418ec5a64b87a33b3280711d7",
                "sha256": "b2a9d2d3e8b86c9ce4750affc1b6c7350ef92fff3d33249302aeafee20f3e09e"
            },
            "downloads": -1,
            "filename": "thinc_apple_ops-1.0.0-cp312-cp312-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "270e345418ec5a64b87a33b3280711d7",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.7",
            "size": 162967,
            "upload_time": "2024-10-01T09:53:59",
            "upload_time_iso_8601": "2024-10-01T09:53:59.094359Z",
            "url": "https://files.pythonhosted.org/packages/9c/08/c223217366308b794105b3269a5de183e898d0eb7d316143c33ca0d44d13/thinc_apple_ops-1.0.0-cp312-cp312-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c95dcacc702a8c7d6b5f19181819104b3d7d534598551d75a08d1c99ae96c869",
                "md5": "5c972cfa9e91d8caa000c144879fc9b9",
                "sha256": "1594f897f7dd03212d5b1ae1e4054be02526da530201a09c32d7bb557d274902"
            },
            "downloads": -1,
            "filename": "thinc_apple_ops-1.0.0-cp312-cp312-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "5c972cfa9e91d8caa000c144879fc9b9",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.7",
            "size": 157725,
            "upload_time": "2024-10-01T09:54:00",
            "upload_time_iso_8601": "2024-10-01T09:54:00.201102Z",
            "url": "https://files.pythonhosted.org/packages/c9/5d/cacc702a8c7d6b5f19181819104b3d7d534598551d75a08d1c99ae96c869/thinc_apple_ops-1.0.0-cp312-cp312-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "93ec8f2aa7e1171343e456145fe47596d053b61d55cdadd3f04b46c99133d34b",
                "md5": "98b887f266382800ba365552e459d589",
                "sha256": "cd04a7cd379a629498f964f5c589150a885ce834228dc060967806ee32196fab"
            },
            "downloads": -1,
            "filename": "thinc_apple_ops-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "98b887f266382800ba365552e459d589",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.7",
            "size": 162548,
            "upload_time": "2024-10-01T09:54:02",
            "upload_time_iso_8601": "2024-10-01T09:54:02.284999Z",
            "url": "https://files.pythonhosted.org/packages/93/ec/8f2aa7e1171343e456145fe47596d053b61d55cdadd3f04b46c99133d34b/thinc_apple_ops-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e0a638816c30c865584ebd4ac80c21ae72d4f24518d726e755e3ff803ec47f89",
                "md5": "9c9eab2e4b241e1f034a9a08264e411f",
                "sha256": "25d16b4518642b394cc7c4cf6a6477720f35fb22e6d91dca633bff4102c59763"
            },
            "downloads": -1,
            "filename": "thinc_apple_ops-1.0.0-cp39-cp39-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "9c9eab2e4b241e1f034a9a08264e411f",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.7",
            "size": 157509,
            "upload_time": "2024-10-01T09:54:03",
            "upload_time_iso_8601": "2024-10-01T09:54:03.407475Z",
            "url": "https://files.pythonhosted.org/packages/e0/a6/38816c30c865584ebd4ac80c21ae72d4f24518d726e755e3ff803ec47f89/thinc_apple_ops-1.0.0-cp39-cp39-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d4415fa7fd8c4f3011f29393a1aed7616b3991fb7562ef044a31a818c25558f",
                "md5": "aee5aa5df47b706f3e9d26a8d8f6e113",
                "sha256": "97238eb6693e758bfdcf1ac8e92c064bbe893ab74a6a3a4237c5bdf17318e502"
            },
            "downloads": -1,
            "filename": "thinc_apple_ops-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "aee5aa5df47b706f3e9d26a8d8f6e113",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 63163,
            "upload_time": "2024-10-01T09:54:04",
            "upload_time_iso_8601": "2024-10-01T09:54:04.474962Z",
            "url": "https://files.pythonhosted.org/packages/0d/44/15fa7fd8c4f3011f29393a1aed7616b3991fb7562ef044a31a818c25558f/thinc_apple_ops-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-01 09:54:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "explosion",
    "github_project": "thinc-apple-ops",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "thinc-apple-ops"
}
        
Elapsed time: 0.34397s