# Mismo
[](https://pypi.org/project/mismo)
[](https://pypi.org/project/mismo)
The SQL/Ibis powered sklearn of record linkage.
Still in alpha stage. Breaking changes will happen frequently
and with no warning. Once things are more stabilized I
will come up with a stability policy. Any suggestions as
to how you want the API to look like would be greatly appreciated.
-----
## Installation
I have claimed `mismo` on PyPI, but I won't update it often
until this is more stable. Until then, install from source:
```console
python -m pip install "mismo[viz] @ git+https://github.com/NickCrews/mismo@<SOME-SHA-OR-BRANCH>"
```
## Goals
Mismo tries to be the sklearn of record linkage, backed by the scalability
and power of SQL and Ibis. It is made of many small
data structures and functions, each with a well-defined and standard API
that allows them to be composed together and extended easily.
None of the other record linkage packages I have seen, such as
[Splink](https://github.com/moj-analytical-services/splink),
[Dedupe](https://www.github.com/dedupeio/dedupe), or
[Record Linkage Toolkit](https://github.com/J535D165/recordlinkage),
had all of these properties, so I decided to make my own.
See [Goals and Alternatives](https://nickcrews.github.io/mismo/concepts/goals_and_alternatives)
for a more detailed discussion of the goals of Mismo and how it compares to other
record linkage packages.
## Features
- Supports larger-than-memory datasets, executed on powerful SQL engines.
Use DuckDB for prototyping and for jobs up to maybe ~10M records,
or Spark or other distributed backends for larger tasks, without
needing to change your code!
- Use the clean, strong-typed, pythonic, and Dataframe API of Ibis.
- Small, modular functions and data structures that are easy to plug together
and extend.
- Layered API: Use top-level APIs if your task is common enough that it is
supported out of the box.
## Examples
See the [example notebook](https://nickcrews.github.io/mismo/examples/patent_deduplication).
## Documentation
See the [documentation](https://nickcrews.github.io/mismo).
## Contributing
See the [contributing guide](https://nickcrews.github.io/mismo/contributing/).
## License
`mismo` is distributed under the terms of the
[LGPL-3.0-or-later](https://spdx.org/licenses/LGPL-3.0-or-later.html) license.
Raw data
{
"_id": null,
"home_page": null,
"name": "mismo",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "record linkage, entity resolution, fuzzy linking, machine learning, ibis, sql, splink, duckdb",
"author": "Nick Crews",
"author_email": "Nick Crews <nicholas.b.crews@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/a6/2e/b5ac5d9f04d1e733e6b15774650cb6ed49583867e63dc46bde692f9c0b04/mismo-0.2.6.tar.gz",
"platform": null,
"description": "# Mismo\n\n[](https://pypi.org/project/mismo)\n[](https://pypi.org/project/mismo)\n\nThe SQL/Ibis powered sklearn of record linkage.\n\nStill in alpha stage. Breaking changes will happen frequently\nand with no warning. Once things are more stabilized I\nwill come up with a stability policy. Any suggestions as\nto how you want the API to look like would be greatly appreciated.\n\n-----\n\n## Installation\n\nI have claimed `mismo` on PyPI, but I won't update it often\nuntil this is more stable. Until then, install from source:\n\n```console\npython -m pip install \"mismo[viz] @ git+https://github.com/NickCrews/mismo@<SOME-SHA-OR-BRANCH>\"\n```\n\n## Goals\n\nMismo tries to be the sklearn of record linkage, backed by the scalability\nand power of SQL and Ibis. It is made of many small\ndata structures and functions, each with a well-defined and standard API\nthat allows them to be composed together and extended easily.\nNone of the other record linkage packages I have seen, such as\n[Splink](https://github.com/moj-analytical-services/splink),\n[Dedupe](https://www.github.com/dedupeio/dedupe), or\n[Record Linkage Toolkit](https://github.com/J535D165/recordlinkage),\nhad all of these properties, so I decided to make my own.\n\nSee [Goals and Alternatives](https://nickcrews.github.io/mismo/concepts/goals_and_alternatives)\nfor a more detailed discussion of the goals of Mismo and how it compares to other\nrecord linkage packages.\n\n## Features\n- Supports larger-than-memory datasets, executed on powerful SQL engines.\n Use DuckDB for prototyping and for jobs up to maybe ~10M records,\n or Spark or other distributed backends for larger tasks, without\n needing to change your code!\n- Use the clean, strong-typed, pythonic, and Dataframe API of Ibis.\n- Small, modular functions and data structures that are easy to plug together\n and extend.\n- Layered API: Use top-level APIs if your task is common enough that it is\n supported out of the box.\n\n## Examples\n\nSee the [example notebook](https://nickcrews.github.io/mismo/examples/patent_deduplication).\n\n## Documentation\n\nSee the [documentation](https://nickcrews.github.io/mismo).\n\n## Contributing\n\nSee the [contributing guide](https://nickcrews.github.io/mismo/contributing/).\n\n## License\n\n`mismo` is distributed under the terms of the\n[LGPL-3.0-or-later](https://spdx.org/licenses/LGPL-3.0-or-later.html) license.\n",
"bugtrack_url": null,
"license": "LGPL-3.0-or-later",
"summary": "The SQL/Ibis powered sklearn of record linkage.",
"version": "0.2.6",
"project_urls": {
"Documentation": "https://nickcrews.github.io/mismo",
"Homepage": "https://github.com/NickCrews/mismo",
"Issues": "https://github.com/NickCrews/mismo/issues",
"Source": "https://github.com/NickCrews/mismo"
},
"split_keywords": [
"record linkage",
" entity resolution",
" fuzzy linking",
" machine learning",
" ibis",
" sql",
" splink",
" duckdb"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "d50ac625b48e4f7a7c7d09a913c08b832d955543505d881ffa8ee60e5233a81a",
"md5": "0e89b3deca5c6d7095f7d3ee3acc3aca",
"sha256": "64ae37a7b2d3efac5da2e03c39c115f93a1ab3608288e5ab2a4301561919b9cb"
},
"downloads": -1,
"filename": "mismo-0.2.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0e89b3deca5c6d7095f7d3ee3acc3aca",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 1074143,
"upload_time": "2025-08-23T01:31:27",
"upload_time_iso_8601": "2025-08-23T01:31:27.525696Z",
"url": "https://files.pythonhosted.org/packages/d5/0a/c625b48e4f7a7c7d09a913c08b832d955543505d881ffa8ee60e5233a81a/mismo-0.2.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a62eb5ac5d9f04d1e733e6b15774650cb6ed49583867e63dc46bde692f9c0b04",
"md5": "db3987ed1c7b5afc48786d844d1f928a",
"sha256": "ad4f3624892033576af3dcfb23c4414400a8603dc7b05ec4625671282bb093d1"
},
"downloads": -1,
"filename": "mismo-0.2.6.tar.gz",
"has_sig": false,
"md5_digest": "db3987ed1c7b5afc48786d844d1f928a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 1016953,
"upload_time": "2025-08-23T01:31:29",
"upload_time_iso_8601": "2025-08-23T01:31:29.667450Z",
"url": "https://files.pythonhosted.org/packages/a6/2e/b5ac5d9f04d1e733e6b15774650cb6ed49583867e63dc46bde692f9c0b04/mismo-0.2.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-23 01:31:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NickCrews",
"github_project": "mismo",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mismo"
}