mplm-sim


Namemplm-sim JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/cisnlp/mplm_sim
SummarymPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
upload_time2024-01-19 13:26:59
maintainer
docs_urlNone
authorPeiqin Lin, Chengzhi Hu
requires_python>=3.6
licenseApache Software License 2.0
keywords mplm_sim
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

[![arXiv](https://img.shields.io/badge/arXiv-2305.13684-b31b1b.svg)](https://arxiv.org/abs/2305.13684)

mplm-sim is a language similarity tool providing:

- `Loader`: Accessing high-quality language similarity results directly.
- `Executor`: Obtaining similarity results from scratch.

## Quickstart

Download the repo for use or alternatively install with PyPi

`pip install mplm_sim`

or directly with pip from GitHub

`pip install --upgrade git+https://github.com/cisnlp/mPLM-Sim.git#egg=mplm_sim`

## Loader

```python
from mplm_sim import Loader

# loading existing results given model_name and corpus_name
loader = Loader.from_pretrained(model_name='cis-lmu/glot500-base', corpus_name='flores200')
# Or loading results given similarity file
# loader = Loader.from_tsv('your_similarity_file.tsv')

# Getting similarity given language pairs
# iso3_script
sim = loader.get_sim('eng_Latn', 'cmn_Hani')
# or language name
sim = loader.get_sim('English', 'Chinese')
```

## Executor

```python
from mplm_sim import Loader

# model_name: any text/speech language model support by Huggingface
# corpus_name: specific corpus name for saving
# corpus_path: path for multi-parallel corpora, see corpora_demo for file formatting
# corpus_type: text or speech
executor = Executor(model_name='cis-lmu/glot500-base', corpus_name='own',
                    corpus_path='corpora/own', corpus_type='text')

# Run
executor.run()
```

## Citation

```
@article{DBLP:journals/corr/abs-2305-13684,
  author       = {Peiqin Lin and
                  Chengzhi Hu and
                  Zheyu Zhang and
                  Andr{\'{e}} F. T. Martins and
                  Hinrich Sch{\"{u}}tze},
  title        = {mPLM-Sim: Unveiling Better Cross-Lingual Similarity and Transfer in
                  Multilingual Pretrained Language Models},
  journal      = {CoRR},
  volume       = {abs/2305.13684},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2305.13684},
  doi          = {10.48550/ARXIV.2305.13684},
  eprinttype    = {arXiv},
  eprint       = {2305.13684},
  timestamp    = {Mon, 05 Jun 2023 15:42:15 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2305-13684.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
```




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cisnlp/mplm_sim",
    "name": "mplm-sim",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "mplm_sim",
    "author": "Peiqin Lin, Chengzhi Hu",
    "author_email": "lpq29743@gmail.com, Chengzhi.Hu@campus.lmu.de",
    "download_url": "https://files.pythonhosted.org/packages/43/d9/ddc3fca946d7beec4e7a554c71a703192c7f7bc616f3e921c4ac748c0bff/mplm_sim-0.1.0.tar.gz",
    "platform": null,
    "description": "# mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models\n\n[![arXiv](https://img.shields.io/badge/arXiv-2305.13684-b31b1b.svg)](https://arxiv.org/abs/2305.13684)\n\nmplm-sim is a language similarity tool providing:\n\n- `Loader`: Accessing high-quality language similarity results directly.\n- `Executor`: Obtaining similarity results from scratch.\n\n## Quickstart\n\nDownload the repo for use or alternatively install with PyPi\n\n`pip install mplm_sim`\n\nor directly with pip from GitHub\n\n`pip install --upgrade git+https://github.com/cisnlp/mPLM-Sim.git#egg=mplm_sim`\n\n## Loader\n\n```python\nfrom mplm_sim import Loader\n\n# loading existing results given model_name and corpus_name\nloader = Loader.from_pretrained(model_name='cis-lmu/glot500-base', corpus_name='flores200')\n# Or loading results given similarity file\n# loader = Loader.from_tsv('your_similarity_file.tsv')\n\n# Getting similarity given language pairs\n# iso3_script\nsim = loader.get_sim('eng_Latn', 'cmn_Hani')\n# or language name\nsim = loader.get_sim('English', 'Chinese')\n```\n\n## Executor\n\n```python\nfrom mplm_sim import Loader\n\n# model_name: any text/speech language model support by Huggingface\n# corpus_name: specific corpus name for saving\n# corpus_path: path for multi-parallel corpora, see corpora_demo for file formatting\n# corpus_type: text or speech\nexecutor = Executor(model_name='cis-lmu/glot500-base', corpus_name='own',\n                    corpus_path='corpora/own', corpus_type='text')\n\n# Run\nexecutor.run()\n```\n\n## Citation\n\n```\n@article{DBLP:journals/corr/abs-2305-13684,\n  author       = {Peiqin Lin and\n                  Chengzhi Hu and\n                  Zheyu Zhang and\n                  Andr{\\'{e}} F. T. Martins and\n                  Hinrich Sch{\\\"{u}}tze},\n  title        = {mPLM-Sim: Unveiling Better Cross-Lingual Similarity and Transfer in\n                  Multilingual Pretrained Language Models},\n  journal      = {CoRR},\n  volume       = {abs/2305.13684},\n  year         = {2023},\n  url          = {https://doi.org/10.48550/arXiv.2305.13684},\n  doi          = {10.48550/ARXIV.2305.13684},\n  eprinttype    = {arXiv},\n  eprint       = {2305.13684},\n  timestamp    = {Mon, 05 Jun 2023 15:42:15 +0200},\n  biburl       = {https://dblp.org/rec/journals/corr/abs-2305-13684.bib},\n  bibsource    = {dblp computer science bibliography, https://dblp.org}\n}\n```\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/cisnlp/mplm_sim"
    },
    "split_keywords": [
        "mplm_sim"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "48aeb658e7624a475148d0cb1dba809fee8ca7fe5e881c2ee4a8b893ba0a1b20",
                "md5": "4b4f330aadd664345d5067af4bf6b217",
                "sha256": "8c222f2cc84892a04afb8f53132f0afb4b57006117396e71ffbe589bac404f31"
            },
            "downloads": -1,
            "filename": "mplm_sim-0.1.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4b4f330aadd664345d5067af4bf6b217",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.6",
            "size": 9839,
            "upload_time": "2024-01-19T13:26:58",
            "upload_time_iso_8601": "2024-01-19T13:26:58.053933Z",
            "url": "https://files.pythonhosted.org/packages/48/ae/b658e7624a475148d0cb1dba809fee8ca7fe5e881c2ee4a8b893ba0a1b20/mplm_sim-0.1.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "43d9ddc3fca946d7beec4e7a554c71a703192c7f7bc616f3e921c4ac748c0bff",
                "md5": "291db56c68e5494e4d4a9f471ecd3ace",
                "sha256": "2dd40f40c27ace8f745d0ab1d59b1e28eca8fda2cb67667c2b5284c7449ea776"
            },
            "downloads": -1,
            "filename": "mplm_sim-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "291db56c68e5494e4d4a9f471ecd3ace",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 9906,
            "upload_time": "2024-01-19T13:26:59",
            "upload_time_iso_8601": "2024-01-19T13:26:59.954584Z",
            "url": "https://files.pythonhosted.org/packages/43/d9/ddc3fca946d7beec4e7a554c71a703192c7f7bc616f3e921c4ac748c0bff/mplm_sim-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-19 13:26:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cisnlp",
    "github_project": "mplm_sim",
    "github_not_found": true,
    "lcname": "mplm-sim"
}
        
Elapsed time: 0.17127s