pymigbench


Namepymigbench JSON
Version 2.2.4 PyPI version JSON
download
home_pagehttps://github.com/ualberta-smr/pymigbench
SummaryAPIs to access the PyMigBench dataset
upload_time2024-08-03 04:13:46
maintainerNone
docs_urlNone
authorPyMigBench Team
requires_python>=3.11
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            PyMigBench is a benchmark of Python Library Migrations. 
This repository contains the data and the code the library that can be used to access the dataset.

## Dataset
### PyMigBench v2
The current version, PyMigBench-2.0, includes 3,096 migration-related code changes from 335 migrations between 141 analogous library pairs.
This includes all migrations from [PyMigBench v1](#pymigbench-v1) and additional migrations borrowed from the [SALM dataset](https://ieeexplore.ieee.org/document/10123560).
The data also includes additional information per migration-related code change compared to v1.

The dataset is published through the FSE 2024 paper titled *Characterizing Python Library Migrations*.
We will add the citation info once it is available.
[Release 2.0.2](https://github.com/ualberta-smr/PyMigBench/releases/v2.0.2) points to the exact dataset linked to the paper.
The data is also permanently archived in [figshare](https://doi.org/10.6084/m9.figshare.24216858.v2).
Use either of these links to reproduce the paper.

We may update this repository to correct any mistakes or add more data and it may not synch with the paper.
For, the latest data, use the [latest release](https://github.com/ualberta-smr/PyMigBench/releases/latest) in this repository.

### PyMigBench v1
We recommend using PyMigBench v2 for any new research.
However, you want to use the v1 dataset, you should look at [Release 1.0.3](https://github.com/ualberta-smr/PyMigBench/releases/v1.0.3).
Cite the paper below if you use the v1 dataset.

```
@INPROCEEDINGS{pymigbench,
  author={Islam, Mohayeminul and Jha, Ajay Kumar and Nadi, Sarah and Akhmetov, Ildar},
  booktitle={2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)}, 
  title={PyMigBench: A Benchmark for Python Library Migration}, 
  year={2023},
  volume={},
  number={},
  pages={511-515},
  doi={10.1109/MSR59073.2023.00075}
}
```


## Library

### Installation
The library and the dataset should be at the same version to be compatible.
To install the library, run:
```bash
pip install pymigbench==<version>
```

### Basic usage
To use the library, you need to have the dataset downloaded.
You can download the dataset from the [GitHub repository](https://github.com/ualberta-smr/pymigbench).

```python
from pymigbench.database import Database
from pathlib import Path

yaml_root = Path('repo-root/migration/')

db = Database.load_from_dir(yaml_root)  # Load the dataset from the directory
migs = db.migs()  # Get all the migrations
```

### The constants
There are several enums to help you work with the dataset:
They are all in the `pymigbench.constants` module. Example: 
```python
from pymigbench.constants import ProgramElement
```

### The migration-related objects
There are three main classes to encapsulate the data: `Migration`, `MigrationFile`, and `CodeChange`.

`Migration` is the top level class representing one single migration, ie, one yaml file.
`Migration` has a list of `MigrationFile` objects, which represent the files that were changed in the migration.
`MigrationFile` has a list of `CodeChange` objects, which represent a single migration-related code change.
Each of these model classes has an `id()` method that returns a unique identifier for the object across the full dataset.
`CodeChange` object additionally has an `index` property and a `id_in_file()` method, which are unique within container file.
Each of the classes has some additional helper methods.

 



## Contributors
- [Mohayeminul Islam](https://mohayemin.github.io/)
- [Ajay Kumar Jha](https://hifromajay.github.io/)
- [Sarah Nadi](https://sarahnadi.org/)
- [Ildar Akhmetov](https://ildarakhmetov.com/)  

For any queries, please contact mohayemin@ualberta.ca.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ualberta-smr/pymigbench",
    "name": "pymigbench",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": null,
    "author": "PyMigBench Team",
    "author_email": "mohayemin@ualberta.ca",
    "download_url": "https://files.pythonhosted.org/packages/df/73/4923523f7a96b8a30c9d483b4d994231f65659fbf3ca0596a1a4fe56a0b5/pymigbench-2.2.4.tar.gz",
    "platform": null,
    "description": "PyMigBench is a benchmark of Python Library Migrations. \r\nThis repository contains the data and the code the library that can be used to access the dataset.\r\n\r\n## Dataset\r\n### PyMigBench v2\r\nThe current version, PyMigBench-2.0, includes 3,096 migration-related code changes from 335 migrations between 141 analogous library pairs.\r\nThis includes all migrations from [PyMigBench v1](#pymigbench-v1) and additional migrations borrowed from the [SALM dataset](https://ieeexplore.ieee.org/document/10123560).\r\nThe data also includes additional information per migration-related code change compared to v1.\r\n\r\nThe dataset is published through the FSE 2024 paper titled *Characterizing Python Library Migrations*.\r\nWe will add the citation info once it is available.\r\n[Release 2.0.2](https://github.com/ualberta-smr/PyMigBench/releases/v2.0.2) points to the exact dataset linked to the paper.\r\nThe data is also permanently archived in [figshare](https://doi.org/10.6084/m9.figshare.24216858.v2).\r\nUse either of these links to reproduce the paper.\r\n\r\nWe may update this repository to correct any mistakes or add more data and it may not synch with the paper.\r\nFor, the latest data, use the [latest release](https://github.com/ualberta-smr/PyMigBench/releases/latest) in this repository.\r\n\r\n### PyMigBench v1\r\nWe recommend using PyMigBench v2 for any new research.\r\nHowever, you want to use the v1 dataset, you should look at [Release 1.0.3](https://github.com/ualberta-smr/PyMigBench/releases/v1.0.3).\r\nCite the paper below if you use the v1 dataset.\r\n\r\n```\r\n@INPROCEEDINGS{pymigbench,\r\n  author={Islam, Mohayeminul and Jha, Ajay Kumar and Nadi, Sarah and Akhmetov, Ildar},\r\n  booktitle={2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)}, \r\n  title={PyMigBench: A Benchmark for Python Library Migration}, \r\n  year={2023},\r\n  volume={},\r\n  number={},\r\n  pages={511-515},\r\n  doi={10.1109/MSR59073.2023.00075}\r\n}\r\n```\r\n\r\n\r\n## Library\r\n\r\n### Installation\r\nThe library and the dataset should be at the same version to be compatible.\r\nTo install the library, run:\r\n```bash\r\npip install pymigbench==<version>\r\n```\r\n\r\n### Basic usage\r\nTo use the library, you need to have the dataset downloaded.\r\nYou can download the dataset from the [GitHub repository](https://github.com/ualberta-smr/pymigbench).\r\n\r\n```python\r\nfrom pymigbench.database import Database\r\nfrom pathlib import Path\r\n\r\nyaml_root = Path('repo-root/migration/')\r\n\r\ndb = Database.load_from_dir(yaml_root)  # Load the dataset from the directory\r\nmigs = db.migs()  # Get all the migrations\r\n```\r\n\r\n### The constants\r\nThere are several enums to help you work with the dataset:\r\nThey are all in the `pymigbench.constants` module. Example: \r\n```python\r\nfrom pymigbench.constants import ProgramElement\r\n```\r\n\r\n### The migration-related objects\r\nThere are three main classes to encapsulate the data: `Migration`, `MigrationFile`, and `CodeChange`.\r\n\r\n`Migration` is the top level class representing one single migration, ie, one yaml file.\r\n`Migration` has a list of `MigrationFile` objects, which represent the files that were changed in the migration.\r\n`MigrationFile` has a list of `CodeChange` objects, which represent a single migration-related code change.\r\nEach of these model classes has an `id()` method that returns a unique identifier for the object across the full dataset.\r\n`CodeChange` object additionally has an `index` property and a `id_in_file()` method, which are unique within container file.\r\nEach of the classes has some additional helper methods.\r\n\r\n \r\n\r\n\r\n\r\n## Contributors\r\n- [Mohayeminul Islam](https://mohayemin.github.io/)\r\n- [Ajay Kumar Jha](https://hifromajay.github.io/)\r\n- [Sarah Nadi](https://sarahnadi.org/)\r\n- [Ildar Akhmetov](https://ildarakhmetov.com/)  \r\n\r\nFor any queries, please contact mohayemin@ualberta.ca.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "APIs to access the PyMigBench dataset",
    "version": "2.2.4",
    "project_urls": {
        "Homepage": "https://github.com/ualberta-smr/pymigbench"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1a2e19a4be17c87e06df25a7213fde70b62153e5d8d0ce950d3295a8048393e",
                "md5": "c7b6d808ff5ab17cee3f8777666bc4bd",
                "sha256": "c59cfa4c71d352c8f272e2206b1eaec515ca63af766c2789778ca1e5f0c7631b"
            },
            "downloads": -1,
            "filename": "pymigbench-2.2.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c7b6d808ff5ab17cee3f8777666bc4bd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 9576,
            "upload_time": "2024-08-03T04:13:44",
            "upload_time_iso_8601": "2024-08-03T04:13:44.776252Z",
            "url": "https://files.pythonhosted.org/packages/d1/a2/e19a4be17c87e06df25a7213fde70b62153e5d8d0ce950d3295a8048393e/pymigbench-2.2.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df734923523f7a96b8a30c9d483b4d994231f65659fbf3ca0596a1a4fe56a0b5",
                "md5": "228f261f3b69ca5def332bcb57f7c91b",
                "sha256": "a40305953218ffc8c085b560fca479b7dcdfe4c0911db2ac915454cf03914a00"
            },
            "downloads": -1,
            "filename": "pymigbench-2.2.4.tar.gz",
            "has_sig": false,
            "md5_digest": "228f261f3b69ca5def332bcb57f7c91b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 9349,
            "upload_time": "2024-08-03T04:13:46",
            "upload_time_iso_8601": "2024-08-03T04:13:46.168162Z",
            "url": "https://files.pythonhosted.org/packages/df/73/4923523f7a96b8a30c9d483b4d994231f65659fbf3ca0596a1a4fe56a0b5/pymigbench-2.2.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-03 04:13:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ualberta-smr",
    "github_project": "pymigbench",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pymigbench"
}
        
Elapsed time: 0.38124s