oslili


Nameoslili JSON
Version 0.15 PyPI version JSON
download
home_pagehttps://github.com/oscarvalenzuelab/oslili
SummaryOpen Source License Identification Library
upload_time2024-07-03 05:59:19
maintainerNone
docs_urlNone
authorOscar Valenzuela B.
requires_pythonNone
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # OSLiLi - Open Source License Identification Library

Open Source License Identification Library is an experimental code, that use Scikit-learn to implement a Multinomial Naive Bayes classifier trained with SPDX data to identify Open Source Licenses. This should be consider as a proof of concept for identify Open Source licenses using Machine Learning. 

This is an experimental project, please don't use it for production. For a more robust implementation, please check the project Askalono https://github.com/jpeddicord/askalono


## Usage

### On the command line

You can use OSLiLi in your terminal as command line, please install the oslili-cli package:
```
$ pip3 install oslili-cli
$ oslili-cli LICENSE
License: MIT (0.89 probability)
Copyright: ('2021', '(c)  Andrew Barrier')
```
### As a library

In order to use the library, you need to import and use identify_license or identify_copyright.
```
import argparse
from oslili import LicenseAndCopyrightIdentifier


def main():
    msg = 'Identify open source license and copyright statements'
    parser = argparse.ArgumentParser(description=msg)
    parser.add_argument('file_path', help='Path to the file to analyze')
    args = parser.parse_args()
    file_path = args.file_path

    with open(args.file_path, 'r') as f:
        text = f.read()

    identifier = LicenseAndCopyrightIdentifier()
    license_spdx_code, license_proba = identifier.identify_license(text)
    print(f'License: {license_spdx_code} ({license_proba:.2f} probability)')
    year_range, statement = identifier.identify_copyright(text)
    if statement:
        if None not in statement:
            print(f'Copyright: {statement}')


if __name__ == '__main__':
    main()
```
## Notice

This tool does not provide legal advice; I'm not a lawyer.

The code is an experimental implementation to match your input to a database of similar license texts and tell you if it's a close match. Refrain from relying on the accuracy of the output of this tool.

Remember: The tool can't tell you if a license works for your project or use case. Please should seek independent legal advice for any licensing questions.

### Where do the licenses come from?

License SPDX dataset is sourced directly from SPDX: https://github.com/spdx/license-list-data. 

Datasets for ML training were generated scanning different sources, and inspired by two academic publications:

* [Machine Learning-Based Detection of Open Source License Exceptions](https://ieeexplore.ieee.org/document/7985655): C. Vendome, M. Linares-Vásquez, G. Bavota, M. Di Penta, D. German and D. Poshyvanyk, "Machine Learning-Based Detection of Open Source License Exceptions," 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017, pp. 118-129, doi: 10.1109/ICSE.2017.19.


* [A Machine Learning Method for Automatic Copyright Notice Identification of Source Files](https://www.jstage.jst.go.jp/article/transinf/E103.D/12/E103.D_2020EDL8089/_article): Shi QIU, German M. DANIEL, Katsuro INOUE, A Machine Learning Method for Automatic Copyright Notice Identification of Source Files, IEICE Transactions on Information and Systems, 2020, Volume E103.D, Issue 12, Pages 2709-2712, Released December 01, 2020, Online ISSN 1745-1361, Print ISSN 0916-853.



## Contributing

Contributions are very welcome! See [CONTRIBUTING](CONTRIBUTING.md) for more info.

## License

This library is licensed under the [Apache 2.0 License](LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/oscarvalenzuelab/oslili",
    "name": "oslili",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Oscar Valenzuela B.",
    "author_email": "alkamod@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2e/b6/0afdd6f9881359937116b00475859113407e957346daa5281ecdc38a0742/oslili-0.15.tar.gz",
    "platform": null,
    "description": "# OSLiLi - Open Source License Identification Library\n\nOpen Source License Identification Library is an experimental code, that use Scikit-learn to implement a Multinomial Naive Bayes classifier trained with SPDX data to identify Open Source Licenses. This should be consider as a proof of concept for identify Open Source licenses using Machine Learning. \n\nThis is an experimental project, please don't use it for production. For a more robust implementation, please check the project Askalono https://github.com/jpeddicord/askalono\n\n\n## Usage\n\n### On the command line\n\nYou can use OSLiLi in your terminal as command line, please install the oslili-cli package:\n```\n$ pip3 install oslili-cli\n$ oslili-cli LICENSE\nLicense: MIT (0.89 probability)\nCopyright: ('2021', '(c)  Andrew Barrier')\n```\n### As a library\n\nIn order to use the library, you need to import and use identify_license or identify_copyright.\n```\nimport argparse\nfrom oslili import LicenseAndCopyrightIdentifier\n\n\ndef main():\n    msg = 'Identify open source license and copyright statements'\n    parser = argparse.ArgumentParser(description=msg)\n    parser.add_argument('file_path', help='Path to the file to analyze')\n    args = parser.parse_args()\n    file_path = args.file_path\n\n    with open(args.file_path, 'r') as f:\n        text = f.read()\n\n    identifier = LicenseAndCopyrightIdentifier()\n    license_spdx_code, license_proba = identifier.identify_license(text)\n    print(f'License: {license_spdx_code} ({license_proba:.2f} probability)')\n    year_range, statement = identifier.identify_copyright(text)\n    if statement:\n        if None not in statement:\n            print(f'Copyright: {statement}')\n\n\nif __name__ == '__main__':\n    main()\n```\n## Notice\n\nThis tool does not provide legal advice; I'm not a lawyer.\n\nThe code is an experimental implementation to match your input to a database of similar license texts and tell you if it's a close match. Refrain from relying on the accuracy of the output of this tool.\n\nRemember: The tool can't tell you if a license works for your project or use case. Please should seek independent legal advice for any licensing questions.\n\n### Where do the licenses come from?\n\nLicense SPDX dataset is sourced directly from SPDX: https://github.com/spdx/license-list-data. \n\nDatasets for ML training were generated scanning different sources, and inspired by two academic publications:\n\n* [Machine Learning-Based Detection of Open Source License Exceptions](https://ieeexplore.ieee.org/document/7985655): C. Vendome, M. Linares-V\u00e1squez, G. Bavota, M. Di Penta, D. German and D. Poshyvanyk, \"Machine Learning-Based Detection of Open Source License Exceptions,\" 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017, pp. 118-129, doi: 10.1109/ICSE.2017.19.\n\n\n* [A Machine Learning Method for Automatic Copyright Notice Identification of Source Files](https://www.jstage.jst.go.jp/article/transinf/E103.D/12/E103.D_2020EDL8089/_article): Shi QIU, German M. DANIEL, Katsuro INOUE, A Machine Learning Method for Automatic Copyright Notice Identification of Source Files, IEICE Transactions on Information and Systems, 2020, Volume E103.D, Issue 12, Pages 2709-2712, Released December 01, 2020, Online ISSN 1745-1361, Print ISSN 0916-853.\n\n\n\n## Contributing\n\nContributions are very welcome! See [CONTRIBUTING](CONTRIBUTING.md) for more info.\n\n## License\n\nThis library is licensed under the [Apache 2.0 License](LICENSE).\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Open Source License Identification Library",
    "version": "0.15",
    "project_urls": {
        "Homepage": "https://github.com/oscarvalenzuelab/oslili"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8da270bb3fef060675b0dc8f079524e667cbdde21b4d603b10aeeba59433d291",
                "md5": "fea9e7e7373f4585e38cc938a42e8745",
                "sha256": "d1d5899c755ac6f1c19dbc4044be2f3ca4cdc5c6bd10d390a784f4156ff232ef"
            },
            "downloads": -1,
            "filename": "oslili-0.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fea9e7e7373f4585e38cc938a42e8745",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 1635619,
            "upload_time": "2024-07-03T05:59:16",
            "upload_time_iso_8601": "2024-07-03T05:59:16.448855Z",
            "url": "https://files.pythonhosted.org/packages/8d/a2/70bb3fef060675b0dc8f079524e667cbdde21b4d603b10aeeba59433d291/oslili-0.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2eb60afdd6f9881359937116b00475859113407e957346daa5281ecdc38a0742",
                "md5": "661c587d7897e6de1b7f4373f124a5f7",
                "sha256": "a8165575bcc618da7dbc4353871c2b70264c4dd73cc055569608cf9a17134248"
            },
            "downloads": -1,
            "filename": "oslili-0.15.tar.gz",
            "has_sig": false,
            "md5_digest": "661c587d7897e6de1b7f4373f124a5f7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 911577,
            "upload_time": "2024-07-03T05:59:19",
            "upload_time_iso_8601": "2024-07-03T05:59:19.019335Z",
            "url": "https://files.pythonhosted.org/packages/2e/b6/0afdd6f9881359937116b00475859113407e957346daa5281ecdc38a0742/oslili-0.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-03 05:59:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "oscarvalenzuelab",
    "github_project": "oslili",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "oslili"
}
        
Elapsed time: 0.56577s