# OSLiLi - Open Source License Identification Library
Open Source License Identification Library is an experimental code, that use Scikit-learn to implement a Multinomial Naive Bayes classifier trained with SPDX data to identify Open Source Licenses. This should be consider as a proof of concept for identify Open Source licenses using Machine Learning.
This is an experimental project, please don't use it for production. For a more robust implementation, please check the project Askalono https://github.com/jpeddicord/askalono
## Usage
### On the command line
You can use OSLiLi in your terminal as command line, please install the oslili-cli package:
```
$ pip3 install oslili-cli
$ oslili-cli LICENSE
License: MIT (0.89 probability)
Copyright: ('2021', '(c) Andrew Barrier')
```
### As a library
In order to use the library, you need to import and use identify_license or identify_copyright.
```
import argparse
from oslili import LicenseAndCopyrightIdentifier
def main():
msg = 'Identify open source license and copyright statements'
parser = argparse.ArgumentParser(description=msg)
parser.add_argument('file_path', help='Path to the file to analyze')
args = parser.parse_args()
file_path = args.file_path
with open(args.file_path, 'r') as f:
text = f.read()
identifier = LicenseAndCopyrightIdentifier()
license_spdx_code, license_proba = identifier.identify_license(text)
print(f'License: {license_spdx_code} ({license_proba:.2f} probability)')
year_range, statement = identifier.identify_copyright(text)
if statement:
if None not in statement:
print(f'Copyright: {statement}')
if __name__ == '__main__':
main()
```
## Notice
This tool does not provide legal advice; I'm not a lawyer.
The code is an experimental implementation to match your input to a database of similar license texts and tell you if it's a close match. Refrain from relying on the accuracy of the output of this tool.
Remember: The tool can't tell you if a license works for your project or use case. Please should seek independent legal advice for any licensing questions.
### Where do the licenses come from?
License SPDX dataset is sourced directly from SPDX: https://github.com/spdx/license-list-data.
Datasets for ML training were generated scanning different sources, and inspired by two academic publications:
* [Machine Learning-Based Detection of Open Source License Exceptions](https://ieeexplore.ieee.org/document/7985655): C. Vendome, M. Linares-Vásquez, G. Bavota, M. Di Penta, D. German and D. Poshyvanyk, "Machine Learning-Based Detection of Open Source License Exceptions," 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017, pp. 118-129, doi: 10.1109/ICSE.2017.19.
* [A Machine Learning Method for Automatic Copyright Notice Identification of Source Files](https://www.jstage.jst.go.jp/article/transinf/E103.D/12/E103.D_2020EDL8089/_article): Shi QIU, German M. DANIEL, Katsuro INOUE, A Machine Learning Method for Automatic Copyright Notice Identification of Source Files, IEICE Transactions on Information and Systems, 2020, Volume E103.D, Issue 12, Pages 2709-2712, Released December 01, 2020, Online ISSN 1745-1361, Print ISSN 0916-853.
## Contributing
Contributions are very welcome! See [CONTRIBUTING](CONTRIBUTING.md) for more info.
## License
This library is licensed under the [Apache 2.0 License](LICENSE).
Raw data
{
"_id": null,
"home_page": "https://github.com/oscarvalenzuelab/oslili",
"name": "oslili",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Oscar Valenzuela B.",
"author_email": "alkamod@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/2e/b6/0afdd6f9881359937116b00475859113407e957346daa5281ecdc38a0742/oslili-0.15.tar.gz",
"platform": null,
"description": "# OSLiLi - Open Source License Identification Library\n\nOpen Source License Identification Library is an experimental code, that use Scikit-learn to implement a Multinomial Naive Bayes classifier trained with SPDX data to identify Open Source Licenses. This should be consider as a proof of concept for identify Open Source licenses using Machine Learning. \n\nThis is an experimental project, please don't use it for production. For a more robust implementation, please check the project Askalono https://github.com/jpeddicord/askalono\n\n\n## Usage\n\n### On the command line\n\nYou can use OSLiLi in your terminal as command line, please install the oslili-cli package:\n```\n$ pip3 install oslili-cli\n$ oslili-cli LICENSE\nLicense: MIT (0.89 probability)\nCopyright: ('2021', '(c) Andrew Barrier')\n```\n### As a library\n\nIn order to use the library, you need to import and use identify_license or identify_copyright.\n```\nimport argparse\nfrom oslili import LicenseAndCopyrightIdentifier\n\n\ndef main():\n msg = 'Identify open source license and copyright statements'\n parser = argparse.ArgumentParser(description=msg)\n parser.add_argument('file_path', help='Path to the file to analyze')\n args = parser.parse_args()\n file_path = args.file_path\n\n with open(args.file_path, 'r') as f:\n text = f.read()\n\n identifier = LicenseAndCopyrightIdentifier()\n license_spdx_code, license_proba = identifier.identify_license(text)\n print(f'License: {license_spdx_code} ({license_proba:.2f} probability)')\n year_range, statement = identifier.identify_copyright(text)\n if statement:\n if None not in statement:\n print(f'Copyright: {statement}')\n\n\nif __name__ == '__main__':\n main()\n```\n## Notice\n\nThis tool does not provide legal advice; I'm not a lawyer.\n\nThe code is an experimental implementation to match your input to a database of similar license texts and tell you if it's a close match. Refrain from relying on the accuracy of the output of this tool.\n\nRemember: The tool can't tell you if a license works for your project or use case. Please should seek independent legal advice for any licensing questions.\n\n### Where do the licenses come from?\n\nLicense SPDX dataset is sourced directly from SPDX: https://github.com/spdx/license-list-data. \n\nDatasets for ML training were generated scanning different sources, and inspired by two academic publications:\n\n* [Machine Learning-Based Detection of Open Source License Exceptions](https://ieeexplore.ieee.org/document/7985655): C. Vendome, M. Linares-V\u00e1squez, G. Bavota, M. Di Penta, D. German and D. Poshyvanyk, \"Machine Learning-Based Detection of Open Source License Exceptions,\" 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), Buenos Aires, 2017, pp. 118-129, doi: 10.1109/ICSE.2017.19.\n\n\n* [A Machine Learning Method for Automatic Copyright Notice Identification of Source Files](https://www.jstage.jst.go.jp/article/transinf/E103.D/12/E103.D_2020EDL8089/_article): Shi QIU, German M. DANIEL, Katsuro INOUE, A Machine Learning Method for Automatic Copyright Notice Identification of Source Files, IEICE Transactions on Information and Systems, 2020, Volume E103.D, Issue 12, Pages 2709-2712, Released December 01, 2020, Online ISSN 1745-1361, Print ISSN 0916-853.\n\n\n\n## Contributing\n\nContributions are very welcome! See [CONTRIBUTING](CONTRIBUTING.md) for more info.\n\n## License\n\nThis library is licensed under the [Apache 2.0 License](LICENSE).\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Open Source License Identification Library",
"version": "0.15",
"project_urls": {
"Homepage": "https://github.com/oscarvalenzuelab/oslili"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8da270bb3fef060675b0dc8f079524e667cbdde21b4d603b10aeeba59433d291",
"md5": "fea9e7e7373f4585e38cc938a42e8745",
"sha256": "d1d5899c755ac6f1c19dbc4044be2f3ca4cdc5c6bd10d390a784f4156ff232ef"
},
"downloads": -1,
"filename": "oslili-0.15-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fea9e7e7373f4585e38cc938a42e8745",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 1635619,
"upload_time": "2024-07-03T05:59:16",
"upload_time_iso_8601": "2024-07-03T05:59:16.448855Z",
"url": "https://files.pythonhosted.org/packages/8d/a2/70bb3fef060675b0dc8f079524e667cbdde21b4d603b10aeeba59433d291/oslili-0.15-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2eb60afdd6f9881359937116b00475859113407e957346daa5281ecdc38a0742",
"md5": "661c587d7897e6de1b7f4373f124a5f7",
"sha256": "a8165575bcc618da7dbc4353871c2b70264c4dd73cc055569608cf9a17134248"
},
"downloads": -1,
"filename": "oslili-0.15.tar.gz",
"has_sig": false,
"md5_digest": "661c587d7897e6de1b7f4373f124a5f7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 911577,
"upload_time": "2024-07-03T05:59:19",
"upload_time_iso_8601": "2024-07-03T05:59:19.019335Z",
"url": "https://files.pythonhosted.org/packages/2e/b6/0afdd6f9881359937116b00475859113407e957346daa5281ecdc38a0742/oslili-0.15.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-03 05:59:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "oscarvalenzuelab",
"github_project": "oslili",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "oslili"
}