Name | gt4sd-trainer-hf-pl JSON |
Version |
1.0.0
JSON |
| download |
home_page | |
Summary | Transformers trainer submodule of GT4SD. |
upload_time | 2023-12-12 09:37:40 |
maintainer | |
docs_url | None |
author | GT4SD team |
requires_python | |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# GT4SD's trainer submodule for HF transformers and PyTorch Lightning
Train Language Models via HuggingFace transformers and PyTorch Lightning.
### Development setup & installation
Create any virtual or conda environment compatible with the specs in setup.cfg. Then run:
```sh
pip install -e ".[dev]"
```
### Perform training via the CLI command
GT4SD provides a trainer client based on the `gt4sd-lm-trainer` CLI command.
```console
$ gt4sd-trainer-lm --help
usage: gt4sd-trainer-lm [-h] [--configuration_file CONFIGURATION_FILE]
optional arguments:
-h, --help show this help message and exit
--configuration_file CONFIGURATION_FILE
Configuration file for the trainining. It can be used
to completely by-pass pipeline specific arguments.
(default: None)
```
To launch a training you have two options.
You can either specify the path of a configuration file that contains the needed training parameters:
```sh
gt4sd-trainer-lm --training_pipeline_name ${TRAINING_PIPELINE_NAME} --configuration_file ${CONFIGURATION_FILE}
```
Or you can provide directly the needed parameters as arguments:
```sh
gt4sd-trainer-lm --type mlm --model_name_or_path mlm --training_file /path/to/train_file.jsonl --validation_file /path/to/valid_file.jsonl
```
### Convert PyTorch Lightning checkpoints to HuggingFace model via the CLI command
Once a training pipeline has been run via the `gt4sd-lm-trainer`, it's possible to convert the PyTorch Lightning checkpoint
to HugginFace model via `gt4sd-pl-to-hf`:
```sh
gt4sd-pl-to-hf --hf_model_path ${HF_MODEL_PATH} --training_type ${TRAINING_TYPE} --model_name_or_path ${MODEL_NAME_OR_PATH} --ckpt {CKPT} --tokenizer_name_or_path {TOKENIZER_NAME_OR_PATH}
```
### References
If you use `gt4sd` in your projects, please consider citing the following:
```bib
@article{manica2022gt4sd,
title={GT4SD: Generative Toolkit for Scientific Discovery},
author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
journal={arXiv preprint arXiv:2207.03928},
year={2022}
}
```
### License
The `gt4sd` codebase is under MIT license.
For individual model usage, please refer to the model licenses found in the original packages.
Raw data
{
"_id": null,
"home_page": "",
"name": "gt4sd-trainer-hf-pl",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "GT4SD team",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/86/f7/bbf031516c47b5709ae40343bcab0c532044315cfed593ec67f868e21d2c/gt4sd-trainer-hf-pl-1.0.0.tar.gz",
"platform": null,
"description": "# GT4SD's trainer submodule for HF transformers and PyTorch Lightning\n\nTrain Language Models via HuggingFace transformers and PyTorch Lightning.\n\n\n### Development setup & installation\n\nCreate any virtual or conda environment compatible with the specs in setup.cfg. Then run:\n```sh\npip install -e \".[dev]\" \n```\n\n\n\n### Perform training via the CLI command\n\nGT4SD provides a trainer client based on the `gt4sd-lm-trainer` CLI command. \n```console\n$ gt4sd-trainer-lm --help\nusage: gt4sd-trainer-lm [-h] [--configuration_file CONFIGURATION_FILE]\n\noptional arguments:\n -h, --help show this help message and exit\n --configuration_file CONFIGURATION_FILE\n Configuration file for the trainining. It can be used\n to completely by-pass pipeline specific arguments.\n (default: None)\n```\n\nTo launch a training you have two options.\n\nYou can either specify the path of a configuration file that contains the needed training parameters:\n\n```sh\ngt4sd-trainer-lm --training_pipeline_name ${TRAINING_PIPELINE_NAME} --configuration_file ${CONFIGURATION_FILE}\n```\n\nOr you can provide directly the needed parameters as arguments:\n\n```sh\ngt4sd-trainer-lm --type mlm --model_name_or_path mlm --training_file /path/to/train_file.jsonl --validation_file /path/to/valid_file.jsonl\n```\n\n\n### Convert PyTorch Lightning checkpoints to HuggingFace model via the CLI command\n\nOnce a training pipeline has been run via the `gt4sd-lm-trainer`, it's possible to convert the PyTorch Lightning checkpoint\n to HugginFace model via `gt4sd-pl-to-hf`:\n\n```sh\ngt4sd-pl-to-hf --hf_model_path ${HF_MODEL_PATH} --training_type ${TRAINING_TYPE} --model_name_or_path ${MODEL_NAME_OR_PATH} --ckpt {CKPT} --tokenizer_name_or_path {TOKENIZER_NAME_OR_PATH}\n```\n\n\n\n### References\n\nIf you use `gt4sd` in your projects, please consider citing the following:\n\n```bib\n@article{manica2022gt4sd,\n title={GT4SD: Generative Toolkit for Scientific Discovery},\n author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},\n journal={arXiv preprint arXiv:2207.03928},\n year={2022}\n}\n```\n\n### License\n\nThe `gt4sd` codebase is under MIT license.\nFor individual model usage, please refer to the model licenses found in the original packages.\n",
"bugtrack_url": null,
"license": "",
"summary": "Transformers trainer submodule of GT4SD.",
"version": "1.0.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9c76460270a33c451846f0542e99b5dd06fae6af3a07b3a4771c9246196abd40",
"md5": "11d190cc1975d4eadfbc0f964b16481a",
"sha256": "95fe13e951eb4e3009262b041437bbb92fbe9c5f34d99c1c95367fb1e1a6cf16"
},
"downloads": -1,
"filename": "gt4sd_trainer_hf_pl-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "11d190cc1975d4eadfbc0f964b16481a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 27547,
"upload_time": "2023-12-12T09:37:38",
"upload_time_iso_8601": "2023-12-12T09:37:38.589948Z",
"url": "https://files.pythonhosted.org/packages/9c/76/460270a33c451846f0542e99b5dd06fae6af3a07b3a4771c9246196abd40/gt4sd_trainer_hf_pl-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "86f7bbf031516c47b5709ae40343bcab0c532044315cfed593ec67f868e21d2c",
"md5": "4d69e2378bb68b09c556291887f66416",
"sha256": "3fc457e7296e3824ca8e3dd1be857cde8a0f51e890d13170774f2483b230e21a"
},
"downloads": -1,
"filename": "gt4sd-trainer-hf-pl-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "4d69e2378bb68b09c556291887f66416",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 17496,
"upload_time": "2023-12-12T09:37:40",
"upload_time_iso_8601": "2023-12-12T09:37:40.157956Z",
"url": "https://files.pythonhosted.org/packages/86/f7/bbf031516c47b5709ae40343bcab0c532044315cfed593ec67f868e21d2c/gt4sd-trainer-hf-pl-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-12 09:37:40",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "gt4sd-trainer-hf-pl"
}