gt4sd-trainer-hf-pl

Name	gt4sd-trainer-hf-pl JSON
Version	1.0.0 JSON
	download
home_page
Summary	Transformers trainer submodule of GT4SD.
upload_time	2023-12-12 09:37:40
maintainer
docs_url	None
author	GT4SD team
requires_python
license
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # GT4SD's trainer submodule for HF transformers and PyTorch Lightning

Train Language Models via HuggingFace transformers and PyTorch Lightning.


### Development setup & installation

Create any virtual or conda environment compatible with the specs in setup.cfg. Then run:
```sh
pip install -e ".[dev]" 
```



### Perform training via the CLI command

GT4SD provides a trainer client based on the `gt4sd-lm-trainer` CLI command. 
```console
$ gt4sd-trainer-lm --help
usage: gt4sd-trainer-lm [-h] [--configuration_file CONFIGURATION_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --configuration_file CONFIGURATION_FILE
                        Configuration file for the trainining. It can be used
                        to completely by-pass pipeline specific arguments.
                        (default: None)
```

To launch a training you have two options.

You can either specify the path of a configuration file that contains the needed training parameters:

```sh
gt4sd-trainer-lm  --training_pipeline_name ${TRAINING_PIPELINE_NAME} --configuration_file ${CONFIGURATION_FILE}
```

Or you can provide directly the needed parameters as arguments:

```sh
gt4sd-trainer-lm --type mlm --model_name_or_path mlm --training_file /path/to/train_file.jsonl --validation_file /path/to/valid_file.jsonl
```


### Convert PyTorch Lightning checkpoints to HuggingFace model via the CLI command

Once a training pipeline has been run via the `gt4sd-lm-trainer`, it's possible to convert the PyTorch Lightning checkpoint
 to HugginFace model via `gt4sd-pl-to-hf`:

```sh
gt4sd-pl-to-hf --hf_model_path ${HF_MODEL_PATH} --training_type ${TRAINING_TYPE} --model_name_or_path ${MODEL_NAME_OR_PATH} --ckpt {CKPT} --tokenizer_name_or_path {TOKENIZER_NAME_OR_PATH}
```



### References

If you use `gt4sd` in your projects, please consider citing the following:

```bib
@article{manica2022gt4sd,
  title={GT4SD: Generative Toolkit for Scientific Discovery},
  author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
  journal={arXiv preprint arXiv:2207.03928},
  year={2022}
}
```

### License

The `gt4sd` codebase is under MIT license.
For individual model usage, please refer to the model licenses found in the original packages.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "gt4sd-trainer-hf-pl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "GT4SD team",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/86/f7/bbf031516c47b5709ae40343bcab0c532044315cfed593ec67f868e21d2c/gt4sd-trainer-hf-pl-1.0.0.tar.gz",
    "platform": null,
    "description": "# GT4SD's trainer submodule for HF transformers and PyTorch Lightning\n\nTrain Language Models via HuggingFace transformers and PyTorch Lightning.\n\n\n### Development setup & installation\n\nCreate any virtual or conda environment compatible with the specs in setup.cfg. Then run:\n```sh\npip install -e \".[dev]\" \n```\n\n\n\n### Perform training via the CLI command\n\nGT4SD provides a trainer client based on the `gt4sd-lm-trainer` CLI command. \n```console\n$ gt4sd-trainer-lm --help\nusage: gt4sd-trainer-lm [-h] [--configuration_file CONFIGURATION_FILE]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --configuration_file CONFIGURATION_FILE\n                        Configuration file for the trainining. It can be used\n                        to completely by-pass pipeline specific arguments.\n                        (default: None)\n```\n\nTo launch a training you have two options.\n\nYou can either specify the path of a configuration file that contains the needed training parameters:\n\n```sh\ngt4sd-trainer-lm  --training_pipeline_name ${TRAINING_PIPELINE_NAME} --configuration_file ${CONFIGURATION_FILE}\n```\n\nOr you can provide directly the needed parameters as arguments:\n\n```sh\ngt4sd-trainer-lm --type mlm --model_name_or_path mlm --training_file /path/to/train_file.jsonl --validation_file /path/to/valid_file.jsonl\n```\n\n\n### Convert PyTorch Lightning checkpoints to HuggingFace model via the CLI command\n\nOnce a training pipeline has been run via the `gt4sd-lm-trainer`, it's possible to convert the PyTorch Lightning checkpoint\n to HugginFace model via `gt4sd-pl-to-hf`:\n\n```sh\ngt4sd-pl-to-hf --hf_model_path ${HF_MODEL_PATH} --training_type ${TRAINING_TYPE} --model_name_or_path ${MODEL_NAME_OR_PATH} --ckpt {CKPT} --tokenizer_name_or_path {TOKENIZER_NAME_OR_PATH}\n```\n\n\n\n### References\n\nIf you use `gt4sd` in your projects, please consider citing the following:\n\n```bib\n@article{manica2022gt4sd,\n  title={GT4SD: Generative Toolkit for Scientific Discovery},\n  author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},\n  journal={arXiv preprint arXiv:2207.03928},\n  year={2022}\n}\n```\n\n### License\n\nThe `gt4sd` codebase is under MIT license.\nFor individual model usage, please refer to the model licenses found in the original packages.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Transformers trainer submodule of GT4SD.",
    "version": "1.0.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c76460270a33c451846f0542e99b5dd06fae6af3a07b3a4771c9246196abd40",
                "md5": "11d190cc1975d4eadfbc0f964b16481a",
                "sha256": "95fe13e951eb4e3009262b041437bbb92fbe9c5f34d99c1c95367fb1e1a6cf16"
            },
            "downloads": -1,
            "filename": "gt4sd_trainer_hf_pl-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "11d190cc1975d4eadfbc0f964b16481a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 27547,
            "upload_time": "2023-12-12T09:37:38",
            "upload_time_iso_8601": "2023-12-12T09:37:38.589948Z",
            "url": "https://files.pythonhosted.org/packages/9c/76/460270a33c451846f0542e99b5dd06fae6af3a07b3a4771c9246196abd40/gt4sd_trainer_hf_pl-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "86f7bbf031516c47b5709ae40343bcab0c532044315cfed593ec67f868e21d2c",
                "md5": "4d69e2378bb68b09c556291887f66416",
                "sha256": "3fc457e7296e3824ca8e3dd1be857cde8a0f51e890d13170774f2483b230e21a"
            },
            "downloads": -1,
            "filename": "gt4sd-trainer-hf-pl-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4d69e2378bb68b09c556291887f66416",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 17496,
            "upload_time": "2023-12-12T09:37:40",
            "upload_time_iso_8601": "2023-12-12T09:37:40.157956Z",
            "url": "https://files.pythonhosted.org/packages/86/f7/bbf031516c47b5709ae40343bcab0c532044315cfed593ec67f868e21d2c/gt4sd-trainer-hf-pl-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-12 09:37:40",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "gt4sd-trainer-hf-pl"
}

GT4SD team