Name | costa-utils JSON |
Version |
0.1.1
JSON |
| download |
home_page | None |
Summary | None |
upload_time | 2024-07-31 17:38:36 |
maintainer | None |
docs_url | None |
author | Costa Huang |
requires_python | <4.0,>=3.9 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# costa-utils
This repo contains some personal utilities to do quick things. Currently we have utils to help visualize Hugging Face's preference and SFT datasets.
# Get started
Visualizing a HF SFT dataset:
```bash
# visualizing https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture
python -m costa_utils.hf_viz \
--sft allenai/tulu-v2-sft-mixture \
--split train \
--sft_messages_column_name messages
python -m costa_utils.hf_viz \
--sft AI-MO/NuminaMath-TIR \
--split train \
--sft_messages_column_name messages
```

which is a bit easier to read than

Visualizing a HF preference dataset:
```bash
# visualizing https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized
python -m costa_utils.hf_viz \
--preference HuggingFaceH4/ultrafeedback_binarized \
--split train_prefs \
--preference_chosen_column_name chosen \
--preference_rejected_column_name rejected
```

which is a bit easier to read than

## dev note
It's simple to debug. Just replace `python -m costa_utils.hf_viz` with `python costa_utils/hf_viz.py`
```bash
python -m costa_utils.hf_viz \
--preference HuggingFaceH4/ultrafeedback_binarized \
--split train_prefs \
--preference_chosen_column_name chosen \
--preference_rejected_column_name rejected
```
Raw data
{
"_id": null,
"home_page": null,
"name": "costa-utils",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Costa Huang",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/4d/63/bcc0017ab97b7ea91be90e8ed9b448179fa1e850a5a45bbc9cf7e1dec2bd/costa_utils-0.1.1.tar.gz",
"platform": null,
"description": "# costa-utils\n\nThis repo contains some personal utilities to do quick things. Currently we have utils to help visualize Hugging Face's preference and SFT datasets.\n\n\n# Get started\n\n\nVisualizing a HF SFT dataset:\n\n```bash\n# visualizing https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture\npython -m costa_utils.hf_viz \\\n --sft allenai/tulu-v2-sft-mixture \\\n --split train \\\n --sft_messages_column_name messages\npython -m costa_utils.hf_viz \\\n --sft AI-MO/NuminaMath-TIR \\\n --split train \\\n --sft_messages_column_name messages\n```\n\n\n\nwhich is a bit easier to read than\n\n\n\n\nVisualizing a HF preference dataset:\n\n```bash\n# visualizing https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized\npython -m costa_utils.hf_viz \\\n --preference HuggingFaceH4/ultrafeedback_binarized \\\n --split train_prefs \\\n --preference_chosen_column_name chosen \\\n --preference_rejected_column_name rejected\n```\n\n\n\nwhich is a bit easier to read than\n\n\n\n\n\n## dev note\n\nIt's simple to debug. Just replace `python -m costa_utils.hf_viz` with `python costa_utils/hf_viz.py`\n\n```bash\npython -m costa_utils.hf_viz \\\n --preference HuggingFaceH4/ultrafeedback_binarized \\\n --split train_prefs \\\n --preference_chosen_column_name chosen \\\n --preference_rejected_column_name rejected\n```",
"bugtrack_url": null,
"license": null,
"summary": null,
"version": "0.1.1",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8a3b5b2d0ac48a6adf305c2cca7224276da6a66147c77eb765bdecd55c6a3e14",
"md5": "a1a20af3e1006ae3478a95c094799ed6",
"sha256": "53893eb199f485782f157b9ff3b248387469a764ee15a801e528d0b54894e13a"
},
"downloads": -1,
"filename": "costa_utils-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a1a20af3e1006ae3478a95c094799ed6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 3579,
"upload_time": "2024-07-31T17:38:34",
"upload_time_iso_8601": "2024-07-31T17:38:34.938694Z",
"url": "https://files.pythonhosted.org/packages/8a/3b/5b2d0ac48a6adf305c2cca7224276da6a66147c77eb765bdecd55c6a3e14/costa_utils-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4d63bcc0017ab97b7ea91be90e8ed9b448179fa1e850a5a45bbc9cf7e1dec2bd",
"md5": "aad60ce5f3de57e2647a51645b7f0e0d",
"sha256": "d9adad4849e75da1cd9c4aa147d1843d918031c2da9abc5dcb1cbb324ae16afe"
},
"downloads": -1,
"filename": "costa_utils-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "aad60ce5f3de57e2647a51645b7f0e0d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 2934,
"upload_time": "2024-07-31T17:38:36",
"upload_time_iso_8601": "2024-07-31T17:38:36.349875Z",
"url": "https://files.pythonhosted.org/packages/4d/63/bcc0017ab97b7ea91be90e8ed9b448179fa1e850a5a45bbc9cf7e1dec2bd/costa_utils-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-31 17:38:36",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "costa-utils"
}