> <span style="color:red">**⚠️ End-of-life notice.**</span>
> <span style="color:red">Effective immediately, HP has ended development, maintenance, and support of this project. The repo will be available in read-only mode until September 1st, 2024, when it will be deleted. You are welcome to create a copy and keep the project going.</span>
# ML-Git
ML-Git is a tool which provides a Distributed Version Control system to enable efficient dataset management. Like its name emphasizes, it is inspired in git concepts and workflows, ML-Git enables the following operations:
- Manage a repository of different datasets, labels and models.
- Distribute these ML artifacts between members of a team or across organizations.
- Apply the right data governance and security models to their artifacts.
If you are seeking to learn more about ML-Git, access [ML-Git Page](https://hpinc.github.io/ml-git/).
### How to install
**Prerequisites:**
- [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
- [Python 3.6.1+](https://www.python.org/downloads/release/python-361/)
- [Pip 20.1.1+](https://pypi.org/project/pip/)
**From repository:**
```
pip install ml-git
```
**From source code:**
Download ML-Git from repository and execute commands below:
```
cd ml-git/
pip install .
```
### How to uninstall
```
pip uninstall ml-git
```
### How to configure
1 - As ML-Git leverages git to manage ML entities metadata, it is necessary to configure user name and email address:
```
git config --global user.name "Your User"
git config --global user.email "your_email@example.com"
```
2 - **OPTIONAL CONFIGURATIONS**
- 2.1 - Some ML-Git commands have a wizard to help you during their execution. Those commands have the ```--wizard``` option available to enable this wizard. However, you can configure the wizard to be enabled by default on all supported commands by running the following command:
```
ml-git repository config --set-wizard=enabled
```
- 2.2 - You can also allow commands and options to be autocompleted with a `[Tab]` key press. For that, take a look at the following link [ML-Git Shell Completion Support](docs/shell_completion_guide.md).
3 - Storage:
ML-Git needs a configured storage to store data from managed artifacts. Please take a look at the [ML-Git architecture and internals documentation](docs/mlgit_internals.md) to better understand how ML-Git works internally with data.
- To configure the storage [see documentation about supported storages and how to configure each one.](docs/storage_configurations.md)
4 - ML-Git project:
- An ML-Git project is an initialized directory that will contain a configuration file to be used by ML-Git in managing entities.
To configure it you can use the basic steps to configure the project described in *[first project documentation.](docs/first_project.md)*
### Usage
```
ml-git --help
Usage: ml-git [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
Commands:
clone Clone an ml-git repository ML_GIT_REPOSITORY_URL
datasets Management of datasets within this ml-git repository.
labels Management of labels sets within this ml-git repository.
models Management of models within this ml-git repository.
repository Management of this ml-git repository.
```
### Basic commands
<details markdown="1">
<summary><code>ml-git clone <repository-url></code></summary>
<br>
```
ml-git clone https://github.com/user/ml_git_configuration_file_example.git
```
If you prefer to create a new directory to clone into:
```
ml-git clone https://github.com/user/ml_git_configuration_file_example.git my-project-dir
```
If you prefer keep git tracking files in the project:
```
ml-git clone https://github.com/user/ml_git_configuration_file_example.git --track
```
</details>
<details markdown="1">
<summary><code>ml-git <ml-entity> create</code></summary>
This command will help you to start a new project, it creates your project artifact metadata:
```
ml-git datasets create --categories="computer-vision, images" --bucket-name=your_bucket --import=../import-path --mutability=strict dataset-ex
```
Demonstration video:
[![asciicast](https://asciinema.org/a/435917.svg)](https://asciinema.org/a/435917)
</details>
<details markdown="1">
<summary><code>ml-git <ml-entity> status</code></summary>
Show changes in project workspace:
```
ml-git datasets status dataset-ex
```
Demonstration video:
[![asciicast](https://asciinema.org/a/385780.svg)](https://asciinema.org/a/385780)
</details>
<details markdown="1">
<summary><code>ml-git <ml-entity> add</code></summary>
Add new files to index:
```
ml-git datasets add dataset-ex
```
To increment version:
```
ml-git datasets add dataset-ex --bumpversion
```
Add an specific file:
```
ml-git datasets add dataset-ex data/file_name.ex
```
Demonstration video:
[![asciicast](https://asciinema.org/a/385781.svg)](https://asciinema.org/a/385781)
</details>
<details markdown="1">
<summary><code>ml-git <ml-entity> commit</code></summary>
Consolidate added files in the index to repository:
```
ml-git datasets commit dataset-ex
```
Demonstration video:
[![asciicast](https://asciinema.org/a/385782.svg)](https://asciinema.org/a/385782)
</details>
<details markdown="1">
<summary><code>ml-git <ml-entity> push</code></summary>
Upload metadata to remote repository and send [chunks](docs/mlgit_internals.md) to storage:
```
ml-git datasets push dataset-ex
```
Demonstration video:
[![asciicast](https://asciinema.org/a/385783.svg)](https://asciinema.org/a/385783)
</details>
<details markdown="1">
<summary><code>ml-git <ml-entity> checkout</code></summary>
Change workspace and metadata to versioned ml-entity tag:
```
ml-git datasets checkout computer-vision__images__dataset-ex__1
```
Demonstration video:
[![asciicast](https://asciinema.org/a/385784.svg)](https://asciinema.org/a/385784)
</details>
[More about commands in documentation](docs/mlgit_commands.md)
### How to contribute
Your contributions are always welcome!
1. Fork the repository into your own GitHub
2. Clone the repository to your local machine
3. Create a new branch for your changes using the following pattern `(feature | bugfix | hotfix)/branch_name`. Example: `feature/sftp_storage_implementation`
4. Make changes and [test](docs/developer_info.md)
5. Push the changes to your repository
6. Create a Pull Request from your forked repository to the ML-Git repository with comprehensive description of changes
Another way to contribute with the community is creating an issue to track your ideas, doubts, enhancements, tasks, or bugs found.
If an issue with the same topic already exists, discuss on the issue.
### Links
- [ML-Git API documentation](docs/api/README.md) - Find the commands that are available in our api, usage examples and more.
- [Working with tabular data](docs/tabular_data/tabular_data.md) - Find suggestions on how to use ml-git with tabular data.
- [ml-git data specialization plugins](docs/plugins.md) - Dynamically link third-party packages to add specialized behaviors for the data type.
Raw data
{
"_id": null,
"home_page": "https://github.com/HPInc/ml-git",
"name": "ml-git",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "version control,cloud storage,machine learning,datasets,labels,models",
"author": "Sebastien Tandel",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/f8/34/a67495abfa670ed66f5987163358ac6a0d1aca916e84153fab444868a9b5/ml_git-2.9.9.tar.gz",
"platform": "Any",
"description": "> <span style=\"color:red\">**\u26a0\ufe0f End-of-life notice.**</span> \n> <span style=\"color:red\">Effective immediately, HP has ended development, maintenance, and support of this project. The repo will be available in read-only mode until September 1st, 2024, when it will be deleted. You are welcome to create a copy and keep the project going.</span>\n\n# ML-Git\n\nML-Git is a tool which provides a Distributed Version Control system to enable efficient dataset management. Like its name emphasizes, it is inspired in git concepts and workflows, ML-Git enables the following operations:\n\n- Manage a repository of different datasets, labels and models.\n- Distribute these ML artifacts between members of a team or across organizations.\n- Apply the right data governance and security models to their artifacts.\n\nIf you are seeking to learn more about ML-Git, access [ML-Git Page](https://hpinc.github.io/ml-git/).\n\n\n### How to install\n\n**Prerequisites:**\n\n- [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)\n- [Python 3.6.1+](https://www.python.org/downloads/release/python-361/)\n- [Pip 20.1.1+](https://pypi.org/project/pip/)\n\n**From repository:**\n```\npip install ml-git\n```\n\n**From source code:**\n\nDownload ML-Git from repository and execute commands below:\n\n```\ncd ml-git/\npip install .\n```\n\n### How to uninstall\n\n```\npip uninstall ml-git\n```\n\n### How to configure\n\n1 - As ML-Git leverages git to manage ML entities metadata, it is necessary to configure user name and email address:\n\n```\ngit config --global user.name \"Your User\"\ngit config --global user.email \"your_email@example.com\"\n```\n\n2 - **OPTIONAL CONFIGURATIONS** \n- 2.1 - Some ML-Git commands have a wizard to help you during their execution. Those commands have the ```--wizard``` option available to enable this wizard. However, you can configure the wizard to be enabled by default on all supported commands by running the following command:\n\n ```\n ml-git repository config --set-wizard=enabled\n ```\n\n- 2.2 - You can also allow commands and options to be autocompleted with a `[Tab]` key press. For that, take a look at the following link [ML-Git Shell Completion Support](docs/shell_completion_guide.md).\n\n3 - Storage:\n\nML-Git needs a configured storage to store data from managed artifacts. Please take a look at the [ML-Git architecture and internals documentation](docs/mlgit_internals.md) to better understand how ML-Git works internally with data.\n\n- To configure the storage [see documentation about supported storages and how to configure each one.](docs/storage_configurations.md)\n\n\n4 - ML-Git project:\n\n- An ML-Git project is an initialized directory that will contain a configuration file to be used by ML-Git in managing entities. \nTo configure it you can use the basic steps to configure the project described in *[first project documentation.](docs/first_project.md)*\n\n### Usage\n\n```\nml-git --help\nUsage: ml-git [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n --version Show the version and exit.\n\nCommands:\n clone Clone an ml-git repository ML_GIT_REPOSITORY_URL\n datasets Management of datasets within this ml-git repository.\n labels Management of labels sets within this ml-git repository.\n models Management of models within this ml-git repository.\n repository Management of this ml-git repository.\n```\n\n### Basic commands\n\n<details markdown=\"1\">\n<summary><code>ml-git clone <repository-url></code></summary>\n<br>\n\n```\nml-git clone https://github.com/user/ml_git_configuration_file_example.git\n```\n\nIf you prefer to create a new directory to clone into:\n\n```\nml-git clone https://github.com/user/ml_git_configuration_file_example.git my-project-dir\n```\n\n\nIf you prefer keep git tracking files in the project:\n\n```\nml-git clone https://github.com/user/ml_git_configuration_file_example.git --track\n```\n\n</details>\n\n<details markdown=\"1\">\n<summary><code>ml-git <ml-entity> create</code></summary>\nThis command will help you to start a new project, it creates your project artifact metadata:\n\n```\nml-git datasets create --categories=\"computer-vision, images\" --bucket-name=your_bucket --import=../import-path --mutability=strict dataset-ex \n```\n\nDemonstration video:\n\n [![asciicast](https://asciinema.org/a/435917.svg)](https://asciinema.org/a/435917)\n\n\n</details>\n\n<details markdown=\"1\">\n<summary><code>ml-git <ml-entity> status</code></summary>\nShow changes in project workspace:\n\n```\nml-git datasets status dataset-ex\n```\n\nDemonstration video:\n\n [![asciicast](https://asciinema.org/a/385780.svg)](https://asciinema.org/a/385780)\n\n\n</details>\n\n<details markdown=\"1\">\n<summary><code>ml-git <ml-entity> add</code></summary>\nAdd new files to index:\n\n```\nml-git datasets add dataset-ex\n```\n\nTo increment version:\n\n```\nml-git datasets add dataset-ex --bumpversion\n```\n\nAdd an specific file:\n\n```\nml-git datasets add dataset-ex data/file_name.ex\n```\n\nDemonstration video:\n\n [![asciicast](https://asciinema.org/a/385781.svg)](https://asciinema.org/a/385781)\n\n\n</details>\n<details markdown=\"1\">\n<summary><code>ml-git <ml-entity> commit</code></summary>\nConsolidate added files in the index to repository:\n\n```\nml-git datasets commit dataset-ex\n```\n\nDemonstration video:\n\n [![asciicast](https://asciinema.org/a/385782.svg)](https://asciinema.org/a/385782)\n\n\n</details>\n<details markdown=\"1\">\n<summary><code>ml-git <ml-entity> push</code></summary>\nUpload metadata to remote repository and send [chunks](docs/mlgit_internals.md) to storage:\n\n```\nml-git datasets push dataset-ex\n```\n\nDemonstration video:\n\n [![asciicast](https://asciinema.org/a/385783.svg)](https://asciinema.org/a/385783)\n\n\n</details>\n<details markdown=\"1\">\n<summary><code>ml-git <ml-entity> checkout</code></summary>\nChange workspace and metadata to versioned ml-entity tag:\n\n```\nml-git datasets checkout computer-vision__images__dataset-ex__1\n```\n\nDemonstration video:\n\n [![asciicast](https://asciinema.org/a/385784.svg)](https://asciinema.org/a/385784)\n</details>\n\n[More about commands in documentation](docs/mlgit_commands.md)\n### How to contribute\n\nYour contributions are always welcome!\n\n1. Fork the repository into your own GitHub\n2. Clone the repository to your local machine\n3. Create a new branch for your changes using the following pattern `(feature | bugfix | hotfix)/branch_name`. Example: `feature/sftp_storage_implementation`\n4. Make changes and [test](docs/developer_info.md)\n5. Push the changes to your repository\n6. Create a Pull Request from your forked repository to the ML-Git repository with comprehensive description of changes\n\nAnother way to contribute with the community is creating an issue to track your ideas, doubts, enhancements, tasks, or bugs found. \nIf an issue with the same topic already exists, discuss on the issue.\n\n### Links\n\n- [ML-Git API documentation](docs/api/README.md) - Find the commands that are available in our api, usage examples and more.\n- [Working with tabular data](docs/tabular_data/tabular_data.md) - Find suggestions on how to use ml-git with tabular data.\n- [ml-git data specialization plugins](docs/plugins.md) - Dynamically link third-party packages to add specialized behaviors for the data type.\n\n\n",
"bugtrack_url": null,
"license": "GNU General Public License v2.0",
"summary": "ML-Git: version control for ML artefacts",
"version": "2.9.9",
"project_urls": {
"Bug Tracker": "https://github.com/HPInc/ml-git/issues",
"Homepage": "https://github.com/HPInc/ml-git"
},
"split_keywords": [
"version control",
"cloud storage",
"machine learning",
"datasets",
"labels",
"models"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5c568596f8e75e28abbcaa23bec6db7213761edd1ad135dabb7cbb353d3527c0",
"md5": "8416f4c3848133be765edb27d259db43",
"sha256": "4686155f567f8f97667023a5c7a4e1e689bba31970142a5de53270a7a3d09cdc"
},
"downloads": -1,
"filename": "ml_git-2.9.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8416f4c3848133be765edb27d259db43",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 254694,
"upload_time": "2023-10-03T20:56:19",
"upload_time_iso_8601": "2023-10-03T20:56:19.768629Z",
"url": "https://files.pythonhosted.org/packages/5c/56/8596f8e75e28abbcaa23bec6db7213761edd1ad135dabb7cbb353d3527c0/ml_git-2.9.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f834a67495abfa670ed66f5987163358ac6a0d1aca916e84153fab444868a9b5",
"md5": "af147d96df889126614406f21c149219",
"sha256": "d6b34512eaabbca3b12b77b69824e4b9cd32a9c030095914445e9dabe7a25810"
},
"downloads": -1,
"filename": "ml_git-2.9.9.tar.gz",
"has_sig": false,
"md5_digest": "af147d96df889126614406f21c149219",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 219958,
"upload_time": "2023-10-03T20:56:21",
"upload_time_iso_8601": "2023-10-03T20:56:21.826166Z",
"url": "https://files.pythonhosted.org/packages/f8/34/a67495abfa670ed66f5987163358ac6a0d1aca916e84153fab444868a9b5/ml_git-2.9.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-03 20:56:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "HPInc",
"github_project": "ml-git",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "ml-git"
}