# Common Metadata Framework (CMF)
[](https://github.com/HewlettPackard/cmf/actions)
[](https://pypi.org/project/cmflib/)
[](https://hewlettpackard.github.io/cmf/)
[](./LICENSE)
**Common Metadata Framework (CMF)** is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metricsβoffering Git-like metadata management across distributed environments.
---
## π Features
- β
Track artifacts (datasets, models, metrics) using content-based hashes
- β
Automatically logs code versions (Git) and data versions (DVC)
- β
Push/pull metadata via CLI across distributed sites
- β
REST API for direct server interaction
- β
Implicit & explicit tracking of pipeline execution
- β
Fine-grained or coarse-grained metric logging
---
## π¦ Installation
### Requirements
- Linux/Ubuntu/Debian
- Python >=3.9, <3.11
- Git (latest)
### Virtual Environment
<details><summary>Conda</summary>
```bash
conda create -n cmf python=3.10
conda activate cmf
```
</details>
<details><summary>Virtualenv</summary>
```bash
virtualenv --python=3.10 .cmf
source .cmf/bin/activate
```
</details>
### Install CMF
<details><summary>Latest from GitHub</summary>
```bash
pip install git+https://github.com/HewlettPackard/cmf
```
</details>
<details><summary>Stable from PyPI</summary>
```bash
pip install cmflib
```
</details>
### Server Setup
π Follow the guide in <a href="docs/cmf_server/cmf-server.md" target="_blank">docs/cmf_server/cmf-server.md</a>
---
## π Documentation
- [Getting Started](https://hewlettpackard.github.io/cmf/)
- [API Reference](https://hewlettpackard.github.io/cmf/api/public/cmf)
- [Command Reference](https://hewlettpackard.github.io/cmf/cmf_client/cmf_client)
- [Related Docs](https://deepwiki.com/HewlettPackard/cmf)
---
## π§ How It Works
CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.
- Artifacts are versioned using DVC (`.dvc` files).
- Code is tracked with Git.
- Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
- Sync metadata with `cmf metadata push` and `cmf metadata pull`.
---
## π Architecture
CMF is composed of:
- **cmflib** - metadata library provides API to log/query metadata
- **cmf-client** β CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from git.
- **cmf-server** β REST API for metadata merge
- **Central Repositories** β Git (code), DVC (artifacts), CMF (metadata)
<p align="center">
<img src="docs/assets/framework.png" height="350" />
</p>
<p align="center">
<img src="docs/assets/distributed_architecture.png" height="300" />
</p>
---
## π§ Sample Usage
```python
from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb
cmf = Cmf(filepath="mlmd", pipeline_name="test_pipeline")
context: mlpb.proto.Context = cmf.create_context(
pipeline_stage="prepare",
custom_properties ={"user-metadata1": "metadata_value"}
)
execution: mlpb.proto.Execution = cmf.create_execution(
execution_type="Prepare",
custom_properties = {"split": split, "seed": seed}
)
artifact: mlpb.proto.Artifact = metawriter.log_dataset(
"artifacts/data.xml.gz", "input",
custom_properties={"user-metadata1": "metadata_value"}
)
```
```bash
cmf # CLI to manage metadata and artifacts
cmf init # Initialize artifact repository
cmf init show # Show current CMF config
cmf metadata push # Push metadata to server
cmf metadata pull # Pull metadata from server
```
β‘οΈ For the complete list of commands, please refer to the <a href="https://hewlettpackard.github.io/cmf/cmf_client/cmf_client">Command Reference</a>
---
## β
Benefits
- Full ML pipeline observability
- Unified metadata, artifact, and code tracking
- Scalable metadata syncing
- Team collaboration on metadata
---
## π€ Talks & Publications
- π [Monterey Data Conference 2022](https://drive.google.com/file/d/1Oqs0AN0RsAjt_y9ZjzYOmBxI8H0yqSpB/view)
---
## π Related Projects
- [π Common Metadata Ontology](https://hewlettpackard.github.io/cmf/common-metadata-ontology/readme/)
- [π§ AI Metadata Knowledge Graph (AIMKG)](https://github.com/HewlettPackard/ai-metadata-knowledge-graph)
---
## π€ Community
- π¬ [Join CMF on Slack](https://commonmetadata.slack.com/)
- π§ Contact: **annmary.roy@hpe.com**
---
## π License
Licensed under the [Apache 2.0 License](./LICENSE)
---
> Β© Hewlett Packard Enterprise. Built for reproducibility in ML.
Raw data
{
"_id": null,
"home_page": null,
"name": "cmflib",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.11,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Hewlett Packard Enterprise",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/2e/d6/44b5247f06bc047ce92427ef4602a76279d4de9c0b4c105eec2a3def4311/cmflib-0.0.94.tar.gz",
"platform": null,
"description": "# Common Metadata Framework (CMF)\n\n[](https://github.com/HewlettPackard/cmf/actions)\n[](https://pypi.org/project/cmflib/)\n[](https://hewlettpackard.github.io/cmf/)\n[](./LICENSE)\n\n**Common Metadata Framework (CMF)** is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metrics\u2014offering Git-like metadata management across distributed environments.\n\n---\n\n## \ud83d\ude80 Features\n\n- \u2705 Track artifacts (datasets, models, metrics) using content-based hashes \n- \u2705 Automatically logs code versions (Git) and data versions (DVC) \n- \u2705 Push/pull metadata via CLI across distributed sites \n- \u2705 REST API for direct server interaction \n- \u2705 Implicit & explicit tracking of pipeline execution \n- \u2705 Fine-grained or coarse-grained metric logging \n\n---\n\n## \ud83d\udce6 Installation\n\n### Requirements\n\n- Linux/Ubuntu/Debian\n- Python >=3.9, <3.11\n- Git (latest)\n\n### Virtual Environment\n\n<details><summary>Conda</summary>\n\n```bash\nconda create -n cmf python=3.10\nconda activate cmf\n```\n</details>\n\n<details><summary>Virtualenv</summary>\n\n```bash\nvirtualenv --python=3.10 .cmf\nsource .cmf/bin/activate\n```\n</details>\n\n### Install CMF\n\n<details><summary>Latest from GitHub</summary>\n\n```bash\npip install git+https://github.com/HewlettPackard/cmf\n```\n</details>\n\n<details><summary>Stable from PyPI</summary>\n\n```bash\npip install cmflib\n```\n</details>\n\n### Server Setup\n\n\ud83d\udcd6 Follow the guide in <a href=\"docs/cmf_server/cmf-server.md\" target=\"_blank\">docs/cmf_server/cmf-server.md</a>\n\n---\n\n## \ud83d\udcd8 Documentation\n\n- [Getting Started](https://hewlettpackard.github.io/cmf/)\n- [API Reference](https://hewlettpackard.github.io/cmf/api/public/cmf)\n- [Command Reference](https://hewlettpackard.github.io/cmf/cmf_client/cmf_client)\n- [Related Docs](https://deepwiki.com/HewlettPackard/cmf)\n\n---\n\n## \ud83e\udde0 How It Works\n\nCMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.\n\n- Artifacts are versioned using DVC (`.dvc` files).\n- Code is tracked with Git.\n- Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)\n- Sync metadata with `cmf metadata push` and `cmf metadata pull`.\n\n---\n\n## \ud83c\udfdb Architecture\n\nCMF is composed of:\n\n- **cmflib** - metadata library provides API to log/query metadata\n- **cmf-client** \u2013 CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from git.\n- **cmf-server** \u2013 REST API for metadata merge\n- **Central Repositories** \u2013 Git (code), DVC (artifacts), CMF (metadata)\n\n<p align=\"center\">\n <img src=\"docs/assets/framework.png\" height=\"350\" />\n</p>\n\n<p align=\"center\">\n <img src=\"docs/assets/distributed_architecture.png\" height=\"300\" />\n</p>\n\n---\n\n## \ud83d\udd27 Sample Usage\n\n```python\n\nfrom cmflib.cmf import Cmf\nfrom ml_metadata.proto import metadata_store_pb2 as mlpb\ncmf = Cmf(filepath=\"mlmd\", pipeline_name=\"test_pipeline\")\ncontext: mlpb.proto.Context = cmf.create_context(\n pipeline_stage=\"prepare\",\n custom_properties ={\"user-metadata1\": \"metadata_value\"}\n)\nexecution: mlpb.proto.Execution = cmf.create_execution(\n execution_type=\"Prepare\",\n custom_properties = {\"split\": split, \"seed\": seed}\n)\nartifact: mlpb.proto.Artifact = metawriter.log_dataset(\n\t\"artifacts/data.xml.gz\", \"input\",\n\tcustom_properties={\"user-metadata1\": \"metadata_value\"}\n)\n```\n\n```bash\ncmf # CLI to manage metadata and artifacts\ncmf init # Initialize artifact repository\ncmf init show # Show current CMF config\ncmf metadata push # Push metadata to server\ncmf metadata pull # Pull metadata from server\n```\n\t\n\u27a1\ufe0f For the complete list of commands, please refer to the <a href=\"https://hewlettpackard.github.io/cmf/cmf_client/cmf_client\">Command Reference</a>\n\n\n---\n\n## \u2705 Benefits\n\n- Full ML pipeline observability\n- Unified metadata, artifact, and code tracking\n- Scalable metadata syncing\n- Team collaboration on metadata\n\n---\n\n## \ud83c\udfa4 Talks & Publications\n\n- \ud83c\udf99 [Monterey Data Conference 2022](https://drive.google.com/file/d/1Oqs0AN0RsAjt_y9ZjzYOmBxI8H0yqSpB/view)\n\n---\n\n## \ud83c\udf10 Related Projects\n\n- [\ud83d\udcda Common Metadata Ontology](https://hewlettpackard.github.io/cmf/common-metadata-ontology/readme/)\n- [\ud83e\udde0 AI Metadata Knowledge Graph (AIMKG)](https://github.com/HewlettPackard/ai-metadata-knowledge-graph)\n---\n\n## \ud83e\udd1d Community\n\n- \ud83d\udcac [Join CMF on Slack](https://commonmetadata.slack.com/)\n- \ud83d\udce7 Contact: **annmary.roy@hpe.com**\n\n---\n\n## \ud83d\udcc4 License\n\nLicensed under the [Apache 2.0 License](./LICENSE)\n\n---\n\n> \u00a9 Hewlett Packard Enterprise. Built for reproducibility in ML.\n",
"bugtrack_url": null,
"license": null,
"summary": "Track metadata for AI pipeline",
"version": "0.0.94",
"project_urls": {
"BugTracker": "https://github.com/HewlettPackard/cmf/issues",
"Homepage": "https://github.com/HewlettPackard/cmf"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "58b410d0ca4de0f66d2ed917b19b062d34c61d8406541521a356c327c6439243",
"md5": "ba5d6c8ef18959e82bbb8f941c78f602",
"sha256": "062844fbd0ed14fa1339282259204637966f41ea710c639ba3a983f2cbaa8f4f"
},
"downloads": -1,
"filename": "cmflib-0.0.94-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ba5d6c8ef18959e82bbb8f941c78f602",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.11,>=3.9",
"size": 170289,
"upload_time": "2025-07-10T14:32:53",
"upload_time_iso_8601": "2025-07-10T14:32:53.893179Z",
"url": "https://files.pythonhosted.org/packages/58/b4/10d0ca4de0f66d2ed917b19b062d34c61d8406541521a356c327c6439243/cmflib-0.0.94-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2ed644b5247f06bc047ce92427ef4602a76279d4de9c0b4c105eec2a3def4311",
"md5": "ae49699b294e20e73848bafc7ee65d05",
"sha256": "c43418ae1a27c05ef758f8826ce72c1e7ff07af2c0b162efb7b1b6c24752a80f"
},
"downloads": -1,
"filename": "cmflib-0.0.94.tar.gz",
"has_sig": false,
"md5_digest": "ae49699b294e20e73848bafc7ee65d05",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.11,>=3.9",
"size": 121683,
"upload_time": "2025-07-10T14:32:55",
"upload_time_iso_8601": "2025-07-10T14:32:55.191207Z",
"url": "https://files.pythonhosted.org/packages/2e/d6/44b5247f06bc047ce92427ef4602a76279d4de9c0b4c105eec2a3def4311/cmflib-0.0.94.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 14:32:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "HewlettPackard",
"github_project": "cmf",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "ml-metadata",
"specs": [
[
"==",
"1.15.0"
]
]
},
{
"name": "dvc",
"specs": [
[
"==",
"3.51.1"
]
]
},
{
"name": "pandas",
"specs": []
},
{
"name": "retrying",
"specs": []
},
{
"name": "pyarrow",
"specs": []
},
{
"name": "neo4j",
"specs": [
[
"==",
"5.26"
]
]
},
{
"name": "tabulate",
"specs": []
},
{
"name": "click",
"specs": []
},
{
"name": "minio",
"specs": []
},
{
"name": "paramiko",
"specs": [
[
"==",
"3.4.1"
]
]
},
{
"name": "scikit_learn",
"specs": []
},
{
"name": "scitokens",
"specs": []
},
{
"name": "cryptography",
"specs": []
},
{
"name": "ray",
"specs": [
[
"==",
"2.34.0"
]
]
},
{
"name": "readchar",
"specs": []
},
{
"name": "mypy",
"specs": []
},
{
"name": "pandas-stubs",
"specs": []
},
{
"name": "types-tabulate",
"specs": []
},
{
"name": "types-requests",
"specs": []
},
{
"name": "types-paramiko",
"specs": []
},
{
"name": "types-setuptools",
"specs": []
},
{
"name": "types-PyYAML",
"specs": []
},
{
"name": "types-protobuf",
"specs": []
}
],
"lcname": "cmflib"
}