cmflib

Name	cmflib JSON
Version	0.0.94 JSON
	download
home_page	None
Summary	Track metadata for AI pipeline
upload_time	2025-07-10 14:32:55
maintainer	None
docs_url	None
author	Hewlett Packard Enterprise
requires_python	<3.11,>=3.9
license	None
keywords
VCS
bugtrack_url
requirements	ml-metadata dvc pandas retrying pyarrow neo4j tabulate click minio paramiko scikit_learn scitokens cryptography ray readchar mypy pandas-stubs types-tabulate types-requests types-paramiko types-setuptools types-PyYAML types-protobuf
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Common Metadata Framework (CMF)

[![Deploy Docs](https://github.com/HewlettPackard/cmf/actions/workflows/deploy_docs_to_gh_pages.yaml/badge.svg)](https://github.com/HewlettPackard/cmf/actions)
[![PyPI version](https://badge.fury.io/py/cmflib.svg)](https://pypi.org/project/cmflib/)
[![Docs](https://img.shields.io/badge/docs-online-blue.svg)](https://hewlettpackard.github.io/cmf/)
[![License](https://img.shields.io/github/license/HewlettPackard/cmf)](./LICENSE)

**Common Metadata Framework (CMF)** is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metrics—offering Git-like metadata management across distributed environments.

---

## 🚀 Features

- ✅ Track artifacts (datasets, models, metrics) using content-based hashes  
- ✅ Automatically logs code versions (Git) and data versions (DVC)  
- ✅ Push/pull metadata via CLI across distributed sites  
- ✅ REST API for direct server interaction  
- ✅ Implicit & explicit tracking of pipeline execution  
- ✅ Fine-grained or coarse-grained metric logging  

---

## 📦 Installation

### Requirements

- Linux/Ubuntu/Debian
- Python >=3.9, <3.11
- Git (latest)

### Virtual Environment

<details><summary>Conda</summary>

```bash
conda create -n cmf python=3.10
conda activate cmf
```
</details>

<details><summary>Virtualenv</summary>

```bash
virtualenv --python=3.10 .cmf
source .cmf/bin/activate
```
</details>

### Install CMF

<details><summary>Latest from GitHub</summary>

```bash
pip install git+https://github.com/HewlettPackard/cmf
```
</details>

<details><summary>Stable from PyPI</summary>

```bash
pip install cmflib
```
</details>

### Server Setup

📖 Follow the guide in <a href="docs/cmf_server/cmf-server.md" target="_blank">docs/cmf_server/cmf-server.md</a>

---

## 📘 Documentation

- [Getting Started](https://hewlettpackard.github.io/cmf/)
- [API Reference](https://hewlettpackard.github.io/cmf/api/public/cmf)
- [Command Reference](https://hewlettpackard.github.io/cmf/cmf_client/cmf_client)
- [Related Docs](https://deepwiki.com/HewlettPackard/cmf)

---

## 🧠 How It Works

CMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.

- Artifacts are versioned using DVC (`.dvc` files).
- Code is tracked with Git.
- Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)
- Sync metadata with `cmf metadata push` and `cmf metadata pull`.

---

## 🏛 Architecture

CMF is composed of:

- **cmflib** - metadata library provides API to log/query metadata
- **cmf-client** – CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from git.
- **cmf-server** – REST API for metadata merge
- **Central Repositories** – Git (code), DVC (artifacts), CMF (metadata)

<p align="center">
  <img src="docs/assets/framework.png" height="350" />
</p>

<p align="center">
  <img src="docs/assets/distributed_architecture.png" height="300" />
</p>

---

## 🔧 Sample Usage

```python

from cmflib.cmf import Cmf
from ml_metadata.proto import metadata_store_pb2 as mlpb
cmf = Cmf(filepath="mlmd", pipeline_name="test_pipeline")
context: mlpb.proto.Context = cmf.create_context(
    pipeline_stage="prepare",
    custom_properties ={"user-metadata1": "metadata_value"}
)
execution: mlpb.proto.Execution = cmf.create_execution(
    execution_type="Prepare",
    custom_properties = {"split": split, "seed": seed}
)
artifact: mlpb.proto.Artifact = metawriter.log_dataset(
	"artifacts/data.xml.gz", "input",
	custom_properties={"user-metadata1": "metadata_value"}
)
```

```bash
cmf                          # CLI to manage metadata and artifacts
cmf init                     # Initialize artifact repository
cmf init show                # Show current CMF config
cmf metadata push            # Push metadata to server
cmf metadata pull            # Pull metadata from server
```
	
➡️ For the complete list of commands, please refer to the <a href="https://hewlettpackard.github.io/cmf/cmf_client/cmf_client">Command Reference</a>


---

## ✅ Benefits

- Full ML pipeline observability
- Unified metadata, artifact, and code tracking
- Scalable metadata syncing
- Team collaboration on metadata

---

## 🎤 Talks & Publications

- 🎙 [Monterey Data Conference 2022](https://drive.google.com/file/d/1Oqs0AN0RsAjt_y9ZjzYOmBxI8H0yqSpB/view)

---

## 🌐 Related Projects

- [📚 Common Metadata Ontology](https://hewlettpackard.github.io/cmf/common-metadata-ontology/readme/)
- [🧠 AI Metadata Knowledge Graph (AIMKG)](https://github.com/HewlettPackard/ai-metadata-knowledge-graph)
---

## 🤝 Community

- 💬 [Join CMF on Slack](https://commonmetadata.slack.com/)
- 📧 Contact: **annmary.roy@hpe.com**

---

## 📄 License

Licensed under the [Apache 2.0 License](./LICENSE)

---

> © Hewlett Packard Enterprise. Built for reproducibility in ML.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cmflib",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.11,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Hewlett Packard Enterprise",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/2e/d6/44b5247f06bc047ce92427ef4602a76279d4de9c0b4c105eec2a3def4311/cmflib-0.0.94.tar.gz",
    "platform": null,
    "description": "# Common Metadata Framework (CMF)\n\n[![Deploy Docs](https://github.com/HewlettPackard/cmf/actions/workflows/deploy_docs_to_gh_pages.yaml/badge.svg)](https://github.com/HewlettPackard/cmf/actions)\n[![PyPI version](https://badge.fury.io/py/cmflib.svg)](https://pypi.org/project/cmflib/)\n[![Docs](https://img.shields.io/badge/docs-online-blue.svg)](https://hewlettpackard.github.io/cmf/)\n[![License](https://img.shields.io/github/license/HewlettPackard/cmf)](./LICENSE)\n\n**Common Metadata Framework (CMF)** is a metadata tracking and versioning system for ML pipelines. It tracks code, data, and pipeline metrics\u2014offering Git-like metadata management across distributed environments.\n\n---\n\n## \ud83d\ude80 Features\n\n- \u2705 Track artifacts (datasets, models, metrics) using content-based hashes  \n- \u2705 Automatically logs code versions (Git) and data versions (DVC)  \n- \u2705 Push/pull metadata via CLI across distributed sites  \n- \u2705 REST API for direct server interaction  \n- \u2705 Implicit & explicit tracking of pipeline execution  \n- \u2705 Fine-grained or coarse-grained metric logging  \n\n---\n\n## \ud83d\udce6 Installation\n\n### Requirements\n\n- Linux/Ubuntu/Debian\n- Python >=3.9, <3.11\n- Git (latest)\n\n### Virtual Environment\n\n<details><summary>Conda</summary>\n\n```bash\nconda create -n cmf python=3.10\nconda activate cmf\n```\n</details>\n\n<details><summary>Virtualenv</summary>\n\n```bash\nvirtualenv --python=3.10 .cmf\nsource .cmf/bin/activate\n```\n</details>\n\n### Install CMF\n\n<details><summary>Latest from GitHub</summary>\n\n```bash\npip install git+https://github.com/HewlettPackard/cmf\n```\n</details>\n\n<details><summary>Stable from PyPI</summary>\n\n```bash\npip install cmflib\n```\n</details>\n\n### Server Setup\n\n\ud83d\udcd6 Follow the guide in <a href=\"docs/cmf_server/cmf-server.md\" target=\"_blank\">docs/cmf_server/cmf-server.md</a>\n\n---\n\n## \ud83d\udcd8 Documentation\n\n- [Getting Started](https://hewlettpackard.github.io/cmf/)\n- [API Reference](https://hewlettpackard.github.io/cmf/api/public/cmf)\n- [Command Reference](https://hewlettpackard.github.io/cmf/cmf_client/cmf_client)\n- [Related Docs](https://deepwiki.com/HewlettPackard/cmf)\n\n---\n\n## \ud83e\udde0 How It Works\n\nCMF tracks pipeline stages, inputs/outputs, metrics, and code. It supports decentralized execution across datacenters, edge, and cloud.\n\n- Artifacts are versioned using DVC (`.dvc` files).\n- Code is tracked with Git.\n- Metadata is logged to relational DB (e.g., SQLite, PostgreSQL)\n- Sync metadata with `cmf metadata push` and `cmf metadata pull`.\n\n---\n\n## \ud83c\udfdb Architecture\n\nCMF is composed of:\n\n- **cmflib** - metadata library provides API to log/query metadata\n- **cmf-client** \u2013 CLI to sync metadata with server, push/pull artifacts to the user-specified repo, push/pull code from git.\n- **cmf-server** \u2013 REST API for metadata merge\n- **Central Repositories** \u2013 Git (code), DVC (artifacts), CMF (metadata)\n\n<p align=\"center\">\n  <img src=\"docs/assets/framework.png\" height=\"350\" />\n</p>\n\n<p align=\"center\">\n  <img src=\"docs/assets/distributed_architecture.png\" height=\"300\" />\n</p>\n\n---\n\n## \ud83d\udd27 Sample Usage\n\n```python\n\nfrom cmflib.cmf import Cmf\nfrom ml_metadata.proto import metadata_store_pb2 as mlpb\ncmf = Cmf(filepath=\"mlmd\", pipeline_name=\"test_pipeline\")\ncontext: mlpb.proto.Context = cmf.create_context(\n    pipeline_stage=\"prepare\",\n    custom_properties ={\"user-metadata1\": \"metadata_value\"}\n)\nexecution: mlpb.proto.Execution = cmf.create_execution(\n    execution_type=\"Prepare\",\n    custom_properties = {\"split\": split, \"seed\": seed}\n)\nartifact: mlpb.proto.Artifact = metawriter.log_dataset(\n\t\"artifacts/data.xml.gz\", \"input\",\n\tcustom_properties={\"user-metadata1\": \"metadata_value\"}\n)\n```\n\n```bash\ncmf                          # CLI to manage metadata and artifacts\ncmf init                     # Initialize artifact repository\ncmf init show                # Show current CMF config\ncmf metadata push            # Push metadata to server\ncmf metadata pull            # Pull metadata from server\n```\n\t\n\u27a1\ufe0f For the complete list of commands, please refer to the <a href=\"https://hewlettpackard.github.io/cmf/cmf_client/cmf_client\">Command Reference</a>\n\n\n---\n\n## \u2705 Benefits\n\n- Full ML pipeline observability\n- Unified metadata, artifact, and code tracking\n- Scalable metadata syncing\n- Team collaboration on metadata\n\n---\n\n## \ud83c\udfa4 Talks & Publications\n\n- \ud83c\udf99 [Monterey Data Conference 2022](https://drive.google.com/file/d/1Oqs0AN0RsAjt_y9ZjzYOmBxI8H0yqSpB/view)\n\n---\n\n## \ud83c\udf10 Related Projects\n\n- [\ud83d\udcda Common Metadata Ontology](https://hewlettpackard.github.io/cmf/common-metadata-ontology/readme/)\n- [\ud83e\udde0 AI Metadata Knowledge Graph (AIMKG)](https://github.com/HewlettPackard/ai-metadata-knowledge-graph)\n---\n\n## \ud83e\udd1d Community\n\n- \ud83d\udcac [Join CMF on Slack](https://commonmetadata.slack.com/)\n- \ud83d\udce7 Contact: **annmary.roy@hpe.com**\n\n---\n\n## \ud83d\udcc4 License\n\nLicensed under the [Apache 2.0 License](./LICENSE)\n\n---\n\n> \u00a9 Hewlett Packard Enterprise. Built for reproducibility in ML.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Track metadata for AI pipeline",
    "version": "0.0.94",
    "project_urls": {
        "BugTracker": "https://github.com/HewlettPackard/cmf/issues",
        "Homepage": "https://github.com/HewlettPackard/cmf"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "58b410d0ca4de0f66d2ed917b19b062d34c61d8406541521a356c327c6439243",
                "md5": "ba5d6c8ef18959e82bbb8f941c78f602",
                "sha256": "062844fbd0ed14fa1339282259204637966f41ea710c639ba3a983f2cbaa8f4f"
            },
            "downloads": -1,
            "filename": "cmflib-0.0.94-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ba5d6c8ef18959e82bbb8f941c78f602",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.11,>=3.9",
            "size": 170289,
            "upload_time": "2025-07-10T14:32:53",
            "upload_time_iso_8601": "2025-07-10T14:32:53.893179Z",
            "url": "https://files.pythonhosted.org/packages/58/b4/10d0ca4de0f66d2ed917b19b062d34c61d8406541521a356c327c6439243/cmflib-0.0.94-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2ed644b5247f06bc047ce92427ef4602a76279d4de9c0b4c105eec2a3def4311",
                "md5": "ae49699b294e20e73848bafc7ee65d05",
                "sha256": "c43418ae1a27c05ef758f8826ce72c1e7ff07af2c0b162efb7b1b6c24752a80f"
            },
            "downloads": -1,
            "filename": "cmflib-0.0.94.tar.gz",
            "has_sig": false,
            "md5_digest": "ae49699b294e20e73848bafc7ee65d05",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.11,>=3.9",
            "size": 121683,
            "upload_time": "2025-07-10T14:32:55",
            "upload_time_iso_8601": "2025-07-10T14:32:55.191207Z",
            "url": "https://files.pythonhosted.org/packages/2e/d6/44b5247f06bc047ce92427ef4602a76279d4de9c0b4c105eec2a3def4311/cmflib-0.0.94.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-10 14:32:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "HewlettPackard",
    "github_project": "cmf",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "ml-metadata",
            "specs": [
                [
                    "==",
                    "1.15.0"
                ]
            ]
        },
        {
            "name": "dvc",
            "specs": [
                [
                    "==",
                    "3.51.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "retrying",
            "specs": []
        },
        {
            "name": "pyarrow",
            "specs": []
        },
        {
            "name": "neo4j",
            "specs": [
                [
                    "==",
                    "5.26"
                ]
            ]
        },
        {
            "name": "tabulate",
            "specs": []
        },
        {
            "name": "click",
            "specs": []
        },
        {
            "name": "minio",
            "specs": []
        },
        {
            "name": "paramiko",
            "specs": [
                [
                    "==",
                    "3.4.1"
                ]
            ]
        },
        {
            "name": "scikit_learn",
            "specs": []
        },
        {
            "name": "scitokens",
            "specs": []
        },
        {
            "name": "cryptography",
            "specs": []
        },
        {
            "name": "ray",
            "specs": [
                [
                    "==",
                    "2.34.0"
                ]
            ]
        },
        {
            "name": "readchar",
            "specs": []
        },
        {
            "name": "mypy",
            "specs": []
        },
        {
            "name": "pandas-stubs",
            "specs": []
        },
        {
            "name": "types-tabulate",
            "specs": []
        },
        {
            "name": "types-requests",
            "specs": []
        },
        {
            "name": "types-paramiko",
            "specs": []
        },
        {
            "name": "types-setuptools",
            "specs": []
        },
        {
            "name": "types-PyYAML",
            "specs": []
        },
        {
            "name": "types-protobuf",
            "specs": []
        }
    ],
    "lcname": "cmflib"
}

Hewlett Packard Enterprise