bdm-tool


Namebdm-tool JSON
Version 0.2 PyPI version JSON
download
home_pageNone
SummarySimple lightweight dataset versioning utility based purely on the file system and symbolic links
upload_time2025-10-12 22:08:30
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords version-control data-versioning versioning machine-learning ai data developer-tools
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # BDM Tool
__BDM__ (Big Dataset Management) Tool is a __simple__ lightweight dataset versioning utility based purely on the file system and symbolic links.

BDM Tool Features:
* __No full downloads required__: Switch to any dataset version without downloading the entire dataset to your local machine.
* __Independent of external VCS__: Does not rely on external version control systems like Git or Mercurial, and does not require integrating with one.
* __Easy dataset sharing__: Supports sharing datasets via remote file systems on a data server.
* __Fast version switching__: Switching between dataset versions does not require long synchronization processes.
* __Transparent version access__: Different dataset versions are accessed through simple and intuitive paths (e.g., dataset/v1.0/, dataset/v2.0/, etc.), making versioning fully transparent to configuration files, MLflow parameters, and other tooling.
* __Storage optimization__: Efficiently stores multiple dataset versions using symbolic links to avoid duplication.
* __Designed for large, complex datasets__: Well-suited for managing big datasets with intricate directory and subdirectory structures.
* __Python API for automation__: Provides a simple Python API to automatically create new dataset versions within MLOps pipelines, workflows, ETL jobs, and other automated processes.

## General Principles
* Each version of a dataset is a path like `dataset/v1.0/`, `dataset/v2.0/`.
* A new dataset version is generated whenever modifications are made
* Each dataset version is immutable and read-only.
* A new version includes only the files that have been added or modified, while unchanged files and directories are stored as symbolic links.
* Each version contains a readme.txt file with a summary of changes.

## Intallation
### Installation from PyPI (Recommended)
Use `pip` to install tool by the following command:
```shell
pip install bdm-tool
```

### Installation from Sources
Use `pip` to install tool by the following command:
```shell
pip install git+https://github.com/aikho/bdm-tool.git
```

## Usage
### Start Versioning Dataset
Let's assume we have a dataset with the following structure:
```shell
tree testdata
testdata
├── annotation
│   ├── part01
│   │   ├── regions01.json
│   │   ├── regions02.json
│   │   ├── regions03.json
│   │   ├── regions04.json
│   │   └── regions05.json
│   ├── part02
│   │   ├── regions01.json
│   │   ├── regions02.json
│   │   ├── regions03.json
│   │   ├── regions04.json
│   │   └── regions05.json
│   └── part03
│       ├── regions01.json
│       ├── regions02.json
│       ├── regions03.json
│       ├── regions04.json
│       └── regions05.json
└── data
    ├── part01
    │   ├── image01.png
    │   ├── image02.png
    │   ├── image03.png
    │   ├── image04.png
    │   └── image05.png
    ├── part02
    │   ├── image01.png
    │   ├── image02.png
    │   ├── image03.png
    │   ├── image04.png
    │   └── image05.png
    └── part03
        ├── image01.png
        ├── image02.png
        ├── image03.png
        ├── image04.png
        └── image05.png

9 directories, 30 files
```
To put it under `bdm-tool` version control use command `bdm init`:
```shell
bdm init testdata
Version v0.1 of dataset has been created.
Files added: 3, updated: 0, removed: 0, symlinked: 0
```
The first version `v0.1` of the dataset has been created. Let’s take a look at the file structure: 
```shell
tree testdata
testdata
├── current -> ./v0.1
└── v0.1
    ├── annotation
    │   ├── part01
    │   │   ├── regions01.json
    │   │   ├── regions02.json
    │   │   ├── regions03.json
    │   │   ├── regions04.json
    │   │   └── regions05.json
    │   ├── part02
    │   │   ├── regions01.json
    │   │   ├── regions02.json
    │   │   ├── regions03.json
    │   │   ├── regions04.json
    │   │   └── regions05.json
    │   └── part03
    │       ├── regions01.json
    │       ├── regions02.json
    │       ├── regions03.json
    │       ├── regions04.json
    │       └── regions05.json
    ├── data
    │   ├── part01
    │   │   ├── image01.png
    │   │   ├── image02.png
    │   │   ├── image03.png
    │   │   ├── image04.png
    │   │   └── image05.png
    │   ├── part02
    │   │   ├── image01.png
    │   │   ├── image02.png
    │   │   ├── image03.png
    │   │   ├── image04.png
    │   │   └── image05.png
    │   └── part03
    │       ├── image01.png
    │       ├── image02.png
    │       ├── image03.png
    │       ├── image04.png
    │       └── image05.png
    └── readme.txt

11 directories, 31 files
```
We can see that version `v0.1` contains all the initial files along with a `readme.txt` file. Let’s take a look inside `readme.txt`:
```shell
cat testdata/v0.1/readme.txt 
Dataset version v0.1 has been created!
Created timestamp: 2023-08-07 19:40:19.498656, OS user: rock-star-ml-engineer
Files added: 2, updated: 0, removed: 0, symlinked: 0

Files added:
annotation/
data/
```
The file shows the creation date, operating system user, relevant statistics, and a summary of performed operations.

### Add New Files
Suppose we have additional data stored in the `new_data` directory:
```shell
tree new_data/
new_data/
├── annotation
│   ├── regions06.json
│   └── regions07.json
└── data
    ├── image06.png
    └── image07.png

2 directories, 4 files
```
New files can be added to a new dataset version using the `dbm change` command. Use the `--add` flag to add individual files, or `--add-all` to add all files from a specified directory:
```shell
bdm change --add_all new_data/annotation/:annotation/part03/ --add_all new_data/data/:data/part03/ -c -m "add new files" testdata
Version v0.2 of dataset has been created.
Files added: 4, updated: 0, removed: 0, symlinked: 14
```
The `:` character is used as a separator between the source path and the target subpath inside the dataset where the files should be added.

The `-c` flag stands for copy. When used, files are copied instead of moved. Moving files can be faster, so you may prefer it for performance reasons.

The `-m` flag allows you to add a message, which is then stored in the `readme.txt` file of the new dataset version.

Let’s take a look inside the `readme.txt` file of the new version:
```shell
cat testdata/current/readme.txt 
Dataset version v0.2 has been created from previous version v0.1!
add new files
Created timestamp: 2023-08-07 19:38:39.758828, OS user: rock-star-ml-engineer
Files added: 4, updated: 0, removed: 0, symlinked: 14

Files added:
annotation/part03/regions06.json
annotation/part03/regions07.json
data/part03/image06.png
data/part03/image07.png
```
Next, let’s examine the updated file structure:
```shell
tree testdata
testdata
├── current -> ./v0.2
├── v0.1
│   ├── annotation
│   │   ├── part01
│   │   │   ├── regions01.json
│   │   │   ├── regions02.json
│   │   │   ├── regions03.json
│   │   │   ├── regions04.json
│   │   │   └── regions05.json
│   │   ├── part02
│   │   │   ├── regions01.json
│   │   │   ├── regions02.json
│   │   │   ├── regions03.json
│   │   │   ├── regions04.json
│   │   │   └── regions05.json
│   │   └── part03
│   │       ├── regions01.json
│   │       ├── regions02.json
│   │       ├── regions03.json
│   │       ├── regions04.json
│   │       └── regions05.json
│   ├── data
│   │   ├── part01
│   │   │   ├── image01.png
│   │   │   ├── image02.png
│   │   │   ├── image03.png
│   │   │   ├── image04.png
│   │   │   └── image05.png
│   │   ├── part02
│   │   │   ├── image01.png
│   │   │   ├── image02.png
│   │   │   ├── image03.png
│   │   │   ├── image04.png
│   │   │   └── image05.png
│   │   └── part03
│   │       ├── image01.png
│   │       ├── image02.png
│   │       ├── image03.png
│   │       ├── image04.png
│   │       └── image05.png
│   └── readme.txt
└── v0.2
    ├── annotation
    │   ├── part01 -> ../../v0.1/annotation/part01
    │   ├── part02 -> ../../v0.1/annotation/part02
    │   └── part03
    │       ├── regions01.json -> ../../../v0.1/annotation/part03/regions01.json
    │       ├── regions02.json -> ../../../v0.1/annotation/part03/regions02.json
    │       ├── regions03.json -> ../../../v0.1/annotation/part03/regions03.json
    │       ├── regions04.json -> ../../../v0.1/annotation/part03/regions04.json
    │       ├── regions05.json -> ../../../v0.1/annotation/part03/regions05.json
    │       ├── regions06.json
    │       └── regions07.json
    ├── data
    │   ├── part01 -> ../../v0.1/data/part01
    │   ├── part02 -> ../../v0.1/data/part02
    │   └── part03
    │       ├── image01.png -> ../../../v0.1/data/part03/image01.png
    │       ├── image02.png -> ../../../v0.1/data/part03/image02.png
    │       ├── image03.png -> ../../../v0.1/data/part03/image03.png
    │       ├── image04.png -> ../../../v0.1/data/part03/image04.png
    │       ├── image05.png -> ../../../v0.1/data/part03/image05.png
    │       ├── image06.png
    │       └── image07.png
    └── readme.txt

20 directories, 46 files
```

### Update Files
Files can be updated in a new dataset version using the `dbm change` command. Use the `--update` flag to update individual files, or `--update-all` to update all files in a given directory:
```shell
bdm change --update data_update/regions05.json:annotation/part03/ -c -m "update" testdata
Version v0.3 of dataset has been created.
Files added: 0, updated: 1, removed: 0, symlinked: 9
```
Let’s take a look inside the `readme.txt` file of the new version:
```shell
cat testdata/current/readme.txt 
Dataset version v0.3 has been created from previous version v0.2!
update
Created timestamp: 2023-08-07 19:40:01.753345, OS user: rock-star-data-scientist
Files added: 0, updated: 1, removed: 0, symlinked: 9

Files updated:
annotation/part03/regions05.json
```
Let’s take a look at the file structure:
```shell
tree testdata
testdata
├── current -> ./v0.3
├── v0.1
│   ├── annotation
│   │   ├── part01
│   │   │   ├── regions01.json
│   │   │   ├── regions02.json
│   │   │   ├── regions03.json
│   │   │   ├── regions04.json
│   │   │   └── regions05.json
│   │   ├── part02
│   │   │   ├── regions01.json
│   │   │   ├── regions02.json
│   │   │   ├── regions03.json
│   │   │   ├── regions04.json
│   │   │   └── regions05.json
│   │   └── part03
│   │       ├── regions01.json
│   │       ├── regions02.json
│   │       ├── regions03.json
│   │       ├── regions04.json
│   │       └── regions05.json
│   ├── data
│   │   ├── part01
│   │   │   ├── image01.png
│   │   │   ├── image02.png
│   │   │   ├── image03.png
│   │   │   ├── image04.png
│   │   │   └── image05.png
│   │   ├── part02
│   │   │   ├── image01.png
│   │   │   ├── image02.png
│   │   │   ├── image03.png
│   │   │   ├── image04.png
│   │   │   └── image05.png
│   │   └── part03
│   │       ├── image01.png
│   │       ├── image02.png
│   │       ├── image03.png
│   │       ├── image04.png
│   │       └── image05.png
│   └── readme.txt
├── v0.2
│   ├── annotation
│   │   ├── part01 -> ../../v0.1/annotation/part01
│   │   ├── part02 -> ../../v0.1/annotation/part02
│   │   └── part03
│   │       ├── regions01.json -> ../../../v0.1/annotation/part03/regions01.json
│   │       ├── regions02.json -> ../../../v0.1/annotation/part03/regions02.json
│   │       ├── regions03.json -> ../../../v0.1/annotation/part03/regions03.json
│   │       ├── regions04.json -> ../../../v0.1/annotation/part03/regions04.json
│   │       ├── regions05.json -> ../../../v0.1/annotation/part03/regions05.json
│   │       ├── regions06.json
│   │       └── regions07.json
│   ├── data
│   │   ├── part01 -> ../../v0.1/data/part01
│   │   ├── part02 -> ../../v0.1/data/part02
│   │   └── part03
│   │       ├── image01.png -> ../../../v0.1/data/part03/image01.png
│   │       ├── image02.png -> ../../../v0.1/data/part03/image02.png
│   │       ├── image03.png -> ../../../v0.1/data/part03/image03.png
│   │       ├── image04.png -> ../../../v0.1/data/part03/image04.png
│   │       ├── image05.png -> ../../../v0.1/data/part03/image05.png
│   │       ├── image06.png
│   │       └── image07.png
│   └── readme.txt
└── v0.3
    ├── annotation
    │   ├── part01 -> ../../v0.2/annotation/part01
    │   ├── part02 -> ../../v0.2/annotation/part02
    │   └── part03
    │       ├── regions01.json -> ../../../v0.2/annotation/part03/regions01.json
    │       ├── regions02.json -> ../../../v0.2/annotation/part03/regions02.json
    │       ├── regions03.json -> ../../../v0.2/annotation/part03/regions03.json
    │       ├── regions04.json -> ../../../v0.2/annotation/part03/regions04.json
    │       ├── regions05.json
    │       ├── regions06.json -> ../../../v0.2/annotation/part03/regions06.json
    │       └── regions07.json -> ../../../v0.2/annotation/part03/regions07.json
    ├── data -> ../v0.2/data
    └── readme.txt

26 directories, 54 file
```

### Remove Files
Files or directories can be removed from the dataset using `dbm change` command with key `--remove`:
```shell
bdm change --remove annotation/part01/regions05.json --remove annotation/part01/regions04.json -c -m "remove obsolete data" testdata 
Version v0.4 of dataset has been created.
Files added: 0, updated: 0, removed: 2, symlinked: 8

```
### Combining Operations
Adding, updating, and removing operations can be freely combined within a single dataset version. Use `bdm change -h` command to get detailed information on available keys and options:
```shell
bdm change -h
```

## License
See `LICENSE` file in the repo.



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bdm-tool",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "version-control, data-versioning, versioning, machine-learning, ai, data, developer-tools",
    "author": null,
    "author_email": "Andrei Khobnia <andrei.khobnia@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/88/61/bb37c300f2038ee60ab08a9846c698d857f782f0d67fe7124189b8327ea9/bdm_tool-0.2.tar.gz",
    "platform": null,
    "description": "# BDM Tool\n__BDM__ (Big Dataset Management) Tool is a __simple__ lightweight dataset versioning utility based purely on the file system and symbolic links.\n\nBDM Tool Features:\n* __No full downloads required__: Switch to any dataset version without downloading the entire dataset to your local machine.\n* __Independent of external VCS__: Does not rely on external version control systems like Git or Mercurial, and does not require integrating with one.\n* __Easy dataset sharing__: Supports sharing datasets via remote file systems on a data server.\n* __Fast version switching__: Switching between dataset versions does not require long synchronization processes.\n* __Transparent version access__: Different dataset versions are accessed through simple and intuitive paths (e.g., dataset/v1.0/, dataset/v2.0/, etc.), making versioning fully transparent to configuration files, MLflow parameters, and other tooling.\n* __Storage optimization__: Efficiently stores multiple dataset versions using symbolic links to avoid duplication.\n* __Designed for large, complex datasets__: Well-suited for managing big datasets with intricate directory and subdirectory structures.\n* __Python API for automation__: Provides a simple Python API to automatically create new dataset versions within MLOps pipelines, workflows, ETL jobs, and other automated processes.\n\n## General Principles\n* Each version of a dataset is a path like `dataset/v1.0/`, `dataset/v2.0/`.\n* A new dataset version is generated whenever modifications are made\n* Each dataset version is immutable and read-only.\n* A new version includes only the files that have been added or modified, while unchanged files and directories are stored as symbolic links.\n* Each version contains a readme.txt file with a summary of changes.\n\n## Intallation\n### Installation from PyPI (Recommended)\nUse `pip` to install tool by the following command:\n```shell\npip install bdm-tool\n```\n\n### Installation from Sources\nUse `pip` to install tool by the following command:\n```shell\npip install git+https://github.com/aikho/bdm-tool.git\n```\n\n## Usage\n### Start Versioning Dataset\nLet's assume we have a dataset with the following structure:\n```shell\ntree testdata\ntestdata\n\u251c\u2500\u2500 annotation\n\u2502   \u251c\u2500\u2500 part01\n\u2502   \u2502   \u251c\u2500\u2500 regions01.json\n\u2502   \u2502   \u251c\u2500\u2500 regions02.json\n\u2502   \u2502   \u251c\u2500\u2500 regions03.json\n\u2502   \u2502   \u251c\u2500\u2500 regions04.json\n\u2502   \u2502   \u2514\u2500\u2500 regions05.json\n\u2502   \u251c\u2500\u2500 part02\n\u2502   \u2502   \u251c\u2500\u2500 regions01.json\n\u2502   \u2502   \u251c\u2500\u2500 regions02.json\n\u2502   \u2502   \u251c\u2500\u2500 regions03.json\n\u2502   \u2502   \u251c\u2500\u2500 regions04.json\n\u2502   \u2502   \u2514\u2500\u2500 regions05.json\n\u2502   \u2514\u2500\u2500 part03\n\u2502       \u251c\u2500\u2500 regions01.json\n\u2502       \u251c\u2500\u2500 regions02.json\n\u2502       \u251c\u2500\u2500 regions03.json\n\u2502       \u251c\u2500\u2500 regions04.json\n\u2502       \u2514\u2500\u2500 regions05.json\n\u2514\u2500\u2500 data\n    \u251c\u2500\u2500 part01\n    \u2502   \u251c\u2500\u2500 image01.png\n    \u2502   \u251c\u2500\u2500 image02.png\n    \u2502   \u251c\u2500\u2500 image03.png\n    \u2502   \u251c\u2500\u2500 image04.png\n    \u2502   \u2514\u2500\u2500 image05.png\n    \u251c\u2500\u2500 part02\n    \u2502   \u251c\u2500\u2500 image01.png\n    \u2502   \u251c\u2500\u2500 image02.png\n    \u2502   \u251c\u2500\u2500 image03.png\n    \u2502   \u251c\u2500\u2500 image04.png\n    \u2502   \u2514\u2500\u2500 image05.png\n    \u2514\u2500\u2500 part03\n        \u251c\u2500\u2500 image01.png\n        \u251c\u2500\u2500 image02.png\n        \u251c\u2500\u2500 image03.png\n        \u251c\u2500\u2500 image04.png\n        \u2514\u2500\u2500 image05.png\n\n9 directories, 30 files\n```\nTo put it under `bdm-tool` version control use command `bdm init`:\n```shell\nbdm init testdata\nVersion v0.1 of dataset has been created.\nFiles added: 3, updated: 0, removed: 0, symlinked: 0\n```\nThe first version `v0.1` of the dataset has been created. Let\u2019s take a look at the file structure: \n```shell\ntree testdata\ntestdata\n\u251c\u2500\u2500 current -> ./v0.1\n\u2514\u2500\u2500 v0.1\n    \u251c\u2500\u2500 annotation\n    \u2502   \u251c\u2500\u2500 part01\n    \u2502   \u2502   \u251c\u2500\u2500 regions01.json\n    \u2502   \u2502   \u251c\u2500\u2500 regions02.json\n    \u2502   \u2502   \u251c\u2500\u2500 regions03.json\n    \u2502   \u2502   \u251c\u2500\u2500 regions04.json\n    \u2502   \u2502   \u2514\u2500\u2500 regions05.json\n    \u2502   \u251c\u2500\u2500 part02\n    \u2502   \u2502   \u251c\u2500\u2500 regions01.json\n    \u2502   \u2502   \u251c\u2500\u2500 regions02.json\n    \u2502   \u2502   \u251c\u2500\u2500 regions03.json\n    \u2502   \u2502   \u251c\u2500\u2500 regions04.json\n    \u2502   \u2502   \u2514\u2500\u2500 regions05.json\n    \u2502   \u2514\u2500\u2500 part03\n    \u2502       \u251c\u2500\u2500 regions01.json\n    \u2502       \u251c\u2500\u2500 regions02.json\n    \u2502       \u251c\u2500\u2500 regions03.json\n    \u2502       \u251c\u2500\u2500 regions04.json\n    \u2502       \u2514\u2500\u2500 regions05.json\n    \u251c\u2500\u2500 data\n    \u2502   \u251c\u2500\u2500 part01\n    \u2502   \u2502   \u251c\u2500\u2500 image01.png\n    \u2502   \u2502   \u251c\u2500\u2500 image02.png\n    \u2502   \u2502   \u251c\u2500\u2500 image03.png\n    \u2502   \u2502   \u251c\u2500\u2500 image04.png\n    \u2502   \u2502   \u2514\u2500\u2500 image05.png\n    \u2502   \u251c\u2500\u2500 part02\n    \u2502   \u2502   \u251c\u2500\u2500 image01.png\n    \u2502   \u2502   \u251c\u2500\u2500 image02.png\n    \u2502   \u2502   \u251c\u2500\u2500 image03.png\n    \u2502   \u2502   \u251c\u2500\u2500 image04.png\n    \u2502   \u2502   \u2514\u2500\u2500 image05.png\n    \u2502   \u2514\u2500\u2500 part03\n    \u2502       \u251c\u2500\u2500 image01.png\n    \u2502       \u251c\u2500\u2500 image02.png\n    \u2502       \u251c\u2500\u2500 image03.png\n    \u2502       \u251c\u2500\u2500 image04.png\n    \u2502       \u2514\u2500\u2500 image05.png\n    \u2514\u2500\u2500 readme.txt\n\n11 directories, 31 files\n```\nWe can see that version `v0.1` contains all the initial files along with a `readme.txt` file. Let\u2019s take a look inside `readme.txt`:\n```shell\ncat testdata/v0.1/readme.txt \nDataset version v0.1 has been created!\nCreated timestamp: 2023-08-07 19:40:19.498656, OS user: rock-star-ml-engineer\nFiles added: 2, updated: 0, removed: 0, symlinked: 0\n\nFiles added:\nannotation/\ndata/\n```\nThe file shows the creation date, operating system user, relevant statistics, and a summary of performed operations.\n\n### Add New Files\nSuppose we have additional data stored in the `new_data` directory:\n```shell\ntree new_data/\nnew_data/\n\u251c\u2500\u2500 annotation\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 regions06.json\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 regions07.json\n\u2514\u2500\u2500 data\n    \u251c\u2500\u2500 image06.png\n    \u2514\u2500\u2500 image07.png\n\n2 directories, 4 files\n```\nNew files can be added to a new dataset version using the `dbm change` command. Use the `--add` flag to add individual files, or `--add-all` to add all files from a specified directory:\n```shell\nbdm change --add_all new_data/annotation/:annotation/part03/ --add_all new_data/data/:data/part03/ -c -m \"add new files\" testdata\nVersion v0.2 of dataset has been created.\nFiles added: 4, updated: 0, removed: 0, symlinked: 14\n```\nThe `:` character is used as a separator between the source path and the target subpath inside the dataset where the files should be added.\n\nThe `-c` flag stands for copy. When used, files are copied instead of moved. Moving files can be faster, so you may prefer it for performance reasons.\n\nThe `-m` flag allows you to add a message, which is then stored in the `readme.txt` file of the new dataset version.\n\nLet\u2019s take a look inside the `readme.txt` file of the new version:\n```shell\ncat testdata/current/readme.txt \nDataset version v0.2 has been created from previous version v0.1!\nadd new files\nCreated timestamp: 2023-08-07 19:38:39.758828, OS user: rock-star-ml-engineer\nFiles added: 4, updated: 0, removed: 0, symlinked: 14\n\nFiles added:\nannotation/part03/regions06.json\nannotation/part03/regions07.json\ndata/part03/image06.png\ndata/part03/image07.png\n```\nNext, let\u2019s examine the updated file structure:\n```shell\ntree testdata\ntestdata\n\u251c\u2500\u2500 current -> ./v0.2\n\u251c\u2500\u2500 v0.1\n\u2502   \u251c\u2500\u2500 annotation\n\u2502   \u2502   \u251c\u2500\u2500 part01\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions01.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions02.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions03.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions04.json\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 regions05.json\n\u2502   \u2502   \u251c\u2500\u2500 part02\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions01.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions02.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions03.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions04.json\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 regions05.json\n\u2502   \u2502   \u2514\u2500\u2500 part03\n\u2502   \u2502       \u251c\u2500\u2500 regions01.json\n\u2502   \u2502       \u251c\u2500\u2500 regions02.json\n\u2502   \u2502       \u251c\u2500\u2500 regions03.json\n\u2502   \u2502       \u251c\u2500\u2500 regions04.json\n\u2502   \u2502       \u2514\u2500\u2500 regions05.json\n\u2502   \u251c\u2500\u2500 data\n\u2502   \u2502   \u251c\u2500\u2500 part01\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image01.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image02.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image03.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image04.png\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 image05.png\n\u2502   \u2502   \u251c\u2500\u2500 part02\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image01.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image02.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image03.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image04.png\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 image05.png\n\u2502   \u2502   \u2514\u2500\u2500 part03\n\u2502   \u2502       \u251c\u2500\u2500 image01.png\n\u2502   \u2502       \u251c\u2500\u2500 image02.png\n\u2502   \u2502       \u251c\u2500\u2500 image03.png\n\u2502   \u2502       \u251c\u2500\u2500 image04.png\n\u2502   \u2502       \u2514\u2500\u2500 image05.png\n\u2502   \u2514\u2500\u2500 readme.txt\n\u2514\u2500\u2500 v0.2\n    \u251c\u2500\u2500 annotation\n    \u2502   \u251c\u2500\u2500 part01 -> ../../v0.1/annotation/part01\n    \u2502   \u251c\u2500\u2500 part02 -> ../../v0.1/annotation/part02\n    \u2502   \u2514\u2500\u2500 part03\n    \u2502       \u251c\u2500\u2500 regions01.json -> ../../../v0.1/annotation/part03/regions01.json\n    \u2502       \u251c\u2500\u2500 regions02.json -> ../../../v0.1/annotation/part03/regions02.json\n    \u2502       \u251c\u2500\u2500 regions03.json -> ../../../v0.1/annotation/part03/regions03.json\n    \u2502       \u251c\u2500\u2500 regions04.json -> ../../../v0.1/annotation/part03/regions04.json\n    \u2502       \u251c\u2500\u2500 regions05.json -> ../../../v0.1/annotation/part03/regions05.json\n    \u2502       \u251c\u2500\u2500 regions06.json\n    \u2502       \u2514\u2500\u2500 regions07.json\n    \u251c\u2500\u2500 data\n    \u2502   \u251c\u2500\u2500 part01 -> ../../v0.1/data/part01\n    \u2502   \u251c\u2500\u2500 part02 -> ../../v0.1/data/part02\n    \u2502   \u2514\u2500\u2500 part03\n    \u2502       \u251c\u2500\u2500 image01.png -> ../../../v0.1/data/part03/image01.png\n    \u2502       \u251c\u2500\u2500 image02.png -> ../../../v0.1/data/part03/image02.png\n    \u2502       \u251c\u2500\u2500 image03.png -> ../../../v0.1/data/part03/image03.png\n    \u2502       \u251c\u2500\u2500 image04.png -> ../../../v0.1/data/part03/image04.png\n    \u2502       \u251c\u2500\u2500 image05.png -> ../../../v0.1/data/part03/image05.png\n    \u2502       \u251c\u2500\u2500 image06.png\n    \u2502       \u2514\u2500\u2500 image07.png\n    \u2514\u2500\u2500 readme.txt\n\n20 directories, 46 files\n```\n\n### Update Files\nFiles can be updated in a new dataset version using the `dbm change` command. Use the `--update` flag to update individual files, or `--update-all` to update all files in a given directory:\n```shell\nbdm change --update data_update/regions05.json:annotation/part03/ -c -m \"update\" testdata\nVersion v0.3 of dataset has been created.\nFiles added: 0, updated: 1, removed: 0, symlinked: 9\n```\nLet\u2019s take a look inside the `readme.txt` file of the new version:\n```shell\ncat testdata/current/readme.txt \nDataset version v0.3 has been created from previous version v0.2!\nupdate\nCreated timestamp: 2023-08-07 19:40:01.753345, OS user: rock-star-data-scientist\nFiles added: 0, updated: 1, removed: 0, symlinked: 9\n\nFiles updated:\nannotation/part03/regions05.json\n```\nLet\u2019s take a look at the file structure:\n```shell\ntree testdata\ntestdata\n\u251c\u2500\u2500 current -> ./v0.3\n\u251c\u2500\u2500 v0.1\n\u2502   \u251c\u2500\u2500 annotation\n\u2502   \u2502   \u251c\u2500\u2500 part01\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions01.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions02.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions03.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions04.json\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 regions05.json\n\u2502   \u2502   \u251c\u2500\u2500 part02\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions01.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions02.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions03.json\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 regions04.json\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 regions05.json\n\u2502   \u2502   \u2514\u2500\u2500 part03\n\u2502   \u2502       \u251c\u2500\u2500 regions01.json\n\u2502   \u2502       \u251c\u2500\u2500 regions02.json\n\u2502   \u2502       \u251c\u2500\u2500 regions03.json\n\u2502   \u2502       \u251c\u2500\u2500 regions04.json\n\u2502   \u2502       \u2514\u2500\u2500 regions05.json\n\u2502   \u251c\u2500\u2500 data\n\u2502   \u2502   \u251c\u2500\u2500 part01\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image01.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image02.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image03.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image04.png\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 image05.png\n\u2502   \u2502   \u251c\u2500\u2500 part02\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image01.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image02.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image03.png\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 image04.png\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 image05.png\n\u2502   \u2502   \u2514\u2500\u2500 part03\n\u2502   \u2502       \u251c\u2500\u2500 image01.png\n\u2502   \u2502       \u251c\u2500\u2500 image02.png\n\u2502   \u2502       \u251c\u2500\u2500 image03.png\n\u2502   \u2502       \u251c\u2500\u2500 image04.png\n\u2502   \u2502       \u2514\u2500\u2500 image05.png\n\u2502   \u2514\u2500\u2500 readme.txt\n\u251c\u2500\u2500 v0.2\n\u2502   \u251c\u2500\u2500 annotation\n\u2502   \u2502   \u251c\u2500\u2500 part01 -> ../../v0.1/annotation/part01\n\u2502   \u2502   \u251c\u2500\u2500 part02 -> ../../v0.1/annotation/part02\n\u2502   \u2502   \u2514\u2500\u2500 part03\n\u2502   \u2502       \u251c\u2500\u2500 regions01.json -> ../../../v0.1/annotation/part03/regions01.json\n\u2502   \u2502       \u251c\u2500\u2500 regions02.json -> ../../../v0.1/annotation/part03/regions02.json\n\u2502   \u2502       \u251c\u2500\u2500 regions03.json -> ../../../v0.1/annotation/part03/regions03.json\n\u2502   \u2502       \u251c\u2500\u2500 regions04.json -> ../../../v0.1/annotation/part03/regions04.json\n\u2502   \u2502       \u251c\u2500\u2500 regions05.json -> ../../../v0.1/annotation/part03/regions05.json\n\u2502   \u2502       \u251c\u2500\u2500 regions06.json\n\u2502   \u2502       \u2514\u2500\u2500 regions07.json\n\u2502   \u251c\u2500\u2500 data\n\u2502   \u2502   \u251c\u2500\u2500 part01 -> ../../v0.1/data/part01\n\u2502   \u2502   \u251c\u2500\u2500 part02 -> ../../v0.1/data/part02\n\u2502   \u2502   \u2514\u2500\u2500 part03\n\u2502   \u2502       \u251c\u2500\u2500 image01.png -> ../../../v0.1/data/part03/image01.png\n\u2502   \u2502       \u251c\u2500\u2500 image02.png -> ../../../v0.1/data/part03/image02.png\n\u2502   \u2502       \u251c\u2500\u2500 image03.png -> ../../../v0.1/data/part03/image03.png\n\u2502   \u2502       \u251c\u2500\u2500 image04.png -> ../../../v0.1/data/part03/image04.png\n\u2502   \u2502       \u251c\u2500\u2500 image05.png -> ../../../v0.1/data/part03/image05.png\n\u2502   \u2502       \u251c\u2500\u2500 image06.png\n\u2502   \u2502       \u2514\u2500\u2500 image07.png\n\u2502   \u2514\u2500\u2500 readme.txt\n\u2514\u2500\u2500 v0.3\n    \u251c\u2500\u2500 annotation\n    \u2502   \u251c\u2500\u2500 part01 -> ../../v0.2/annotation/part01\n    \u2502   \u251c\u2500\u2500 part02 -> ../../v0.2/annotation/part02\n    \u2502   \u2514\u2500\u2500 part03\n    \u2502       \u251c\u2500\u2500 regions01.json -> ../../../v0.2/annotation/part03/regions01.json\n    \u2502       \u251c\u2500\u2500 regions02.json -> ../../../v0.2/annotation/part03/regions02.json\n    \u2502       \u251c\u2500\u2500 regions03.json -> ../../../v0.2/annotation/part03/regions03.json\n    \u2502       \u251c\u2500\u2500 regions04.json -> ../../../v0.2/annotation/part03/regions04.json\n    \u2502       \u251c\u2500\u2500 regions05.json\n    \u2502       \u251c\u2500\u2500 regions06.json -> ../../../v0.2/annotation/part03/regions06.json\n    \u2502       \u2514\u2500\u2500 regions07.json -> ../../../v0.2/annotation/part03/regions07.json\n    \u251c\u2500\u2500 data -> ../v0.2/data\n    \u2514\u2500\u2500 readme.txt\n\n26 directories, 54 file\n```\n\n### Remove Files\nFiles or directories can be removed from the dataset using `dbm change` command with key `--remove`:\n```shell\nbdm change --remove annotation/part01/regions05.json --remove annotation/part01/regions04.json -c -m \"remove obsolete data\" testdata \nVersion v0.4 of dataset has been created.\nFiles added: 0, updated: 0, removed: 2, symlinked: 8\n\n```\n### Combining Operations\nAdding, updating, and removing operations can be freely combined within a single dataset version. Use `bdm change -h` command to get detailed information on available keys and options:\n```shell\nbdm change -h\n```\n\n## License\nSee `LICENSE` file in the repo.\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Simple lightweight dataset versioning utility based purely on the file system and symbolic links",
    "version": "0.2",
    "project_urls": null,
    "split_keywords": [
        "version-control",
        " data-versioning",
        " versioning",
        " machine-learning",
        " ai",
        " data",
        " developer-tools"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8addb04d53c1a390202304f50b57862b431707f3c5ba72a227eb6de8e6a2ec45",
                "md5": "96f75740e03993d604106b92b73036e0",
                "sha256": "07ead409758650b1071c71c9c998de9021ea7d59049c7641b848b1b6d3b239ee"
            },
            "downloads": -1,
            "filename": "bdm_tool-0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "96f75740e03993d604106b92b73036e0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 10249,
            "upload_time": "2025-10-12T22:08:29",
            "upload_time_iso_8601": "2025-10-12T22:08:29.267763Z",
            "url": "https://files.pythonhosted.org/packages/8a/dd/b04d53c1a390202304f50b57862b431707f3c5ba72a227eb6de8e6a2ec45/bdm_tool-0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8861bb37c300f2038ee60ab08a9846c698d857f782f0d67fe7124189b8327ea9",
                "md5": "70b917a70a7e9924f5a70c3fea9e28b2",
                "sha256": "a7a9e2d4aa835ebba8a5df2e8367a3ab21b453c5f1a0a928304157dc1356ace6"
            },
            "downloads": -1,
            "filename": "bdm_tool-0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "70b917a70a7e9924f5a70c3fea9e28b2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 11921,
            "upload_time": "2025-10-12T22:08:30",
            "upload_time_iso_8601": "2025-10-12T22:08:30.702849Z",
            "url": "https://files.pythonhosted.org/packages/88/61/bb37c300f2038ee60ab08a9846c698d857f782f0d67fe7124189b8327ea9/bdm_tool-0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-12 22:08:30",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "bdm-tool"
}
        
Elapsed time: 4.85632s