h5md


Nameh5md JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/hyoklee/h5md
SummaryA command-line tool to convert HDF5 files to markdown format
upload_time2025-11-02 17:48:18
maintainerNone
docs_urlNone
authorJoe Lee
requires_python>=3.10
licenseBSD-3-Clause
keywords hdf5 markdown converter documentation
VCS
bugtrack_url
requirements h5py markitdown numpy plotly pandas pytest pytest-cov black isort flake8 mypy twine build
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # HDF5 to Markdown Converter

A command-line tool to convert HDF5 files to AI-friendly markdown format with key-value structure. This tool helps you visualize the structure, metadata, and actual data from HDF5 files in a format optimized for both human readability and AI consumption.

## Features

- **AI-friendly key-value format** - Structured output optimized for AI parsing
- **Smart data subsetting** - Preview large datasets with configurable row/column limits
- **Multiple sampling strategies** - Choose how to sample data: first, uniform, or edges
- **Flexible data preview** - Include or exclude actual data values
- **Complete metadata** - Display file structure, groups, datasets, and attributes
- **External link support** - Detect and display HDF5 external links
- **Compression info** - Show dataset compression and chunking details

## Installation

```bash
# Clone the repository
git clone https://github.com/hyoklee/h5md.git
cd h5md

# Install in development mode
pip install -e .
```

Or install directly from GitHub:

```bash
pip install git+https://github.com/hyoklee/h5md.git
```

## Usage

### Command Line

**Basic conversion** (uses defaults: 10 rows/cols, 'first' sampling):

```bash
h5md input.h5
```

This will create `input.md` in the same directory.

**Custom output path:**

```bash
h5md input.h5 -o output.md
```

**Control data subsetting:**

```bash
# Limit to 5 rows and 5 columns
h5md input.h5 --max-rows 5 --max-cols 5

# Show all data (use carefully with large files!)
h5md input.h5 --max-rows 0 --max-cols 0

# Metadata only (no data values)
h5md input.h5 --no-data
```

**Choose sampling strategy:**

```bash
# Take first N items (default)
h5md input.h5 --sampling first

# Sample uniformly across dataset
h5md input.h5 --sampling uniform

# Show first and last items (useful for ranges)
h5md input.h5 --sampling edges
```

**Combined options:**

```bash
h5md data.h5 -o output.md --max-rows 20 --max-cols 10 --sampling edges
```

### Python API

```python
from h5md import HDF5Converter

# Basic conversion with defaults
converter = HDF5Converter()
markdown_content = converter.convert('input.h5', 'output.md')

# Advanced: customize subsetting and sampling
converter = HDF5Converter(
    max_rows=20,           # Limit to 20 rows per dataset
    max_cols=15,           # Limit to 15 columns per dataset
    sampling_strategy="edges",  # Show first and last items
    include_data_preview=True   # Include actual data values
)
markdown_content = converter.convert('data.h5', 'output.md')

# Metadata only (no data values)
converter = HDF5Converter(include_data_preview=False)
markdown_content = converter.convert('data.h5', 'metadata.md')
```

## Output Format

The generated markdown uses an AI-friendly key-value structure that includes:

1. **File-level attributes** - Metadata about the HDF5 file
2. **Group hierarchy** - Nested structure with group attributes
3. **Dataset properties** - Shape, data type, size, compression, chunks
4. **Dataset attributes** - Custom metadata for each dataset
5. **Data preview** - Actual data values in key-value format (configurable)
6. **External links** - Target file and path information

### Sample Key-Value Markdown Output

```markdown
# HDF5 File Structure: example.h5

## Attributes

- **title:** `Sample Scientific Dataset` (type: `str`)
- **version:** `1.0` (type: `str`)

## Group: /measurements

### Attributes

- **description:** `Experimental measurements` (type: `str`)

### Dataset: temperature

#### Properties

- **Shape:** `(100,)`
- **Data Type:** `float64`
- **Size:** `100` elements

**Data (Key-Value Format):**

- `index_0`: `22.935992117831265`
- `index_1`: `23.308188819527796`
- `index_2`: `20.582239974390227`
- `index_3`: `20.184652272470018`
- `index_4`: `23.397532910900622`
- *(showing 5 of 100 rows using 'first' sampling)*

#### Attributes

- **sensor:** `TH-100` (type: `str`)
- **unit:** `Celsius` (type: `str`)

### Dataset: correlation_matrix

#### Properties

- **Shape:** `(50, 20)`
- **Data Type:** `float64`
- **Size:** `1000` elements

**Data (Key-Value Format):**

- **Row 0:**
  - `col_0`: `0.175408510335`
  - `col_1`: `0.367993360963`
  - `col_2`: `0.361122287567`
- **Row 1:**
  - `col_0`: `0.504039513844`
  - `col_1`: `0.817406445579`
  - `col_2`: `0.900514954273`
- *(showing 2 of 50 rows, 3 of 20 cols using 'first' sampling)*

#### Attributes

- **description:** `Correlation coefficients` (type: `str`)
```

This format is designed to be:
- **Parseable** - Clear structure for AI to extract information
- **Readable** - Easy for humans to understand
- **Scalable** - Smart subsetting prevents overwhelming output from large datasets

## Requirements

- Python 3.10+
- h5py
- numpy

## License

BSD 3-Clause License

Copyright (c) 2025, Joe Lee
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hyoklee/h5md",
    "name": "h5md",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "hdf5, markdown, converter, documentation",
    "author": "Joe Lee",
    "author_email": "Joe Lee <hyoklee@hdfgroup.org>",
    "download_url": "https://files.pythonhosted.org/packages/d1/d3/e62b2db5f33f160d8ed442ec6d5099427c4c4176f17332867e2ae0995e1a/h5md-0.1.1.tar.gz",
    "platform": null,
    "description": "# HDF5 to Markdown Converter\r\n\r\nA command-line tool to convert HDF5 files to AI-friendly markdown format with key-value structure. This tool helps you visualize the structure, metadata, and actual data from HDF5 files in a format optimized for both human readability and AI consumption.\r\n\r\n## Features\r\n\r\n- **AI-friendly key-value format** - Structured output optimized for AI parsing\r\n- **Smart data subsetting** - Preview large datasets with configurable row/column limits\r\n- **Multiple sampling strategies** - Choose how to sample data: first, uniform, or edges\r\n- **Flexible data preview** - Include or exclude actual data values\r\n- **Complete metadata** - Display file structure, groups, datasets, and attributes\r\n- **External link support** - Detect and display HDF5 external links\r\n- **Compression info** - Show dataset compression and chunking details\r\n\r\n## Installation\r\n\r\n```bash\r\n# Clone the repository\r\ngit clone https://github.com/hyoklee/h5md.git\r\ncd h5md\r\n\r\n# Install in development mode\r\npip install -e .\r\n```\r\n\r\nOr install directly from GitHub:\r\n\r\n```bash\r\npip install git+https://github.com/hyoklee/h5md.git\r\n```\r\n\r\n## Usage\r\n\r\n### Command Line\r\n\r\n**Basic conversion** (uses defaults: 10 rows/cols, 'first' sampling):\r\n\r\n```bash\r\nh5md input.h5\r\n```\r\n\r\nThis will create `input.md` in the same directory.\r\n\r\n**Custom output path:**\r\n\r\n```bash\r\nh5md input.h5 -o output.md\r\n```\r\n\r\n**Control data subsetting:**\r\n\r\n```bash\r\n# Limit to 5 rows and 5 columns\r\nh5md input.h5 --max-rows 5 --max-cols 5\r\n\r\n# Show all data (use carefully with large files!)\r\nh5md input.h5 --max-rows 0 --max-cols 0\r\n\r\n# Metadata only (no data values)\r\nh5md input.h5 --no-data\r\n```\r\n\r\n**Choose sampling strategy:**\r\n\r\n```bash\r\n# Take first N items (default)\r\nh5md input.h5 --sampling first\r\n\r\n# Sample uniformly across dataset\r\nh5md input.h5 --sampling uniform\r\n\r\n# Show first and last items (useful for ranges)\r\nh5md input.h5 --sampling edges\r\n```\r\n\r\n**Combined options:**\r\n\r\n```bash\r\nh5md data.h5 -o output.md --max-rows 20 --max-cols 10 --sampling edges\r\n```\r\n\r\n### Python API\r\n\r\n```python\r\nfrom h5md import HDF5Converter\r\n\r\n# Basic conversion with defaults\r\nconverter = HDF5Converter()\r\nmarkdown_content = converter.convert('input.h5', 'output.md')\r\n\r\n# Advanced: customize subsetting and sampling\r\nconverter = HDF5Converter(\r\n    max_rows=20,           # Limit to 20 rows per dataset\r\n    max_cols=15,           # Limit to 15 columns per dataset\r\n    sampling_strategy=\"edges\",  # Show first and last items\r\n    include_data_preview=True   # Include actual data values\r\n)\r\nmarkdown_content = converter.convert('data.h5', 'output.md')\r\n\r\n# Metadata only (no data values)\r\nconverter = HDF5Converter(include_data_preview=False)\r\nmarkdown_content = converter.convert('data.h5', 'metadata.md')\r\n```\r\n\r\n## Output Format\r\n\r\nThe generated markdown uses an AI-friendly key-value structure that includes:\r\n\r\n1. **File-level attributes** - Metadata about the HDF5 file\r\n2. **Group hierarchy** - Nested structure with group attributes\r\n3. **Dataset properties** - Shape, data type, size, compression, chunks\r\n4. **Dataset attributes** - Custom metadata for each dataset\r\n5. **Data preview** - Actual data values in key-value format (configurable)\r\n6. **External links** - Target file and path information\r\n\r\n### Sample Key-Value Markdown Output\r\n\r\n```markdown\r\n# HDF5 File Structure: example.h5\r\n\r\n## Attributes\r\n\r\n- **title:** `Sample Scientific Dataset` (type: `str`)\r\n- **version:** `1.0` (type: `str`)\r\n\r\n## Group: /measurements\r\n\r\n### Attributes\r\n\r\n- **description:** `Experimental measurements` (type: `str`)\r\n\r\n### Dataset: temperature\r\n\r\n#### Properties\r\n\r\n- **Shape:** `(100,)`\r\n- **Data Type:** `float64`\r\n- **Size:** `100` elements\r\n\r\n**Data (Key-Value Format):**\r\n\r\n- `index_0`: `22.935992117831265`\r\n- `index_1`: `23.308188819527796`\r\n- `index_2`: `20.582239974390227`\r\n- `index_3`: `20.184652272470018`\r\n- `index_4`: `23.397532910900622`\r\n- *(showing 5 of 100 rows using 'first' sampling)*\r\n\r\n#### Attributes\r\n\r\n- **sensor:** `TH-100` (type: `str`)\r\n- **unit:** `Celsius` (type: `str`)\r\n\r\n### Dataset: correlation_matrix\r\n\r\n#### Properties\r\n\r\n- **Shape:** `(50, 20)`\r\n- **Data Type:** `float64`\r\n- **Size:** `1000` elements\r\n\r\n**Data (Key-Value Format):**\r\n\r\n- **Row 0:**\r\n  - `col_0`: `0.175408510335`\r\n  - `col_1`: `0.367993360963`\r\n  - `col_2`: `0.361122287567`\r\n- **Row 1:**\r\n  - `col_0`: `0.504039513844`\r\n  - `col_1`: `0.817406445579`\r\n  - `col_2`: `0.900514954273`\r\n- *(showing 2 of 50 rows, 3 of 20 cols using 'first' sampling)*\r\n\r\n#### Attributes\r\n\r\n- **description:** `Correlation coefficients` (type: `str`)\r\n```\r\n\r\nThis format is designed to be:\r\n- **Parseable** - Clear structure for AI to extract information\r\n- **Readable** - Easy for humans to understand\r\n- **Scalable** - Smart subsetting prevents overwhelming output from large datasets\r\n\r\n## Requirements\r\n\r\n- Python 3.10+\r\n- h5py\r\n- numpy\r\n\r\n## License\r\n\r\nBSD 3-Clause License\r\n\r\nCopyright (c) 2025, Joe Lee\r\nAll rights reserved.\r\n\r\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\r\n\r\n1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\r\n\r\n2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\r\n\r\n3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\r\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "A command-line tool to convert HDF5 files to markdown format",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/hyoklee/h5md/blob/main/README.md",
        "Homepage": "https://github.com/hyoklee/h5md",
        "Issues": "https://github.com/hyoklee/h5md/issues",
        "Repository": "https://github.com/hyoklee/h5md.git"
    },
    "split_keywords": [
        "hdf5",
        " markdown",
        " converter",
        " documentation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dd36a03c28f5d2a145eea9d398e8d421fde9826225b195b395b49097ad8c61d5",
                "md5": "2550af85f1ae7756d0284a0692ec318e",
                "sha256": "98ca90fea1e3413ec1b35acc4a97377e84b64d049bcdeb84d5739bb3694ed397"
            },
            "downloads": -1,
            "filename": "h5md-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2550af85f1ae7756d0284a0692ec318e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 8981,
            "upload_time": "2025-11-02T17:48:17",
            "upload_time_iso_8601": "2025-11-02T17:48:17.842310Z",
            "url": "https://files.pythonhosted.org/packages/dd/36/a03c28f5d2a145eea9d398e8d421fde9826225b195b395b49097ad8c61d5/h5md-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d1d3e62b2db5f33f160d8ed442ec6d5099427c4c4176f17332867e2ae0995e1a",
                "md5": "2aa1f115ce3f7eafd4243d57d5bd80c6",
                "sha256": "d4b74f3966564696a2938d6b344243058b22ac6cea57c0d657e59256f9136e0d"
            },
            "downloads": -1,
            "filename": "h5md-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "2aa1f115ce3f7eafd4243d57d5bd80c6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 12304,
            "upload_time": "2025-11-02T17:48:18",
            "upload_time_iso_8601": "2025-11-02T17:48:18.925249Z",
            "url": "https://files.pythonhosted.org/packages/d1/d3/e62b2db5f33f160d8ed442ec6d5099427c4c4176f17332867e2ae0995e1a/h5md-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-02 17:48:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hyoklee",
    "github_project": "h5md",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "h5py",
            "specs": [
                [
                    ">=",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "markitdown",
            "specs": [
                [
                    ">",
                    "0.0.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.19.0"
                ]
            ]
        },
        {
            "name": "plotly",
            "specs": [
                [
                    ">=",
                    "5.13.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "pytest-cov",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "black",
            "specs": [
                [
                    ">=",
                    "23.3.0"
                ]
            ]
        },
        {
            "name": "isort",
            "specs": [
                [
                    ">=",
                    "5.12.0"
                ]
            ]
        },
        {
            "name": "flake8",
            "specs": [
                [
                    ">=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "mypy",
            "specs": [
                [
                    ">=",
                    "1.3.0"
                ]
            ]
        },
        {
            "name": "twine",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "build",
            "specs": [
                [
                    ">=",
                    "0.10.0"
                ]
            ]
        }
    ],
    "lcname": "h5md"
}
        
Elapsed time: 2.53833s