MarkMyMedia


NameMarkMyMedia JSON
Version 1.0.0 PyPI version JSON
download
home_pageNone
SummaryA fast, simple utility to visually stamp media files with their filenames, preparing them for multimodal LLM training and analysis.
upload_time2025-08-04 07:11:24
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords multimodal llm ffmpeg media image-processing video-processing audio-processing utilities
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MarkMyMedia

[![PyPI version](https://img.shields.io/pypi/v/MarkMyMedia.svg)](https://pypi.org/project/MarkMyMedia/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A fast, simple utility to visually stamp media files with their filenames, preparing them for multimodal LLM training and analysis.

## The Problem: Lost Context in Multimodal Sequences

When you feed a sequence of media files (e.g., `portal 2 mod.jpg`, `intro.mp3`, `my homework.mp4`) to a Large Language Model, the model sees a continuous stream of data. It lacks explicit, built-in separators or context about where one file ends and another begins, or what the original source of a particular frame or soundbite was.

This ambiguity makes it difficult to:
-   Analyze which specific file triggered a response.
-   Train the model on tasks that require knowledge of file boundaries.
-   Debug model behavior on complex, mixed-media inputs.

## The Solution: Visibly Embedded Markers

**MarkMyMedia** solves this by "stamping" each file with its own name, creating an unambiguous visual or auditory marker directly within the data.

-   **Images:** Get a clean text overlay with the filename.
-   **Audio:** Are converted into a video with the filename displayed on a black background.
-   **Videos:** Get a short, 0.5-second marker clip prepended, showing the filename without re-encoding the entire video.

This way, the context is never lost. The model "sees" the filename associated with the content that follows.

## Key Features

-   **Multimodal Support:** Works out-of-the-box for images, audio, and video.
-   **Blazing Fast:** Uses parallel processing to handle large datasets quickly.
-   **Efficient Video Processing:** Prepends markers to videos **without re-encoding**, saving massive amounts of time and preserving original quality.
-   **Flexible Usage:** Can be used as a simple command-line tool or as a Python library.
-   **Recursive Search:** Point it at a directory, and it can process all nested media files.
-   **Simple & Focused:** Does one job and does it well.

### How It Looks

**MarkMyMedia** provides clear, unambiguous markers for each file type.

#### 🖼️ Images

A clean, readable marker with the filename is embedded directly onto the image. This ensures that even in a long sequence, the source of each image is immediately visible.

![marked_img](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//marked_img.jpg)

*<p align="center">Example: A screenshot of a Discord message marked with its filename.</p>*

#### 🎧 Audio

Audio files are converted into a static video format. This clever workaround makes them visually identifiable in multimodal timelines and tools like Google AI Studio, where audio-only files might not provide visual cues. The entire audio track is preserved under a single, persistent frame showing its original filename.

![markered_audio](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//markered_audio.jpg)

*<p align="center">The result is a standard video file, making the audio's presence known visually.</p>*

![AI Studio](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//markered_audio_gemini.jpg)

#### 🎬 Video

A short, 0.5-second marker clip is prepended to the video. This process is nearly instant because it **avoids re-encoding** the entire file, preserving the original quality and saving significant time.

![some](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//markered_vid.gif)

*<p align="center">The model sees the filename right before the video content begins.</p>*


## Technical Constraints

1. This tool relies on **FFmpeg** for all audio and video operations. You must have `ffmpeg` and `ffprobe` installed and available in your system's PATH.
2. To achieve high speed by avoiding full re-encoding, `MarkMyMedia` relies on **stream copying**. This approach is extremely fast but requires input files to meet specific format criteria.

| Modality | Requirement | Reason & Details |
| :--- | :--- | :--- |
| **Video (`mark_video`)** | <ul><li>Video Codec: `h264` or `hevc`</li><li>Audio Codec: `aac` (if present)</li></ul> | **For preserving quality and speed.** Processing other codecs (like VP9 in `.webm`) will fail, as they cannot be directly concatenated in this workflow. |
| **Audio (`mark_audio`)** | <ul><li>Always outputs a `.mp4` video file.</li><li>Audio Format: `mp3`, `flac`, `aac`, `m4a`, `ogg` or `opus`</li></ul> | **To create a visual marker.** The original audio stream is copied losslessly into the new video container, ensuring no quality is lost. |

## Installation

Install `MarkMyMedia` directly from PyPI:

```bash
pip install MarkMyMedia
```

## Usage

### As a Command-Line Tool (CLI)

The CLI is designed for batch processing entire directories.

**Mark all media in the current directory (output to `markered_modals/`):**
```bash
markmymedia 
```

**Recursively process a dataset and specify an output folder:**
```bash
markmymedia ./my_dataset -r -o ./processed_data
```

**See all available options:**
`markmymedia --help`
```
usage: markmymedia [-h] [-r] [-o OUTPUT] [-j JOBS] [-p] [--version] [inputs ...]

Batch mark images, audio, and video with filename overlays.

positional arguments:
  inputs                Files or directories to process. If omitted, current directory is used.

options:
  -h, --help            show this help message and exit
  -r, --recursive       Recursively traverse directories.
  -o, --output OUTPUT   Base output directory (default: markered_modals).
  -j, --jobs JOBS       Number of worker threads to use per modality (default: number of CPUs).
  -p, --preserve-structure
                        Preserve the directory structure of input files in the output directory.
  --version             show program's version number and exit

```

### As a Python Library

You can also use the core functions directly in your Python scripts for more granular control.

```python
from markmymedia import mark_image, mark_audio, mark_video

# Mark a single image
mark_image(
    input_path='data/cat.jpg',
    output_path='processed/cat_marked.jpg'
)

# Create a marked video from an audio file
mark_audio(
    input_path='data/intro.mp3',
    output_path='processed/intro.mp4'
)

# Prepend a marker to a video file
mark_video(
    input_path='data/dog_on_beach.mp4',
    output_path='processed/dog_on_beach.mp4',
    overlay_text="Some cool video!!",
)
```

## Contributing

Contributions are welcome! If you find a bug or have a feature request, please [open an issue](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//issues).

## License

This project is licensed under the MIT License. See the `LICENSE` file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "MarkMyMedia",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "multimodal, llm, ffmpeg, media, image-processing, video-processing, audio-processing, utilities",
    "author": null,
    "author_email": "laVashik <me@lavashik.dev>",
    "download_url": "https://files.pythonhosted.org/packages/95/83/38219220741e6c08ef924064ac9298ee3fc1f1f9c4502285259acd088ed4/markmymedia-1.0.0.tar.gz",
    "platform": null,
    "description": "# MarkMyMedia\n\n[![PyPI version](https://img.shields.io/pypi/v/MarkMyMedia.svg)](https://pypi.org/project/MarkMyMedia/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA fast, simple utility to visually stamp media files with their filenames, preparing them for multimodal LLM training and analysis.\n\n## The Problem: Lost Context in Multimodal Sequences\n\nWhen you feed a sequence of media files (e.g., `portal 2 mod.jpg`, `intro.mp3`, `my homework.mp4`) to a Large Language Model, the model sees a continuous stream of data. It lacks explicit, built-in separators or context about where one file ends and another begins, or what the original source of a particular frame or soundbite was.\n\nThis ambiguity makes it difficult to:\n-   Analyze which specific file triggered a response.\n-   Train the model on tasks that require knowledge of file boundaries.\n-   Debug model behavior on complex, mixed-media inputs.\n\n## The Solution: Visibly Embedded Markers\n\n**MarkMyMedia** solves this by \"stamping\" each file with its own name, creating an unambiguous visual or auditory marker directly within the data.\n\n-   **Images:** Get a clean text overlay with the filename.\n-   **Audio:** Are converted into a video with the filename displayed on a black background.\n-   **Videos:** Get a short, 0.5-second marker clip prepended, showing the filename without re-encoding the entire video.\n\nThis way, the context is never lost. The model \"sees\" the filename associated with the content that follows.\n\n## Key Features\n\n-   **Multimodal Support:** Works out-of-the-box for images, audio, and video.\n-   **Blazing Fast:** Uses parallel processing to handle large datasets quickly.\n-   **Efficient Video Processing:** Prepends markers to videos **without re-encoding**, saving massive amounts of time and preserving original quality.\n-   **Flexible Usage:** Can be used as a simple command-line tool or as a Python library.\n-   **Recursive Search:** Point it at a directory, and it can process all nested media files.\n-   **Simple & Focused:** Does one job and does it well.\n\n### How It Looks\n\n**MarkMyMedia** provides clear, unambiguous markers for each file type.\n\n#### \ud83d\uddbc\ufe0f Images\n\nA clean, readable marker with the filename is embedded directly onto the image. This ensures that even in a long sequence, the source of each image is immediately visible.\n\n![marked_img](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//marked_img.jpg)\n\n*<p align=\"center\">Example: A screenshot of a Discord message marked with its filename.</p>*\n\n#### \ud83c\udfa7 Audio\n\nAudio files are converted into a static video format. This clever workaround makes them visually identifiable in multimodal timelines and tools like Google AI Studio, where audio-only files might not provide visual cues. The entire audio track is preserved under a single, persistent frame showing its original filename.\n\n![markered_audio](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//markered_audio.jpg)\n\n*<p align=\"center\">The result is a standard video file, making the audio's presence known visually.</p>*\n\n![AI Studio](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//markered_audio_gemini.jpg)\n\n#### \ud83c\udfac Video\n\nA short, 0.5-second marker clip is prepended to the video. This process is nearly instant because it **avoids re-encoding** the entire file, preserving the original quality and saving significant time.\n\n![some](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//markered_vid.gif)\n\n*<p align=\"center\">The model sees the filename right before the video content begins.</p>*\n\n\n## Technical Constraints\n\n1. This tool relies on **FFmpeg** for all audio and video operations. You must have `ffmpeg` and `ffprobe` installed and available in your system's PATH.\n2. To achieve high speed by avoiding full re-encoding, `MarkMyMedia` relies on **stream copying**. This approach is extremely fast but requires input files to meet specific format criteria.\n\n| Modality | Requirement | Reason & Details |\n| :--- | :--- | :--- |\n| **Video (`mark_video`)** | <ul><li>Video Codec: `h264` or `hevc`</li><li>Audio Codec: `aac` (if present)</li></ul> | **For preserving quality and speed.** Processing other codecs (like VP9 in `.webm`) will fail, as they cannot be directly concatenated in this workflow. |\n| **Audio (`mark_audio`)** | <ul><li>Always outputs a `.mp4` video file.</li><li>Audio Format: `mp3`, `flac`, `aac`, `m4a`, `ogg` or `opus`</li></ul> | **To create a visual marker.** The original audio stream is copied losslessly into the new video container, ensuring no quality is lost. |\n\n## Installation\n\nInstall `MarkMyMedia` directly from PyPI:\n\n```bash\npip install MarkMyMedia\n```\n\n## Usage\n\n### As a Command-Line Tool (CLI)\n\nThe CLI is designed for batch processing entire directories.\n\n**Mark all media in the current directory (output to `markered_modals/`):**\n```bash\nmarkmymedia \n```\n\n**Recursively process a dataset and specify an output folder:**\n```bash\nmarkmymedia ./my_dataset -r -o ./processed_data\n```\n\n**See all available options:**\n`markmymedia --help`\n```\nusage: markmymedia [-h] [-r] [-o OUTPUT] [-j JOBS] [-p] [--version] [inputs ...]\n\nBatch mark images, audio, and video with filename overlays.\n\npositional arguments:\n  inputs                Files or directories to process. If omitted, current directory is used.\n\noptions:\n  -h, --help            show this help message and exit\n  -r, --recursive       Recursively traverse directories.\n  -o, --output OUTPUT   Base output directory (default: markered_modals).\n  -j, --jobs JOBS       Number of worker threads to use per modality (default: number of CPUs).\n  -p, --preserve-structure\n                        Preserve the directory structure of input files in the output directory.\n  --version             show program's version number and exit\n\n```\n\n### As a Python Library\n\nYou can also use the core functions directly in your Python scripts for more granular control.\n\n```python\nfrom markmymedia import mark_image, mark_audio, mark_video\n\n# Mark a single image\nmark_image(\n    input_path='data/cat.jpg',\n    output_path='processed/cat_marked.jpg'\n)\n\n# Create a marked video from an audio file\nmark_audio(\n    input_path='data/intro.mp3',\n    output_path='processed/intro.mp4'\n)\n\n# Prepend a marker to a video file\nmark_video(\n    input_path='data/dog_on_beach.mp4',\n    output_path='processed/dog_on_beach.mp4',\n    overlay_text=\"Some cool video!!\",\n)\n```\n\n## Contributing\n\nContributions are welcome! If you find a bug or have a feature request, please [open an issue](https://github.com/LaVashikk/MarkMyMedia-LLM/blob/main/media//issues).\n\n## License\n\nThis project is licensed under the MIT License. See the `LICENSE` file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A fast, simple utility to visually stamp media files with their filenames, preparing them for multimodal LLM training and analysis.",
    "version": "1.0.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/IaVashik/MarkMyMedia-LLM/issues",
        "Homepage": "https://github.com/LaVashikk/MarkMyMedia-LLM"
    },
    "split_keywords": [
        "multimodal",
        " llm",
        " ffmpeg",
        " media",
        " image-processing",
        " video-processing",
        " audio-processing",
        " utilities"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "48077f17d244eff84e973d7c92f2d5311d44de5cd946cb799d1013a20a373b95",
                "md5": "4a62cbd2099562cfbec9554bbf05fa66",
                "sha256": "eafa037d6a3a50525a235cf60e2e1347bc5201b4ad18bc095089ae62709de578"
            },
            "downloads": -1,
            "filename": "markmymedia-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4a62cbd2099562cfbec9554bbf05fa66",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 15585,
            "upload_time": "2025-08-04T07:11:21",
            "upload_time_iso_8601": "2025-08-04T07:11:21.785626Z",
            "url": "https://files.pythonhosted.org/packages/48/07/7f17d244eff84e973d7c92f2d5311d44de5cd946cb799d1013a20a373b95/markmymedia-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "958338219220741e6c08ef924064ac9298ee3fc1f1f9c4502285259acd088ed4",
                "md5": "dd90b8427f09156080ac40e5b697a017",
                "sha256": "b03209ffd936f8ad5881e78b367f38d398817430246f57600178b30f34d0a247"
            },
            "downloads": -1,
            "filename": "markmymedia-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "dd90b8427f09156080ac40e5b697a017",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 13027,
            "upload_time": "2025-08-04T07:11:24",
            "upload_time_iso_8601": "2025-08-04T07:11:24.207765Z",
            "url": "https://files.pythonhosted.org/packages/95/83/38219220741e6c08ef924064ac9298ee3fc1f1f9c4502285259acd088ed4/markmymedia-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-04 07:11:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "IaVashik",
    "github_project": "MarkMyMedia-LLM",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "markmymedia"
}
        
Elapsed time: 0.62232s