video-clip-describer

Name	video-clip-describer JSON
Version	0.4.0 JSON
	download
home_page	None
Summary	Generate text descriptions of video clips
upload_time	2024-11-14 03:25:32
maintainer	None
docs_url	None
author	Bendik R. Brenne
requires_python	<4.0,>=3.12
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # video-clip-describer

## Installation

```bash
pip install video-clip-describer
```

## Usage

```python
import asyncio
from video_clip_describer import VisionAgent

agent = VisionAgent(
    "~/Videos/test.mp4",
    api_base_url="https://my-litellm-proxy.local/v1",
    api_key="sk-apikey",
    vision_model="claude-3-5-sonnet",
    refine_model="gemini-1.5-flash",
    stack_grid=True,
    stack_grid_size=(3, 3),
    resize_video=(1024, 768),
    hashing_max_frames=200,
    hash_size=8,
    debug=True,
    debug_dir="./debug",
)

description = asyncio.run(agent.run())
print(description)
```

## CLI

```bash
$ video2text path/to/video.mp4
```

```bash
$ video2text --help

 Usage: video2text [OPTIONS] VIDEO_FILE

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    video_file      FILENAME  The video file to process. [required]                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --resize                      <width>x<height>  Resize frames before sending to GPT-V. [default: 1024x768]                                       │
│ --stack-grid                                    Put video frames in a grid before sending to GPT-V.                                              │
│ --stack-grid-size             <cols>x<rows>     Grid size to stack frames in. [default: 3x3]                                                     │
│ --context                                       Context to add to prompt. [default: None]                                                        │
│ --api-base-url                                  OpenAI API compatible base URL. [env var: OPENAI_BASE_URL] [default: https://api.openai.com/v1]  │
│ --api-key                                       OpenAI API key. [env var: OPENAI_API_KEY]                                                        │
│ --model                                         LLM model to use (overrides --vision-model and --refine-model). [default: None]                  │
│ --vision-model                                  LLM model to use for vision. [default: claude-3-5-sonnet]                                        │
│ --refine-model                                  LLM model to use for refinement. [default: gemini-1.5-flash]                                     │
│ --no-compress                                   Don't remove similar frames before sending to GPT-V.                                             │
│ --max-frames                                    Max number of frames to allow before decreasing hashing length. [default: 200]                   │
│ --debug                                         Enable debugging.                                                                                │
│ --debug-dir                   PATH              Directory to output debug frames to if --debug is enabled. [default: ./debug]                    │
│                       -v                        Enable verbose output. Repeat for increased verbosity.                                           │
│ --test                                          Don't send requests to LLM.                                                                      │
│ --install-completion                            Install completion for the current shell.                                                        │
│ --show-completion                               Show completion for the current shell, to copy it or customize the installation.                 │
│ --help                                          Show this message and exit.                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "video-clip-describer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.12",
    "maintainer_email": null,
    "keywords": null,
    "author": "Bendik R. Brenne",
    "author_email": "bendik@konstant.no",
    "download_url": "https://files.pythonhosted.org/packages/f9/56/962e4fadd8ebf3482e538b9947194f6f2422a686f9578c48a0bdf719289b/video_clip_describer-0.4.0.tar.gz",
    "platform": null,
    "description": "# video-clip-describer\n\n## Installation\n\n```bash\npip install video-clip-describer\n```\n\n## Usage\n\n```python\nimport asyncio\nfrom video_clip_describer import VisionAgent\n\nagent = VisionAgent(\n    \"~/Videos/test.mp4\",\n    api_base_url=\"https://my-litellm-proxy.local/v1\",\n    api_key=\"sk-apikey\",\n    vision_model=\"claude-3-5-sonnet\",\n    refine_model=\"gemini-1.5-flash\",\n    stack_grid=True,\n    stack_grid_size=(3, 3),\n    resize_video=(1024, 768),\n    hashing_max_frames=200,\n    hash_size=8,\n    debug=True,\n    debug_dir=\"./debug\",\n)\n\ndescription = asyncio.run(agent.run())\nprint(description)\n```\n\n## CLI\n\n```bash\n$ video2text path/to/video.mp4\n```\n\n```bash\n$ video2text --help\n\n Usage: video2text [OPTIONS] VIDEO_FILE\n\n\u256d\u2500 Arguments \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *    video_file      FILENAME  The video file to process. [required]                                                                             \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --resize                      <width>x<height>  Resize frames before sending to GPT-V. [default: 1024x768]                                       \u2502\n\u2502 --stack-grid                                    Put video frames in a grid before sending to GPT-V.                                              \u2502\n\u2502 --stack-grid-size             <cols>x<rows>     Grid size to stack frames in. [default: 3x3]                                                     \u2502\n\u2502 --context                                       Context to add to prompt. [default: None]                                                        \u2502\n\u2502 --api-base-url                                  OpenAI API compatible base URL. [env var: OPENAI_BASE_URL] [default: https://api.openai.com/v1]  \u2502\n\u2502 --api-key                                       OpenAI API key. [env var: OPENAI_API_KEY]                                                        \u2502\n\u2502 --model                                         LLM model to use (overrides --vision-model and --refine-model). [default: None]                  \u2502\n\u2502 --vision-model                                  LLM model to use for vision. [default: claude-3-5-sonnet]                                        \u2502\n\u2502 --refine-model                                  LLM model to use for refinement. [default: gemini-1.5-flash]                                     \u2502\n\u2502 --no-compress                                   Don't remove similar frames before sending to GPT-V.                                             \u2502\n\u2502 --max-frames                                    Max number of frames to allow before decreasing hashing length. [default: 200]                   \u2502\n\u2502 --debug                                         Enable debugging.                                                                                \u2502\n\u2502 --debug-dir                   PATH              Directory to output debug frames to if --debug is enabled. [default: ./debug]                    \u2502\n\u2502                       -v                        Enable verbose output. Repeat for increased verbosity.                                           \u2502\n\u2502 --test                                          Don't send requests to LLM.                                                                      \u2502\n\u2502 --install-completion                            Install completion for the current shell.                                                        \u2502\n\u2502 --show-completion                               Show completion for the current shell, to copy it or customize the installation.                 \u2502\n\u2502 --help                                          Show this message and exit.                                                                      \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Generate text descriptions of video clips",
    "version": "0.4.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "90b081bf44e83bdc832b39f7631e43d64995ca7a13cd38dab1f147476be7553e",
                "md5": "155e18a9b8ad2ea42db56087453f58ca",
                "sha256": "c950194873b89f763f412c3dbb3c54fc185221a9bf36db7fc731744a49b847fe"
            },
            "downloads": -1,
            "filename": "video_clip_describer-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "155e18a9b8ad2ea42db56087453f58ca",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.12",
            "size": 9145,
            "upload_time": "2024-11-14T03:25:31",
            "upload_time_iso_8601": "2024-11-14T03:25:31.209271Z",
            "url": "https://files.pythonhosted.org/packages/90/b0/81bf44e83bdc832b39f7631e43d64995ca7a13cd38dab1f147476be7553e/video_clip_describer-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f956962e4fadd8ebf3482e538b9947194f6f2422a686f9578c48a0bdf719289b",
                "md5": "8213629fa43231793bf355285f84ae68",
                "sha256": "047e21bf40ac1ad9b13c0600418edf64d706310a731ccf6e21c7589d815b71ad"
            },
            "downloads": -1,
            "filename": "video_clip_describer-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8213629fa43231793bf355285f84ae68",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.12",
            "size": 8388,
            "upload_time": "2024-11-14T03:25:32",
            "upload_time_iso_8601": "2024-11-14T03:25:32.808546Z",
            "url": "https://files.pythonhosted.org/packages/f9/56/962e4fadd8ebf3482e538b9947194f6f2422a686f9578c48a0bdf719289b/video_clip_describer-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-14 03:25:32",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "video-clip-describer"
}

Bendik R. Brenne