# Mirk
**Mirk** is a library and a pipeline that combines classical Computer Vision (CV) models with Large Visual Models (LVMs) to provide detailed analysis and understanding of a video. The classical CV model handles initial processing and object detection, while the LVM generates rich, contextual interpretations of the visual content.
## Overview
Mirk works by:
1. Taking an input video
2. Using a CV model to detect objects of interest. Objects (classes) of interest are specified by the user
3. When a specified object is identified, triggering a VLM to generate detailed explanations about what is seen in the video, to reason about the detected object and its context based on the provided question
## Installation
```bash
pip install mirk
```
## Quick Start
Check out the [example](examples/one_shot.ipynb) to see how to use Mirk.
For your convenience, we provide a [bash script](examples/one_shot.sh) that downloads a sample video and runs the one-shot example:
```bash
cd examples
./one_shot.sh
```
with the following output:
```bash
[download] Destination: input/selective_attention_test.mp4
...
[download] 100% of 2.63MiB in 00:00:00 at 5.80MiB/s
Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt to '.../mirk/mirk/models/yolo11n.pt'...
100%|███████| 5.35M/5.35M [00:00<00:00, 7.07MB/s]
video 1/1 (frame 1/2447) .../mirk/examples/input/selective_attention_test.mp4: 480x640 (no detections), 172.3ms
video 1/1 (frame 2/2447) .../mirk/examples/input/selective_attention_test.mp4: 480x640 (no detections), 145.1ms
video 1/1 (frame 3/2447) .../mirk/examples/input/selective_attention_test.mp4: 480x640 (no detections), 134.0ms
...
video 1/1 (frame 361/2447) .../mirk/examples/input/selective_attention_test.mp4: 480x640 5 persons, 160.9ms
Found person in frame 360 with confidence 0.88
Saved frame to: output/detected_person_frame_360.jpg
Question: What are the people doing in the image?
Answer: The people in the image are playing with basketballs, passing them to each other. There is a group of individuals, and some are walking while others are engaged in the activity. It's a scene from a well-known experiment involving selective attention.
```
## Credentials
Mirk uses the following APIs:
- [YOLO](https://docs.ultralytics.com/quickstart/)
- [OpenAI](https://platform.openai.com/docs/api-reference/introduction)
You need to set up your own credentials for OpenAI API. See [.env.example](.env.example) file.
You don't need to set up credentials for YOLO.
Raw data
{
"_id": null,
"home_page": "https://github.com/CuriousDima/mirk",
"name": "mirk",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": null,
"keywords": "vision, ai, reasoning",
"author": "Dima Timofeev",
"author_email": "dimkat@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/03/9c/d4f9b42284babf89df0314ff7a1c7d805a79b8e7e43422a16538b997b1d1/mirk-0.1.1.tar.gz",
"platform": null,
"description": "# Mirk\n\n**Mirk** is a library and a pipeline that combines classical Computer Vision (CV) models with Large Visual Models (LVMs) to provide detailed analysis and understanding of a video. The classical CV model handles initial processing and object detection, while the LVM generates rich, contextual interpretations of the visual content.\n\n## Overview\n\nMirk works by:\n\n1. Taking an input video\n2. Using a CV model to detect objects of interest. Objects (classes) of interest are specified by the user\n3. When a specified object is identified, triggering a VLM to generate detailed explanations about what is seen in the video, to reason about the detected object and its context based on the provided question\n\n## Installation\n\n```bash\npip install mirk\n```\n\n## Quick Start\n\nCheck out the [example](examples/one_shot.ipynb) to see how to use Mirk.\n\nFor your convenience, we provide a [bash script](examples/one_shot.sh) that downloads a sample video and runs the one-shot example:\n\n```bash\ncd examples\n./one_shot.sh \n```\n\nwith the following output:\n\n```bash\n[download] Destination: input/selective_attention_test.mp4\n...\n[download] 100% of 2.63MiB in 00:00:00 at 5.80MiB/s\nDownloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n.pt to '.../mirk/mirk/models/yolo11n.pt'...\n100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 5.35M/5.35M [00:00<00:00, 7.07MB/s]\n\nvideo 1/1 (frame 1/2447) .../mirk/examples/input/selective_attention_test.mp4: 480x640 (no detections), 172.3ms\nvideo 1/1 (frame 2/2447) .../mirk/examples/input/selective_attention_test.mp4: 480x640 (no detections), 145.1ms\nvideo 1/1 (frame 3/2447) .../mirk/examples/input/selective_attention_test.mp4: 480x640 (no detections), 134.0ms\n...\nvideo 1/1 (frame 361/2447) .../mirk/examples/input/selective_attention_test.mp4: 480x640 5 persons, 160.9ms\nFound person in frame 360 with confidence 0.88\nSaved frame to: output/detected_person_frame_360.jpg\n\nQuestion: What are the people doing in the image?\nAnswer: The people in the image are playing with basketballs, passing them to each other. There is a group of individuals, and some are walking while others are engaged in the activity. It's a scene from a well-known experiment involving selective attention.\n```\n\n## Credentials\n\nMirk uses the following APIs:\n\n- [YOLO](https://docs.ultralytics.com/quickstart/)\n- [OpenAI](https://platform.openai.com/docs/api-reference/introduction)\n\nYou need to set up your own credentials for OpenAI API. See [.env.example](.env.example) file. \nYou don't need to set up credentials for YOLO.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Mirk is a vision-reasoning pipeline designed to interpret scenes only when needed.",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/CuriousDima/mirk",
"Repository": "https://github.com/CuriousDima/mirk"
},
"split_keywords": [
"vision",
" ai",
" reasoning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "299efbd76433222b716fedfd292fa15fba38b56a4feb0f97a7a2554e95ed034e",
"md5": "74698091c029dd36053c713736df5552",
"sha256": "0768c797f1bd94d142ccce8a5740bb57762e68de5baab91a983d37c889023788"
},
"downloads": -1,
"filename": "mirk-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "74698091c029dd36053c713736df5552",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 5750,
"upload_time": "2024-12-08T01:09:51",
"upload_time_iso_8601": "2024-12-08T01:09:51.883195Z",
"url": "https://files.pythonhosted.org/packages/29/9e/fbd76433222b716fedfd292fa15fba38b56a4feb0f97a7a2554e95ed034e/mirk-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "039cd4f9b42284babf89df0314ff7a1c7d805a79b8e7e43422a16538b997b1d1",
"md5": "c33fb59c8e862511ea85f6d5233121e9",
"sha256": "5dc15bfac7a44c67f12d883cdec8a81b186a91f3c17a48f44bb52c3c2da137ce"
},
"downloads": -1,
"filename": "mirk-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "c33fb59c8e862511ea85f6d5233121e9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 4897,
"upload_time": "2024-12-08T01:09:53",
"upload_time_iso_8601": "2024-12-08T01:09:53.348175Z",
"url": "https://files.pythonhosted.org/packages/03/9c/d4f9b42284babf89df0314ff7a1c7d805a79b8e7e43422a16538b997b1d1/mirk-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-08 01:09:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CuriousDima",
"github_project": "mirk",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mirk"
}