Name | vision-agent JSON |
Version |
0.2.243
JSON |
| download |
home_page | None |
Summary | Toolset for Vision Agent |
upload_time | 2025-02-13 22:02:32 |
maintainer | None |
docs_url | None |
author | Landing AI |
requires_python | <4.0,>=3.9 |
license | None |
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/landing-ai/vision-agent/blob/main/assets/logo_light.svg?raw=true">
<source media="(prefers-color-scheme: light)" srcset="https://github.com/landing-ai/vision-agent/blob/main/assets/logo_dark.svg?raw=true">
<img alt="VisionAgent" height="200px" src="https://github.com/landing-ai/vision-agent/blob/main/assets/logo_light.svg?raw=true">
</picture>
[](https://discord.gg/wPdN8RCYew)

[](https://badge.fury.io/py/vision-agent)

</div>
## VisionAgent
VisionAgent is a library that helps you utilize agent frameworks to generate code to
solve your vision task. Check out our discord for updates and roadmaps! The fastest
way to test out VisionAgent is to use our web application which you can find [here](https://va.landing.ai/).
## Installation
```bash
pip install vision-agent
```
```bash
export ANTHROPIC_API_KEY="your-api-key"
export OPENAI_API_KEY="your-api-key"
```
> **_NOTE:_** We found using both Anthropic Claude-3.5 and OpenAI o1 to be provide the best performance for VisionAgent. If you want to use a different LLM provider or only one, see 'Using Other LLM Providers' below.
## Documentation
[VisionAgent Library Docs](https://landing-ai.github.io/vision-agent/)
## Examples
### Counting cans in an image
You can run VisionAgent in a local Jupyter Notebook [Counting cans in an image](https://github.com/landing-ai/vision-agent/blob/main/examples/notebooks/counting_cans.ipynb)
### Generating code
You can use VisionAgent to generate code to count the number of people in an image:
```python
from vision_agent.agent import VisionAgentCoderV2
from vision_agent.models import AgentMessage
agent = VisionAgentCoderV2(verbose=True)
code_context = agent.generate_code(
[
AgentMessage(
role="user",
content="Count the number of people in this image",
media=["people.png"]
)
]
)
with open("generated_code.py", "w") as f:
f.write(code_context.code + "\n" + code_context.test)
```
### Using the tools directly
VisionAgent produces code that utilizes our tools. You can also use the tools directly.
For example if you wanted to detect people in an image and visualize the results:
```python
import vision_agent.tools as T
import matplotlib.pyplot as plt
image = T.load_image("people.png")
dets = T.countgd_object_detection("person", image)
# visualize the countgd bounding boxes on the image
viz = T.overlay_bounding_boxes(image, dets)
# save the visualization to a file
T.save_image(viz, "people_detected.png")
# display the visualization
plt.imshow(viz)
plt.show()
```
You can also use the tools for running on video files:
```python
import vision_agent.tools as T
frames_and_ts = T.extract_frames_and_timestamps("people.mp4")
# extract the frames from the frames_and_ts list
frames = [f["frame"] for f in frames_and_ts]
# run the countgd tracking on the frames
tracks = T.countgd_sam2_video_tracking("person", frames)
# visualize the countgd tracking results on the frames and save the video
viz = T.overlay_segmentation_masks(frames, tracks)
T.save_video(viz, "people_detected.mp4")
```
## Using Other LLM Providers
You can use other LLM providers by changing `config.py` in the `vision_agent/configs`
directory. For example to change to Anthropic simply just run:
```bash
cp vision_agent/configs/anthropic_config.py vision_agent/configs/config.py
```
> **_NOTE:_** VisionAgent moves fast and we are constantly updating and changing the library. If you have any questions or need help, please reach out to us on our discord channel.
Raw data
{
"_id": null,
"home_page": null,
"name": "vision-agent",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Landing AI",
"author_email": "dev@landing.ai",
"download_url": "https://files.pythonhosted.org/packages/74/52/6a931bc4676bf55207c7c6a89529cd0740ed86d728278e779b56dd6f7ddf/vision_agent-0.2.243.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <picture>\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://github.com/landing-ai/vision-agent/blob/main/assets/logo_light.svg?raw=true\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"https://github.com/landing-ai/vision-agent/blob/main/assets/logo_dark.svg?raw=true\">\n <img alt=\"VisionAgent\" height=\"200px\" src=\"https://github.com/landing-ai/vision-agent/blob/main/assets/logo_light.svg?raw=true\">\n </picture>\n\n[](https://discord.gg/wPdN8RCYew)\n\n[](https://badge.fury.io/py/vision-agent)\n\n</div>\n\n## VisionAgent\nVisionAgent is a library that helps you utilize agent frameworks to generate code to\nsolve your vision task. Check out our discord for updates and roadmaps! The fastest\nway to test out VisionAgent is to use our web application which you can find [here](https://va.landing.ai/).\n\n## Installation\n```bash\npip install vision-agent\n```\n\n```bash\nexport ANTHROPIC_API_KEY=\"your-api-key\"\nexport OPENAI_API_KEY=\"your-api-key\"\n```\n\n> **_NOTE:_** We found using both Anthropic Claude-3.5 and OpenAI o1 to be provide the best performance for VisionAgent. If you want to use a different LLM provider or only one, see 'Using Other LLM Providers' below.\n\n## Documentation\n\n[VisionAgent Library Docs](https://landing-ai.github.io/vision-agent/)\n\n## Examples\n### Counting cans in an image\nYou can run VisionAgent in a local Jupyter Notebook [Counting cans in an image](https://github.com/landing-ai/vision-agent/blob/main/examples/notebooks/counting_cans.ipynb)\n\n### Generating code\nYou can use VisionAgent to generate code to count the number of people in an image:\n```python\nfrom vision_agent.agent import VisionAgentCoderV2\nfrom vision_agent.models import AgentMessage\n\nagent = VisionAgentCoderV2(verbose=True)\ncode_context = agent.generate_code(\n [\n AgentMessage(\n role=\"user\",\n content=\"Count the number of people in this image\",\n media=[\"people.png\"]\n )\n ]\n)\n\nwith open(\"generated_code.py\", \"w\") as f:\n f.write(code_context.code + \"\\n\" + code_context.test)\n```\n\n### Using the tools directly\nVisionAgent produces code that utilizes our tools. You can also use the tools directly.\nFor example if you wanted to detect people in an image and visualize the results:\n```python\nimport vision_agent.tools as T\nimport matplotlib.pyplot as plt\n\nimage = T.load_image(\"people.png\")\ndets = T.countgd_object_detection(\"person\", image)\n# visualize the countgd bounding boxes on the image\nviz = T.overlay_bounding_boxes(image, dets)\n\n# save the visualization to a file\nT.save_image(viz, \"people_detected.png\")\n\n# display the visualization\nplt.imshow(viz)\nplt.show()\n```\n\nYou can also use the tools for running on video files:\n```python\nimport vision_agent.tools as T\n\nframes_and_ts = T.extract_frames_and_timestamps(\"people.mp4\")\n# extract the frames from the frames_and_ts list\nframes = [f[\"frame\"] for f in frames_and_ts]\n\n# run the countgd tracking on the frames\ntracks = T.countgd_sam2_video_tracking(\"person\", frames)\n# visualize the countgd tracking results on the frames and save the video\nviz = T.overlay_segmentation_masks(frames, tracks)\nT.save_video(viz, \"people_detected.mp4\")\n```\n\n## Using Other LLM Providers\nYou can use other LLM providers by changing `config.py` in the `vision_agent/configs`\ndirectory. For example to change to Anthropic simply just run:\n```bash\ncp vision_agent/configs/anthropic_config.py vision_agent/configs/config.py\n```\n\n> **_NOTE:_** VisionAgent moves fast and we are constantly updating and changing the library. If you have any questions or need help, please reach out to us on our discord channel.\n",
"bugtrack_url": null,
"license": null,
"summary": "Toolset for Vision Agent",
"version": "0.2.243",
"project_urls": {
"Homepage": "https://landing.ai",
"documentation": "https://github.com/landing-ai/vision-agent",
"repository": "https://github.com/landing-ai/vision-agent"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a4be3a86f16c50395be30e447d12ac9c2df291d3b9482b467fc4a762561666c9",
"md5": "7398e3002176324ad66faf4caa0352ff",
"sha256": "c8d162eb32adfbee8486d9c8133c4fd66ed33d6f467fa626021489b2f816356b"
},
"downloads": -1,
"filename": "vision_agent-0.2.243-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7398e3002176324ad66faf4caa0352ff",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 1247913,
"upload_time": "2025-02-13T22:02:30",
"upload_time_iso_8601": "2025-02-13T22:02:30.641362Z",
"url": "https://files.pythonhosted.org/packages/a4/be/3a86f16c50395be30e447d12ac9c2df291d3b9482b467fc4a762561666c9/vision_agent-0.2.243-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "74526a931bc4676bf55207c7c6a89529cd0740ed86d728278e779b56dd6f7ddf",
"md5": "fed0cd221aec9ac5d8fa56852bd9452c",
"sha256": "902ebc2c91e374f80d1511c5e9334434a1995bfdf6ad3895c90b85ae93d51f95"
},
"downloads": -1,
"filename": "vision_agent-0.2.243.tar.gz",
"has_sig": false,
"md5_digest": "fed0cd221aec9ac5d8fa56852bd9452c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 1227786,
"upload_time": "2025-02-13T22:02:32",
"upload_time_iso_8601": "2025-02-13T22:02:32.524278Z",
"url": "https://files.pythonhosted.org/packages/74/52/6a931bc4676bf55207c7c6a89529cd0740ed86d728278e779b56dd6f7ddf/vision_agent-0.2.243.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-13 22:02:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "landing-ai",
"github_project": "vision-agent",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "vision-agent"
}