# Spatial Reasoning
A powerful Python package for object detection using advanced vision and reasoning models, including OpenAI's models and Google's Gemini.

*Comparison of detection results across different models - showing the superior performance of the advanced reasoning model*
## Features
- **Multiple Detection Models**:
- Advanced Reasoning Model (OpenAI) - Reasoning model that leverages tools and other foundation models to perform object detection
- Vanilla Reasoning Model - Directly using a reasoning model to perform object detection
- Vision Model - GroundingDino + SAM
- Gemini Model (Google) - Fine-tuned LMM for object detection
- **Tool-Use Reasoning**: Our advanced model uses innovative grid-based reasoning for precise object detection

*How the advanced reasoning model works under the hood - using grid cells for precise localization*
- **Simple API**: One function for all your detection needs
- **CLI Support**: Command-line interface for quick testing
## Installation
```bash
pip install spatial-reasoning
```
Or install from source:
```bash
git clone https://github.com/QasimWani/spatial-reasoning.git
cd spatial_reasoning
pip install -e .
```
### Optional: Flash Attention (for better performance)
For improved performance with transformer models, you can optionally install Flash Attention:
```bash
pip install flash-attn --no-build-isolation
```
Note: Flash Attention requires CUDA development tools and must be compiled for your specific PyTorch/CUDA version. The package will work without it, just with slightly reduced performance.
## Setup
Create a `.env` file in your project root:
```bash
# .env
OPENAI_API_KEY=your-openai-api-key-here
GEMINI_API_KEY=your-google-gemini-api-key-here
```
Get your API keys:
- OpenAI: https://platform.openai.com/api-keys
- Gemini: https://makersuite.google.com/app/apikey
## Quick Start
### Python API
```python
from spatial_reasoning import detect
# Detect objects in an image
result = detect(
image_path="https://ix-cdn.b2e5.com/images/27094/27094_3063d356a3a54cc3859537fd23c5ba9d_1539205710.jpeg", # or image-path
object_of_interest="farthest scooter in the image",
task_type="advanced_reasoning_model"
)
# Access results
bboxes = result['bboxs']
visualized_image = result['visualized_image']
print(f"Found {len(bboxes)} objects")
# Save the result
visualized_image.save("output.jpg")
```
### Command Line
```bash
# Basic usage
spatial-reasoning --image-path "image.jpg" --object-of-interest "person" # "advanced_reasoning_model" used by default
# With specific model
spatial-reasoning --image-path "image.jpg" --object-of-interest "cat" --task-type "gemini"
# From URL with custom parameters
vision-evals \
--image-path "https://example.com/image.jpg" \
--object-of-interest "text in image" \
--task-type "advanced_reasoning_model" \
--task-kwargs '{"nms_threshold": 0.7}'
```
### Available Models
- `advanced_reasoning_model` (default) - Best accuracy, uses tool-use reasoning
- `vanilla_reasoning_model` - Faster, standard detection
- `vision_model` - Uses GroundingDino + (optional) SAM2 for segmentation
- `gemini` - Google's Gemini model
## License
MIT License
Raw data
{
"_id": null,
"home_page": "https://github.com/QasimWani/spatial-reasoning",
"name": "spatial-reasoning",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "computer vision, object detection, AI, machine learning, OpenAI, Gemini",
"author": "Qasim Wani",
"author_email": "Qasim Wani <qasim31wani@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/78/36/7ed7e5fa17c15367e993e24b484f75b363be1dfb61f51e23282023b46c3b/spatial_reasoning-0.1.8.tar.gz",
"platform": null,
"description": "# Spatial Reasoning\n\nA powerful Python package for object detection using advanced vision and reasoning models, including OpenAI's models and Google's Gemini.\n\n\n*Comparison of detection results across different models - showing the superior performance of the advanced reasoning model*\n\n## Features\n\n- **Multiple Detection Models**: \n - Advanced Reasoning Model (OpenAI) - Reasoning model that leverages tools and other foundation models to perform object detection\n - Vanilla Reasoning Model - Directly using a reasoning model to perform object detection\n - Vision Model - GroundingDino + SAM\n - Gemini Model (Google) - Fine-tuned LMM for object detection\n\n- **Tool-Use Reasoning**: Our advanced model uses innovative grid-based reasoning for precise object detection\n \n \n *How the advanced reasoning model works under the hood - using grid cells for precise localization*\n\n- **Simple API**: One function for all your detection needs\n- **CLI Support**: Command-line interface for quick testing\n\n## Installation\n\n```bash\npip install spatial-reasoning\n```\n\nOr install from source:\n```bash\ngit clone https://github.com/QasimWani/spatial-reasoning.git\ncd spatial_reasoning\npip install -e .\n```\n\n### Optional: Flash Attention (for better performance)\n\nFor improved performance with transformer models, you can optionally install Flash Attention:\n\n```bash\npip install flash-attn --no-build-isolation\n```\n\nNote: Flash Attention requires CUDA development tools and must be compiled for your specific PyTorch/CUDA version. The package will work without it, just with slightly reduced performance.\n\n## Setup\n\nCreate a `.env` file in your project root:\n\n```bash\n# .env\nOPENAI_API_KEY=your-openai-api-key-here\nGEMINI_API_KEY=your-google-gemini-api-key-here\n```\n\nGet your API keys:\n- OpenAI: https://platform.openai.com/api-keys\n- Gemini: https://makersuite.google.com/app/apikey\n\n## Quick Start\n\n### Python API\n\n```python\nfrom spatial_reasoning import detect\n\n# Detect objects in an image\nresult = detect(\n image_path=\"https://ix-cdn.b2e5.com/images/27094/27094_3063d356a3a54cc3859537fd23c5ba9d_1539205710.jpeg\", # or image-path\n object_of_interest=\"farthest scooter in the image\",\n task_type=\"advanced_reasoning_model\"\n)\n\n# Access results\nbboxes = result['bboxs']\nvisualized_image = result['visualized_image']\nprint(f\"Found {len(bboxes)} objects\")\n\n# Save the result\nvisualized_image.save(\"output.jpg\")\n```\n\n### Command Line\n\n```bash\n# Basic usage\nspatial-reasoning --image-path \"image.jpg\" --object-of-interest \"person\" # \"advanced_reasoning_model\" used by default\n\n# With specific model\nspatial-reasoning --image-path \"image.jpg\" --object-of-interest \"cat\" --task-type \"gemini\"\n\n# From URL with custom parameters\nvision-evals \\\n --image-path \"https://example.com/image.jpg\" \\\n --object-of-interest \"text in image\" \\\n --task-type \"advanced_reasoning_model\" \\\n --task-kwargs '{\"nms_threshold\": 0.7}'\n```\n\n### Available Models\n\n- `advanced_reasoning_model` (default) - Best accuracy, uses tool-use reasoning\n- `vanilla_reasoning_model` - Faster, standard detection\n- `vision_model` - Uses GroundingDino + (optional) SAM2 for segmentation\n- `gemini` - Google's Gemini model\n\n## License\n\nMIT License\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A PyPI package for object detection using advanced vision models",
"version": "0.1.8",
"project_urls": {
"Bug Tracker": "https://github.com/QasimWani/spatial-reasoning/issues",
"Documentation": "https://github.com/QasimWani/spatial-reasoning#readme",
"Homepage": "https://github.com/QasimWani/spatial-reasoning",
"Source Code": "https://github.com/QasimWani/spatial-reasoning"
},
"split_keywords": [
"computer vision",
" object detection",
" ai",
" machine learning",
" openai",
" gemini"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "72e4d11fc9775533584d4ef50bae1e9352320baf52161e38856e20c06a380f0a",
"md5": "d77ceb685de128dc9d85d2dde3eeb962",
"sha256": "cb3cab1b3d30750178110052eb23df286d208433d5af623620a1f7ca504e5ad3"
},
"downloads": -1,
"filename": "spatial_reasoning-0.1.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d77ceb685de128dc9d85d2dde3eeb962",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 47792,
"upload_time": "2025-08-04T09:48:11",
"upload_time_iso_8601": "2025-08-04T09:48:11.241237Z",
"url": "https://files.pythonhosted.org/packages/72/e4/d11fc9775533584d4ef50bae1e9352320baf52161e38856e20c06a380f0a/spatial_reasoning-0.1.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "78367ed7e5fa17c15367e993e24b484f75b363be1dfb61f51e23282023b46c3b",
"md5": "4bc4e116248338f5d308123e13ec3ff6",
"sha256": "73b8fdbd771fe2a57d983f00a81bdd99aeb94f75bf841c35660a5e14760a9ce3"
},
"downloads": -1,
"filename": "spatial_reasoning-0.1.8.tar.gz",
"has_sig": false,
"md5_digest": "4bc4e116248338f5d308123e13ec3ff6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 38511,
"upload_time": "2025-08-04T09:48:12",
"upload_time_iso_8601": "2025-08-04T09:48:12.164779Z",
"url": "https://files.pythonhosted.org/packages/78/36/7ed7e5fa17c15367e993e24b484f75b363be1dfb61f51e23282023b46c3b/spatial_reasoning-0.1.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-04 09:48:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "QasimWani",
"github_project": "spatial-reasoning",
"github_not_found": true,
"lcname": "spatial-reasoning"
}