spatial-reasoning

Name	spatial-reasoning JSON
Version	0.1.8 JSON
	download
home_page	https://github.com/QasimWani/spatial-reasoning
Summary	A PyPI package for object detection using advanced vision models
upload_time	2025-08-04 09:48:12
maintainer	None
docs_url	None
author	Qasim Wani
requires_python	>=3.8
license	MIT
keywords	computer vision object detection ai machine learning openai gemini
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Spatial Reasoning

A powerful Python package for object detection using advanced vision and reasoning models, including OpenAI's models and Google's Gemini.

![Example Results](assets/example_results.png)
*Comparison of detection results across different models - showing the superior performance of the advanced reasoning model*

## Features

- **Multiple Detection Models**: 
  - Advanced Reasoning Model (OpenAI) - Reasoning model that leverages tools and other foundation models to perform object detection
  - Vanilla Reasoning Model - Directly using a reasoning model to perform object detection
  - Vision Model - GroundingDino + SAM
  - Gemini Model (Google) - Fine-tuned LMM for object detection

- **Tool-Use Reasoning**: Our advanced model uses innovative grid-based reasoning for precise object detection
  
  ![Internal Workings](assets/internal_workings.png)
  *How the advanced reasoning model works under the hood - using grid cells for precise localization*

- **Simple API**: One function for all your detection needs
- **CLI Support**: Command-line interface for quick testing

## Installation

```bash
pip install spatial-reasoning
```

Or install from source:
```bash
git clone https://github.com/QasimWani/spatial-reasoning.git
cd spatial_reasoning
pip install -e .
```

### Optional: Flash Attention (for better performance)

For improved performance with transformer models, you can optionally install Flash Attention:

```bash
pip install flash-attn --no-build-isolation
```

Note: Flash Attention requires CUDA development tools and must be compiled for your specific PyTorch/CUDA version. The package will work without it, just with slightly reduced performance.

## Setup

Create a `.env` file in your project root:

```bash
# .env
OPENAI_API_KEY=your-openai-api-key-here
GEMINI_API_KEY=your-google-gemini-api-key-here
```

Get your API keys:
- OpenAI: https://platform.openai.com/api-keys
- Gemini: https://makersuite.google.com/app/apikey

## Quick Start

### Python API

```python
from spatial_reasoning import detect

# Detect objects in an image
result = detect(
    image_path="https://ix-cdn.b2e5.com/images/27094/27094_3063d356a3a54cc3859537fd23c5ba9d_1539205710.jpeg",  # or image-path
    object_of_interest="farthest scooter in the image",
    task_type="advanced_reasoning_model"
)

# Access results
bboxes = result['bboxs']
visualized_image = result['visualized_image']
print(f"Found {len(bboxes)} objects")

# Save the result
visualized_image.save("output.jpg")
```

### Command Line

```bash
# Basic usage
spatial-reasoning --image-path "image.jpg" --object-of-interest "person"  # "advanced_reasoning_model" used by default

# With specific model
spatial-reasoning --image-path "image.jpg" --object-of-interest "cat" --task-type "gemini"

# From URL with custom parameters
vision-evals \
  --image-path "https://example.com/image.jpg" \
  --object-of-interest "text in image" \
  --task-type "advanced_reasoning_model" \
  --task-kwargs '{"nms_threshold": 0.7}'
```

### Available Models

- `advanced_reasoning_model` (default) - Best accuracy, uses tool-use reasoning
- `vanilla_reasoning_model` - Faster, standard detection
- `vision_model` - Uses GroundingDino + (optional) SAM2 for segmentation
- `gemini` - Google's Gemini model

## License

MIT License

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/QasimWani/spatial-reasoning",
    "name": "spatial-reasoning",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "computer vision, object detection, AI, machine learning, OpenAI, Gemini",
    "author": "Qasim Wani",
    "author_email": "Qasim Wani <qasim31wani@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/78/36/7ed7e5fa17c15367e993e24b484f75b363be1dfb61f51e23282023b46c3b/spatial_reasoning-0.1.8.tar.gz",
    "platform": null,
    "description": "# Spatial Reasoning\n\nA powerful Python package for object detection using advanced vision and reasoning models, including OpenAI's models and Google's Gemini.\n\n![Example Results](assets/example_results.png)\n*Comparison of detection results across different models - showing the superior performance of the advanced reasoning model*\n\n## Features\n\n- **Multiple Detection Models**: \n  - Advanced Reasoning Model (OpenAI) - Reasoning model that leverages tools and other foundation models to perform object detection\n  - Vanilla Reasoning Model - Directly using a reasoning model to perform object detection\n  - Vision Model - GroundingDino + SAM\n  - Gemini Model (Google) - Fine-tuned LMM for object detection\n\n- **Tool-Use Reasoning**: Our advanced model uses innovative grid-based reasoning for precise object detection\n  \n  ![Internal Workings](assets/internal_workings.png)\n  *How the advanced reasoning model works under the hood - using grid cells for precise localization*\n\n- **Simple API**: One function for all your detection needs\n- **CLI Support**: Command-line interface for quick testing\n\n## Installation\n\n```bash\npip install spatial-reasoning\n```\n\nOr install from source:\n```bash\ngit clone https://github.com/QasimWani/spatial-reasoning.git\ncd spatial_reasoning\npip install -e .\n```\n\n### Optional: Flash Attention (for better performance)\n\nFor improved performance with transformer models, you can optionally install Flash Attention:\n\n```bash\npip install flash-attn --no-build-isolation\n```\n\nNote: Flash Attention requires CUDA development tools and must be compiled for your specific PyTorch/CUDA version. The package will work without it, just with slightly reduced performance.\n\n## Setup\n\nCreate a `.env` file in your project root:\n\n```bash\n# .env\nOPENAI_API_KEY=your-openai-api-key-here\nGEMINI_API_KEY=your-google-gemini-api-key-here\n```\n\nGet your API keys:\n- OpenAI: https://platform.openai.com/api-keys\n- Gemini: https://makersuite.google.com/app/apikey\n\n## Quick Start\n\n### Python API\n\n```python\nfrom spatial_reasoning import detect\n\n# Detect objects in an image\nresult = detect(\n    image_path=\"https://ix-cdn.b2e5.com/images/27094/27094_3063d356a3a54cc3859537fd23c5ba9d_1539205710.jpeg\",  # or image-path\n    object_of_interest=\"farthest scooter in the image\",\n    task_type=\"advanced_reasoning_model\"\n)\n\n# Access results\nbboxes = result['bboxs']\nvisualized_image = result['visualized_image']\nprint(f\"Found {len(bboxes)} objects\")\n\n# Save the result\nvisualized_image.save(\"output.jpg\")\n```\n\n### Command Line\n\n```bash\n# Basic usage\nspatial-reasoning --image-path \"image.jpg\" --object-of-interest \"person\"  # \"advanced_reasoning_model\" used by default\n\n# With specific model\nspatial-reasoning --image-path \"image.jpg\" --object-of-interest \"cat\" --task-type \"gemini\"\n\n# From URL with custom parameters\nvision-evals \\\n  --image-path \"https://example.com/image.jpg\" \\\n  --object-of-interest \"text in image\" \\\n  --task-type \"advanced_reasoning_model\" \\\n  --task-kwargs '{\"nms_threshold\": 0.7}'\n```\n\n### Available Models\n\n- `advanced_reasoning_model` (default) - Best accuracy, uses tool-use reasoning\n- `vanilla_reasoning_model` - Faster, standard detection\n- `vision_model` - Uses GroundingDino + (optional) SAM2 for segmentation\n- `gemini` - Google's Gemini model\n\n## License\n\nMIT License\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A PyPI package for object detection using advanced vision models",
    "version": "0.1.8",
    "project_urls": {
        "Bug Tracker": "https://github.com/QasimWani/spatial-reasoning/issues",
        "Documentation": "https://github.com/QasimWani/spatial-reasoning#readme",
        "Homepage": "https://github.com/QasimWani/spatial-reasoning",
        "Source Code": "https://github.com/QasimWani/spatial-reasoning"
    },
    "split_keywords": [
        "computer vision",
        " object detection",
        " ai",
        " machine learning",
        " openai",
        " gemini"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "72e4d11fc9775533584d4ef50bae1e9352320baf52161e38856e20c06a380f0a",
                "md5": "d77ceb685de128dc9d85d2dde3eeb962",
                "sha256": "cb3cab1b3d30750178110052eb23df286d208433d5af623620a1f7ca504e5ad3"
            },
            "downloads": -1,
            "filename": "spatial_reasoning-0.1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d77ceb685de128dc9d85d2dde3eeb962",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 47792,
            "upload_time": "2025-08-04T09:48:11",
            "upload_time_iso_8601": "2025-08-04T09:48:11.241237Z",
            "url": "https://files.pythonhosted.org/packages/72/e4/d11fc9775533584d4ef50bae1e9352320baf52161e38856e20c06a380f0a/spatial_reasoning-0.1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "78367ed7e5fa17c15367e993e24b484f75b363be1dfb61f51e23282023b46c3b",
                "md5": "4bc4e116248338f5d308123e13ec3ff6",
                "sha256": "73b8fdbd771fe2a57d983f00a81bdd99aeb94f75bf841c35660a5e14760a9ce3"
            },
            "downloads": -1,
            "filename": "spatial_reasoning-0.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "4bc4e116248338f5d308123e13ec3ff6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 38511,
            "upload_time": "2025-08-04T09:48:12",
            "upload_time_iso_8601": "2025-08-04T09:48:12.164779Z",
            "url": "https://files.pythonhosted.org/packages/78/36/7ed7e5fa17c15367e993e24b484f75b363be1dfb61f51e23282023b46c3b/spatial_reasoning-0.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-04 09:48:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "QasimWani",
    "github_project": "spatial-reasoning",
    "github_not_found": true,
    "lcname": "spatial-reasoning"
}

Qasim Wani