Screenwise Framework
===================
A Python framework for screen element detection and interaction using computer vision and machine learning.
Overview
--------
Screenwise provides automated detection and interaction with UI elements through:
* Screenshot capture and analysis
* ML-based element detection
* Coordinate-based interaction
* OCR capabilities
* Debug and capture modes
* Cross-platform support
Installation
-----------
.. code-block:: bash
pip install screenwise
Basic Usage
----------
Initialize Framework
~~~~~~~~~~~~~~~~~~
.. code-block:: python
from t_screenwise.screenwise import Framework
# Initialize with default settings
framework = Framework()
# Initialize with custom settings
framework = Framework(
mode="CAPTURE",
model_path="path/to/model.pth",
labels="path/to/labels.json",
device="cpu"
)
Detect Elements
~~~~~~~~~~~~~~
.. code-block:: python
# Get all detected elements
elements = framework.get()
# Filter for specific element types
buttons = framework.get(filter=["button"])
text = framework.get(filter=["text"])
Interact with Elements
~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# Click element
element.click()
# Click at specific position
element.click(coords="up_right")
# Type text
element.send_keys("Hello World")
# Click and type
element.click_and_send_keys("Hello World")
Process OCR Elements
~~~~~~~~~~~~~~~~~~
.. code-block:: python
framework = Framework()
results = framework.get(image="path/to/image.png", process_ocr=True)
# Work with both types of elements
for element in results:
if isinstance(element, OCRElement):
print(f"OCR Text: {element.text} (Confidence: {element.confidence})")
else:
print(f"Box Label: {element.label}")
OCR Elements
~~~~~~~~~~~
* Text content extraction
* Confidence scoring
* Spatial relationship analysis
* Text-based element search
OCR Spatial Analysis
~~~~~~~~~~~~~~~~~~
The OCRElement class provides powerful spatial analysis capabilities through the ``get_nearest_boxes`` method:
.. code-block:: python
# Get OCR elements from an image
ocr_elements = framework.get(image="screenshot.png", process_ocr=True)
# For a specific OCR element, find nearest elements in all directions
nearest = ocr_element.get_nearest_boxes(ocr_elements, n=1)
# Access nearest elements by direction
right_element = nearest["right"][0] # Nearest element to the right
left_element = nearest["left"][0] # Nearest element to the left
above_element = nearest["above"][0] # Nearest element above
below_element = nearest["below"][0] # Nearest element below
Features:
* Find n nearest elements in each direction (right, left, above, below)
* Considers spatial overlap when determining nearest elements
* Returns elements sorted by distance
* Useful for understanding layout and relationships between text elements
Features
--------
Screen Elements
~~~~~~~~~~~~~~
* Coordinate-based positioning
* Margin calculations
* Drawing capabilities
Mouse and keyboard interaction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Debug visualization
Operating Modes
~~~~~~~~~~~~~
* CAPTURE: Live interaction with screen elements
* DEBUG: Visualization and testing without actual interaction
Configuration
------------
Labels
~~~~~~
Labels are defined in a JSON file mapping element types to numeric IDs:
.. code-block:: json
{
"button": 1,
"text": 2,
"input": 3
// etc...
}
Model
~~~~~
Supports custom trained object detection models:
* Default model trained for common UI elements
* Configurable confidence thresholds
Contributing
-----------
1. Clone the repository
2. Create a feature branch
3. Commit changes
4. Push to branch
5. Create Pull Request
Raw data
{
"_id": null,
"home_page": "https://www.thoughtful.ai/",
"name": "t-screenwise",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "t_screenwise",
"author": "Nikolas Cohn, Alejandro Mu\u00f1oz",
"author_email": "support@thoughtful.ai",
"download_url": "https://files.pythonhosted.org/packages/f4/ea/af24ae776dc3c4e2c7012b76c7d7141fd5eb178d9235562d388ba993c59c/t_screenwise-1.0.2.tar.gz",
"platform": null,
"description": "Screenwise Framework\n===================\n\nA Python framework for screen element detection and interaction using computer vision and machine learning.\n\nOverview\n--------\nScreenwise provides automated detection and interaction with UI elements through:\n\n* Screenshot capture and analysis\n* ML-based element detection \n* Coordinate-based interaction\n* OCR capabilities\n* Debug and capture modes\n* Cross-platform support\n\nInstallation\n-----------\n.. code-block:: bash\n\n pip install screenwise\n\nBasic Usage\n----------\n\nInitialize Framework\n~~~~~~~~~~~~~~~~~~\n.. code-block:: python\n\n from t_screenwise.screenwise import Framework\n\n # Initialize with default settings\n framework = Framework()\n\n # Initialize with custom settings\n framework = Framework(\n mode=\"CAPTURE\",\n model_path=\"path/to/model.pth\",\n labels=\"path/to/labels.json\",\n device=\"cpu\"\n )\n\nDetect Elements\n~~~~~~~~~~~~~~\n.. code-block:: python\n\n # Get all detected elements\n elements = framework.get()\n\n # Filter for specific element types\n buttons = framework.get(filter=[\"button\"])\n text = framework.get(filter=[\"text\"])\n\nInteract with Elements\n~~~~~~~~~~~~~~~~~~~~\n.. code-block:: python\n\n # Click element\n element.click()\n\n # Click at specific position\n element.click(coords=\"up_right\")\n\n # Type text\n element.send_keys(\"Hello World\")\n\n # Click and type\n element.click_and_send_keys(\"Hello World\")\n\nProcess OCR Elements\n~~~~~~~~~~~~~~~~~~\n.. code-block:: python\n\n framework = Framework()\n results = framework.get(image=\"path/to/image.png\", process_ocr=True)\n\n # Work with both types of elements\n for element in results:\n if isinstance(element, OCRElement):\n print(f\"OCR Text: {element.text} (Confidence: {element.confidence})\")\n else:\n print(f\"Box Label: {element.label}\")\n\nOCR Elements\n~~~~~~~~~~~\n* Text content extraction\n* Confidence scoring\n* Spatial relationship analysis\n* Text-based element search\n\nOCR Spatial Analysis\n~~~~~~~~~~~~~~~~~~\nThe OCRElement class provides powerful spatial analysis capabilities through the ``get_nearest_boxes`` method:\n\n.. code-block:: python\n\n # Get OCR elements from an image\n ocr_elements = framework.get(image=\"screenshot.png\", process_ocr=True)\n\n # For a specific OCR element, find nearest elements in all directions\n nearest = ocr_element.get_nearest_boxes(ocr_elements, n=1)\n\n # Access nearest elements by direction\n right_element = nearest[\"right\"][0] # Nearest element to the right\n left_element = nearest[\"left\"][0] # Nearest element to the left\n above_element = nearest[\"above\"][0] # Nearest element above\n below_element = nearest[\"below\"][0] # Nearest element below\n\nFeatures:\n\n* Find n nearest elements in each direction (right, left, above, below)\n* Considers spatial overlap when determining nearest elements\n* Returns elements sorted by distance\n* Useful for understanding layout and relationships between text elements\n\nFeatures\n--------\n\nScreen Elements\n~~~~~~~~~~~~~~\n* Coordinate-based positioning\n* Margin calculations\n* Drawing capabilities\n\nMouse and keyboard interaction\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n* Debug visualization\n\nOperating Modes\n~~~~~~~~~~~~~\n* CAPTURE: Live interaction with screen elements\n* DEBUG: Visualization and testing without actual interaction\n\nConfiguration\n------------\n\nLabels\n~~~~~~\nLabels are defined in a JSON file mapping element types to numeric IDs:\n\n.. code-block:: json\n\n {\n \"button\": 1,\n \"text\": 2,\n \"input\": 3\n // etc...\n }\n\nModel\n~~~~~\nSupports custom trained object detection models:\n\n* Default model trained for common UI elements\n* Configurable confidence thresholds\n\nContributing\n-----------\n1. Clone the repository\n2. Create a feature branch\n3. Commit changes\n4. Push to branch\n5. Create Pull Request\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python package for detecting and interacting with screen elements using computer vision and OCR.",
"version": "1.0.2",
"project_urls": {
"Homepage": "https://www.thoughtful.ai/"
},
"split_keywords": [
"t_screenwise"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f4eaaf24ae776dc3c4e2c7012b76c7d7141fd5eb178d9235562d388ba993c59c",
"md5": "9ca61a78bc6711708e8e2e5de52752df",
"sha256": "10ab4f9bba0ebab831bac00175978be2593bd7c6ab3f51ed968fb881b1500fec"
},
"downloads": -1,
"filename": "t_screenwise-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "9ca61a78bc6711708e8e2e5de52752df",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 16137,
"upload_time": "2025-01-14T23:21:50",
"upload_time_iso_8601": "2025-01-14T23:21:50.523127Z",
"url": "https://files.pythonhosted.org/packages/f4/ea/af24ae776dc3c4e2c7012b76c7d7141fd5eb178d9235562d388ba993c59c/t_screenwise-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-14 23:21:50",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "t-screenwise"
}