smartlocate

Name	smartlocate JSON
Version	2025.2.11.post1 JSON
	download
home_page	https://github.com/NormanTUD/smartlocate
Summary	Similiar to locate, but less stupid
upload_time	2025-02-11 10:09:32
maintainer	None
docs_url	None
author	Norman Koch
requires_python	None
license	None
keywords
VCS
bugtrack_url
requirements	torch torchvision yolov5 rich sixel easyocr transformers face-recognition rich-argparse pypandoc pdfplumber imageio pyzbar typeguard mypy types-requests flake8 pylint flameprof python-crontab
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# smartlocate - Intelligent File Indexer

smartlocate is a tool for Linux that uses YOLO and many other AI tools (no GPU required! Everything is done locally) to detect objects in images, describe images and creates a database of detected objects, image descriptions, text contents and so on, and makes them searchable. This database is stored locally and allows you to search for specific objects in images. smartlocate uses an SQLite database to efficiently store and search data.

If the parameter `--ocr` is set while indexing, all images are also OCRed and the found text is searchable. You can set the language with `--lang_ocr tr` for example. Default is `["de", "en"]`.

If the parameter `--describe` is set while indexing, the model `Salesforce/blip-image-captioning-large` will be used to generate descriptions of images automatically, which also then can be searched.

## Quickstart

```bash
# Install the tool
python3 -mvenv ~/smartlocate/
source ~/smartlocate/bin/activate
pip install smartlocate

# Index files (Using all possible indexing methods)
smartlocate ~/Documents --index

# Index files (using OCR, face-recognition, qr-code detection)
smartlocate ~/Documents --index --ocr --face_recognition --qrcodes

# Index files (Using all possible indexing methods), run hourly (won't work with new faces)
smartlocate ~/Documents --index --run_hourly

# Search for cats
smartlocate "cat"

# Search for cat in /home/username/Documents
smartlocate cat /home/username/Documents

# Search for "cat and dog" (order doesn't matter) in /home/username/Documents
smartlocate cat and dog /home/username/Documents

# Search for "cat and dog" (exactly in that order) in /home/username/Documents
smartlocate "cat and dog" /home/username/Documents --exact

# Help
smartlocate --help
```

## Screenshots

### Indexing

This shows the indexing process, with `--face_recognition` enabled. This means it asks for a name the first time a face is shown, but later on, it detects it automatically and can associate the face with a name, making it easily searchable.

### Face recognition while indexing

While indexing, with `--face_recognition`, faces are recognized. If the face cannot be automatically determined, it will ask you for the name of the person. For later images, this person will (most probably) be automatically detected again without any intervention.

If you don't want to wait manually for a long time, you can run smartlocate with `--dont_ask_new_faces`. This will skip images where person are found, but cannot be determined. This way, you can run it through a whole folder over night without manual intervention, and then run it again after it's done without that option, so that you get asked for all new faces. This way, you don't get longer waiting periods before entering names again.

## Searching

### Images of cats and dogs

These images were not manually labelled. Those labels were found by AI!

### Searching through Documents

This is a search on OCR'ed documents.

## Features

- Easy to install and use.
- Object detection in images using YOLO.
- OCR is done via easyocr, when `--ocr` was set during indexing. Allows you to use `%` as a wildcard.
- Qr-Code-Detection and indexing.
- Documents are converted with pandoc. Allowed document types are: `['.doc', '.docx', '.pptx', '.ppt', '.odp', '.odt', '.md', '.txt', '.pdf']`. Use `--documents` while indexing for finding documents.
- Stores detected objects in a local SQLite database (`~/.smartlocate_db`).
- Fast searching for specific objects in images.
- Supports Sixel graphics for visualizing results.
- Automatic face recognition (use `--face_recognition` while indexing). It will ask you (hopefully only once) per person what their name is, so it can recognize them later on automatically. You only have to label a person once (or a few times, when the images are VERY different), and after being labelled once, it will auto-detect them in other images as well.

## Installation

### Get latest official release

This will get the latest officially released version from <a href="https://pypi.org/project/smartlocate">pypi</a>.

```
python3 -mvenv ~/smartlocate/
source ~/smartlocate/bin/activate
pip3 install smartlocate
```

### Run latest version

1. Clone the repository:

```bash
git clone --depth 1 https://github.com/NormanTUD/smartlocate.git
```

2. Navigate to the directory and run the following command to install the tool:

```bash
cd smartlocate
./smartlocate --index --dir ~/Pictures
```

smartlocate will automatically install all necessary dependencies, and YOLO is already included. This is done on first execution, which may take some time. But this only has to be done once!

## Usage

### Indexing Images

To index images in a specific directory, run the following command:

```bash
smartlocate --dir /path/to/images --index
```

YOLO and an image description AI will be used to detect objects in images, and pandoc is used for indexing all kinds of documents, and the results will be stored in the database.

You need to re-run the index every time new images are added or changed.

### Searching for Objects

To search for a specific object (e.g., "cat"), run the following command:

```bash
smartlocate cat
```

The tool will search the indexed images for the object and display the results.

## Options

- `--index`: Indexes images in the specified directory.
- `--size SIZE`: Specifies the size to which images should be resized when indexing. Default is 400.
- `--dir DIR`: Specifies the directory to search or index.
- `--debug`: Enables debug mode to output detailed logs.
- `--no_sixel`: Hide Sixel graphics.
- `--qrcodes`: Enable indexing of qr-codes/search only qr-codes
- `--describe`: Saves descriptions of images (generated by AI) as well and makes them searchable
- `--exact`: Searches exactly what is entered, without splitting
- `--ocr`: Enable OCR.
- `--documents`: Enable documents.
- `--lang_ocr`: OCR languages, default: de, en. Accepts multiple languages.
- `--delete_non_existing_files`: Deletes non-existing files from the database.
- `--shuffle_index`: Shuffles the list of files before indexing.
- `--model MODEL`: Specifies the YOLO model for object detection.
- `--threshold THRESHOLD`: Sets the confidence threshold for object detection (0-1).
- `--dbfile DBFILE`: Specifies the path to the SQLite database file.
- `--exclude PATH`: Excludes a path from indexing/searching. Can be used multiple times.
- `--dont_ask_new_faces`: Don't ask for new faces (useful for automatically tagging all photos that can be tagged automatically).

## Example Commands

### Indexing images in a directory:

```bash
smartlocate --dir /home/user/images --index
```

### Search for images containing the object "cat":

```bash
smartlocate cat
```

### Indexing:

Indexing with YOLO, Description and OCR:

```bash
smartlocate --dir /home/user/images --index
```

## Database

The results of image indexing are stored in the SQLite database `~/.smartlocate_db`. This database contains information about detected
objects in the images. The index must be re-run whenever new images are added or changes are made.

## Manage single images

Simply run `smartlocate /path/to/an/image/file.jpg` to see an overview of the image file's data and modify it.

## Requirements

- Python 3.x
- All python-dependencies will be automatically installed when the tool is first run.

## Ideas

Future ideas would be to expand this to other formats than images as well. Imagine you could say:

```bash
smartlocate "text about cats"
```

and get all `.txt`, `.md`, `.docx`, `.tex` and so on files in which something about cats is written. Currently, document indexing is only done via a full-text search.

Same for videos and audio files. If someone wants to do it, feel free to contribute!

## Troubleshooting

### The SQlite3-file is too large

When the sqlite3-file is too large, you can vacuum it:

```bash
smartlocate --vacuum
```

This will not delete any data, but just free up claimed, but yet unreleased space.

## License

Licensed under GPL2.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/NormanTUD/smartlocate",
    "name": "smartlocate",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Norman Koch",
    "author_email": "norman.koch@tu-dresden.de",
    "download_url": "https://files.pythonhosted.org/packages/07/04/149e0caff6b82f8096f3cf13e2b27f0394427b2b5691340dfae99afc43ee/smartlocate-2025.2.11.post1.tar.gz",
    "platform": "Linux",
    "description": "# smartlocate - Intelligent File Indexer\n\nsmartlocate is a tool for Linux that uses YOLO and many other AI tools (no GPU required! Everything is done locally) to detect objects in images, describe images and creates a database of detected objects, image descriptions, text contents and so on, and makes them searchable. This database is stored locally and allows you to search for specific objects in images. smartlocate uses an SQLite database to efficiently store and search data.\n\nIf the parameter `--ocr` is set while indexing, all images are also OCRed and the found text is searchable. You can set the language with `--lang_ocr tr` for example. Default is `[\"de\", \"en\"]`.\n\nIf the parameter `--describe` is set while indexing, the model `Salesforce/blip-image-captioning-large` will be used to generate descriptions of images automatically, which also then can be searched.\n\n## Quickstart\n\n```bash\n# Install the tool\npython3 -mvenv ~/smartlocate/\nsource ~/smartlocate/bin/activate\npip install smartlocate\n\n# Index files (Using all possible indexing methods)\nsmartlocate ~/Documents --index\n\n# Index files (using OCR, face-recognition, qr-code detection)\nsmartlocate ~/Documents --index --ocr --face_recognition --qrcodes\n\n# Index files (Using all possible indexing methods), run hourly (won't work with new faces)\nsmartlocate ~/Documents --index --run_hourly\n\n# Search for cats\nsmartlocate \"cat\"\n\n# Search for cat in /home/username/Documents\nsmartlocate cat /home/username/Documents\n\n# Search for \"cat and dog\" (order doesn't matter) in /home/username/Documents\nsmartlocate cat and dog /home/username/Documents\n\n# Search for \"cat and dog\" (exactly in that order) in /home/username/Documents\nsmartlocate \"cat and dog\" /home/username/Documents --exact\n\n# Help\nsmartlocate --help\n```\n\n## Screenshots\n\n### Indexing\n\nThis shows the indexing process, with `--face_recognition` enabled. This means it asks for a name the first time a face is shown, but later on, it detects it automatically and can associate the face with a name, making it easily searchable.\n\n<p align=\"center\">\n<img src=\"https://raw.githubusercontent.com/NormanTUD/smartlocate/refs/heads/main/images/index.gif\" alt=\"Indexing\" width=\"1046\"/>\n</p>\n\n### Face recognition while indexing\n\nWhile indexing, with `--face_recognition`, faces are recognized. If the face cannot be automatically determined, it will ask you for the name of the person. For later images, this person will (most probably) be automatically detected again without any intervention.\n\n<p align=\"center\">\n<img src=\"https://raw.githubusercontent.com/NormanTUD/smartlocate/refs/heads/main/images/face_recognition.gif\" alt=\"Face Recognition\" width=\"1046\"/>\n</p>\n\nIf you don't want to wait manually for a long time, you can run smartlocate with `--dont_ask_new_faces`. This will skip images where person are found, but cannot be determined. This way, you can run it through a whole folder over night without manual intervention, and then run it again after it's done without that option, so that you get asked for all new faces. This way, you don't get longer waiting periods before entering names again.\n\n## Searching\n\n### Images of cats and dogs\n\nThese images were not manually labelled. Those labels were found by AI!\n\n<p align=\"center\">\n<img src=\"https://raw.githubusercontent.com/NormanTUD/smartlocate/refs/heads/main/images/dog.gif\" alt=\"Search: Dog\" width=\"1046\"/>\n</p>\n\n<p align=\"center\">\n<img src=\"https://raw.githubusercontent.com/NormanTUD/smartlocate/refs/heads/main/images/cat.gif\" alt=\"Search: Cat\" width=\"1046\"/>\n</p>\n\n### Searching through Documents\n\nThis is a search on OCR'ed documents.\n\n<p align=\"center\">\n<img src=\"https://raw.githubusercontent.com/NormanTUD/smartlocate/refs/heads/main/images/ocr.gif\" alt=\"OCR\" width=\"1046\"/>\n</p>\n\n## Features\n\n- Easy to install and use.\n- Object detection in images using YOLO.\n- OCR is done via easyocr, when `--ocr` was set during indexing. Allows you to use `%` as a wildcard.\n- Qr-Code-Detection and indexing.\n- Documents are converted with pandoc. Allowed document types are: `['.doc', '.docx', '.pptx', '.ppt', '.odp', '.odt', '.md', '.txt', '.pdf']`. Use `--documents` while indexing for finding documents.\n- Stores detected objects in a local SQLite database (`~/.smartlocate_db`).\n- Fast searching for specific objects in images.\n- Supports Sixel graphics for visualizing results.\n- Automatic face recognition (use `--face_recognition` while indexing). It will ask you (hopefully only once) per person what their name is, so it can recognize them later on automatically. You only have to label a person once (or a few times, when the images are VERY different), and after being labelled once, it will auto-detect them in other images as well.\n\n## Installation\n\n### Get latest official release\n\nThis will get the latest officially released version from <a href=\"https://pypi.org/project/smartlocate\">pypi</a>.\n\n```\npython3 -mvenv ~/smartlocate/\nsource ~/smartlocate/bin/activate\npip3 install smartlocate\n```\n\n### Run latest version\n\n1. Clone the repository:\n\n```bash\n   git clone --depth 1 https://github.com/NormanTUD/smartlocate.git\n```\n\n2. Navigate to the directory and run the following command to install the tool:\n\n```bash\ncd smartlocate\n./smartlocate --index --dir ~/Pictures\n```\n\nsmartlocate will automatically install all necessary dependencies, and YOLO is already included. This is done on first execution, which may take some time. But this only has to be done once!\n\n## Usage\n\n### Indexing Images\n\nTo index images in a specific directory, run the following command:\n\n```bash\nsmartlocate --dir /path/to/images --index\n```\n\nYOLO and an image description AI will be used to detect objects in images, and pandoc is used for indexing all kinds of documents, and the results will be stored in the database.\n\nYou need to re-run the index every time new images are added or changed.\n\n### Searching for Objects\n\nTo search for a specific object (e.g., \"cat\"), run the following command:\n\n```bash\nsmartlocate cat\n```\n\nThe tool will search the indexed images for the object and display the results.\n\n## Options\n\n- `--index`: Indexes images in the specified directory.\n- `--size SIZE`: Specifies the size to which images should be resized when indexing. Default is 400.\n- `--dir DIR`: Specifies the directory to search or index.\n- `--debug`: Enables debug mode to output detailed logs.\n- `--no_sixel`: Hide Sixel graphics.\n- `--qrcodes`: Enable indexing of qr-codes/search only qr-codes\n- `--describe`: Saves descriptions of images (generated by AI) as well and makes them searchable\n- `--exact`: Searches exactly what is entered, without splitting\n- `--ocr`: Enable OCR.\n- `--documents`: Enable documents.\n- `--lang_ocr`: OCR languages, default: de, en. Accepts multiple languages.\n- `--delete_non_existing_files`: Deletes non-existing files from the database.\n- `--shuffle_index`: Shuffles the list of files before indexing.\n- `--model MODEL`: Specifies the YOLO model for object detection.\n- `--threshold THRESHOLD`: Sets the confidence threshold for object detection (0-1).\n- `--dbfile DBFILE`: Specifies the path to the SQLite database file.\n- `--exclude PATH`: Excludes a path from indexing/searching. Can be used multiple times.\n- `--dont_ask_new_faces`: Don't ask for new faces (useful for automatically tagging all photos that can be tagged automatically).\n\n## Example Commands\n\n### Indexing images in a directory:\n\n```bash\nsmartlocate --dir /home/user/images --index\n```\n\n### Search for images containing the object \"cat\":\n\n```bash\nsmartlocate cat\n```\n\n### Indexing:\n\nIndexing with YOLO, Description and OCR:\n\n```bash\nsmartlocate --dir /home/user/images --index\n```\n\n## Database\n\nThe results of image indexing are stored in the SQLite database `~/.smartlocate_db`. This database contains information about detected\nobjects in the images. The index must be re-run whenever new images are added or changes are made.\n\n## Manage single images\n\nSimply run `smartlocate /path/to/an/image/file.jpg` to see an overview of the image file's data and modify it.\n\n## Requirements\n\n- Python 3.x\n- All python-dependencies will be automatically installed when the tool is first run.\n\n## Ideas\n\nFuture ideas would be to expand this to other formats than images as well. Imagine you could say:\n\n```bash\nsmartlocate \"text about cats\"\n```\n\nand get all `.txt`, `.md`, `.docx`, `.tex` and so on files in which something about cats is written. Currently, document indexing is only done via a full-text search.\n\nSame for videos and audio files. If someone wants to do it, feel free to contribute!\n\n## Troubleshooting\n\n### The SQlite3-file is too large\n\nWhen the sqlite3-file is too large, you can vacuum it:\n\n```bash\nsmartlocate --vacuum\n```\n\nThis will not delete any data, but just free up claimed, but yet unreleased space.\n\n## License\n\nLicensed under GPL2.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Similiar to locate, but less stupid",
    "version": "2025.2.11.post1",
    "project_urls": {
        "Homepage": "https://github.com/NormanTUD/smartlocate"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dff4540ef4be80ae9be86fe701ef86dbb6e41cf06c9108f8e609a19f13ba97f2",
                "md5": "4836c74de36ba98838b8ccb1cc3857bd",
                "sha256": "cce06900044c68ab554c31199b7fe185da4ef8ab05f9d8242113c1c4ef63b0d5"
            },
            "downloads": -1,
            "filename": "smartlocate-2025.2.11.post1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4836c74de36ba98838b8ccb1cc3857bd",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 86055,
            "upload_time": "2025-02-11T10:09:30",
            "upload_time_iso_8601": "2025-02-11T10:09:30.716283Z",
            "url": "https://files.pythonhosted.org/packages/df/f4/540ef4be80ae9be86fe701ef86dbb6e41cf06c9108f8e609a19f13ba97f2/smartlocate-2025.2.11.post1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0704149e0caff6b82f8096f3cf13e2b27f0394427b2b5691340dfae99afc43ee",
                "md5": "f46f3cf3e5b152ba481dbbf852ea2f78",
                "sha256": "bf5f9c02b639fbf390b175c66d07142cc04083b9a7c1e1fec4a8bcd0a11560e8"
            },
            "downloads": -1,
            "filename": "smartlocate-2025.2.11.post1.tar.gz",
            "has_sig": false,
            "md5_digest": "f46f3cf3e5b152ba481dbbf852ea2f78",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 37337,
            "upload_time": "2025-02-11T10:09:32",
            "upload_time_iso_8601": "2025-02-11T10:09:32.803326Z",
            "url": "https://files.pythonhosted.org/packages/07/04/149e0caff6b82f8096f3cf13e2b27f0394427b2b5691340dfae99afc43ee/smartlocate-2025.2.11.post1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-11 10:09:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "NormanTUD",
    "github_project": "smartlocate",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "torchvision",
            "specs": []
        },
        {
            "name": "yolov5",
            "specs": []
        },
        {
            "name": "rich",
            "specs": []
        },
        {
            "name": "sixel",
            "specs": []
        },
        {
            "name": "easyocr",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "face-recognition",
            "specs": []
        },
        {
            "name": "rich-argparse",
            "specs": []
        },
        {
            "name": "pypandoc",
            "specs": []
        },
        {
            "name": "pdfplumber",
            "specs": []
        },
        {
            "name": "imageio",
            "specs": []
        },
        {
            "name": "pyzbar",
            "specs": []
        },
        {
            "name": "typeguard",
            "specs": []
        },
        {
            "name": "mypy",
            "specs": []
        },
        {
            "name": "types-requests",
            "specs": []
        },
        {
            "name": "flake8",
            "specs": []
        },
        {
            "name": "pylint",
            "specs": []
        },
        {
            "name": "flameprof",
            "specs": []
        },
        {
            "name": "python-crontab",
            "specs": []
        }
    ],
    "lcname": "smartlocate"
}

Norman Koch