animl

Name	animl JSON
Version	1.2.0 JSON
	download
home_page
Summary	Tools for classifying camera trap images
upload_time	2024-03-18 22:28:10
maintainer
docs_url	None
author
requires_python	>=3.7
license
keywords	camera trap ecology conservation zoo sdzwa conservationtechlab
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # animl-py
AniML comprises a variety of machine learning tools for analyzing ecological data. This Python package includes a set of functions to classify subjects within camera trap field data and can handle both images and videos. 
This package is also available in R: [animl](https://github.com/conservationtechlab/animl)

Table of Contents
1. Installation
2. [Usage](#usage)
3. [Models](#models)

## Installation Instructions

It is recommended that you set up a conda environment for using animl.
See **Dependencies** below for more detail. You will have to activate the conda environment first each
time you want to run AniML from a new terminal.

### From GitHub
```
git clone https://github.com/conservationtechlab/animl-py.git
cd animl-py
conda env create --file environment.yml
conda activate animl-gpu
pip install -e .
```

### From PyPi
With NVIDIA GPU
```
conda create -n animl-gpu python=3.7
conda activate animl-gpu
conda install cudatoolkit=11.3.1 cudnn=8.2.1
pip install animl
```
CPU only
```
conda create -n animl-cpu python=3.7
conda activate animl-cpu
pip install animl
```

### Dependencies
We recommend running AniML on GPU-enabled hardware. **If using an NVIDIA GPU, ensure driviers, cuda-toolkit and cudnn are installed.
The /models/ and /utils/ modules are from the YOLOv5 repository.  https://github.com/ultralytics/yolov5

Python Package Dependencies
- pandas = 1.3.5
- tensorflow = 2.6
- torch = 1.13.1
- torchvision = 0.14.1
- numpy = 1.19.5
- cudatoolkit = 11.3.1 **
- cudnn = 8.2.1 **

A full list of dependencies can be found in environment.yml

### Verify Install 
We recommend you download the [examples](https://github.com/conservationtechlab/animl-py/blob/main/examples/Southwest.zip) folder within this repository.
Download and unarchive the zip folder. Then with the conda environment active:
```
python -m animl /path/to/example/folder
```
This should create an Animl-Directory subfolder within
the example folder.

Or, if using your own data/models, animl can be given the paths to those files:
Download and unarchive the zip folder. Then with the conda environment active:
```
python -m animl /path/to/example/folder /path/to/megadetector /path/to/classifier /path/to/classlist.txt
```

## Usage

### Inference
The functionality of animl can be parcelated into its individual functions to suit your data and scripting needs.
The sandbox.ipynb notebook has all of these steps available for further exploration.

1. It is recommended that you use the animl working directory for storing intermediate steps.
```python
from animl import file_management
workingdir = file_management.WorkingDirectory(imagedir)
```

2. Build the file manifest of your given directory. This will find both images and videos.
```python
files = file_management.build_file_manifest('/path/to/images', out_file = workingdir.filemanifest)
```

3. If there are videos, extract individual frames for processing.
   Select either the number of frames or fps using the argumments.
   The other option can be set to None or removed.
```python
from animl import video_processing
allframes = video_processing.images_from_videos(files, out_dir=workingdir.vidfdir,
                                                out_file=workingdir.imageframes,
                                                parallel=True, frames=3, fps=None)
```
4. Pass all images into MegaDetector. We recommend [MDv5a](https://github.com/agentmorris/MegaDetector/releases/download/v5.0/md_v5a.0.0.pt)
   parseMD will merge detections with the original file manifest, if provided.

```python
from animl import detect, megadetector
detector = megadetector.MegaDetector('/path/to/mdmodel.pt')
mdresults = detect.detect_MD_batch(detector, allframes["Frame"], quiet=True)
mdres = detect.parse_MD(mdresults, out_file=workingdir.mdresults, threshold=0)
frame_bbox = allframes.merge(mdres, left_on="Frame", right_on="file")
```
5. For speed and efficiency, extract the empty/human/vehicle detections before classification.
```python
from animl import split
animals = split.get_animals(frame_bbox)
empty = split.get_empty(frame_bbox)
```
6. Classify using the appropriate species model. Merge the output with the rest of the detections
   if desired.
```python
from animl import classifiers, inference
classifier, class_list = classifiers.load_model('/path/to/model, '/path/to/classlist.txt')
predresults = inference.predict_species(animals, classifier, class_list, batch = 16)
manifest = pd.concat([animals,empty])
pd.to_csv(manifest, workindir.results)
```

7. (OPTIONAL) Save the Pandas DataFrame's required columns to csv and then use it to create json for TimeLapse compatibility

```python
from animl import timelapse, animl_results_to_md_results
csv_loc = timelapse.csv_converter(animals, empty, imagedir, only_animl = True)
animl_results_to_md_results.animl_results_to_md_results(csv_loc, imagedir + "final_result.json")
```

8. (OPTIONAL) Create symlinks within a given directory for file browser access.
```python
linked_manifest = symlink_species(manifest, workingdir.linkdir, file_col="FilePath", copy=False)
pd.to_csv(linked_manifest, workindir.results)
```

---
### Training

Training workflows are still under development. Please submit Issues as you come upon them.

1. Assuming a file manifest of training data with species labels, first split the data into training, validation and test splits.
   This function splits each label proportionally by the given percentages, by default 0.7 training, 0.2 validation, 0.1 Test.
```python
from animl import split
train, val, test, stats = train_val_test(manifest, out_dir='path/to/save/data/, label_col="species",
                   percentage=(0.7, 0.2, 0.1), seed=None)
```

2. Set up training configuration file. Specify the paths to the data splits from the previous step. Example .yaml file:
```
seed: 28  # random number generator seed (long integer value)
device: cuda:0  # set to local gpu device 
num_workers: 8  # number of cores

# dataset parameters
num_classes: 53 #might need to be adjusted based on the classes file
training_set: "/path/to/save/train_data.csv"
validate_set: "/path/to/save/validate_data.csv"
test_set: "/path/to/save/test_data.csv"
class_file: "/home/usr/machinelearning/Models/Animl-Test/test_classes.txt" 

# training hyperparameters
architecture: "efficientnet_v2_m"
image_size: [299, 299]
num_epochs: 100
batch_size: 16

learning_rate: 0.003
weight_decay: 0.001

# overwrite .pt files
overwrite: False

experiment_folder: '/home/usr/machinelearning/Models/Animl-Test/'
```

class_file refers to a flle that contains index,label pairs. For example:<br>
test_class.txt
```
id,species
1,cat
2,dog
```

3. (Optional) Update train.py to include MLOPS connection. 

4. Using the config file, begin training
```bash
python -m animl.train --config /path/to/config.yaml
```
Every 10 epochs, the model will be checkpointed to the 'experiment_folder' parameter in the config file, and will contain performance metrics for selection.


5. Testing of a model checkpoint can be done with the "test.py" module.  Add an 'active_model' parameter to the config file that contains the path of the checkpoint to test.
   This will produce a confusion matrix of the test dataset as well as a csv containing predicted and ground truth labels for each image.
```bash
python -m animl.test --config /path/to/config.yaml
```

# Models

The Conservation Technology Lab has several models available for use. 

* Southwest United States [v2](https://sandiegozoo.box.com/s/mzhv08cxcbunsjuh2yp6o7aa5ueetua6) [v3](https://sandiegozoo.box.com/s/p4ws6v5qnoi87otsie0ckmie0izxzqwo)
* [Amazon](https://sandiegozoo.box.com/s/dfc3ozdslku1ekahvz635kjloaaeopfl)
* [Savannah](https://sandiegozoo.box.com/s/ai6yu45jgvc0to41xzd26moqh8amb4vw)
* [MegaDetector](https://github.com/agentmorris/MegaDetector/releases/download/v5.0/md_v5a.0.0.pt)

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "animl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "camera trap,ecology,conservation,zoo,SDZWA,conservationtechlab",
    "author": "",
    "author_email": "Kyra Swanson <tswanson@sdzwa.org>",
    "download_url": "https://files.pythonhosted.org/packages/94/fa/4a57596b7fa75c0bb25c3234fb60ba11cf8d8c7eee2d720d682654e2beb9/animl-1.2.0.tar.gz",
    "platform": null,
    "description": "# animl-py\nAniML comprises a variety of machine learning tools for analyzing ecological data. This Python package includes a set of functions to classify subjects within camera trap field data and can handle both images and videos. \nThis package is also available in R: [animl](https://github.com/conservationtechlab/animl)\n\nTable of Contents\n1. Installation\n2. [Usage](#usage)\n3. [Models](#models)\n\n## Installation Instructions\n\nIt is recommended that you set up a conda environment for using animl.\nSee **Dependencies** below for more detail. You will have to activate the conda environment first each\ntime you want to run AniML from a new terminal.\n\n### From GitHub\n```\ngit clone https://github.com/conservationtechlab/animl-py.git\ncd animl-py\nconda env create --file environment.yml\nconda activate animl-gpu\npip install -e .\n```\n\n### From PyPi\nWith NVIDIA GPU\n```\nconda create -n animl-gpu python=3.7\nconda activate animl-gpu\nconda install cudatoolkit=11.3.1 cudnn=8.2.1\npip install animl\n```\nCPU only\n```\nconda create -n animl-cpu python=3.7\nconda activate animl-cpu\npip install animl\n```\n\n### Dependencies\nWe recommend running AniML on GPU-enabled hardware. **If using an NVIDIA GPU, ensure driviers, cuda-toolkit and cudnn are installed.\nThe /models/ and /utils/ modules are from the YOLOv5 repository.  https://github.com/ultralytics/yolov5\n\nPython Package Dependencies\n- pandas = 1.3.5\n- tensorflow = 2.6\n- torch = 1.13.1\n- torchvision = 0.14.1\n- numpy = 1.19.5\n- cudatoolkit = 11.3.1 **\n- cudnn = 8.2.1 **\n\nA full list of dependencies can be found in environment.yml\n\n### Verify Install \nWe recommend you download the [examples](https://github.com/conservationtechlab/animl-py/blob/main/examples/Southwest.zip) folder within this repository.\nDownload and unarchive the zip folder. Then with the conda environment active:\n```\npython -m animl /path/to/example/folder\n```\nThis should create an Animl-Directory subfolder within\nthe example folder.\n\nOr, if using your own data/models, animl can be given the paths to those files:\nDownload and unarchive the zip folder. Then with the conda environment active:\n```\npython -m animl /path/to/example/folder /path/to/megadetector /path/to/classifier /path/to/classlist.txt\n```\n\n## Usage\n\n### Inference\nThe functionality of animl can be parcelated into its individual functions to suit your data and scripting needs.\nThe sandbox.ipynb notebook has all of these steps available for further exploration.\n\n1. It is recommended that you use the animl working directory for storing intermediate steps.\n```python\nfrom animl import file_management\nworkingdir = file_management.WorkingDirectory(imagedir)\n```\n\n2. Build the file manifest of your given directory. This will find both images and videos.\n```python\nfiles = file_management.build_file_manifest('/path/to/images', out_file = workingdir.filemanifest)\n```\n\n3. If there are videos, extract individual frames for processing.\n   Select either the number of frames or fps using the argumments.\n   The other option can be set to None or removed.\n```python\nfrom animl import video_processing\nallframes = video_processing.images_from_videos(files, out_dir=workingdir.vidfdir,\n                                                out_file=workingdir.imageframes,\n                                                parallel=True, frames=3, fps=None)\n```\n4. Pass all images into MegaDetector. We recommend [MDv5a](https://github.com/agentmorris/MegaDetector/releases/download/v5.0/md_v5a.0.0.pt)\n   parseMD will merge detections with the original file manifest, if provided.\n\n```python\nfrom animl import detect, megadetector\ndetector = megadetector.MegaDetector('/path/to/mdmodel.pt')\nmdresults = detect.detect_MD_batch(detector, allframes[\"Frame\"], quiet=True)\nmdres = detect.parse_MD(mdresults, out_file=workingdir.mdresults, threshold=0)\nframe_bbox = allframes.merge(mdres, left_on=\"Frame\", right_on=\"file\")\n```\n5. For speed and efficiency, extract the empty/human/vehicle detections before classification.\n```python\nfrom animl import split\nanimals = split.get_animals(frame_bbox)\nempty = split.get_empty(frame_bbox)\n```\n6. Classify using the appropriate species model. Merge the output with the rest of the detections\n   if desired.\n```python\nfrom animl import classifiers, inference\nclassifier, class_list = classifiers.load_model('/path/to/model, '/path/to/classlist.txt')\npredresults = inference.predict_species(animals, classifier, class_list, batch = 16)\nmanifest = pd.concat([animals,empty])\npd.to_csv(manifest, workindir.results)\n```\n\n7. (OPTIONAL) Save the Pandas DataFrame's required columns to csv and then use it to create json for TimeLapse compatibility\n\n```python\nfrom animl import timelapse, animl_results_to_md_results\ncsv_loc = timelapse.csv_converter(animals, empty, imagedir, only_animl = True)\naniml_results_to_md_results.animl_results_to_md_results(csv_loc, imagedir + \"final_result.json\")\n```\n\n8. (OPTIONAL) Create symlinks within a given directory for file browser access.\n```python\nlinked_manifest = symlink_species(manifest, workingdir.linkdir, file_col=\"FilePath\", copy=False)\npd.to_csv(linked_manifest, workindir.results)\n```\n\n---\n### Training\n\nTraining workflows are still under development. Please submit Issues as you come upon them.\n\n1. Assuming a file manifest of training data with species labels, first split the data into training, validation and test splits.\n   This function splits each label proportionally by the given percentages, by default 0.7 training, 0.2 validation, 0.1 Test.\n```python\nfrom animl import split\ntrain, val, test, stats = train_val_test(manifest, out_dir='path/to/save/data/, label_col=\"species\",\n                   percentage=(0.7, 0.2, 0.1), seed=None)\n```\n\n2. Set up training configuration file. Specify the paths to the data splits from the previous step. Example .yaml file:\n```\nseed: 28  # random number generator seed (long integer value)\ndevice: cuda:0  # set to local gpu device \nnum_workers: 8  # number of cores\n\n# dataset parameters\nnum_classes: 53 #might need to be adjusted based on the classes file\ntraining_set: \"/path/to/save/train_data.csv\"\nvalidate_set: \"/path/to/save/validate_data.csv\"\ntest_set: \"/path/to/save/test_data.csv\"\nclass_file: \"/home/usr/machinelearning/Models/Animl-Test/test_classes.txt\" \n\n# training hyperparameters\narchitecture: \"efficientnet_v2_m\"\nimage_size: [299, 299]\nnum_epochs: 100\nbatch_size: 16\n\nlearning_rate: 0.003\nweight_decay: 0.001\n\n# overwrite .pt files\noverwrite: False\n\nexperiment_folder: '/home/usr/machinelearning/Models/Animl-Test/'\n```\n\nclass_file refers to a flle that contains index,label pairs. For example:<br>\ntest_class.txt\n```\nid,species\n1,cat\n2,dog\n```\n\n3. (Optional) Update train.py to include MLOPS connection. \n\n4. Using the config file, begin training\n```bash\npython -m animl.train --config /path/to/config.yaml\n```\nEvery 10 epochs, the model will be checkpointed to the 'experiment_folder' parameter in the config file, and will contain performance metrics for selection.\n\n\n5. Testing of a model checkpoint can be done with the \"test.py\" module.  Add an 'active_model' parameter to the config file that contains the path of the checkpoint to test.\n   This will produce a confusion matrix of the test dataset as well as a csv containing predicted and ground truth labels for each image.\n```bash\npython -m animl.test --config /path/to/config.yaml\n```\n\n# Models\n\nThe Conservation Technology Lab has several models available for use. \n\n* Southwest United States [v2](https://sandiegozoo.box.com/s/mzhv08cxcbunsjuh2yp6o7aa5ueetua6) [v3](https://sandiegozoo.box.com/s/p4ws6v5qnoi87otsie0ckmie0izxzqwo)\n* [Amazon](https://sandiegozoo.box.com/s/dfc3ozdslku1ekahvz635kjloaaeopfl)\n* [Savannah](https://sandiegozoo.box.com/s/ai6yu45jgvc0to41xzd26moqh8amb4vw)\n* [MegaDetector](https://github.com/agentmorris/MegaDetector/releases/download/v5.0/md_v5a.0.0.pt)\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Tools for classifying camera trap images",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://github.com/conservationtechlab/animl-py"
    },
    "split_keywords": [
        "camera trap",
        "ecology",
        "conservation",
        "zoo",
        "sdzwa",
        "conservationtechlab"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0cba474439bdda39b49745a60ad518837c4b0a31a351bbbdf56aed59fb6b5071",
                "md5": "e0915729ebb50e3bbdae61ee82819ae6",
                "sha256": "1bc59dc6515f584a26fcea29bf40fd7e254dba910624b41194aef36a17354e2c"
            },
            "downloads": -1,
            "filename": "animl-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e0915729ebb50e3bbdae61ee82819ae6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 112774,
            "upload_time": "2024-03-18T22:28:07",
            "upload_time_iso_8601": "2024-03-18T22:28:07.427937Z",
            "url": "https://files.pythonhosted.org/packages/0c/ba/474439bdda39b49745a60ad518837c4b0a31a351bbbdf56aed59fb6b5071/animl-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "94fa4a57596b7fa75c0bb25c3234fb60ba11cf8d8c7eee2d720d682654e2beb9",
                "md5": "15bb7c5db529a883cdc65580971eddac",
                "sha256": "f5c0a3daa4378e3fd711e36367c4dc4753c98460efba6ef5c66c2a380014cc49"
            },
            "downloads": -1,
            "filename": "animl-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "15bb7c5db529a883cdc65580971eddac",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 102636,
            "upload_time": "2024-03-18T22:28:10",
            "upload_time_iso_8601": "2024-03-18T22:28:10.286952Z",
            "url": "https://files.pythonhosted.org/packages/94/fa/4a57596b7fa75c0bb25c3234fb60ba11cf8d8c7eee2d720d682654e2beb9/animl-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-18 22:28:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "conservationtechlab",
    "github_project": "animl-py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "animl"
}