BirdSTEM


NameBirdSTEM JSON
Version 0.0.2 PyPI version JSON
download
home_page
SummaryAdaSTEM model for daily abundance estimation using eBird citizen science data
upload_time2023-08-29 05:49:54
maintainer
docs_urlNone
authorYangkang Chen
requires_python
license
keywords python ebird spatial-temporal model citizen science spatial temporal exploratory model stem adastem abundance phenology
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# BirdSTEM
AdaSTEM model for daily abundance estimation using eBird citizen science data

## Brief introduction
Bird STEM is an AdaSTEM model for daily abundance estimation using eBird citizen science data. It leverages the "adjacency" information of surrounding bird observation in space and time, to predict the occurence and abundance of target spatial-temporal point. In the demo, we use a two-step hurdle model as "base model", with XGBoostClassifier for occurence modeling and XGBoostRegressor for abundance modeling.

User can define the size of stixel (spatial temporal pixel) in terms of space and time. Larger stixel guarantees generalizability but loses precision in fine resolution; Smaller stixel may have better predictability in the exact area but reduced extrapolability for points outside the stixel.

In the demo, we first split the training data using temporal sliding windows with size of 50 DOY and step of 20 DOY (`temporal_start = 0`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval = 50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 50 units (measured in longitude and latitude, `grid_len_lon_upper_threshold=50`, `grid_len_lat_upper_threshold=50`), and stop splitting to prevent the edge length to shrink below 10 units (`grid_len_lon_lower_threshold=10`, `grid_len_lat_lower_threshold=10`) or containing less than 50 checklists (`points_lower_threshold = 50`).

This process is excecuted 10 times (`ensemble_fold = 10`), each time with random jitter and random rotation of the gridding, generating 10 ensembles. In the prediciton phase, only spatial-temporal points with more than 7 (`min_ensemble_required = 7`) ensembles usable are predicted (otherwise, set as `np.nan`).

Fitting and prediction methods follow the convention of sklearn `estimator` class:

```py
## fit
model.fit(X_train,y_train)

## predict
pred_mean, pred_std = model.predict(X_test)
pred_mean = np.where(pred_mean>0, pred_mean, 0)
```

Where the pred_mean and pred_std are the mean and standard deviation of the predicted values across ensembles.


## Full usage:

```py
from BirdSTEM.model.AdaSTEM import AdaSTEM, AdaSTEMHurdle
from BirdSTEM.model.Hurdle import Hurdle
from xgboost import XGBClassifier, XGBRegressor

SAVE_DIR = './'

base_model = Hurdle(classifier=XGBClassifier(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),
                    regressor=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1))



model = AdaSTEMHurdle(base_model=base_model,
                        ensemble_fold = 10,
                        min_ensemble_required= 7,
                        grid_len_lon_upper_threshold=50,
                            grid_len_lon_lower_threshold=10,
                            grid_len_lat_upper_threshold=50,
                            grid_len_lat_lower_threshold=10,
                            points_lower_threshold = 50,
                            temporal_start = 0, temporal_end=366, temporal_step=20, temporal_bin_interval = 50,
                            stixel_training_size_threshold = 50, ## important, should be consistent with points_lower_threshold
                            save_gridding_plot = True,
                            save_tmp = True,
                            save_dir=SAVE_DIR,
                            sample_weights_for_classifier=True)

## fit
model.fit(X_train,y_train)

## predict
pred_mean, pred_std = model.predict(X_test)
pred_mean = np.where(pred_mean>0, pred_mean, 0)
eval_metrics = AdaSTEM.eval_STEM_res('hurdle',y_test, pred_mean)
print(eval_metrics)

```


----
## Documentation:
[BirdSTEM Documentation](https://chenyangkang.github.io/BirdSTEM/)
<!-- BirdSTEM -->

----
![QuadTree example](QuadTree.png)

-----
References:

1. [Fink, D., Damoulas, T., & Dave, J. (2013, June). Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 27, No. 1, pp. 1284-1290).](https://ojs.aaai.org/index.php/AAAI/article/view/8484)

2. [Fink, D., Auer, T., Johnston, A., Ruizā€Gutierrez, V., Hochachka, W. M., & Kelling, S. (2020). Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications, 30(3), e02056.](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/eap.2056)

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "BirdSTEM",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,ebird,spatial-temporal model,citizen science,spatial temporal exploratory model,STEM,AdaSTEM,abundance,phenology",
    "author": "Yangkang Chen",
    "author_email": "chenyangkang24@outlook.com",
    "download_url": "https://files.pythonhosted.org/packages/86/1b/eed4fd2a926cad779dc03680603c9443501ccaa2e967a58c3507acf1689c/BirdSTEM-0.0.2.tar.gz",
    "platform": null,
    "description": "\n# BirdSTEM\nAdaSTEM model for daily abundance estimation using eBird citizen science data\n\n## Brief introduction\nBird STEM is an AdaSTEM model for daily abundance estimation using eBird citizen science data. It leverages the \"adjacency\" information of surrounding bird observation in space and time, to predict the occurence and abundance of target spatial-temporal point. In the demo, we use a two-step hurdle model as \"base model\", with XGBoostClassifier for occurence modeling and XGBoostRegressor for abundance modeling.\n\nUser can define the size of stixel (spatial temporal pixel) in terms of space and time. Larger stixel guarantees generalizability but loses precision in fine resolution; Smaller stixel may have better predictability in the exact area but reduced extrapolability for points outside the stixel.\n\nIn the demo, we first split the training data using temporal sliding windows with size of 50 DOY and step of 20 DOY (`temporal_start = 0`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval = 50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 50 units (measured in longitude and latitude, `grid_len_lon_upper_threshold=50`, `grid_len_lat_upper_threshold=50`), and stop splitting to prevent the edge length to shrink below 10 units (`grid_len_lon_lower_threshold=10`, `grid_len_lat_lower_threshold=10`) or containing less than 50 checklists (`points_lower_threshold = 50`).\n\nThis process is excecuted 10 times (`ensemble_fold = 10`), each time with random jitter and random rotation of the gridding, generating 10 ensembles. In the prediciton phase, only spatial-temporal points with more than 7 (`min_ensemble_required = 7`) ensembles usable are predicted (otherwise, set as `np.nan`).\n\nFitting and prediction methods follow the convention of sklearn `estimator` class:\n\n```py\n## fit\nmodel.fit(X_train,y_train)\n\n## predict\npred_mean, pred_std = model.predict(X_test)\npred_mean = np.where(pred_mean>0, pred_mean, 0)\n```\n\nWhere the pred_mean and pred_std are the mean and standard deviation of the predicted values across ensembles.\n\n\n## Full usage:\n\n```py\nfrom BirdSTEM.model.AdaSTEM import AdaSTEM, AdaSTEMHurdle\nfrom BirdSTEM.model.Hurdle import Hurdle\nfrom xgboost import XGBClassifier, XGBRegressor\n\nSAVE_DIR = './'\n\nbase_model = Hurdle(classifier=XGBClassifier(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),\n                    regressor=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1))\n\n\n\nmodel = AdaSTEMHurdle(base_model=base_model,\n                        ensemble_fold = 10,\n                        min_ensemble_required= 7,\n                        grid_len_lon_upper_threshold=50,\n                            grid_len_lon_lower_threshold=10,\n                            grid_len_lat_upper_threshold=50,\n                            grid_len_lat_lower_threshold=10,\n                            points_lower_threshold = 50,\n                            temporal_start = 0, temporal_end=366, temporal_step=20, temporal_bin_interval = 50,\n                            stixel_training_size_threshold = 50, ## important, should be consistent with points_lower_threshold\n                            save_gridding_plot = True,\n                            save_tmp = True,\n                            save_dir=SAVE_DIR,\n                            sample_weights_for_classifier=True)\n\n## fit\nmodel.fit(X_train,y_train)\n\n## predict\npred_mean, pred_std = model.predict(X_test)\npred_mean = np.where(pred_mean>0, pred_mean, 0)\neval_metrics = AdaSTEM.eval_STEM_res('hurdle',y_test, pred_mean)\nprint(eval_metrics)\n\n```\n\n\n----\n## Documentation:\n[BirdSTEM Documentation](https://chenyangkang.github.io/BirdSTEM/)\n<!-- BirdSTEM -->\n\n----\n![QuadTree example](QuadTree.png)\n\n-----\nReferences:\n\n1. [Fink, D., Damoulas, T., & Dave, J. (2013, June). Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 27, No. 1, pp. 1284-1290).](https://ojs.aaai.org/index.php/AAAI/article/view/8484)\n\n2. [Fink, D., Auer, T., Johnston, A., Ruiz\u2010Gutierrez, V., Hochachka, W. M., & Kelling, S. (2020). Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications, 30(3), e02056.](https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/eap.2056)\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "AdaSTEM model for daily abundance estimation using eBird citizen science data",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [
        "python",
        "ebird",
        "spatial-temporal model",
        "citizen science",
        "spatial temporal exploratory model",
        "stem",
        "adastem",
        "abundance",
        "phenology"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "583ba5d1023f9b698d39deefdce6de54c32245a9272978322884a4da73bf4951",
                "md5": "64ac51b56925be1a24765493332d03e5",
                "sha256": "cddfa5e62621e6d58245bd4845511dfe1aa9e7a3bed82916c733665376810fca"
            },
            "downloads": -1,
            "filename": "BirdSTEM-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "64ac51b56925be1a24765493332d03e5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 20875,
            "upload_time": "2023-08-29T05:49:52",
            "upload_time_iso_8601": "2023-08-29T05:49:52.201574Z",
            "url": "https://files.pythonhosted.org/packages/58/3b/a5d1023f9b698d39deefdce6de54c32245a9272978322884a4da73bf4951/BirdSTEM-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "861beed4fd2a926cad779dc03680603c9443501ccaa2e967a58c3507acf1689c",
                "md5": "ccf485686436bc46b2da21c7d15ade7a",
                "sha256": "b12c99439c071fd4b5f31a3cbfeb0e9d49183c815448df09a012e43c9bcf3f26"
            },
            "downloads": -1,
            "filename": "BirdSTEM-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ccf485686436bc46b2da21c7d15ade7a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 18135,
            "upload_time": "2023-08-29T05:49:54",
            "upload_time_iso_8601": "2023-08-29T05:49:54.094999Z",
            "url": "https://files.pythonhosted.org/packages/86/1b/eed4fd2a926cad779dc03680603c9443501ccaa2e967a58c3507acf1689c/BirdSTEM-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-29 05:49:54",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "birdstem"
}
        
Elapsed time: 1.27148s