Geolet


NameGeolet JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/cri98li/Geolet
SummaryPackage description
upload_time2023-06-27 10:51:20
maintainer
docs_urlNone
authorcri98li
requires_python
licenseBSD-Clause-2
keywords keyword1 keyword2 keyword3
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # Geolet - Interpretable GPS Trajectory Classifier



Researchers, businesses, and governments use mobility data to make decisions that affect people's lives in many ways, employing accurate but opaque deep learning models that are difficult to interpret from a human standpoint. 

To address these limitations, we propose Geolet, a human-interpretable machine-learning model for trajectory classification. 

We use discriminative sub-trajectories extracted from mobility data to turn trajectories into a simplified representation that can be used as input by any machine learning classifier. 





## Setup



### Using PyPI



```bash

  pip install geolet

```



### Manual Setup



```bash

git clone https://github.com/cri98li/Geolet

cd Geolet

pip install -e .

```



Dependencies are listed in `requirements.txt`.





## Running the code



```python

import pandas as pd

from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier



from Geolet.classifier.geoletclassifier import GeoletClassifier, prepare_y

from Geolet import distancers





df = pd.read_csv("animals_prepared.zip").sort_values(by=["tid", "t"])

df = df[["tid", "class", "t", "c1", "c2"]]



tid_train, tid_test, _, _ = train_test_split(df.groupby(by=["tid"]).max().reset_index()["tid"],

                                             df.groupby(by=["tid"]).max().reset_index()["class"],

                                             test_size=.3,

                                             stratify=df.groupby(by=["tid"]).max().reset_index()["class"],

                                             random_state=3)

transform = GeoletClassifier(

    precision=3, # Geohash precision for the partitioning phase

    geolet_per_class=10,  # Number of candidate geolets to subsample randomly before the selecting phase

    selector='MutualInformation', # Name of the selector to use. Possible values are ["Random", "MutualInformation"]

    top_k=5,  # Top k geolets, according to the selector score, to use for transforming the entire dataset.

    trajectory_for_stats=100,  # Number of trajectory to subsample for selector scoring

    bestFittingMeasure=distancers.InterpolatedRouteDistance.interpolatedRootDistanceBestFitting, # best fitting measure to use

    distancer='IRD',  #Distance Measure to use for the final transformation. Possible values are ["E", "IRD"]

    verbose=True,

    n_jobs=4

)



X_train = df[df.tid.isin(tid_train)].drop(columns="class").values

y_train = df[df.tid.isin(tid_train)].values[:, 1]



X = df.drop(columns="class").values

y = prepare_y(classes=df.values[:, 1], tids=df.values[:, 0])



X_index, X_dist = transform.fit(X_train, y_train).transform(X)



X_train, X_test, y_train, y_test = train_test_split(X_dist, y, test_size=.3, stratify=y, random_state=3)

clf = RandomForestClassifier()

clf.fit(X_train, y_train)



y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

```



Jupyter notebooks with examples on real datasets can be found in the `examples/` directory.





## Docs and reference





You can find the software documentation in the `/docs/` folder and 

a powerpoint presentation on Geolet can be found [here](http://example.org).

You can cite this work with

```

@inproceedings{DBLP:conf/ida/LandiSGMN23,

  author       = {Cristiano Landi and

                  Francesco Spinnato and

                  Riccardo Guidotti and

                  Anna Monreale and

                  Mirco Nanni},

  title        = {Geolet: An Interpretable Model for Trajectory Classification},

  booktitle    = {{IDA}},

  series       = {Lecture Notes in Computer Science},

  volume       = {13876},

  pages        = {236--248},

  publisher    = {Springer},

  year         = {2023}

}

```





## Extending the algorithm



The original Geolet code, i.e., the code used for the experiments in the paper, is available in the /original_code branch.



The code in the main branch is a reimplementation that speeds up the execution time by about 7%.

 


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cri98li/Geolet",
    "name": "Geolet",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "keyword1 keyword2 keyword3",
    "author": "cri98li",
    "author_email": "cri98li@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/82/94/54886312efbcdfc294a5864a12df9bf82f42fea277c648dab4c4876acf13/Geolet-0.0.1.tar.gz",
    "platform": null,
    "description": "# Geolet - Interpretable GPS Trajectory Classifier\r\r\n\r\r\nResearchers, businesses, and governments use mobility data to make decisions that affect people's lives in many ways, employing accurate but opaque deep learning models that are difficult to interpret from a human standpoint. \r\r\nTo address these limitations, we propose Geolet, a human-interpretable machine-learning model for trajectory classification. \r\r\nWe use discriminative sub-trajectories extracted from mobility data to turn trajectories into a simplified representation that can be used as input by any machine learning classifier. \r\r\n\r\r\n\r\r\n## Setup\r\r\n\r\r\n### Using PyPI\r\r\n\r\r\n```bash\r\r\n  pip install geolet\r\r\n```\r\r\n\r\r\n### Manual Setup\r\r\n\r\r\n```bash\r\r\ngit clone https://github.com/cri98li/Geolet\r\r\ncd Geolet\r\r\npip install -e .\r\r\n```\r\r\n\r\r\nDependencies are listed in `requirements.txt`.\r\r\n\r\r\n\r\r\n## Running the code\r\r\n\r\r\n```python\r\r\nimport pandas as pd\r\r\nfrom sklearn.metrics import accuracy_score\r\r\nfrom sklearn.model_selection import train_test_split\r\r\nfrom sklearn.ensemble import RandomForestClassifier\r\r\n\r\r\nfrom Geolet.classifier.geoletclassifier import GeoletClassifier, prepare_y\r\r\nfrom Geolet import distancers\r\r\n\r\r\n\r\r\ndf = pd.read_csv(\"animals_prepared.zip\").sort_values(by=[\"tid\", \"t\"])\r\r\ndf = df[[\"tid\", \"class\", \"t\", \"c1\", \"c2\"]]\r\r\n\r\r\ntid_train, tid_test, _, _ = train_test_split(df.groupby(by=[\"tid\"]).max().reset_index()[\"tid\"],\r\r\n                                             df.groupby(by=[\"tid\"]).max().reset_index()[\"class\"],\r\r\n                                             test_size=.3,\r\r\n                                             stratify=df.groupby(by=[\"tid\"]).max().reset_index()[\"class\"],\r\r\n                                             random_state=3)\r\r\ntransform = GeoletClassifier(\r\r\n    precision=3, # Geohash precision for the partitioning phase\r\r\n    geolet_per_class=10,  # Number of candidate geolets to subsample randomly before the selecting phase\r\r\n    selector='MutualInformation', # Name of the selector to use. Possible values are [\"Random\", \"MutualInformation\"]\r\r\n    top_k=5,  # Top k geolets, according to the selector score, to use for transforming the entire dataset.\r\r\n    trajectory_for_stats=100,  # Number of trajectory to subsample for selector scoring\r\r\n    bestFittingMeasure=distancers.InterpolatedRouteDistance.interpolatedRootDistanceBestFitting, # best fitting measure to use\r\r\n    distancer='IRD',  #Distance Measure to use for the final transformation. Possible values are [\"E\", \"IRD\"]\r\r\n    verbose=True,\r\r\n    n_jobs=4\r\r\n)\r\r\n\r\r\nX_train = df[df.tid.isin(tid_train)].drop(columns=\"class\").values\r\r\ny_train = df[df.tid.isin(tid_train)].values[:, 1]\r\r\n\r\r\nX = df.drop(columns=\"class\").values\r\r\ny = prepare_y(classes=df.values[:, 1], tids=df.values[:, 0])\r\r\n\r\r\nX_index, X_dist = transform.fit(X_train, y_train).transform(X)\r\r\n\r\r\nX_train, X_test, y_train, y_test = train_test_split(X_dist, y, test_size=.3, stratify=y, random_state=3)\r\r\nclf = RandomForestClassifier()\r\r\nclf.fit(X_train, y_train)\r\r\n\r\r\ny_pred = clf.predict(X_test)\r\r\naccuracy = accuracy_score(y_test, y_pred)\r\r\n```\r\r\n\r\r\nJupyter notebooks with examples on real datasets can be found in the `examples/` directory.\r\r\n\r\r\n\r\r\n## Docs and reference\r\r\n\r\r\n\r\r\nYou can find the software documentation in the `/docs/` folder and \r\r\na powerpoint presentation on Geolet can be found [here](http://example.org).\r\r\nYou can cite this work with\r\r\n```\r\r\n@inproceedings{DBLP:conf/ida/LandiSGMN23,\r\r\n  author       = {Cristiano Landi and\r\r\n                  Francesco Spinnato and\r\r\n                  Riccardo Guidotti and\r\r\n                  Anna Monreale and\r\r\n                  Mirco Nanni},\r\r\n  title        = {Geolet: An Interpretable Model for Trajectory Classification},\r\r\n  booktitle    = {{IDA}},\r\r\n  series       = {Lecture Notes in Computer Science},\r\r\n  volume       = {13876},\r\r\n  pages        = {236--248},\r\r\n  publisher    = {Springer},\r\r\n  year         = {2023}\r\r\n}\r\r\n```\r\r\n\r\r\n\r\r\n## Extending the algorithm\r\r\n\r\r\nThe original Geolet code, i.e., the code used for the experiments in the paper, is available in the /original_code branch.\r\r\n\r\r\nThe code in the main branch is a reimplementation that speeds up the execution time by about 7%.\r\r\n \r\r\n",
    "bugtrack_url": null,
    "license": "BSD-Clause-2",
    "summary": "Package description",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/cri98li/Geolet"
    },
    "split_keywords": [
        "keyword1",
        "keyword2",
        "keyword3"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "829454886312efbcdfc294a5864a12df9bf82f42fea277c648dab4c4876acf13",
                "md5": "93f5d77989afb65cb51ed861eb068da9",
                "sha256": "905e070dfa2c161131262dec8c45cafc969791e8f29da4d0857eeb8e0553e2fa"
            },
            "downloads": -1,
            "filename": "Geolet-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "93f5d77989afb65cb51ed861eb068da9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14481,
            "upload_time": "2023-06-27T10:51:20",
            "upload_time_iso_8601": "2023-06-27T10:51:20.416600Z",
            "url": "https://files.pythonhosted.org/packages/82/94/54886312efbcdfc294a5864a12df9bf82f42fea277c648dab4c4876acf13/Geolet-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-27 10:51:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cri98li",
    "github_project": "Geolet",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": false,
    "requirements": [],
    "lcname": "geolet"
}
        
Elapsed time: 0.41159s