niimpy


Nameniimpy JSON
Version 1.2.2 PyPI version JSON
download
home_pageNone
SummaryPython module for analysis of behavioral data
upload_time2024-10-22 12:42:12
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Niimpy

![maintenance-status](https://img.shields.io/badge/maintenance-actively--developed-brightgreen.svg)
[![Test](https://github.com/digitraceslab/niimpy/actions/workflows/test.yml/badge.svg)](https://github.com/digitraceslab/niimpy/actions/workflows/test.yml)
[![Build](https://github.com/digitraceslab/niimpy/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/digitraceslab/niimpy/actions/workflows/pages/pages-build-deployment)
[![Test installation from source](https://github.com/digitraceslab/niimpy/actions/workflows/install.yml/badge.svg)](https://github.com/digitraceslab/niimpy/actions/workflows/install.yml)
[![codecov](https://codecov.io/gh/digitraceslab/niimpy/branch/master/graph/badge.svg?token=SEEOOF7A70)](https://codecov.io/gh/digitraceslab/niimpy)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

What
----

Niimpy is a Python package for analyzing and quantifying behavioral data. It uses pandas to read data from disk, perform basic manipulations, provides explorative data analysis functions, offers many high-level preprocessing functions for various types of data, and has functions for behavioral data analysis.

For Who
-------

Niimpy is intended for researchers and data scientists analyzing digital digital behavioral data. Its purpose is to facilitate data analysis by providing a standardized replicable workflow.

Why
---

Digital behavioral studies using personal digital devices typically produce rich multi-sensor longitudinal datasets of mixed data types. Analyzing such data requires multidisciplinary expertise and software designed for the purpose. Currently, no standardized workflow or tools exist to analyze such data sets. The analysis requires domain knowledge in multiple fields and programming expertise. Niimpy package is specifically designed to analyze longitudinal, multimodal behavioral data. Niimpy is a user-friendly open-source package that can be easily expanded and adapted to specific research requirements. The toolbox facilitates the analysis phase by providing tools for data management, preprocessing, feature extraction, and visualization. The more advanced analysis methods will be incorporated into the toolbox in the future.


How
---

The toolbox is divided into four layers by functionality: 1) reading, 2) preprocessing, 3) exploration, and 4) analysis. For more information about the layers, refer the toolbox architecture chapter :doc:`architecture`. Quickstart guide would be a good place to start :doc:`quick-start`. More detailed demo Jupyter notebooks are provided in user guide chapter :doc:`demo_notebooks/Exploration`. Instructions for individual functions can be found under API chapter :doc:`api/niimpy`.


## Installation

- Only supports Python 3 (tested on 3.8 and above)

- This is a normal Python package to install. 

  ```
  pip install niimpy
  ```

- It can also be installed manually:

  ```
  pip install https://github.com/digitraceslab/niimpy/archive/master.zip
  ```

### Getting started with location data

All of the functions for reading, preprocessing, and feature extraction for location data is in [`location.py`](niimpy/location.py). Currently implemented features are:

- `dist_total`: total distance a person traveled in meter.
- `variance`, `log_variance`: variance is defined as sum of variance in latitudes and longitudes.
- `speed_average`, `speed_variance`, and `speed_max`: statistics of speed (m/s). Speed, if not given, can be calculated by dividing the distance between two consequitive bins by their time difference.
- `n_bins`: number of location bins that a user recorded in dataset.
- `n_static`: number of static points. Static points are defined as bins whose speed is lower than a threshold.
- `n_moving`: number of moving points. Equivalent to `n_bins - n_static`.
- `n_home`: number of static bins which are close to the person's home. Home is defined the place most visited during nights. More formally, all the locations recorded during 12 Am and 6 AM are clusterd and the center of largest cluster is assumed to be home.
- `max_dist_home`: maximum distance from home.
- `n_sps`: number of significant places. All of the static bins are clusterd using DBSCAN algorithm. Each cluster represents a Signicant Place (SP) for a user.
- `n_rare`: number of rarely visited (referred as outliers in DBSCAN).
- `n_transitions`: number of transitions between significant places.
- `n_top1`, `n_top2`, `n_top3`, `n_top4`, `n_top5`: number of bins in the top `N` cluster. In other words, `n_top1` shows the number of times the person has visited the most freqently visited place.
- `entropy`, `normalized_entropy`: entropy of time spent in clusters. Normalized entropy is the entropy divided by the number of clusters.

Usage:

```python
import pandas as pd
import niimpy
import niimpy.location as nilo

CONTROL_PATH = "PATH/TO/CONTROL/DATA"
PATIENT_PATH = "PATH/TO/PATIENT/DATA"

# Read data of control and patients from database
location_control = niimpy.read_sqlite(CONTROL_PATH, table='AwareLocation', add_group='control', tz='Europe/Helsinki')
location_patient = niimpy.read_sqlite(PATIENT_PATH, table='AwareLocation', add_group='patient', tz='Europe/Helsinki')

# Concatenate the two dataframes to have one dataframe
location = pd.concat([location_control, location_patient])

# Remove low-quality and outlier locations
location = nilo.filter_location(location)

# Downsample locations (median filter). Bin size is 10 minute.
location = niimpy.util.aggregate(location, freq='10min', method_numerical='median')
location = location.reset_index(0).dropna()

# Feature extraction
features = nilo.extract_features(
  lats=location['double_latitude'],
  lons=location['double_longitude'],
  users=location['user'],
  groups=location['group'],
  times=location.index,
  speeds=location['double_speed']
)
```

## Documentation

Niimpy documentation is hosted at [readthedocs]https://digitraceslab.github.io/niimpy/.

## Development

This is a pretty typical Python project with code and documentation as
you might expect.

`requirements-dev.txt` contains some basic dev requirements, which
includes a editable dev install of niimpy itself (`pip install -e`).

Run tests with:
```
pytest .
```

Documentation is built with Sphinx:
```
cd docs
make html
# output in _build/html/
```

Enable nbdime Jupyter notebook diff and merge via git with:
```
nbdime config-git --enable
```


## See also

* To learn about pandas, see its documentation.  It is *not* the most
  clearly written documentation you will find, but you should try
  starting with the "Package overview" and "10 minutes to pandas"
  sections.

* [Matplotlib](https://matplotlib.org/) is the standard Python
  plotting package, but [Seaborn](https://seaborn.pydata.org/) will
  produce nicer graphics by default.  Hint: look for examples and copy
  them.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "niimpy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Digitraceslab <talayeh.aledavood@aalto.fi>",
    "download_url": "https://files.pythonhosted.org/packages/12/68/200ce3c3dcaf15724c692f3cd2b0aad3422f87d1cae3372e604bd0f7dad5/niimpy-1.2.2.tar.gz",
    "platform": null,
    "description": "# Niimpy\n\n![maintenance-status](https://img.shields.io/badge/maintenance-actively--developed-brightgreen.svg)\n[![Test](https://github.com/digitraceslab/niimpy/actions/workflows/test.yml/badge.svg)](https://github.com/digitraceslab/niimpy/actions/workflows/test.yml)\n[![Build](https://github.com/digitraceslab/niimpy/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/digitraceslab/niimpy/actions/workflows/pages/pages-build-deployment)\n[![Test installation from source](https://github.com/digitraceslab/niimpy/actions/workflows/install.yml/badge.svg)](https://github.com/digitraceslab/niimpy/actions/workflows/install.yml)\n[![codecov](https://codecov.io/gh/digitraceslab/niimpy/branch/master/graph/badge.svg?token=SEEOOF7A70)](https://codecov.io/gh/digitraceslab/niimpy)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n\nWhat\n----\n\nNiimpy is a Python package for analyzing and quantifying behavioral data. It uses pandas to read data from disk, perform basic manipulations, provides explorative data analysis functions, offers many high-level preprocessing functions for various types of data, and has functions for behavioral data analysis.\n\nFor Who\n-------\n\nNiimpy is intended for researchers and data scientists analyzing digital digital behavioral data. Its purpose is to facilitate data analysis by providing a standardized replicable workflow.\n\nWhy\n---\n\nDigital behavioral studies using personal digital devices typically produce rich multi-sensor longitudinal datasets of mixed data types. Analyzing such data requires multidisciplinary expertise and software designed for the purpose. Currently, no standardized workflow or tools exist to analyze such data sets. The analysis requires domain knowledge in multiple fields and programming expertise. Niimpy package is specifically designed to analyze longitudinal, multimodal behavioral data. Niimpy is a user-friendly open-source package that can be easily expanded and adapted to specific research requirements. The toolbox facilitates the analysis phase by providing tools for data management, preprocessing, feature extraction, and visualization. The more advanced analysis methods will be incorporated into the toolbox in the future.\n\n\nHow\n---\n\nThe toolbox is divided into four layers by functionality: 1) reading, 2) preprocessing, 3) exploration, and 4) analysis. For more information about the layers, refer the toolbox architecture chapter :doc:`architecture`. Quickstart guide would be a good place to start :doc:`quick-start`. More detailed demo Jupyter notebooks are provided in user guide chapter :doc:`demo_notebooks/Exploration`. Instructions for individual functions can be found under API chapter :doc:`api/niimpy`.\n\n\n## Installation\n\n- Only supports Python 3 (tested on 3.8 and above)\n\n- This is a normal Python package to install. \n\n  ```\n  pip install niimpy\n  ```\n\n- It can also be installed manually:\n\n  ```\n  pip install https://github.com/digitraceslab/niimpy/archive/master.zip\n  ```\n\n### Getting started with location data\n\nAll of the functions for reading, preprocessing, and feature extraction for location data is in [`location.py`](niimpy/location.py). Currently implemented features are:\n\n- `dist_total`: total distance a person traveled in meter.\n- `variance`, `log_variance`: variance is defined as sum of variance in latitudes and longitudes.\n- `speed_average`, `speed_variance`, and `speed_max`: statistics of speed (m/s). Speed, if not given, can be calculated by dividing the distance between two consequitive bins by their time difference.\n- `n_bins`: number of location bins that a user recorded in dataset.\n- `n_static`: number of static points. Static points are defined as bins whose speed is lower than a threshold.\n- `n_moving`: number of moving points. Equivalent to `n_bins - n_static`.\n- `n_home`: number of static bins which are close to the person's home. Home is defined the place most visited during nights. More formally, all the locations recorded during 12 Am and 6 AM are clusterd and the center of largest cluster is assumed to be home.\n- `max_dist_home`: maximum distance from home.\n- `n_sps`: number of significant places. All of the static bins are clusterd using DBSCAN algorithm. Each cluster represents a Signicant Place (SP) for a user.\n- `n_rare`: number of rarely visited (referred as outliers in DBSCAN).\n- `n_transitions`: number of transitions between significant places.\n- `n_top1`, `n_top2`, `n_top3`, `n_top4`, `n_top5`: number of bins in the top `N` cluster. In other words, `n_top1` shows the number of times the person has visited the most freqently visited place.\n- `entropy`, `normalized_entropy`: entropy of time spent in clusters. Normalized entropy is the entropy divided by the number of clusters.\n\nUsage:\n\n```python\nimport pandas as pd\nimport niimpy\nimport niimpy.location as nilo\n\nCONTROL_PATH = \"PATH/TO/CONTROL/DATA\"\nPATIENT_PATH = \"PATH/TO/PATIENT/DATA\"\n\n# Read data of control and patients from database\nlocation_control = niimpy.read_sqlite(CONTROL_PATH, table='AwareLocation', add_group='control', tz='Europe/Helsinki')\nlocation_patient = niimpy.read_sqlite(PATIENT_PATH, table='AwareLocation', add_group='patient', tz='Europe/Helsinki')\n\n# Concatenate the two dataframes to have one dataframe\nlocation = pd.concat([location_control, location_patient])\n\n# Remove low-quality and outlier locations\nlocation = nilo.filter_location(location)\n\n# Downsample locations (median filter). Bin size is 10 minute.\nlocation = niimpy.util.aggregate(location, freq='10min', method_numerical='median')\nlocation = location.reset_index(0).dropna()\n\n# Feature extraction\nfeatures = nilo.extract_features(\n  lats=location['double_latitude'],\n  lons=location['double_longitude'],\n  users=location['user'],\n  groups=location['group'],\n  times=location.index,\n  speeds=location['double_speed']\n)\n```\n\n## Documentation\n\nNiimpy documentation is hosted at [readthedocs]https://digitraceslab.github.io/niimpy/.\n\n## Development\n\nThis is a pretty typical Python project with code and documentation as\nyou might expect.\n\n`requirements-dev.txt` contains some basic dev requirements, which\nincludes a editable dev install of niimpy itself (`pip install -e`).\n\nRun tests with:\n```\npytest .\n```\n\nDocumentation is built with Sphinx:\n```\ncd docs\nmake html\n# output in _build/html/\n```\n\nEnable nbdime Jupyter notebook diff and merge via git with:\n```\nnbdime config-git --enable\n```\n\n\n## See also\n\n* To learn about pandas, see its documentation.  It is *not* the most\n  clearly written documentation you will find, but you should try\n  starting with the \"Package overview\" and \"10 minutes to pandas\"\n  sections.\n\n* [Matplotlib](https://matplotlib.org/) is the standard Python\n  plotting package, but [Seaborn](https://seaborn.pydata.org/) will\n  produce nicer graphics by default.  Hint: look for examples and copy\n  them.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python module for analysis of behavioral data",
    "version": "1.2.2",
    "project_urls": {
        "Repository": "https://github.com/digitraceslab/niimpy"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1b6d8019d2c19c9b25f84e24b1c86a5f4ecd249eaa78272b93697bf9d73a7fc0",
                "md5": "2df15766b02ca457d2993e248436b75b",
                "sha256": "fcec75ab67dfb4074e77fd2af3f3de97f6c20071fce8f4595609d4755a3e7e58"
            },
            "downloads": -1,
            "filename": "niimpy-1.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2df15766b02ca457d2993e248436b75b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 863169,
            "upload_time": "2024-10-22T12:42:05",
            "upload_time_iso_8601": "2024-10-22T12:42:05.906499Z",
            "url": "https://files.pythonhosted.org/packages/1b/6d/8019d2c19c9b25f84e24b1c86a5f4ecd249eaa78272b93697bf9d73a7fc0/niimpy-1.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "1268200ce3c3dcaf15724c692f3cd2b0aad3422f87d1cae3372e604bd0f7dad5",
                "md5": "321ac9af62017ded7ac2b8cb91d0b648",
                "sha256": "df6ca13fa8560cc6d3a4ad35e24d2fbfe81e7196fd0d3d4f037df7e49daa3c45"
            },
            "downloads": -1,
            "filename": "niimpy-1.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "321ac9af62017ded7ac2b8cb91d0b648",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 2213645,
            "upload_time": "2024-10-22T12:42:12",
            "upload_time_iso_8601": "2024-10-22T12:42:12.816947Z",
            "url": "https://files.pythonhosted.org/packages/12/68/200ce3c3dcaf15724c692f3cd2b0aad3422f87d1cae3372e604bd0f7dad5/niimpy-1.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-22 12:42:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "digitraceslab",
    "github_project": "niimpy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "niimpy"
}
        
Elapsed time: 1.17313s