phenome-outlier-analysis


Namephenome-outlier-analysis JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/yourusername/phenome-outlier-analysis
SummaryA package for outlier detection in phenome datasets
upload_time2024-08-12 15:40:56
maintainerNone
docs_urlNone
authorYour Name
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # phenome-outlier-analysis

# OutlierDetector Class Documentation

## Overview

The `OutlierDetector` class is designed for detecting outliers in datasets using various normalization methods. It supports both context-specific and global outlier detection strategies, making it versatile for different types of data analysis.

## Class Initialization

```python
OutlierDetector(df, analyte_columns, segment_columns=['sex'])
```

### Parameters:
- `df` (pandas.DataFrame): The input DataFrame containing the data to be analyzed.
- `analyte_columns` (list): A list of column names to be analyzed for outliers.
- `segment_columns` (list, optional): A list of column names used for segmentation in context-specific outlier detection. Defaults to ['sex'].

## Main Methods

### 1. perform_outlier_detection

```python
perform_outlier_detection(lower_percentile=0.01, upper_percentile=0.99, method='double_mad', take_log=False)
```

This is the primary method to perform outlier detection on the given DataFrame.

#### Parameters:
- `lower_percentile` (float): Lower percentile for cutoff calculation. Default is 0.01.
- `upper_percentile` (float): Upper percentile for cutoff calculation. Default is 0.99.
- `method` (str): Normalization method. Can be 'double_mad' or 'zscore'. Default is 'double_mad'.
- `take_log` (bool): Whether to apply log transformation before normalization. Default is False.

#### Returns:
A tuple containing two dictionaries:
1. Context-specific results
2. Super-global results

### 2. context_specific_outlier_detection

```python
context_specific_outlier_detection(method='double_mad', take_log=False)
```

Performs context-specific outlier detection by segmenting the DataFrame based on the `segment_columns`.

### 3. super_global_outlier_detection

```python
super_global_outlier_detection(method='double_mad', take_log=False)
```

Evaluates outliers on a global scale, considering all data points together.

## Helper Methods

### calculate_double_mad

Calculates left and right Median Absolute Deviations (MADs) from the median.

### normalize_series

Normalizes a series using the specified method (double_mad or zscore).

### calculate_percentile_cutoffs

Calculates global percentile cutoffs based on the specified columns of a DataFrame.

### create_binary_matrix

Creates a binary matrix indicating outliers based on specified cutoffs.

### normalize_dataframe

Normalizes specified columns in a DataFrame.

### detect_outliers

Detects outliers in the specified columns of a DataFrame.

### get_global_cutoffs

Gets global cutoffs for outlier detection.

## Usage Example

```python
import pandas as pd
from outlier_detection import OutlierDetector

# Load your data
df = pd.read_csv('your_data.csv')

# Define columns
analyte_columns = ['column1', 'column2', 'column3']
segment_columns = ['sex', 'age_group']

# Create OutlierDetector instance
detector = OutlierDetector(df, analyte_columns, segment_columns)

# Perform outlier detection
context_results, global_results = detector.perform_outlier_detection(
    lower_percentile=0.01,
    upper_percentile=0.99,
    method='double_mad',
    take_log=True
)

# Analyze results
for (segment, value), result in context_results.items():
    print(f"Outliers for {segment}={value}:")
    print(result['binary_matrix'].sum())

print("Global outliers:")
print(global_results[('global', 'global')]['binary_matrix'].sum())
```

## Notes

- The class uses logging to provide information and warnings during the outlier detection process.
- The `tqdm` library is used to show progress bars for long-running operations.
- The class can handle both context-specific (segmented) and global outlier detection.
- Two normalization methods are supported: 'double_mad' (double Median Absolute Deviation) and 'zscore'.
- Log transformation can be applied before normalization if needed.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yourusername/phenome-outlier-analysis",
    "name": "phenome-outlier-analysis",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Your Name",
    "author_email": "your.email@example.com",
    "download_url": "https://files.pythonhosted.org/packages/da/67/a4edc5c168a8fdd90d80c40708c66ad12443a9441cab2fc80458031d38ab/phenome_outlier_analysis-0.1.0.tar.gz",
    "platform": null,
    "description": "# phenome-outlier-analysis\n\n# OutlierDetector Class Documentation\n\n## Overview\n\nThe `OutlierDetector` class is designed for detecting outliers in datasets using various normalization methods. It supports both context-specific and global outlier detection strategies, making it versatile for different types of data analysis.\n\n## Class Initialization\n\n```python\nOutlierDetector(df, analyte_columns, segment_columns=['sex'])\n```\n\n### Parameters:\n- `df` (pandas.DataFrame): The input DataFrame containing the data to be analyzed.\n- `analyte_columns` (list): A list of column names to be analyzed for outliers.\n- `segment_columns` (list, optional): A list of column names used for segmentation in context-specific outlier detection. Defaults to ['sex'].\n\n## Main Methods\n\n### 1. perform_outlier_detection\n\n```python\nperform_outlier_detection(lower_percentile=0.01, upper_percentile=0.99, method='double_mad', take_log=False)\n```\n\nThis is the primary method to perform outlier detection on the given DataFrame.\n\n#### Parameters:\n- `lower_percentile` (float): Lower percentile for cutoff calculation. Default is 0.01.\n- `upper_percentile` (float): Upper percentile for cutoff calculation. Default is 0.99.\n- `method` (str): Normalization method. Can be 'double_mad' or 'zscore'. Default is 'double_mad'.\n- `take_log` (bool): Whether to apply log transformation before normalization. Default is False.\n\n#### Returns:\nA tuple containing two dictionaries:\n1. Context-specific results\n2. Super-global results\n\n### 2. context_specific_outlier_detection\n\n```python\ncontext_specific_outlier_detection(method='double_mad', take_log=False)\n```\n\nPerforms context-specific outlier detection by segmenting the DataFrame based on the `segment_columns`.\n\n### 3. super_global_outlier_detection\n\n```python\nsuper_global_outlier_detection(method='double_mad', take_log=False)\n```\n\nEvaluates outliers on a global scale, considering all data points together.\n\n## Helper Methods\n\n### calculate_double_mad\n\nCalculates left and right Median Absolute Deviations (MADs) from the median.\n\n### normalize_series\n\nNormalizes a series using the specified method (double_mad or zscore).\n\n### calculate_percentile_cutoffs\n\nCalculates global percentile cutoffs based on the specified columns of a DataFrame.\n\n### create_binary_matrix\n\nCreates a binary matrix indicating outliers based on specified cutoffs.\n\n### normalize_dataframe\n\nNormalizes specified columns in a DataFrame.\n\n### detect_outliers\n\nDetects outliers in the specified columns of a DataFrame.\n\n### get_global_cutoffs\n\nGets global cutoffs for outlier detection.\n\n## Usage Example\n\n```python\nimport pandas as pd\nfrom outlier_detection import OutlierDetector\n\n# Load your data\ndf = pd.read_csv('your_data.csv')\n\n# Define columns\nanalyte_columns = ['column1', 'column2', 'column3']\nsegment_columns = ['sex', 'age_group']\n\n# Create OutlierDetector instance\ndetector = OutlierDetector(df, analyte_columns, segment_columns)\n\n# Perform outlier detection\ncontext_results, global_results = detector.perform_outlier_detection(\n    lower_percentile=0.01,\n    upper_percentile=0.99,\n    method='double_mad',\n    take_log=True\n)\n\n# Analyze results\nfor (segment, value), result in context_results.items():\n    print(f\"Outliers for {segment}={value}:\")\n    print(result['binary_matrix'].sum())\n\nprint(\"Global outliers:\")\nprint(global_results[('global', 'global')]['binary_matrix'].sum())\n```\n\n## Notes\n\n- The class uses logging to provide information and warnings during the outlier detection process.\n- The `tqdm` library is used to show progress bars for long-running operations.\n- The class can handle both context-specific (segmented) and global outlier detection.\n- Two normalization methods are supported: 'double_mad' (double Median Absolute Deviation) and 'zscore'.\n- Log transformation can be applied before normalization if needed.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A package for outlier detection in phenome datasets",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/yourusername/phenome-outlier-analysis"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "564b9ccddb69fcf58ca09bc5de16dfd9d7328efdb4afcd16c6a18df799300d5a",
                "md5": "aaf8e8ca3974fb754999df418537654c",
                "sha256": "37fce37970dc8e0aa6de056acba5719c8c578750b53aa975d0091dbbaf309f78"
            },
            "downloads": -1,
            "filename": "phenome_outlier_analysis-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aaf8e8ca3974fb754999df418537654c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6838,
            "upload_time": "2024-08-12T15:40:54",
            "upload_time_iso_8601": "2024-08-12T15:40:54.917399Z",
            "url": "https://files.pythonhosted.org/packages/56/4b/9ccddb69fcf58ca09bc5de16dfd9d7328efdb4afcd16c6a18df799300d5a/phenome_outlier_analysis-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "da67a4edc5c168a8fdd90d80c40708c66ad12443a9441cab2fc80458031d38ab",
                "md5": "bf68e33662e2a91f0f211f18369251c5",
                "sha256": "fadd5fbd5befc06f2e8f97c232dd3ae003b7da4afc58eded04181cda165fb0b5"
            },
            "downloads": -1,
            "filename": "phenome_outlier_analysis-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bf68e33662e2a91f0f211f18369251c5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5406,
            "upload_time": "2024-08-12T15:40:56",
            "upload_time_iso_8601": "2024-08-12T15:40:56.317202Z",
            "url": "https://files.pythonhosted.org/packages/da/67/a4edc5c168a8fdd90d80c40708c66ad12443a9441cab2fc80458031d38ab/phenome_outlier_analysis-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-12 15:40:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yourusername",
    "github_project": "phenome-outlier-analysis",
    "github_not_found": true,
    "lcname": "phenome-outlier-analysis"
}
        
Elapsed time: 1.23640s