[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
# IBDCluster v1.2.1:
## Documentation:
___
This readme is a more technical description of the project, providing information about the class structures and relationships. More practical documentation about how to install and use the program can be found here: [IBDCluster documentation (still a work in progress)](https://jtb324.github.io/IBDCluster/)
## Purpose of the project:
___
This project is a cli tool that clusters shared ibd segments within biobanks around a gene of interest. These network are then analyzed to determine how many individuals within a network are affected by a phenotype of interest.
## General PipeLine:
___
```mermaid
flowchart LR
A(IBD Information) --> B(Identified Networks) --> C(Binomial test for enrichment of Phenotypes)
```
## installing:
___
***Cloning from github and modify permissions:***
1. Clone the project into the appropriate directory using git clone.
2. cd into the IBDCluster directory
```
cd IBDCluster
```
2. run the following command to set the right permissions on the IBDCluster.py file
```
chmod +x IBDCluster/IBDCluster.py
```
***Installing dependencies:***
Next install all the necessary dependencies. The steps for this vary depending on what package manager you are using.
*If using conda:*
1. There is a environment.yml file in the main IBDCluster directory. Run the following command and it will create an environment called IBDCluster
```
conda env create -f environment.yml
```
2. You can now activate the environment by calling:
```
conda activate IBDCluster
```
*If using mamba:*
1. This is the same as the conda section except use the command
```
mamba env create -f environment.yml
```
2. You can activate this environment using:
```
conda activate IBDCluster
```
*If using Poetry*
1. The requirements for a poetry project are also in the IBDCluster directory. Ideally you need to activate some type of virtual environment first. This environment can be either a conda environment or a virtualenv. Once this environment is activated you can call:
```
poetry install
```
2. At this point all necessary dependencies should be installed.
* if you wish to find more information about the project you can find the documentation here: https://python-poetry.org/
***Adding IBDCluster to the users $PATH:***
To be able to run the IBDCluster program without having to be in the source code directory, you should add the IBDCluster.py file to your path.
1. In your .bashrc file or .zshrc add the line :
```
export PATH="{Path to the directory that the program was cloned into}/IBDCluster/IBDCluster:$PATH"
```
2. run this line:
```
source .bashrc
```
or
```
source .zshrc
```
This will allow you to run the code by just typing IBDCluster.py from any directory.
***Running IBDCluster***
* You can find all the optional parameters by running:
```
IBDCluster.py --help
```
## Running the code:
___
*
## Reporting Issues:
___
All issues can be reported using the templates in the .github/ folder. There are options for bug_reports and for feature_request
## Technical Details of the project:
___
* This part is mainly for keeping track of the directory structure.
## Project Structure:
___
```
├── IBDCluster
│ ├── analysis
│ │ ├── main.py
│ │ ├── percentages.py
│ ├── callbacks
│ │ ├── check_inputs.py
│ ├── models
│ │ ├── cluster_class.py
│ │ ├── indices.py
│ │ ├── pairs.py
│ │ ├── writers.py
│ ├── log
│ │ ├── logger.py
│ ├── cluster
│ │ ├── main.py
│ ├── IBDCluster.py
├── .env
├── environment.yml
├── .gitignore
├── poetry.lock
├── pyproject.toml
├── README.md
├── requirements.txt
│ ├── tests
│ │ ├── test_data
│ │ ├── test_integration
```
## Comments about models:
___
* Classes for the cluster_class.py:
```mermaid
classDiagram
class Cluster {
ibd_file: str
ibd_program: str
indices: models.FileInfo
count: int=0
ibd_df: pd.DataFrame=pd.DataFrame
network_id: str=1
inds_in_network: Set[str]=set
network_list: List[Network]=list
}
class Network {
gene_name: str
gene_chr: str
network_id: int
pairs: List[Pairs]=list
iids: Set[str]=set
haplotypes: Set[str]=set
+filter_for_seed(ibd_df: pd.DataFrame, ind_seed: List[str], indices: FileInfo, exclusion: Set[str]=None) -> pd.DataFrame
#determine_pairs(ibd_row: pd.Series, indices: FileInfo) -> Pairs
+gather_grids(dataframe: pd.DataFrame, pair_1_indx: int, pair_2_indx: int) -> Set[str]
+update(ibd_df: pd.DataFrame, indices: FileInfo) -> None
}
class FileInfo {
<<interface>>
id1_indx: int
ind1_with_phase: int
id2_indx: int
ind2_with_phase: int
chr_indx: int
str_indx: int
end_indx: int
+set_program_indices(program_name: str) -> None
}
Cluster o-- Network
```
## Entity relationships:
___
```mermaid
erDiagram
NETWORK }|--|{ PAIRS : contains
NETWORK {
string gene_name
string chromosome
int network_id
}
NETWORK }|--|{ IIDS : contains
NETWORK }|--|{ HAPLOTYPES : contains
PAIRS {
string pair_1_id
string pair_1_phase
string pair_2_id
string pair_2_phase
int chromosome_number
int segment_start
int segment_end
float length
series affected_statuses
}
IIDS {
string Individual-ids
}
HAPLOTYPES {
string haplotype-phase
}
```
## Plugins: (all the plugins are classes)
___
**NetworkWriter**
```mermaid
classDiagram
class NetworkWriter {
gene_name: str
chromosome: str
carrier_cols: List[str]
#_form_header() -> str
#_find_min_phecode(analysis_dict: Dict) -> Tuple[str, str]
#_form_analysis_string(analysis_dict: Dict) -> str
+write(**kwargs) -> None
}
```
## Work in Progress:
---
Raw data
{
"_id": null,
"home_page": "https://jtb324.github.io/IBDCluster/",
"name": "ibdcluster",
"maintainer": "jtb324",
"docs_url": null,
"requires_python": ">=3.11,<4.0.0",
"maintainer_email": "james.baker@vanderbilt.edu",
"keywords": "python,clustering,IBD,genetics,relatedness",
"author": "jtb324",
"author_email": "james.baker@vanderbilt.edu",
"download_url": "https://files.pythonhosted.org/packages/20/77/3d268d74b8e412a2e67b1a510fe1e239270ad76308962cc36df13cfc1e75/ibdcluster-1.2.9.tar.gz",
"platform": null,
"description": "[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n# IBDCluster v1.2.1:\n\n## Documentation:\n___\nThis readme is a more technical description of the project, providing information about the class structures and relationships. More practical documentation about how to install and use the program can be found here: [IBDCluster documentation (still a work in progress)](https://jtb324.github.io/IBDCluster/) \n\n## Purpose of the project: \n___\nThis project is a cli tool that clusters shared ibd segments within biobanks around a gene of interest. These network are then analyzed to determine how many individuals within a network are affected by a phenotype of interest.\n\n## General PipeLine:\n___\n```mermaid\nflowchart LR\n A(IBD Information) --> B(Identified Networks) --> C(Binomial test for enrichment of Phenotypes)\n```\n\n## installing:\n___\n***Cloning from github and modify permissions:***\n1. Clone the project into the appropriate directory using git clone.\n2. cd into the IBDCluster directory\n```\ncd IBDCluster\n```\n2. run the following command to set the right permissions on the IBDCluster.py file\n```\nchmod +x IBDCluster/IBDCluster.py\n```\n***Installing dependencies:***\nNext install all the necessary dependencies. The steps for this vary depending on what package manager you are using.\n\n*If using conda:*\n1. There is a environment.yml file in the main IBDCluster directory. Run the following command and it will create an environment called IBDCluster\n\n```\nconda env create -f environment.yml\n```\n\n2. You can now activate the environment by calling:\n\n```\nconda activate IBDCluster\n```\n\n*If using mamba:*\n1. This is the same as the conda section except use the command\n```\nmamba env create -f environment.yml\n```\n2. You can activate this environment using:\n```\nconda activate IBDCluster\n```\n\n*If using Poetry*\n1. The requirements for a poetry project are also in the IBDCluster directory. Ideally you need to activate some type of virtual environment first. This environment can be either a conda environment or a virtualenv. Once this environment is activated you can call:\n\n```\npoetry install\n```\n\n2. At this point all necessary dependencies should be installed.\n\n* if you wish to find more information about the project you can find the documentation here: https://python-poetry.org/\n\n***Adding IBDCluster to the users $PATH:***\nTo be able to run the IBDCluster program without having to be in the source code directory, you should add the IBDCluster.py file to your path.\n\n1. In your .bashrc file or .zshrc add the line :\n```\nexport PATH=\"{Path to the directory that the program was cloned into}/IBDCluster/IBDCluster:$PATH\"\n```\n2. run this line:\n```\nsource .bashrc\n```\nor\n```\nsource .zshrc\n```\nThis will allow you to run the code by just typing IBDCluster.py from any directory.\n\n***Running IBDCluster***\n* You can find all the optional parameters by running:\n```\nIBDCluster.py --help\n```\n## Running the code:\n___\n*\n\n## Reporting Issues:\n___\nAll issues can be reported using the templates in the .github/ folder. There are options for bug_reports and for feature_request\n\n## Technical Details of the project:\n___\n* This part is mainly for keeping track of the directory structure.\n\n## Project Structure:\n___\n```\n\u251c\u2500\u2500 IBDCluster\n\u2502 \u251c\u2500\u2500 analysis\n\u2502 \u2502 \u251c\u2500\u2500 main.py\n\u2502 \u2502 \u251c\u2500\u2500 percentages.py\n\u2502 \u251c\u2500\u2500 callbacks\n\u2502 \u2502 \u251c\u2500\u2500 check_inputs.py\n\u2502 \u251c\u2500\u2500 models\n\u2502 \u2502 \u251c\u2500\u2500 cluster_class.py\n\u2502 \u2502 \u251c\u2500\u2500 indices.py\n\u2502 \u2502 \u251c\u2500\u2500 pairs.py\n\u2502 \u2502 \u251c\u2500\u2500 writers.py\n\u2502 \u251c\u2500\u2500 log\n\u2502 \u2502 \u251c\u2500\u2500 logger.py\n\u2502 \u251c\u2500\u2500 cluster\n\u2502 \u2502 \u251c\u2500\u2500 main.py\n\u2502 \u251c\u2500\u2500 IBDCluster.py\n\u251c\u2500\u2500 .env\n\u251c\u2500\u2500 environment.yml\n\u251c\u2500\u2500 .gitignore\n\u251c\u2500\u2500 poetry.lock\n\u251c\u2500\u2500 pyproject.toml\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 requirements.txt\n\u2502 \u251c\u2500\u2500 tests\n\u2502 \u2502 \u251c\u2500\u2500 test_data\n\u2502 \u2502 \u251c\u2500\u2500 test_integration\n\n```\n## Comments about models:\n___\n* Classes for the cluster_class.py:\n\n```mermaid\nclassDiagram\n class Cluster {\n ibd_file: str\n ibd_program: str\n indices: models.FileInfo\n count: int=0\n ibd_df: pd.DataFrame=pd.DataFrame\n network_id: str=1\n inds_in_network: Set[str]=set\n network_list: List[Network]=list\n }\n class Network {\n gene_name: str\n gene_chr: str\n network_id: int\n pairs: List[Pairs]=list\n iids: Set[str]=set\n haplotypes: Set[str]=set\n +filter_for_seed(ibd_df: pd.DataFrame, ind_seed: List[str], indices: FileInfo, exclusion: Set[str]=None) -> pd.DataFrame\n #determine_pairs(ibd_row: pd.Series, indices: FileInfo) -> Pairs\n +gather_grids(dataframe: pd.DataFrame, pair_1_indx: int, pair_2_indx: int) -> Set[str]\n +update(ibd_df: pd.DataFrame, indices: FileInfo) -> None\n }\n class FileInfo {\n <<interface>>\n id1_indx: int\n ind1_with_phase: int\n id2_indx: int\n ind2_with_phase: int\n chr_indx: int\n str_indx: int\n end_indx: int\n +set_program_indices(program_name: str) -> None\n }\n Cluster o-- Network\n```\n\n## Entity relationships:\n___\n```mermaid\nerDiagram\n NETWORK }|--|{ PAIRS : contains\n NETWORK {\n string gene_name\n string chromosome\n int network_id\n }\n NETWORK }|--|{ IIDS : contains\n NETWORK }|--|{ HAPLOTYPES : contains\n PAIRS {\n string pair_1_id\n string pair_1_phase \n string pair_2_id\n string pair_2_phase \n int chromosome_number\n int segment_start \n int segment_end\n float length \n series affected_statuses \n }\n IIDS {\n string Individual-ids\n }\n HAPLOTYPES {\n string haplotype-phase\n }\n```\n## Plugins: (all the plugins are classes)\n___\n**NetworkWriter**\n```mermaid\nclassDiagram\n class NetworkWriter {\n gene_name: str\n chromosome: str\n carrier_cols: List[str]\n #_form_header() -> str\n #_find_min_phecode(analysis_dict: Dict) -> Tuple[str, str]\n #_form_analysis_string(analysis_dict: Dict) -> str\n +write(**kwargs) -> None\n\n }\n\n```\n\n## Work in Progress:\n---\n",
"bugtrack_url": null,
"license": "",
"summary": "A CLI tool to help identify ibd sharing within networks across a locus of interest at biobank scale and then test for phenotypic enrichment within these networks.",
"version": "1.2.9",
"split_keywords": [
"python",
"clustering",
"ibd",
"genetics",
"relatedness"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7d48dca0273b2b8ce44852db25de55edca806e13c9264aedd7f02f05b59c2ae7",
"md5": "b3ab3ed13f5a12ed76c26fb13dcee0b4",
"sha256": "e15ec4caa1a017174049788ee89b4d6afa12eb48dfe65e00bb99fe4ba68ac26d"
},
"downloads": -1,
"filename": "ibdcluster-1.2.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b3ab3ed13f5a12ed76c26fb13dcee0b4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11,<4.0.0",
"size": 27218,
"upload_time": "2023-03-28T21:44:52",
"upload_time_iso_8601": "2023-03-28T21:44:52.812662Z",
"url": "https://files.pythonhosted.org/packages/7d/48/dca0273b2b8ce44852db25de55edca806e13c9264aedd7f02f05b59c2ae7/ibdcluster-1.2.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "20773d268d74b8e412a2e67b1a510fe1e239270ad76308962cc36df13cfc1e75",
"md5": "f96cac5febb005c99fa24812c22edea6",
"sha256": "547717a8f696f55a22cb8f6a3a1e01af398f9d5570440eb2e0aa875176960dd0"
},
"downloads": -1,
"filename": "ibdcluster-1.2.9.tar.gz",
"has_sig": false,
"md5_digest": "f96cac5febb005c99fa24812c22edea6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11,<4.0.0",
"size": 21771,
"upload_time": "2023-03-28T21:44:54",
"upload_time_iso_8601": "2023-03-28T21:44:54.464085Z",
"url": "https://files.pythonhosted.org/packages/20/77/3d268d74b8e412a2e67b1a510fe1e239270ad76308962cc36df13cfc1e75/ibdcluster-1.2.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-28 21:44:54",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "ibdcluster"
}