plynk


Nameplynk JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://gitlab.com/achwalt/plynk
SummaryEasy command over the PLINK software directly from Python
upload_time2024-07-06 21:01:19
maintainerNone
docs_urlNone
authorBenjamin Albrechts
requires_pythonNone
licenseNone
keywords plink genomics genetics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # README

## Known Limitations
This is still undergoing development - Use at your own risk!

1. While this software attempts to adhere to the official Plink guidelines and list of commands, NO proper testing of all available plink commands has yet completed. Further development will thus focus on improving robustness of testing.
2. `PlinkFileReader.load_file` can only read ONE file - Instead use `PlinkFileReader.read_file` with a specific file path. `args` and `kwargs` follow `pandas.read_csv`.


## Feature Requests and Bug Reports
**Please don't write E-Mails!** Instead, consider creating a new issue on GitLab: [https://gitlab.com/achwalt/plynk](https://gitlab.com/achwalt/plynk)


## Installation

Plynk can be installed with `pip install plynk`.

1. Set up a new conda environment (assuing name `"my_env"`).
    1. `conda create --name my_env python=3.12 -y`
    2. `conda activate my_env`
2. Install `plynk`.
    1. This always works: `pip install plynk`.
    2. This is faster but only works if you installed `uv`: `uv pip install plynk`. Install `uv` with `pip install uv`.

## Using PLINK in python to process genetic data

`Plink` is an abstraction which allows you to use Plink commands easily from Python, with automatic management of path variables so you maintain full control.

## Import the Plink class


```python
from plynk import Plink
```

`Plink` is an abstraction which allows you to use the Plink software commands easily from Python, with automatic management of path variables so you maintain full control.

**Plink let's you choose three _optional_ parameters:**
1.  `plink_binary_path`: This is the path to where you have your plink binary available. If you leave it empty, you must have plink globally installed.
2.  `plink_prefix`: This is the path to the folder where your **output** data lives, starting with the initial files to process. If you leave it blank, you won't be able to use the file reader. Remember to always create/keep a backup of your source data.
3.  `encoding`: Default is `utf-8`. If your data is encoded in a different format, consider setting this here.

In the following example, we want to work with a local version of plink which should be at `"binaries/plink"` in our working directory. Our data is stored in the folder `"data_src/demo_data"`.


```python
# Constants
PLINK_BINARY_PATH = "binaries/plink"
PLINK_PREFIX = "data_src/demo_data/demo_01" # folder: data_scr/demo_data, file_prefix: demo_01
```


```python
plink = Plink(
    plink_binary_path=PLINK_BINARY_PATH,
    plink_prefix=PLINK_PREFIX,
)
```


```python
try:
    print(plink.info)
except RuntimeError:
    print("Plink is not yet available. Let's download it!")
```

    Plink is not yet available. Let's download it!


## Get the plink binary

All versions of Plink are available here: [https://www.cog-genomics.org/plink2/](https://www.cog-genomics.org/plink2/)

`Plink` can download this data for us so that we do not need to take care about it.


```python
plink.download_binaries()
```




    PosixPath('/home/achwalt/Development/plynk/binaries/plink')




```python
plink.info
```




    {'name': 'PLINK',
     'version': 'v1.90b7.2',
     'architecture': '64-bit',
     'release_date': datetime.datetime(2023, 12, 11, 0, 0)}



## Working with data

Now that plink is available, we can work with it natively from python. We can use either 
1. the exposed `Path` object or
2. the plink file reader

to see what data we have in our data folder to work with. Since the **plink prefix contains already both the folder and the file's prefix**, we need to use it's parent!


```python
files_available = [file_name for file_name in plink.plink_prefix.parent.iterdir()]
files_available
```




    [PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.ped'),
     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.map')]




```python
plink.file_reader.find_files(plink.plink_prefix)
```




    [PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.map'),
     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.ped')]



### We can inspect the data by using the path:


```python
ped_data = plink.file_reader.read_file(files_available[0])
ped_data
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>TYPE_3c64f9b442261a43a88a89606d363421</th>
      <th>COW_3c935a5fb77c7462d6535787c846bd24</th>
      <th>0</th>
      <th>0.1</th>
      <th>0.2</th>
      <th>-9</th>
      <th>A</th>
      <th>G</th>
      <th>C</th>
      <th>A.1</th>
      <th>...</th>
      <th>C.7482</th>
      <th>C.7483</th>
      <th>A.37157</th>
      <th>A.37158</th>
      <th>G.31809</th>
      <th>G.31810</th>
      <th>G.31811</th>
      <th>G.31812</th>
      <th>G.31813</th>
      <th>A.37159</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_8ffcd41500450a2afc2a7e18740709d1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>A</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>1</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_133f937b04029126ee01146a0c1bb594</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>A</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>2</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_7d49f51b714f3dbd2c25a04254295b5d</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>A</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>3</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_15e3a8fe1a0172f527dfb8451492c671</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>A</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>4</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_08020b767145f321d189fcac4f94cd1c</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>A</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>60</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_63b051a5b94f870a92026fa87043f7a1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>A</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
    </tr>
    <tr>
      <th>61</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_ff04e2afe8a65c7ab4a6cb89d3be51f8</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
    </tr>
    <tr>
      <th>62</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_4a2448cc04232570fee82a7d68d94bda</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>63</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_5346afe3589590c3fe017de8c4b8cad5</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>A</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
    </tr>
    <tr>
      <th>64</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_e507d53129af542a4f49f4fc14299061</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>A</td>
      <td>A</td>
      <td>A</td>
      <td>A</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
    </tr>
  </tbody>
</table>
<p>65 rows × 76694 columns</p>
</div>



### If there is only one file per plink file type, we can also just load it with some pre-formatted column names.


```python
ped_data = plink.file_reader.load_data("ped", numerated_colname="Nucleotide")
ped_data
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Family ID</th>
      <th>Individual ID</th>
      <th>Paternal ID</th>
      <th>Maternal ID</th>
      <th>Sex</th>
      <th>Phenotype</th>
      <th>Nucleotide 1</th>
      <th>Nucleotide 2</th>
      <th>Nucleotide 3</th>
      <th>Nucleotide 4</th>
      <th>...</th>
      <th>Nucleotide 76679</th>
      <th>Nucleotide 76680</th>
      <th>Nucleotide 76681</th>
      <th>Nucleotide 76682</th>
      <th>Nucleotide 76683</th>
      <th>Nucleotide 76684</th>
      <th>Nucleotide 76685</th>
      <th>Nucleotide 76686</th>
      <th>Nucleotide 76687</th>
      <th>Nucleotide 76688</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_3c935a5fb77c7462d6535787c846bd24</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>A</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_8ffcd41500450a2afc2a7e18740709d1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>A</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>2</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_133f937b04029126ee01146a0c1bb594</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>A</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>3</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_7d49f51b714f3dbd2c25a04254295b5d</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>A</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>4</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_15e3a8fe1a0172f527dfb8451492c671</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>A</td>
      <td>G</td>
      <td>C</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>61</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_63b051a5b94f870a92026fa87043f7a1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>A</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
    </tr>
    <tr>
      <th>62</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_ff04e2afe8a65c7ab4a6cb89d3be51f8</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
      <td>G</td>
      <td>A</td>
    </tr>
    <tr>
      <th>63</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_4a2448cc04232570fee82a7d68d94bda</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>C</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
    </tr>
    <tr>
      <th>64</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_5346afe3589590c3fe017de8c4b8cad5</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>A</td>
      <td>C</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
    </tr>
    <tr>
      <th>65</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_e507d53129af542a4f49f4fc14299061</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>-9</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
      <td>A</td>
      <td>...</td>
      <td>A</td>
      <td>A</td>
      <td>A</td>
      <td>A</td>
      <td>A</td>
      <td>A</td>
      <td>G</td>
      <td>G</td>
      <td>G</td>
      <td>A</td>
    </tr>
  </tbody>
</table>
<p>66 rows × 76694 columns</p>
</div>




```python
map_data = plink.file_reader.load_data("map")
map_data
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Chromosome</th>
      <th>Marker ID</th>
      <th>Genetic Distance</th>
      <th>Position Base Pairs</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>COW_bee91d0199f55f7ceac640dd11d140c0</td>
      <td>0</td>
      <td>135098</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>COW_1913080647293a3848ad937ca9653aec</td>
      <td>0</td>
      <td>149772</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>COW_2eb599df7f168d419f8c2757da552295</td>
      <td>0</td>
      <td>163995</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1</td>
      <td>COW_96317923c4b14b72dc44ebbdfa25d279</td>
      <td>0</td>
      <td>183040</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>COW_81d964d5d890f1b262d1cb2d8e784523</td>
      <td>0</td>
      <td>267940</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>38339</th>
      <td>29</td>
      <td>COW_7319e9361f794ba225c89a83edfd620b</td>
      <td>0</td>
      <td>51038808</td>
    </tr>
    <tr>
      <th>38340</th>
      <td>29</td>
      <td>COW_e0c1b1b57e7e6d7b6c83573badbaa991</td>
      <td>0</td>
      <td>51042316</td>
    </tr>
    <tr>
      <th>38341</th>
      <td>29</td>
      <td>COW_a7f1212ead02b209c58c1766a914b01a</td>
      <td>0</td>
      <td>51152056</td>
    </tr>
    <tr>
      <th>38342</th>
      <td>29</td>
      <td>COW_5b06fd46612881ff9e7d2604de917a46</td>
      <td>0</td>
      <td>51358815</td>
    </tr>
    <tr>
      <th>38343</th>
      <td>29</td>
      <td>COW_34d9e4d41e8d8c8b8cadc8505a4f09d6</td>
      <td>0</td>
      <td>51484561</td>
    </tr>
  </tbody>
</table>
<p>38344 rows × 4 columns</p>
</div>



## Processing the data for further analysis

Plink enables you to process your data in order to make it easy to analyse. While Plink itself only exposes a command line API, you must still maintain all used instructions for your analysis in order to be scientific: Research must be reproducible. 

This is why this package was created: It allows you to do your analysis in Python while also commanding Plink natively from within Python.

`Plink` allows you to use pythonic arguments and keyword arguments, which will then be translated into command line instructions.


```python
# Encouraged: Use a pythonic way to declare args and kwargs
output = plink.run(
    make_bed=True,
    file=plink.plink_prefix,
    out=plink.plink_prefix,
    chr_set=34,
    chr="1-29",
    geno=0.1,
    mind=0.1,
    maf=0.05
)
```


```python
# Supports suppling arguments as strings directly, such as '--make-bed'
output = plink.run(
    "--make-bed",
    file=plink.plink_prefix,
    out=plink.plink_prefix,
    chr_set=34,
    chr="1-29",
    geno=0.1,
    mind=0.1,
    maf=0.05
)
```

### Type Safety for Plink keywords

`Plink` will prefer type-safe conversion where possible. But you can indeed provide basic arguments such as `--make-bed` directly as well.

Currently as subsidiary method converts plink commands unsafely to not block you from working due to a missing feature. That's why it is discouraged to provide arguments as strings.


```python
cmd_used = plink.make_cmd(
    "--make-bed",
    file=plink.plink_prefix,
    out=plink.plink_prefix,
    chr_set=34,
    chr="1-29",
    geno=0.1,
    mind=0.1,
    maf=0.05
)
cmd_used
```




    [PosixPath('/home/achwalt/Development/plynk/binaries/plink'),
     '--make-bed',
     <PlinkKeyword.FILE: '--file'>,
     '/home/achwalt/Development/plynk/data_src/demo_data/demo_01',
     <PlinkKeyword.OUT: '--out'>,
     '/home/achwalt/Development/plynk/data_src/demo_data/demo_01',
     <PlinkKeyword.CHR_SET: '--chr-set'>,
     '34',
     <PlinkKeyword.CHR: '--chr'>,
     '1-29',
     <PlinkKeyword.GENO: '--geno'>,
     '0.1',
     <PlinkKeyword.MIND: '--mind'>,
     '0.1',
     <PlinkKeyword.MAF: '--maf'>,
     '0.05']




```python
[str(cmd) for cmd in cmd_used]
```




    ['/home/achwalt/Development/plynk/binaries/plink',
     '--make-bed',
     '--file',
     '/home/achwalt/Development/plynk/data_src/demo_data/demo_01',
     '--out',
     '/home/achwalt/Development/plynk/data_src/demo_data/demo_01',
     '--chr-set',
     '34',
     '--chr',
     '1-29',
     '--geno',
     '0.1',
     '--mind',
     '0.1',
     '--maf',
     '0.05']




```python
from pathlib import Path

plink.make_cmd(
    "--make-bed",
    "--file",
    'data_src/demo_data/demo_01', # ❌ No Path safety will be applied!
    "--out",
    Path('data_src/demo_data/demo_01'), # ✅ Path safety will be applied.
    chr_set=34,
    chr="1-29",
    geno=0.1,
    mind=0.1,
    maf=0.05
)
```




    [PosixPath('/home/achwalt/Development/plynk/binaries/plink'),
     '--make-bed',
     '--file',
     'data_src/demo_data/demo_01',
     '--out',
     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01'),
     <PlinkKeyword.CHR_SET: '--chr-set'>,
     '34',
     <PlinkKeyword.CHR: '--chr'>,
     '1-29',
     <PlinkKeyword.GENO: '--geno'>,
     '0.1',
     <PlinkKeyword.MIND: '--mind'>,
     '0.1',
     <PlinkKeyword.MAF: '--maf'>,
     '0.05']



### You can also run a command created by `make_cmd`


```python
cmd = plink.make_cmd(
    make_bed=True,
    file=plink.plink_prefix,
    out=plink.plink_prefix,
    chr_set=34,
    chr="1-29",
    geno=0.1,
    mind=0.1,
    maf=0.05
)
output = plink.run_cmd(cmd)
```


```python
# Let's see what files we have now
plink.file_reader.find_files(plink.plink_prefix)
```




    [PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.bed'),
     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.bim'),
     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.fam'),
     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.log'),
     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.map'),
     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.nosex'),
     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.ped')]




```python
## More Examples
cmd = plink.make_cmd(file=plink.plink_prefix, out=plink.plink_prefix, cow=True, het=True)
output = plink.run_cmd(cmd)
```


```python
plink.file_reader.load_data("het")
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>FID</th>
      <th>IID</th>
      <th>O(HOM)</th>
      <th>E(HOM)</th>
      <th>N(NM)</th>
      <th>F</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_3c935a5fb77c7462d6535787c846bd24</td>
      <td>22120</td>
      <td>22280.0</td>
      <td>38015</td>
      <td>-0.010110</td>
    </tr>
    <tr>
      <th>1</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_8ffcd41500450a2afc2a7e18740709d1</td>
      <td>22102</td>
      <td>22280.0</td>
      <td>38017</td>
      <td>-0.011320</td>
    </tr>
    <tr>
      <th>2</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_133f937b04029126ee01146a0c1bb594</td>
      <td>22005</td>
      <td>22280.0</td>
      <td>38017</td>
      <td>-0.017490</td>
    </tr>
    <tr>
      <th>3</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_7d49f51b714f3dbd2c25a04254295b5d</td>
      <td>22194</td>
      <td>22280.0</td>
      <td>38018</td>
      <td>-0.005507</td>
    </tr>
    <tr>
      <th>4</th>
      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>
      <td>COW_15e3a8fe1a0172f527dfb8451492c671</td>
      <td>21689</td>
      <td>22280.0</td>
      <td>38018</td>
      <td>-0.037600</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>61</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_63b051a5b94f870a92026fa87043f7a1</td>
      <td>23399</td>
      <td>22280.0</td>
      <td>38018</td>
      <td>0.071060</td>
    </tr>
    <tr>
      <th>62</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_ff04e2afe8a65c7ab4a6cb89d3be51f8</td>
      <td>23515</td>
      <td>22280.0</td>
      <td>38017</td>
      <td>0.078470</td>
    </tr>
    <tr>
      <th>63</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_4a2448cc04232570fee82a7d68d94bda</td>
      <td>23970</td>
      <td>22280.0</td>
      <td>38018</td>
      <td>0.107300</td>
    </tr>
    <tr>
      <th>64</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_5346afe3589590c3fe017de8c4b8cad5</td>
      <td>23874</td>
      <td>22280.0</td>
      <td>38018</td>
      <td>0.101200</td>
    </tr>
    <tr>
      <th>65</th>
      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>
      <td>COW_e507d53129af542a4f49f4fc14299061</td>
      <td>25778</td>
      <td>22280.0</td>
      <td>38018</td>
      <td>0.222200</td>
    </tr>
  </tbody>
</table>
<p>66 rows × 6 columns</p>
</div>




```python

```

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/achwalt/plynk",
    "name": "plynk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "plink genomics genetics",
    "author": "Benjamin Albrechts",
    "author_email": "benjamin.albrechts@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/dc/c3/c5d79c5dad5753c53c8d7d767c2524e5964b42539a9ebfe16c1d4bd5afc4/plynk-0.1.2.tar.gz",
    "platform": null,
    "description": "# README\n\n## Known Limitations\nThis is still undergoing development - Use at your own risk!\n\n1. While this software attempts to adhere to the official Plink guidelines and list of commands, NO proper testing of all available plink commands has yet completed. Further development will thus focus on improving robustness of testing.\n2. `PlinkFileReader.load_file` can only read ONE file - Instead use `PlinkFileReader.read_file` with a specific file path. `args` and `kwargs` follow `pandas.read_csv`.\n\n\n## Feature Requests and Bug Reports\n**Please don't write E-Mails!** Instead, consider creating a new issue on GitLab: [https://gitlab.com/achwalt/plynk](https://gitlab.com/achwalt/plynk)\n\n\n## Installation\n\nPlynk can be installed with `pip install plynk`.\n\n1. Set up a new conda environment (assuing name `\"my_env\"`).\n    1. `conda create --name my_env python=3.12 -y`\n    2. `conda activate my_env`\n2. Install `plynk`.\n    1. This always works: `pip install plynk`.\n    2. This is faster but only works if you installed `uv`: `uv pip install plynk`. Install `uv` with `pip install uv`.\n\n## Using PLINK in python to process genetic data\n\n`Plink` is an abstraction which allows you to use Plink commands easily from Python, with automatic management of path variables so you maintain full control.\n\n## Import the Plink class\n\n\n```python\nfrom plynk import Plink\n```\n\n`Plink` is an abstraction which allows you to use the Plink software commands easily from Python, with automatic management of path variables so you maintain full control.\n\n**Plink let's you choose three _optional_ parameters:**\n1.  `plink_binary_path`: This is the path to where you have your plink binary available. If you leave it empty, you must have plink globally installed.\n2.  `plink_prefix`: This is the path to the folder where your **output** data lives, starting with the initial files to process. If you leave it blank, you won't be able to use the file reader. Remember to always create/keep a backup of your source data.\n3.  `encoding`: Default is `utf-8`. If your data is encoded in a different format, consider setting this here.\n\nIn the following example, we want to work with a local version of plink which should be at `\"binaries/plink\"` in our working directory. Our data is stored in the folder `\"data_src/demo_data\"`.\n\n\n```python\n# Constants\nPLINK_BINARY_PATH = \"binaries/plink\"\nPLINK_PREFIX = \"data_src/demo_data/demo_01\" # folder: data_scr/demo_data, file_prefix: demo_01\n```\n\n\n```python\nplink = Plink(\n    plink_binary_path=PLINK_BINARY_PATH,\n    plink_prefix=PLINK_PREFIX,\n)\n```\n\n\n```python\ntry:\n    print(plink.info)\nexcept RuntimeError:\n    print(\"Plink is not yet available. Let's download it!\")\n```\n\n    Plink is not yet available. Let's download it!\n\n\n## Get the plink binary\n\nAll versions of Plink are available here: [https://www.cog-genomics.org/plink2/](https://www.cog-genomics.org/plink2/)\n\n`Plink` can download this data for us so that we do not need to take care about it.\n\n\n```python\nplink.download_binaries()\n```\n\n\n\n\n    PosixPath('/home/achwalt/Development/plynk/binaries/plink')\n\n\n\n\n```python\nplink.info\n```\n\n\n\n\n    {'name': 'PLINK',\n     'version': 'v1.90b7.2',\n     'architecture': '64-bit',\n     'release_date': datetime.datetime(2023, 12, 11, 0, 0)}\n\n\n\n## Working with data\n\nNow that plink is available, we can work with it natively from python. We can use either \n1. the exposed `Path` object or\n2. the plink file reader\n\nto see what data we have in our data folder to work with. Since the **plink prefix contains already both the folder and the file's prefix**, we need to use it's parent!\n\n\n```python\nfiles_available = [file_name for file_name in plink.plink_prefix.parent.iterdir()]\nfiles_available\n```\n\n\n\n\n    [PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.ped'),\n     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.map')]\n\n\n\n\n```python\nplink.file_reader.find_files(plink.plink_prefix)\n```\n\n\n\n\n    [PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.map'),\n     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.ped')]\n\n\n\n### We can inspect the data by using the path:\n\n\n```python\nped_data = plink.file_reader.read_file(files_available[0])\nped_data\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>TYPE_3c64f9b442261a43a88a89606d363421</th>\n      <th>COW_3c935a5fb77c7462d6535787c846bd24</th>\n      <th>0</th>\n      <th>0.1</th>\n      <th>0.2</th>\n      <th>-9</th>\n      <th>A</th>\n      <th>G</th>\n      <th>C</th>\n      <th>A.1</th>\n      <th>...</th>\n      <th>C.7482</th>\n      <th>C.7483</th>\n      <th>A.37157</th>\n      <th>A.37158</th>\n      <th>G.31809</th>\n      <th>G.31810</th>\n      <th>G.31811</th>\n      <th>G.31812</th>\n      <th>G.31813</th>\n      <th>A.37159</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_8ffcd41500450a2afc2a7e18740709d1</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>A</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_133f937b04029126ee01146a0c1bb594</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>A</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_7d49f51b714f3dbd2c25a04254295b5d</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>A</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_15e3a8fe1a0172f527dfb8451492c671</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>A</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_08020b767145f321d189fcac4f94cd1c</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>A</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>60</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_63b051a5b94f870a92026fa87043f7a1</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>A</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>61</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_ff04e2afe8a65c7ab4a6cb89d3be51f8</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>62</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_4a2448cc04232570fee82a7d68d94bda</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>63</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_5346afe3589590c3fe017de8c4b8cad5</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>A</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>64</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_e507d53129af542a4f49f4fc14299061</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>A</td>\n      <td>A</td>\n      <td>A</td>\n      <td>A</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n    </tr>\n  </tbody>\n</table>\n<p>65 rows \u00d7 76694 columns</p>\n</div>\n\n\n\n### If there is only one file per plink file type, we can also just load it with some pre-formatted column names.\n\n\n```python\nped_data = plink.file_reader.load_data(\"ped\", numerated_colname=\"Nucleotide\")\nped_data\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Family ID</th>\n      <th>Individual ID</th>\n      <th>Paternal ID</th>\n      <th>Maternal ID</th>\n      <th>Sex</th>\n      <th>Phenotype</th>\n      <th>Nucleotide 1</th>\n      <th>Nucleotide 2</th>\n      <th>Nucleotide 3</th>\n      <th>Nucleotide 4</th>\n      <th>...</th>\n      <th>Nucleotide 76679</th>\n      <th>Nucleotide 76680</th>\n      <th>Nucleotide 76681</th>\n      <th>Nucleotide 76682</th>\n      <th>Nucleotide 76683</th>\n      <th>Nucleotide 76684</th>\n      <th>Nucleotide 76685</th>\n      <th>Nucleotide 76686</th>\n      <th>Nucleotide 76687</th>\n      <th>Nucleotide 76688</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_3c935a5fb77c7462d6535787c846bd24</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>A</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_8ffcd41500450a2afc2a7e18740709d1</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>A</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_133f937b04029126ee01146a0c1bb594</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>A</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_7d49f51b714f3dbd2c25a04254295b5d</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>A</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_15e3a8fe1a0172f527dfb8451492c671</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>A</td>\n      <td>G</td>\n      <td>C</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>61</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_63b051a5b94f870a92026fa87043f7a1</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>A</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>62</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_ff04e2afe8a65c7ab4a6cb89d3be51f8</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n      <td>G</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>63</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_4a2448cc04232570fee82a7d68d94bda</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>C</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n    </tr>\n    <tr>\n      <th>64</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_5346afe3589590c3fe017de8c4b8cad5</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>A</td>\n      <td>C</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n    </tr>\n    <tr>\n      <th>65</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_e507d53129af542a4f49f4fc14299061</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>-9</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n      <td>A</td>\n      <td>...</td>\n      <td>A</td>\n      <td>A</td>\n      <td>A</td>\n      <td>A</td>\n      <td>A</td>\n      <td>A</td>\n      <td>G</td>\n      <td>G</td>\n      <td>G</td>\n      <td>A</td>\n    </tr>\n  </tbody>\n</table>\n<p>66 rows \u00d7 76694 columns</p>\n</div>\n\n\n\n\n```python\nmap_data = plink.file_reader.load_data(\"map\")\nmap_data\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Chromosome</th>\n      <th>Marker ID</th>\n      <th>Genetic Distance</th>\n      <th>Position Base Pairs</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1</td>\n      <td>COW_bee91d0199f55f7ceac640dd11d140c0</td>\n      <td>0</td>\n      <td>135098</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n      <td>COW_1913080647293a3848ad937ca9653aec</td>\n      <td>0</td>\n      <td>149772</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1</td>\n      <td>COW_2eb599df7f168d419f8c2757da552295</td>\n      <td>0</td>\n      <td>163995</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1</td>\n      <td>COW_96317923c4b14b72dc44ebbdfa25d279</td>\n      <td>0</td>\n      <td>183040</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1</td>\n      <td>COW_81d964d5d890f1b262d1cb2d8e784523</td>\n      <td>0</td>\n      <td>267940</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>38339</th>\n      <td>29</td>\n      <td>COW_7319e9361f794ba225c89a83edfd620b</td>\n      <td>0</td>\n      <td>51038808</td>\n    </tr>\n    <tr>\n      <th>38340</th>\n      <td>29</td>\n      <td>COW_e0c1b1b57e7e6d7b6c83573badbaa991</td>\n      <td>0</td>\n      <td>51042316</td>\n    </tr>\n    <tr>\n      <th>38341</th>\n      <td>29</td>\n      <td>COW_a7f1212ead02b209c58c1766a914b01a</td>\n      <td>0</td>\n      <td>51152056</td>\n    </tr>\n    <tr>\n      <th>38342</th>\n      <td>29</td>\n      <td>COW_5b06fd46612881ff9e7d2604de917a46</td>\n      <td>0</td>\n      <td>51358815</td>\n    </tr>\n    <tr>\n      <th>38343</th>\n      <td>29</td>\n      <td>COW_34d9e4d41e8d8c8b8cadc8505a4f09d6</td>\n      <td>0</td>\n      <td>51484561</td>\n    </tr>\n  </tbody>\n</table>\n<p>38344 rows \u00d7 4 columns</p>\n</div>\n\n\n\n## Processing the data for further analysis\n\nPlink enables you to process your data in order to make it easy to analyse. While Plink itself only exposes a command line API, you must still maintain all used instructions for your analysis in order to be scientific: Research must be reproducible. \n\nThis is why this package was created: It allows you to do your analysis in Python while also commanding Plink natively from within Python.\n\n`Plink` allows you to use pythonic arguments and keyword arguments, which will then be translated into command line instructions.\n\n\n```python\n# Encouraged: Use a pythonic way to declare args and kwargs\noutput = plink.run(\n    make_bed=True,\n    file=plink.plink_prefix,\n    out=plink.plink_prefix,\n    chr_set=34,\n    chr=\"1-29\",\n    geno=0.1,\n    mind=0.1,\n    maf=0.05\n)\n```\n\n\n```python\n# Supports suppling arguments as strings directly, such as '--make-bed'\noutput = plink.run(\n    \"--make-bed\",\n    file=plink.plink_prefix,\n    out=plink.plink_prefix,\n    chr_set=34,\n    chr=\"1-29\",\n    geno=0.1,\n    mind=0.1,\n    maf=0.05\n)\n```\n\n### Type Safety for Plink keywords\n\n`Plink` will prefer type-safe conversion where possible. But you can indeed provide basic arguments such as `--make-bed` directly as well.\n\nCurrently as subsidiary method converts plink commands unsafely to not block you from working due to a missing feature. That's why it is discouraged to provide arguments as strings.\n\n\n```python\ncmd_used = plink.make_cmd(\n    \"--make-bed\",\n    file=plink.plink_prefix,\n    out=plink.plink_prefix,\n    chr_set=34,\n    chr=\"1-29\",\n    geno=0.1,\n    mind=0.1,\n    maf=0.05\n)\ncmd_used\n```\n\n\n\n\n    [PosixPath('/home/achwalt/Development/plynk/binaries/plink'),\n     '--make-bed',\n     <PlinkKeyword.FILE: '--file'>,\n     '/home/achwalt/Development/plynk/data_src/demo_data/demo_01',\n     <PlinkKeyword.OUT: '--out'>,\n     '/home/achwalt/Development/plynk/data_src/demo_data/demo_01',\n     <PlinkKeyword.CHR_SET: '--chr-set'>,\n     '34',\n     <PlinkKeyword.CHR: '--chr'>,\n     '1-29',\n     <PlinkKeyword.GENO: '--geno'>,\n     '0.1',\n     <PlinkKeyword.MIND: '--mind'>,\n     '0.1',\n     <PlinkKeyword.MAF: '--maf'>,\n     '0.05']\n\n\n\n\n```python\n[str(cmd) for cmd in cmd_used]\n```\n\n\n\n\n    ['/home/achwalt/Development/plynk/binaries/plink',\n     '--make-bed',\n     '--file',\n     '/home/achwalt/Development/plynk/data_src/demo_data/demo_01',\n     '--out',\n     '/home/achwalt/Development/plynk/data_src/demo_data/demo_01',\n     '--chr-set',\n     '34',\n     '--chr',\n     '1-29',\n     '--geno',\n     '0.1',\n     '--mind',\n     '0.1',\n     '--maf',\n     '0.05']\n\n\n\n\n```python\nfrom pathlib import Path\n\nplink.make_cmd(\n    \"--make-bed\",\n    \"--file\",\n    'data_src/demo_data/demo_01', # \u274c No Path safety will be applied!\n    \"--out\",\n    Path('data_src/demo_data/demo_01'), # \u2705 Path safety will be applied.\n    chr_set=34,\n    chr=\"1-29\",\n    geno=0.1,\n    mind=0.1,\n    maf=0.05\n)\n```\n\n\n\n\n    [PosixPath('/home/achwalt/Development/plynk/binaries/plink'),\n     '--make-bed',\n     '--file',\n     'data_src/demo_data/demo_01',\n     '--out',\n     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01'),\n     <PlinkKeyword.CHR_SET: '--chr-set'>,\n     '34',\n     <PlinkKeyword.CHR: '--chr'>,\n     '1-29',\n     <PlinkKeyword.GENO: '--geno'>,\n     '0.1',\n     <PlinkKeyword.MIND: '--mind'>,\n     '0.1',\n     <PlinkKeyword.MAF: '--maf'>,\n     '0.05']\n\n\n\n### You can also run a command created by `make_cmd`\n\n\n```python\ncmd = plink.make_cmd(\n    make_bed=True,\n    file=plink.plink_prefix,\n    out=plink.plink_prefix,\n    chr_set=34,\n    chr=\"1-29\",\n    geno=0.1,\n    mind=0.1,\n    maf=0.05\n)\noutput = plink.run_cmd(cmd)\n```\n\n\n```python\n# Let's see what files we have now\nplink.file_reader.find_files(plink.plink_prefix)\n```\n\n\n\n\n    [PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.bed'),\n     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.bim'),\n     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.fam'),\n     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.log'),\n     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.map'),\n     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.nosex'),\n     PosixPath('/home/achwalt/Development/plynk/data_src/demo_data/demo_01.ped')]\n\n\n\n\n```python\n## More Examples\ncmd = plink.make_cmd(file=plink.plink_prefix, out=plink.plink_prefix, cow=True, het=True)\noutput = plink.run_cmd(cmd)\n```\n\n\n```python\nplink.file_reader.load_data(\"het\")\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>FID</th>\n      <th>IID</th>\n      <th>O(HOM)</th>\n      <th>E(HOM)</th>\n      <th>N(NM)</th>\n      <th>F</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_3c935a5fb77c7462d6535787c846bd24</td>\n      <td>22120</td>\n      <td>22280.0</td>\n      <td>38015</td>\n      <td>-0.010110</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_8ffcd41500450a2afc2a7e18740709d1</td>\n      <td>22102</td>\n      <td>22280.0</td>\n      <td>38017</td>\n      <td>-0.011320</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_133f937b04029126ee01146a0c1bb594</td>\n      <td>22005</td>\n      <td>22280.0</td>\n      <td>38017</td>\n      <td>-0.017490</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_7d49f51b714f3dbd2c25a04254295b5d</td>\n      <td>22194</td>\n      <td>22280.0</td>\n      <td>38018</td>\n      <td>-0.005507</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>TYPE_3c64f9b442261a43a88a89606d363421</td>\n      <td>COW_15e3a8fe1a0172f527dfb8451492c671</td>\n      <td>21689</td>\n      <td>22280.0</td>\n      <td>38018</td>\n      <td>-0.037600</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>61</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_63b051a5b94f870a92026fa87043f7a1</td>\n      <td>23399</td>\n      <td>22280.0</td>\n      <td>38018</td>\n      <td>0.071060</td>\n    </tr>\n    <tr>\n      <th>62</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_ff04e2afe8a65c7ab4a6cb89d3be51f8</td>\n      <td>23515</td>\n      <td>22280.0</td>\n      <td>38017</td>\n      <td>0.078470</td>\n    </tr>\n    <tr>\n      <th>63</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_4a2448cc04232570fee82a7d68d94bda</td>\n      <td>23970</td>\n      <td>22280.0</td>\n      <td>38018</td>\n      <td>0.107300</td>\n    </tr>\n    <tr>\n      <th>64</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_5346afe3589590c3fe017de8c4b8cad5</td>\n      <td>23874</td>\n      <td>22280.0</td>\n      <td>38018</td>\n      <td>0.101200</td>\n    </tr>\n    <tr>\n      <th>65</th>\n      <td>TYPE_3cda26e0c8aedbf662adb42f923ef3ec</td>\n      <td>COW_e507d53129af542a4f49f4fc14299061</td>\n      <td>25778</td>\n      <td>22280.0</td>\n      <td>38018</td>\n      <td>0.222200</td>\n    </tr>\n  </tbody>\n</table>\n<p>66 rows \u00d7 6 columns</p>\n</div>\n\n\n\n\n```python\n\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Easy command over the PLINK software directly from Python",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://gitlab.com/achwalt/plynk"
    },
    "split_keywords": [
        "plink",
        "genomics",
        "genetics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1bfe2b6258b39aa6d2eee3eecaf336226b8b041b53c6e06f3e658bfc3b223fe5",
                "md5": "d2e27795bcce3d43c73668df32c03475",
                "sha256": "521b0307cbd739d74da5a86be14cbab2caf51e39a0b120b605813432e7cf0815"
            },
            "downloads": -1,
            "filename": "plynk-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d2e27795bcce3d43c73668df32c03475",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 29157,
            "upload_time": "2024-07-06T21:01:17",
            "upload_time_iso_8601": "2024-07-06T21:01:17.205266Z",
            "url": "https://files.pythonhosted.org/packages/1b/fe/2b6258b39aa6d2eee3eecaf336226b8b041b53c6e06f3e658bfc3b223fe5/plynk-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dcc3c5d79c5dad5753c53c8d7d767c2524e5964b42539a9ebfe16c1d4bd5afc4",
                "md5": "9655e7829ac0d323955cb4d16f62bf05",
                "sha256": "ac6363af64c432d8555b23742bedbae9dd152f419f1786c7a1632f8c0354d42f"
            },
            "downloads": -1,
            "filename": "plynk-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "9655e7829ac0d323955cb4d16f62bf05",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 23229,
            "upload_time": "2024-07-06T21:01:19",
            "upload_time_iso_8601": "2024-07-06T21:01:19.120496Z",
            "url": "https://files.pythonhosted.org/packages/dc/c3/c5d79c5dad5753c53c8d7d767c2524e5964b42539a9ebfe16c1d4bd5afc4/plynk-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-06 21:01:19",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "achwalt",
    "gitlab_project": "plynk",
    "lcname": "plynk"
}
        
Elapsed time: 0.68384s