# FoldKit
A Python toolkit for working with AlphaFold3 results and storing them efficiently.
## Installation
`pip install foldkit`
## Use Cases
There are two main use cases for this package
(1) Convenient python interface for accessing AlphaFold3 prediction confidence metrics. This is particularly useful for metrics across chains in protein complexes, as they have been shown to be predictive of binding and are useful criteria for protein design and specificity predictions
(2) Efficient Storage of AlphaFold3 confidence results. The default JSON formats for AF3 results are large, and can take up a lot of unnecessary space. Foldkit has a CLI for exporting the AF3 confidence JSONs to space-efficient .npz files. These .npz files can also be used as inputs to the python interface in (1)
## foldkit - Python Interface Tutorial
Let's say you have a directory that contains the results of an AlphaFold3 prediction for a protein complex. This protein complex is actually a TCR with the following chains: ["A", "B", "M", "P"] (which is the TCRa, TCRb, MHCa, peptide). These results are stored in a directory:
`"structures/tcr_pmhc_1/"`.`
We can can load the results:
```
import foldkit
result_obj = foldkit.AF3Result.load_result("structures/tcr_pmhc_1/")
```
This object has access to all of the confidence metadata, as well as the ability to compute specific statistics on the metadata.
```
>>> result_obj.chains
[np.str_('A'), np.str_('B'), np.str_('M'), np.str_('P')]
```
For example, the structure wide PTM:
```
>>> result_obj.get_ptm()
0.81
```
Or, just the average PTM for the TCRa chain:
```
>>> result_obj.get_ptm("A")
0.82
```
Here is the average interaction_pae (ipae) between the TCRb chain and the peptide:
```
>>> result_obj.get_ipae(chain1="B", chain2="P")
np.float64(6.253699186991869)
```
By default, these methods compute the average. But maybe you want a different aggregation function? You can pass in a custom `agg`:
```
>>> result_obj.get_ipae(chain1="B", chain2="P", agg=np.min)
np.float64(1.3)
```
## folkdkit - CLI Tutorial
```
usage: foldkit [-h] [--verbose] {export-single-result,export-multi-result,batch-export-multi-result} ...
Export AlphaFold3 result directories into compressed format.Converts confidences into npz format and copies over the rest of the data as is (except the _input_data.json which is not kept since it is redundant).
positional arguments:
{export-single-result,export-multi-result,batch-export-multi-result}
export-single-result
Export a single AlphaFold3 result directory to compressed format
export-multi-result
Export multiseed/multisample AlphaFold3 results to compressed format.
batch-export-multi-result
Export multiple AlphaFold3 results to compressed format.
options:
-h, --help show this help message and exit
--verbose, -v Print detailed output.
```
There are 3 main entry points, depending on the data you are exporting:
1) A single prediction directory (i.e. one prediction corresponding to a single seed and sample)
2) A prediction directory (i.e. N*K predictions corresponding to the same input with N seeds and K samples)
3) A directory of prediction directories (i.e. a directory containing many "prediction directories" like in (2).
### 1- Export a single result (i.e. one single structure from a single seed and sample)
```
foldkit export-result /path/to/specific_structure_directory /path/to/outdir
```
### 2- Export a single result with multiple seeds and/or samples
```
foldkit export-multi-result /path/to/specific_structure_parent_directory /data1/greenbab/users/levinej4/af3/foldkit/tests/test_data/test-m1
```
### 3- Batch export many results
```
foldkit -v batch-export-multi-result /path/to/directory_of_subdirectories/ /path/to/outdir
```
Raw data
{
"_id": null,
"home_page": null,
"name": "foldkit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "protein, protein design, alphafold, bioinformatics",
"author": null,
"author_email": "Jonathan Levine <jonalevine1@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/c2/1a/b76f6d37432f058e0f6a01557ffd81f1be83a6c9fa3383e52996bddc9fbf/foldkit-0.1.1.tar.gz",
"platform": null,
"description": "# FoldKit\n\nA Python toolkit for working with AlphaFold3 results and storing them efficiently.\n\n## Installation\n`pip install foldkit`\n\n## Use Cases\nThere are two main use cases for this package\n(1) Convenient python interface for accessing AlphaFold3 prediction confidence metrics. This is particularly useful for metrics across chains in protein complexes, as they have been shown to be predictive of binding and are useful criteria for protein design and specificity predictions\n\n(2) Efficient Storage of AlphaFold3 confidence results. The default JSON formats for AF3 results are large, and can take up a lot of unnecessary space. Foldkit has a CLI for exporting the AF3 confidence JSONs to space-efficient .npz files. These .npz files can also be used as inputs to the python interface in (1)\n\n## foldkit - Python Interface Tutorial\nLet's say you have a directory that contains the results of an AlphaFold3 prediction for a protein complex. This protein complex is actually a TCR with the following chains: [\"A\", \"B\", \"M\", \"P\"] (which is the TCRa, TCRb, MHCa, peptide). These results are stored in a directory:\n`\"structures/tcr_pmhc_1/\"`.`\nWe can can load the results:\n```\nimport foldkit\nresult_obj = foldkit.AF3Result.load_result(\"structures/tcr_pmhc_1/\")\n```\nThis object has access to all of the confidence metadata, as well as the ability to compute specific statistics on the metadata.\n```\n>>> result_obj.chains\n[np.str_('A'), np.str_('B'), np.str_('M'), np.str_('P')]\n``` \nFor example, the structure wide PTM:\n```\n>>> result_obj.get_ptm()\n0.81\n```\nOr, just the average PTM for the TCRa chain:\n```\n>>> result_obj.get_ptm(\"A\")\n0.82\n```\nHere is the average interaction_pae (ipae) between the TCRb chain and the peptide:\n```\n>>> result_obj.get_ipae(chain1=\"B\", chain2=\"P\")\nnp.float64(6.253699186991869)\n```\n\nBy default, these methods compute the average. But maybe you want a different aggregation function? You can pass in a custom `agg`:\n```\n>>> result_obj.get_ipae(chain1=\"B\", chain2=\"P\", agg=np.min)\nnp.float64(1.3)\n```\n\n## folkdkit - CLI Tutorial\n```\nusage: foldkit [-h] [--verbose] {export-single-result,export-multi-result,batch-export-multi-result} ...\n\nExport AlphaFold3 result directories into compressed format.Converts confidences into npz format and copies over the rest of the data as is (except the _input_data.json which is not kept since it is redundant).\n\npositional arguments:\n {export-single-result,export-multi-result,batch-export-multi-result}\n export-single-result\n Export a single AlphaFold3 result directory to compressed format\n export-multi-result\n Export multiseed/multisample AlphaFold3 results to compressed format.\n batch-export-multi-result\n Export multiple AlphaFold3 results to compressed format.\n\noptions:\n -h, --help show this help message and exit\n --verbose, -v Print detailed output.\n```\nThere are 3 main entry points, depending on the data you are exporting:\n1) A single prediction directory (i.e. one prediction corresponding to a single seed and sample)\n2) A prediction directory (i.e. N*K predictions corresponding to the same input with N seeds and K samples)\n3) A directory of prediction directories (i.e. a directory containing many \"prediction directories\" like in (2).\n\n ### 1- Export a single result (i.e. one single structure from a single seed and sample)\n ```\n foldkit export-result /path/to/specific_structure_directory /path/to/outdir\n ```\n ### 2- Export a single result with multiple seeds and/or samples\n ```\nfoldkit export-multi-result /path/to/specific_structure_parent_directory /data1/greenbab/users/levinej4/af3/foldkit/tests/test_data/test-m1\n ```\n ### 3- Batch export many results\n ```\n foldkit -v batch-export-multi-result /path/to/directory_of_subdirectories/ /path/to/outdir\n ```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Toolkit for working with AlphaFold3 results and confidence metadata",
"version": "0.1.1",
"project_urls": null,
"split_keywords": [
"protein",
" protein design",
" alphafold",
" bioinformatics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "cf426d2f82b165458fe3dd1503fa9c1075077f1540584ca95d91ecf4353fd012",
"md5": "37ecfa94dff6a73c65fbd2e555ce62cf",
"sha256": "e7b6a936d38771a12b556e8d4ba7822370c5182e0c0eff8aad9f133cce57e653"
},
"downloads": -1,
"filename": "foldkit-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "37ecfa94dff6a73c65fbd2e555ce62cf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 9191,
"upload_time": "2025-10-23T14:52:14",
"upload_time_iso_8601": "2025-10-23T14:52:14.920168Z",
"url": "https://files.pythonhosted.org/packages/cf/42/6d2f82b165458fe3dd1503fa9c1075077f1540584ca95d91ecf4353fd012/foldkit-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c21ab76f6d37432f058e0f6a01557ffd81f1be83a6c9fa3383e52996bddc9fbf",
"md5": "b70f35d780fa0e49c591f4ff276d6454",
"sha256": "4756b6a2974bbbc7f70fbf83424f65898c329b332cf2cb2cb8dcb78c23643e00"
},
"downloads": -1,
"filename": "foldkit-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "b70f35d780fa0e49c591f4ff276d6454",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 10389,
"upload_time": "2025-10-23T14:52:16",
"upload_time_iso_8601": "2025-10-23T14:52:16.098369Z",
"url": "https://files.pythonhosted.org/packages/c2/1a/b76f6d37432f058e0f6a01557ffd81f1be83a6c9fa3383e52996bddc9fbf/foldkit-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-23 14:52:16",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "foldkit"
}