pybiotech


Namepybiotech JSON
Version 0.2.6 PyPI version JSON
download
home_pageNone
SummaryA collection of reusable python biotech library from AI Lingues.
upload_time2025-10-21 16:41:24
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseMIT
keywords ailingues components biotech library
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!--
 * @Author: Zeng Shengbo shengbo.zeng@ailingues.com
 * @Date: 2025-06-19 15:05:41
 * @LastEditors: Zeng Shengbo shengbo.zeng@ailingues.com
 * @LastEditTime: 10/22/2025 00:19:32
 * @FilePath: //pybiotech//README.md
 * @Description:
 * 
 * Copyright (c) 2025 by AI Lingues, All Rights Reserved. 
-->

AI Lingues Biotech Python Library
====

本组件库用于生物科学和医疗方面的数据处理、分析等。

- 支持NIH PubChem公开数据库数据查询访问
- 支持SDF格式文件高速读取
- 支持分子化合物构象力场优化及特征计算

<hr>

**最新版本**

version **0.2.6**

**主要依赖**

- python >= 3.11
- pycorelibs >= 0.2.6
- rdkit == 2024.9.6

**CopyRight**

    AI Lingues Team

**email**

    support@ailingues.com

<hr>

# core 模块

<hr>

## molecule 模块

### calculator 模块

<hr>

#### MolCaculator 分子计算器类

计算分子化合物相关的参数、特征等,函数清单如下:

| 函数                              | 描述                                                                   |
| --------------------------------- | ---------------------------------------------------------------------- |
| approximate_oe_hydrophobe         | 计算"疏水簇"                                                           |
| calc_surface_area                 | 计算分子化合物的表面积                                                 |
| calc_hydrophobic_surface_area     | 计算分子疏水表面积                                                     |
| calc_diameter                     | 计算单个分子的最大直径                                                 |
| calc_sssr                         | 计算分子的环信息(SSSR 与 RingInfo 快照)                               |
| calc_logp                         | 计算分子化合物的脂溶性(LogP)                                           |
| calc_morgan_fringerprint          | 计算分子化合物的Morgan指纹                                             |
| calc_maccs_fingerprint            | 计算分子化合物的MACCS指纹                                              |
| calc_crippen_contribs             | 计算分子的Crippen规范化贡献,包括每个原子的 logP 贡献和摩尔折射率贡献   |
| calc_pharmacophore_features       | 计算药效团特征                                                         |
| calc_intra_pharmacophore_distance | 计算每种药效团类型内部所有原子的欧几里得距离矩阵                       |
| calc_inter_pharmacophore_distance | 计算两个不同药效团类型之间的欧几里得距离矩阵                           |
| get_ring_atoms                    | 提取所有环中原子的索引 (1-based)                                       |
| get_hydrophobic_clusters          | 识别疏水原子并根据 3D 距离聚合成多个“簇”,以对应 SDF 中多行 hydrophobe |
| get_anions_cations                | 获取阴离子原子列表、阳离子原子列表 (1-based)                           |
| get_hbond_acceptors               | 识别氢键受体原子 (0-based 索引)                                        |
| get_hbond_donors                  | 识别氢键供体原子 (0-based 索引)                                        |

<hr>

##### approximate_oe_hydrophobe

计算"疏水簇"。  
在纯 RDKit 环境下近似地模仿 OEShape 的疏水原子聚合逻辑, 返回多个"疏水簇"。
每个簇可视为一行 hydrophobe, 类似:  
3  38 41 42 hydrophobe  
4  17 19 20 21 hydrophobe

等等.

###### 参数说明

- mol : RDKit Mol 对象

    如果已带有合理的 3D conformer,可不再 embed。如果没3D,且embed_if_needed=True则会自动Embed+Optimize。

- distance_threshold : float

    两个候选原子若距离 < 此阈值则视为同一疏水簇. OEShape常用 ~1.0 or 1.5Å.

- exclude_aromatic : bool

    是否排除所有芳香碳(例如苯环C). OEShape 里通常某些芳香C也可能算疏水, 这里可选。

- partial_charge_cutoff : float or None

    若不为 None, 则使用 Gasteiger 部分电荷, 排除绝对值>=该阈值的碳, 以排除极性碳.  
    例如 0.2 -> |q|≥0.2 的碳视为不疏水.

- min_cluster_size : int

    最小簇大小, 若某个簇只有 < min_cluster_size 个原子, 则可视为噪声/舍弃 (或可保留).

- extended_filter : bool

    若 True, 用 "不含 O,N,S,P,卤素" 规则剔除碳; 否则只要邻居里无 O,N 即保留。

- embed_if_needed : bool

    若 mol 无 3D 构象, 是否调用 ETKDG embed.

- max_attempts : int

    embed 出错时的尝试次数.

###### 返回

- hydrophobe_lines : list of tuples  [(atom_count, [a1,a2,...]), ...]

    其中 a1,a2,... 是1-based原子索引, 已排序, 代表同一个疏水簇.
    可以把它转成 SDF-like字符串:
        f"{atom_count} {' '.join(map(str, atom_ids))} hydrophobe"
    也可以直接当数据结构用.

###### 注意

1. 不保证与 OEShape 结果完全一致, 只是"力场 + 距离聚类 + 不同过滤"的思路.
2. 若 embed 失败/坐标不合理, 或分子过大, 可能结果仍不理想.
3. 可多次调参 distance_threshold, partial_charge_cutoff 等, 观察对结果的影响.

##### calc_surface_area

计算分子化合物的表面积。

在调用之前,必须做如下处理:

 1. 添加显式氢原子
    ```python
        mol.UpdatePropertyCache(strict=False)  
        mol_with_H = Chem.AddHs(mol)
    ```

 2. 生成 3D 坐标
    ```python
        AllChem.EmbedMolecule(mol_with_H)  
        AllChem.MMFFOptimizeMolecule(mol_with_H)
    ```

###### Args

- mol (Chem.Mol): 分子化合物对象

###### Raises

- Exception: _description_

###### Returns

- float: 分子总表面积

<hr>

##### calc_hydrophobic_surface_area

计算分子疏水表面积。

在调用之前,必须做如下处理:

1. 添加显式氢原子
    ```python
        mol.UpdatePropertyCache(strict=False)  
        mol_with_H = Chem.AddHs(mol)
    ```

2. 生成 3D 坐标
    ```python
        AllChem.EmbedMolecule(mol_with_H)  
        AllChem.MMFFOptimizeMolecule(mol_with_H)
    ```

###### Args

- mol (Chem.Mol): 分子化合物对象
- is_high_precise (bool): 是否更精确,缺省为False

###### Raises

- e: _description_

###### Returns

- float: 分子疏水表面积

<hr>

##### calc_diameter

计算单个分子的最大直径。

###### Args

- mol (Chem.Mol): RDKit分子对象,假设已经有3D坐标。

###### Returns

- float: 分子的最大直径(Angstrom)。

<hr>

##### calc_sssr

计算分子的环信息(SSSR 与 RingInfo 快照)。

###### 功能

- 调用 GetSymmSSSR(mol) 触发并获取“对称 SSSR”环集(以原子索引表示)。
- 读取 RingInfo(mol.GetRingInfo()),给出每个原子/键的“环计数”等信息,以及按原子/按键的环列表。
- 返回结构化结果,便于后续分析与统计(不再 print)。

###### 参数

- mol : rdkit.Chem.rdchem.Mol

    RDKit 分子对象。函数内部不会修改该对象(仅读取)。

###### 返回

- Dict[str, Any]
    {

    "num_rings": int,                     # SSSR 环的数量

    "atom_rings": List[List[int]],        # SSSR:每个环对应的原子索引列表

    "by_size": Dict[int, List[List[int]]],# 按环尺寸分组的 SSSR

    "ri_atom_rings": List[List[int]],     # RingInfo.AtomRings()(不一定等同于 SSSR)

    "ri_bond_rings": List[List[int]],     # RingInfo.BondRings()

    "atom_ring_count": List[int],         # 每个原子属于多少个环

    "bond_ring_count": List[int],         # 每根键属于多少个环

    "atom_in_ring": List[bool],           # 原子是否在任何环中(派生自计数>0)

    "bond_in_ring": List[bool],           # 键是否在任何环中(派生自计数>0)

    "algorithm": str,                     # 'SymmSSSR'

    }

###### 说明

- GetSymmSSSR() 会确保环感知已进行,并把信息缓存到 RingInfo。
- 返回中的 `atom_rings` 是 SSSR(对称最小环集);`ri_atom_rings/ri_bond_rings` 来自 RingInfo,可能包含与 SSSR 不完全一致的环枚举(实现层面差异)。
- 原子/键的“是否在环中”通过计数 > 0 派生,效率高且直观。

<hr>

##### calc_logp

计算分子化合物的脂溶性(LogP)。

###### Args

- mol (Chem.Mol): 分子化合物Mol实例对象,支持Chem.Mol子类实例

###### Returns

- float: 脂溶性(LogP)值

<hr>

##### calc_morgan_fringerprint

计算分子化合物的Morgan指纹。

###### 说明

Morgan指纹是RDKit中一种常用的分子指纹类型,可以用于描述分子的结构和相似性。

它基于分子的拓扑结构和半径参数生成,具有以下特点:

1. 生成的指纹是一个固定长度的二进制向量,每个位表示一个子结构的存在或缺失。
2. 指纹的长度和半径参数可以根据需要进行调整,以平衡指纹的信息量和计算效率。
3. 可以使用不同的哈希函数来生成指纹,以增加指纹的多样性和鲁棒性。

GetMorganGenerator签名: 参考doc/specifications/interface/GetMorganGenerator.md

###### 注意事项

关于countSimulation参数

1. Morgan 指纹默认行为:

    - 默认情况下(countSimulation=False):

        Morgan 指纹是一个位向量,值为 0 或 1,表示某个化学环境是否存在。

    - 启用计数模拟(countSimulation=True):

        Morgan 指纹包含整数值,表示某个化学环境出现的次数。

2. 在分类问题中:

    2.1 如果化学环境的 存在与否 是关键,则 0 和 1 的位向量形式通常足够。

      - 适用场景:

        a.化学环境的存在与否足够描述目标性质。

        b.任务是分类问题(例如,是否具有毒性、是否活跃)。

        c.数据稀疏,或子结构的出现次数分布较均匀。

    2.2 如果化学环境的 出现频率 是分类的潜在决定因素,则保留计数信息可能更有帮助。

      - 适用场景:

        a.化学环境的出现频率对分类任务有重要影响。

        b.任务需要描述分子中功能性团的强度(如高毒性分子)。

        c.需要捕捉数量信息的额外价值。

        d.其他解决预测问题或者分析场景

###### 引用[fingerprint](https://github.com/daiyizheng/DL/blob/master/07-rdkit/08-rdkit%E5%8C%96%E5%AD%A6%E6%8C%87%E7%BA%B9.ipynb)

###### Args

- mol (Chem.Mol): 分子化合物Mol实例对象,支持Chem.Mol子类实例
- countSimulation (bool): 是否开启计数,缺省为False(此参数详细参考注意事项部分)
- bitSize (int): 位向量长度,缺省2048

###### Returns

- np.array: Morgan指纹数据数组

<hr>

##### calc_maccs_fingerprint

计算分子化合物的MACCS指纹。

###### 方法

使用rdkit.Chem.MACCSkeys.GenMACCSKeys 函数来计算分子

###### 说明

MACCS (Molecular ACCess System) 分子指纹是一种用于表示分子结构信息的二进制指纹。  
MACCS分子指纹是基于分子中是否含有特定的亚结构来定义的,共包含166个不同的分子特征。  
每个特征都对应于一个特定的化学子结构,例如,一个羟基、一个苯环或一个氮原子等。  
如果分子中存在这个特征,则该特征对应的二进制位上的值为1,否则为0。  
MACCS分子指纹的长度为166位,它可以用于分子相似性比较、分子分类、分子聚类、分子筛选等许多领域中的化学信息学研究。

###### 注意事项

无

###### 引用 [fingerprint](https://github.com/daiyizheng/DL/blob/master/07-rdkit/08-rdkit%E5%8C%96%E5%AD%A6%E6%8C%87%E7%BA%B9.ipynb)

###### Args

- mol (Chem.Mol): 分子化合物Mol实例对象,支持Chem.Mol子类实例

###### Returns

- np.array: MACCS指纹数据数组

<hr>

##### calc_crippen_contribs

计算分子的Crippen规范化贡献,包括每个原子的 logP 贡献和摩尔折射率贡献。

###### 方法

无

###### 说明

基于分子的原子电荷和分子的几何形状计算的,可以用于描述分子的溶解度、生物利用度和其他性质。  
这个函数通常与RDKit分子对象一起使用。

###### 注意事项

  1. 传入的Chem.Mol对象应先调用UpdatePropertyCache方法处理
  2. Crippen规范化贡献虽然是按整个分子化合物计算,但计算结果应按索引位置将贡献值分配到对应原子作为原子特征的一部分

###### Args

- mol (Chem.Mol): 分子化合物Mol实例对象,支持Chem.Mol子类实例

###### Returns

- tuple: 包括每个原子的 logP 贡献和摩尔折射率贡献
        元组,其中包含两个长度为分子中原子数的列表。
        第一个列表包含每个原子的Crippen贡献的平均值,
        第二个列表包含每个原子的Crippen贡献的标准差。

<hr>

##### calc_pharmacophore_features

计算药效团特征。

###### 说明

  1. 检测是否含 3D conformer, 若无则做简单的Embed + Optimize(可选).
  2. 计算氢键受体/供体, 阴阳离子, 环原子, 疏水原子等.
  3. 返回 (features, features_count, atom_list).
  其中:
      - features = {"rings":0/1,...}
      - features_count = {"rings":N,...}
      - atom_list = {"rings":[...],...} (1-based or list of lists)

<hr>

##### calc_intra_pharmacophore_distance

计算每种药效团类型内部所有原子的欧几里得距离矩阵。

###### 参数

- mol: RDKit Mol 对象,需包含 3D 坐标 (Conformer)。
- atom_list: dict,每种药效团类型对应的原子编号列表,如 {'rings': [1,2,3], 'anion': [4,5], ...}
- conf_id: int,可选,指定使用哪个 conformer 计算距离。

###### 返回

- intra_distances: dict

    key为药效团类型,value为对应的距离矩阵 (二维list),  
    如: {'rings': [[0.0, 1.2, ...], [...], ...], 'anion': [...], ...}

<hr>

##### calc_inter_pharmacophore_distance

计算两个不同药效团类型之间的欧几里得距离矩阵。

###### 参数

- mol: RDKit Mol 对象,需包含 3D 坐标 (Conformer)。
- atom_list: dict,每种药效团类型对应的原子编号列表,例如:

    {'rings': [1,2,3], 'anion': [4,5], 'cation': [], ...}

- type1: str,第一个药效团类型 (如 'rings', 'anion', 'cation', 'acceptor', 'donor', 'hydrophobe')
- type2: str,第二个药效团类型
- conf_id: int,可选,指定使用哪个 conformer 计算距离。

###### 返回

- inter_distance_matrix:

    二维 list, 形状为 (len(type1原子), len(type2原子))

<hr>

##### get_ring_atoms

提取所有环中原子的索引 (1-based)。

###### 参数

- mol: RDKit Mol 对象
- use_ringinfo: bool

    如果为 True, 使用 ringinfo 来识别环原子;  
    如果为 False, 使用 GetSymmSSSR.

###### 返回

- List[List[int]] : 每个环是一个列表, 里面存环内的原子(1-based).

    例如: [[1,2,3,4,5,6],[8,9,10]].

<hr>

##### get_hydrophobic_clusters

识别疏水原子并根据 3D 距离聚合成多个“簇”,以对应 SDF 中多行 hydrophobe。

返回: List[List[int]], 每个子列表是一群(簇)疏水原子的 1-based 索引。

###### 参数

- mol : RDKit Mol 对象 (需有3D构象,若无需先 Embed + 优化)
- distance_threshold : float

    任意两个候选疏水原子的3D距离若 < 该值,就视为同一簇。  
    默认1.0Å,也可尝试1.5/2.0等。
- extended : bool

    True: 不含 O,N,S,P,卤素(F,Cl,Br,I)的碳视为疏水  
    False: 仅要求邻居里无 O,N

###### 返回

- clusters_1based : List[List[int]]

    例如 [[10,12,14],[18,22,23]],表示两簇疏水原子(1-based)。  
    若没有疏水原子,返回空列表 []。

<hr>

##### get_anions_cations

获取阴离子原子列表、阳离子原子列表 (1-based)。

当前基于 formal charge 判定:  
- atom.GetFormalCharge() <0 => anion
- atom.GetFormalCharge() >0 => cation

对多价电荷, 同样识别到同一组, 如 +2 => cation.  
若需部分电荷, 需额外力场/量化计算.

<hr>

##### get_hbond_acceptors

识别氢键受体原子 (0-based 索引)。

返回 SubstructMatch 的 tuple list, 每个元素是 (atom_idx, ...).  
如果只需原子 idx, 可自行提取 match[0].  
这里使用稍微更全的 SMARTS 例子, 包含芳环N, 羰基O等.

<hr>

##### get_hbond_donors

识别氢键供体原子 (0-based 索引)。

<hr>

### optimizer 模块

分子构象优化,序列化/反序列化

#### Optimizer 构象优化类

<hr>

##### embed_and_optimize (静态类方法)

使用RDKit生成初始3D构象并优化几何结构。优先使用MMFF94力场优化,失败则回退到UFF。

###### Args

- mol (Mol): RDKit 分子对象(可未消毒)。函数内部会复制一份工作副本,不会修改来参。
- max_embed_attempts (int, optional): 3D 构象嵌入(ETKDG)的最大尝试次数。

    数值越大,困难分子的成功率越高,但时间也越长。  
    `建议`:一般 200–1000;含大环/复杂稠环可适当提高。. Defaults to 1000.
- random_seed (int, optional): 随机种子。固定值可复现结果;

    设置为 -1 表示完全随机(非确定性)。  
    `建议`:科研/调试阶段建议固定;生产批量可使用 -1 提高多样性. Defaults to 0xC0FFEE.
- use_small_ring_torsions (bool, optional): ETKDG 的小环扭转参数。

    开启通常更符合小环(如 3–5 元环)经验构象,提升嵌入质量。 to True.
- use_macrocycle_torsions (bool, optional): . ETKDG 的大环扭转处理。

    对大环/多环体系开启有利于找到更合理的初始构象。 to True.
- prune_rms_thresh (float, optional):

    构象剔除的 RMSD 阈值(重复构象的去冗策略)。  
    即便只嵌 1 个构象,这个阈值也会影响“寻找与已有构象足够不同”的重试逻辑。  
    值越大,越容易把相似构象视为“重复”而继续尝试。  
    `建议`:0.1–0.5 Å 之间较常用。. Defaults to 0.1.
- max_ff_iters (int, optional): 力场最小化的最大迭代次数。

    数值越大,越有机会“收敛”;但时间也更长。

    `建议`:200–1000。若经常“不收敛”,可先增大再考虑结构预处理。. Defaults to 500.

###### **Raises:**

- ValueError: RDKit 分子对象为空

###### **Returns:**

Tuple[ Mol, bool, Dict[str, Any]]:

- optimized_mol : rdkit.Chem.Mol 已添加显式氢的分子对象,包含单一 3D 构象。

    若过程中失败,也会返回当前工作副本以便诊断。
- ok : bool
    是否达到力场收敛条件(True=收敛;False=未收敛/失败)。
- meta : Dict[str, Any]
    诊断信息字典,常见键如下(按情况部分缺省):
  - stage : str
      当前执行阶段:"init" | "sanitize" | "embed" | "optimize"。
  - method : str
      实际使用的力场方法:"MMFF94" 或 "UFF"。
  - energy : float
      最终力场能量(力场单位,通常可视为 kcal/mol;仅在同一力场内比较具有可比性)。
  - steps : int
      _RDKit Minimize 返回码_(注意:不是实际步数)。0 表示收敛,非 0 表示未收敛。
  - message : str
      提示/警告/错误信息(例如 "ETKDG embedding failed"、"MMFF params unavailable, fallback to UFF")。

###### 行为与保证

- 不修改传入的 `mol`;在其复制体上操作。
- 执行消毒(Sanitize)与立体化学分配(AssignStereochemistry)。
- 添加显式氢(AddHs)。
- 使用 ETKDGv3 进行 3D 嵌入;清空并仅保留 1 个构象。
- 优先尝试 MMFF94;若分子不支持,则回退 UFF。
- `ok=True` 表示力场最小化返回码为 0(达到收敛条件);否则为 False。
- 发生常见化学问题(嵌入失败、力场不可用等)时不抛异常,而是 `ok=False` 并在 `meta['message']` 给出原因。
    若 `mol is None` 或输入不可用,可能抛出 `ValueError`。

###### 使用建议

- 需要结果可重复:保持固定 `random_seed`。
- 大环/复杂体系:保持 `use_macrocycle_torsions=True`,适当调大 `max_embed_attempts`。
- 经常未收敛:增大 `max_ff_iters`;或先做电荷/价态/金属配位等预处理。
- 批量处理时,建议记录/持久化 `meta`,便于后期追溯与质量筛选(如优先选用 MMFF94 且收敛的结果)

<hr>

##### **to_serialize (静态类方法)**

将一组 RDKit Mol 对象序列化为“单一二进制容器 blob”(高性能、无文本中间态)。

该二进制容器旨在用于**跨进程/跨应用 IPC 或持久化**,完整保留构象、坐标、手性和(可选)属性。

###### **Args:**

- mols : Iterable[Optional[rdkit.Chem.Mol]]

    分子序列;可包含 None(将写出空占位记录,保持位置对应)。
- include_props : Chem.PropertyPickleOptions, default Chem.GetDefaultPickleProperties()

    是否将分子属性(props)一并打包。推荐 True。
- with_checksum : bool, default True

    是否为每条记录附加 CRC32 校验。生产环境强烈建议开启(默认开启)。

###### **Returns:**

- bytes
    自定义二进制容器(v2)。推荐通过管道/Socket/共享内存/文件在进程或应用间传输。

###### **Raises**

- ValueError 输入序列为空。
- RDKit 相关异常,个别分子损坏等导致二进制写出失败时。

###### **性能与可靠性**

- 性能:相对文本(SDF/MolBlock/JSON)通常更小更快。CRC32 为 C 实现,开销很低(每秒 GB 级)。
- 可靠:长度前缀 + CRC32 抵御截断/半包/损坏;大端编码利于跨语言一致性。
- 兼容性:用于在线协作/IPC 非常稳妥;若用于超长期归档,建议额外保留文本/JSON 以防极端跨大版本情况。

<hr>

##### **to_unserialize (静态类方法)**

从 Optimizer.to_serialize() 产出的**二进制容器**还原出 Mol/None 列表(位置一一对应)。

###### **兼容性**

- v2 容器:b"RDKB\\x02"(推荐;含 RDKit 版本与 flags/CRC)
- v1 容器:b"RDKB\\x01"(向后兼容;仅 header + count + [len+payload],无版本/flags/CRC)
- 裸 RDKit 单体二进制:若魔数不匹配,尝试作为**单体 Mol** 的 RDKit 二进制读取,成功则返回长度为 1 的列表。

###### **Args:**

- serialized_mols : bytes

    二进制容器 blob。

###### **Returns:**

- List[Optional[rdkit.Chem.Mol]]

    解析得到的分子列表;`None` 表示对应位置为空占位(或损坏记录在启用 CRC 下被拒绝)。

###### **Raises**

- ValueError
  - 入参为空;
  - 容器头不合法,且也不是裸 RDKit 二进制;
  - v2/v1 容器数据结构截断或记录长度异常;
  - v2 容器且 CRC 校验不通过(说明数据损坏/被截断/被篡改)。

###### **说明**

- 使用大端解码(network order)。
- v2 容器会读取并忽略 RDKit 版本字符串(可根据需要记录日志/检查兼容)。
- 默认在 v2 下启用 CRC32 校验(如果写端开启了该标志)。

<hr>

# loaders 模块

加载数据模块

## sdf_loader 模块

### SDFLoader 类

SDF格式数据加载器

从SDF格式数据中读取分子化合物数据,支持从文件、从目录和从文本三种方式读取。

<hr>

#### **closeWarning 关闭警告信息**

此方法是禁用rdkit的警告信息输出

##### Args

无

##### Returns

无

<hr>

#### openWarning 打开警告信息

此方法是恢复rdkit的警告信息输出

##### Args

无

##### Returns

无

<hr>

#### readDataFromFile

从指定sdf文件中读取 molecule 数据

##### Args

- sdfDoc (str):

    sdf 文档名(含路径)
- startIndex (int, optional):

    起始索引(包含). Defaults to 0.
- endIndex (int, optional):

    结束索引(不包含,默认值 -1 代表读取到文件末尾). Defaults to -1.
- ignore_error (bool, optional):

    是否忽略错误. Defaults to True.

##### **Raises:**

- FileNotFoundError: 指定文件不存在
- ValueError: 开始索引必须是非负数
- ValueError: 结束索引必须大于开始索引,或指定为-1
- ValueError: 数据段错误

##### Returns

- tuple[list[Mol], int, int]:

    已读出 molecule 列表, 应读数量, 实际读取数量

----

#### readDataFromDir

从指定目录中读取所有的sdf文件,并且读取全部文件的molecule数据

##### Args

- sdfDir (str):

    指定读取的文件目录
- recursive (bool, optional):

    是否递归子目录. Defaults to True.
- ignore_error (bool, optional):

    是否忽略错误. Defaults to True.

##### Raises

- FileNotFoundError: 指定目录不存在
- TypeError: 指定目录并非目录类型

##### Returns

- tuple[list[Mol], int, int]:

    已读出molecule列表,应读数量,实际读取数量

<hr>

#### readDataFromText

从给定文本中读取分子化合物数据

##### Args

- sdfText (str):

    包含分子化合物的sdf格式内容的文本
- ignore_error (bool, optional):

    是否忽略错误. Defaults to True.

##### Returns

- tuple[list[Mol], int, int]:

    已读出molecule列表,应读数量,实际读取数量

<hr>

#### splitDataByMarker

根据指定的标记将SDF文本数据拆分为多个分子数据段

##### Args

- sdfText (str):

    包含至少一个分子化合物的SDF格式内容的文本

- marker (str, optional):

    用于拆分的标记字符串. Defaults to "$$$$\n".

- strict (bool, optional):
    
    是否严格模式. Defaults to True.

##### Raises:
- ValueError: 如果严格模式下没有找到标记

##### Returns

- List[str]:

    拆分后的多个分子数据段列表

<hr>

## nih.pubchem.online 模块

在线实时获取NIH PubChem数据

<hr>

### compound模块

Module for fetching PubChem compound records and conformers over the NIH PubChem REST API,
parsing the returned SDF/JSON payloads, and returning structured ALNPCompound / ALNPConformer
objects.

#### Primary responsibilities

- Request compound SDF blocks for a list of PubChem CIDs.
- Optionally request conformer metadata (ConformerID) and then fetch conformer SDFs.
- Parse SDF content via SDFLoader and construct ALNPCompound and ALNPConformer instances.
- Return a mapping from CID (string) to ALNPCompound instances, with conformer data
    attached when requested.

#### Provided function

- **get_compound**(cid_list: List[str], include_conformer: bool = False) -> Dict[str, ALNPCompound]

- **get_similarity_compound**(
    input: EInputType,
    value: str | int,
    operation: EOperationType,
    output: EOutputType,
    threshold: int = 90,
    max_records: int = 10
) -> List[str]:

----

#### get_compound方法

##### **Parameters**

- **cid_list (List[str])**

    Sequence of PubChem CIDs to fetch. Each CID should be convertible to string; callers
    typically pass strings (e.g. ["2244","3672"]) or integers converted to strings.
- **include_conformer (bool, default False)**

    When False, only the primary compound SDF/metadata is fetched and returned.
    When True, the function also queries the conformer metadata endpoint to obtain
    ConformerIDs for each CID, and then requests SDFs for those conformers and attaches
    ALNPConformer objects under each ALNPCompound.

##### **Return value**

- Dict[str, ALNPCompound]
    A dictionary keyed by the CID string. Each value is an ALNPCompound instance
    populated with:  
    - PUBCHEM_COMPOUND_CID (from SDF properties)
    - ROW (raw SDF block text for the molecule)
    - If include_conformer True:
      - CONFORMER_ID: List[str] of conformer IDs reported by PubChem for that CID
      - CONFORMERS: Dict[str, ALNPConformer] keyed by conformer ID; each conformer
          contains `PUBCHEM_CONFORMER_ID`, `PUBCHEM_COMPOUND_CID` and `ROW` (raw conformer SDF).

##### **Behavior and error handling**

- Uses HTTP endpoints:
  - Compound SDF by CID(s): /rest/pug/compound/cid/{cid_list}/SDF?response_type=display
  - Conformer metadata for CIDs: /rest/pug/compound/cid/{cid_list}/conformers/JSON
  - Conformer SDFs by Conformer ID(s): /rest/pug/conformers/{conformer_id_list}/SDF?response_type=display
- Interprets fetch_url(...) return value as a dict with at least "status_code", "success",
    and "content" keys.
- For 200 responses with "success" False, raises ValueError indicating CID(s) not found.
- For HTTP 404 or 503 at top-level requests, raises ValueError with descriptive messages.
- For conformer fetching, any exceptions raised while parsing conformer SDF content are
    caught and printed; partial results may still be returned for other CIDs.
- Logs important events and warnings:
  - Missing molecules in SDF payloads
  - CID mismatches between JSON metadata and SDF properties
  - Conformer IDs being processed

##### **Notes, constraints and assumptions**

- The SDF parsing relies on SDFLoader.splitDataByMarker and SDFLoader.readDataFromText.
    Those functions are expected to return lists of RDKit-like molecule objects (mol.GetProp(...))
    and counts. The code assumes SDF blocks include `PUBCHEM_COMPOUND_CID` and (for conformers)
    `PUBCHEM_CONFORMER_ID` properties.
- The returned dictionary is pre-populated with keys from the input cid_list (strings)
    and values set to ALNPCompound instances only when parsed successfully; entries may remain None
    if the SDF for a given CID could not be parsed or was absent.
- Network reliability and rate limits are outside this module's control; callers should
    handle transient failures or consider retry/backoff when calling get_compound with large lists.

##### **Example**

- Simple usage:

```python

    cids = ["2244", "3672"]
    compounds = get_compound(cids, include_conformer=False)
    # compounds is a dict mapping "2244" -> ALNPCompound(...)

```

- Fetch compounds with conformers:

```python

    compounds_with_confs = get_compound(["2244"], include_conformer=True)
    conf_ids = compounds_with_confs["2244"].CONFORMER_ID
    conformers_map = compounds_with_confs["2244"].CONFORMERS
```

##### **Types referenced**

- ALNPCompound: container/dataclass representing a PubChem compound record and any attached conformer info.
- ALNPConformer: container/dataclass representing a single conformer (ID, parent CID, raw SDF).

Security and privacy

- Requests are made to the public PubChem REST endpoints; no credentials are required.
- Raw SDF content is stored in returned objects' ROW fields and may contain structural or identifier data;
    treat returned data according to your privacy/security policies.

----

#### get_similarity_compound方法

##### **Summary**

Retrieve similar compounds from PubChem and return detailed compound data.
This function builds a PubChem similarity search URL from the provided
input parameters, performs the HTTP request, parses the returned data
(SDF/JSON/TXT), extracts matching PubChem Compound IDs (CIDs), and then
fetches full compound records (and optionally conformers) for those CIDs.

##### Parameters

- **input (EInputType):**

  The type of the search input (for example SMILES, InChI, CID, etc.).

- **value (str | int):**

    The search value. For CID input this must be an integer or a numeric
        string. Value must not be empty.

- **operation (EOperationType):**

  The PubChem operation type (for example CIDS or RECORD).

- **output (EOutputType):**

    The requested output format from PubChem (SDF, JSON, TXT, ...).

- **threshold (int,optional):**

  Similarity threshold (percent). Default is 90.

- **max_records (int, optional):**

    Maximum number of similar records to request. Default is 10.


##### Return value

- List[str]
    A list by the CID string. Each value is string

##### Raises

- ValueError
  - If `value` is empty.
  - If `input` is EInputType.CID and `value` cannot be parsed as an integer.
  - If `operation` is CIDS and `output` is EOutputType.SDF (disallowed).
  - If a RECORD operation requests TXT output (invalid combination).
  - If the HTTP fetch returns a non-200 status or indicates failure.
  - If the returned content is empty when parsing JSON/SDF/TXT responses.

##### Behavior and error handling

- Uses HTTP endpoints:
    - Compound SDF by CID(s): /rest/pug/compound/cid/{cid_list}/SDF?response_type=display
    - Conformer metadata for CIDs: /rest/pug/compound/cid/{cid_list}/conformers/JSON
    - Conformer SDFs by Conformer ID(s): /rest/pug/conformers/{conformer_id_list}/SDF?response_type=display
- Interprets fetch_url(...) return value as a dict with at least "status_code", "success",
    and "content" keys.
- For 200 responses with "success" False, raises ValueError indicating CID(s) not found.
- For HTTP 404 or 503 at top-level requests, raises ValueError with descriptive messages.
- For conformer fetching, any exceptions raised while parsing conformer SDF content are
    caught and printed; partial results may still be returned for other CIDs.
- Logs important events and warnings:
    - Missing molecules in SDF payloads
    - CID mismatches between JSON metadata and SDF properties
    - Conformer IDs being processed

##### Notes, constraints and assumptions

- This function uses an internal URL template to request PubChem similarity
    - results, then parses the response according to `output`:
    - SDF: parsed via SDFLoader.readDataFromText
    - JSON: parsed via json.loads and expected keys vary with `operation`
    - TXT: expected to contain one CID per line

- After extracting CIDs, the function calls get_compound(...) to obtain
    detailed compound (and optional conformer) data.
- Progress reporting, error continuation, and exact returned ALNPCompound
    structure are delegated to the underlying fetch/get routines.

- others
  - input 可选值为cid,smiles,InChI.

  - ouput 可选值为SDF,JSON或TXT

  - operation 可选值为record,cids,sids

  - others
| Option     | Type    | Meaning                                             | Default   |
| ---------- | ------- | --------------------------------------------------- | --------- |
| Threshold  | integer | minimum Tanimoto score for a hit                    | 90        |
| MaxSeconds | integer | maximum search time in seconds                      | unlimited |
| MaxRecords | integer | maximum number of hits                              | 2M        |
| listkey    | string  | restrict to matches within hits from a prior search | none      |

##### Example

- Simple usage:

```python

    cid = 2244
    compounds = get_similarity_compound(
                                input=EInputType.CID, 
                                value=cid, 
                                operation=EOperationType.CIDS, 
                                output=EOutputType.TXT)
    # compounds is a dict mapping "2244" -> ALNPCompound(...)

```

or

```python

    smiles = "CCCCCC1C(C(OC(=O)C(C(OC1=O)C)NC(=O)C2=C(C(=CC=C2)NC=O)O)C)OC(=O)CC(C)C"
    print(get_similarity_compound(input=EInputType.SMILES, 
                                  value=smiles, 
                                  operation=EOperationType.CIDS, 
                                  output=EOutputType.TXT))
    # compounds is a dict mapping smiles -> ALNPCompound(...)

```

- Fetch compounds with conformers:

```python

    smiles = "CCCCCC1C(C(OC(=O)C(C(OC1=O)C)NC(=O)C2=C(C(=CC=C2)NC=O)O)C)OC(=O)CC(C)C"
    print(get_similarity_compound(input=EInputType.CID, 
                                  value=cid_list[-1], 
                                  operation=EOperationType.CIDS, 
                                  output=EOutputType.TXT))
    # compounds is a dict mapping smiles -> ALNPCompound(...)

```

##### Types referenced

- ALNPCompound: container/dataclass representing a PubChem compound record and any attached conformer info.
- ALNPConformer: container/dataclass representing a single conformer (ID, parent CID, raw SDF).
Security and privacy
- Requests are made to the public PubChem REST endpoints; no credentials are required.
- Raw SDF content is stored in returned objects' ROW fields and may contain structural or identifier data;
    treat returned data according to your privacy/security policies.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pybiotech",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "AILingues, components, biotech, library",
    "author": null,
    "author_email": "AI Lingues <support@ailingues.com>",
    "download_url": null,
    "platform": null,
    "description": "<!--\n * @Author: Zeng Shengbo shengbo.zeng@ailingues.com\n * @Date: 2025-06-19 15:05:41\n * @LastEditors: Zeng Shengbo shengbo.zeng@ailingues.com\n * @LastEditTime: 10/22/2025 00:19:32\n * @FilePath: //pybiotech//README.md\n * @Description:\n * \n * Copyright (c) 2025 by AI Lingues, All Rights Reserved. \n-->\n\nAI Lingues Biotech Python Library\n====\n\n\u672c\u7ec4\u4ef6\u5e93\u7528\u4e8e\u751f\u7269\u79d1\u5b66\u548c\u533b\u7597\u65b9\u9762\u7684\u6570\u636e\u5904\u7406\u3001\u5206\u6790\u7b49\u3002\n\n- \u652f\u6301NIH PubChem\u516c\u5f00\u6570\u636e\u5e93\u6570\u636e\u67e5\u8be2\u8bbf\u95ee\n- \u652f\u6301SDF\u683c\u5f0f\u6587\u4ef6\u9ad8\u901f\u8bfb\u53d6\n- \u652f\u6301\u5206\u5b50\u5316\u5408\u7269\u6784\u8c61\u529b\u573a\u4f18\u5316\u53ca\u7279\u5f81\u8ba1\u7b97\n\n<hr>\n\n**\u6700\u65b0\u7248\u672c**\n\nversion **0.2.6**\n\n**\u4e3b\u8981\u4f9d\u8d56**\n\n- python >= 3.11\n- pycorelibs >= 0.2.6\n- rdkit == 2024.9.6\n\n**CopyRight**\n\n    AI Lingues Team\n\n**email**\n\n    support@ailingues.com\n\n<hr>\n\n# core \u6a21\u5757\n\n<hr>\n\n## molecule \u6a21\u5757\n\n### calculator \u6a21\u5757\n\n<hr>\n\n#### MolCaculator \u5206\u5b50\u8ba1\u7b97\u5668\u7c7b\n\n\u8ba1\u7b97\u5206\u5b50\u5316\u5408\u7269\u76f8\u5173\u7684\u53c2\u6570\u3001\u7279\u5f81\u7b49\uff0c\u51fd\u6570\u6e05\u5355\u5982\u4e0b\uff1a\n\n| \u51fd\u6570                              | \u63cf\u8ff0                                                                   |\n| --------------------------------- | ---------------------------------------------------------------------- |\n| approximate_oe_hydrophobe         | \u8ba1\u7b97\"\u758f\u6c34\u7c07\"                                                           |\n| calc_surface_area                 | \u8ba1\u7b97\u5206\u5b50\u5316\u5408\u7269\u7684\u8868\u9762\u79ef                                                 |\n| calc_hydrophobic_surface_area     | \u8ba1\u7b97\u5206\u5b50\u758f\u6c34\u8868\u9762\u79ef                                                     |\n| calc_diameter                     | \u8ba1\u7b97\u5355\u4e2a\u5206\u5b50\u7684\u6700\u5927\u76f4\u5f84                                                 |\n| calc_sssr                         | \u8ba1\u7b97\u5206\u5b50\u7684\u73af\u4fe1\u606f(SSSR \u4e0e RingInfo \u5feb\u7167\uff09                               |\n| calc_logp                         | \u8ba1\u7b97\u5206\u5b50\u5316\u5408\u7269\u7684\u8102\u6eb6\u6027(LogP)                                           |\n| calc_morgan_fringerprint          | \u8ba1\u7b97\u5206\u5b50\u5316\u5408\u7269\u7684Morgan\u6307\u7eb9                                             |\n| calc_maccs_fingerprint            | \u8ba1\u7b97\u5206\u5b50\u5316\u5408\u7269\u7684MACCS\u6307\u7eb9                                              |\n| calc_crippen_contribs             | \u8ba1\u7b97\u5206\u5b50\u7684Crippen\u89c4\u8303\u5316\u8d21\u732e,\u5305\u62ec\u6bcf\u4e2a\u539f\u5b50\u7684 logP \u8d21\u732e\u548c\u6469\u5c14\u6298\u5c04\u7387\u8d21\u732e   |\n| calc_pharmacophore_features       | \u8ba1\u7b97\u836f\u6548\u56e2\u7279\u5f81                                                         |\n| calc_intra_pharmacophore_distance | \u8ba1\u7b97\u6bcf\u79cd\u836f\u6548\u56e2\u7c7b\u578b\u5185\u90e8\u6240\u6709\u539f\u5b50\u7684\u6b27\u51e0\u91cc\u5f97\u8ddd\u79bb\u77e9\u9635                       |\n| calc_inter_pharmacophore_distance | \u8ba1\u7b97\u4e24\u4e2a\u4e0d\u540c\u836f\u6548\u56e2\u7c7b\u578b\u4e4b\u95f4\u7684\u6b27\u51e0\u91cc\u5f97\u8ddd\u79bb\u77e9\u9635                           |\n| get_ring_atoms                    | \u63d0\u53d6\u6240\u6709\u73af\u4e2d\u539f\u5b50\u7684\u7d22\u5f15 (1-based)                                       |\n| get_hydrophobic_clusters          | \u8bc6\u522b\u758f\u6c34\u539f\u5b50\u5e76\u6839\u636e 3D \u8ddd\u79bb\u805a\u5408\u6210\u591a\u4e2a\u201c\u7c07\u201d\uff0c\u4ee5\u5bf9\u5e94 SDF \u4e2d\u591a\u884c hydrophobe |\n| get_anions_cations                | \u83b7\u53d6\u9634\u79bb\u5b50\u539f\u5b50\u5217\u8868\u3001\u9633\u79bb\u5b50\u539f\u5b50\u5217\u8868 (1-based)                           |\n| get_hbond_acceptors               | \u8bc6\u522b\u6c22\u952e\u53d7\u4f53\u539f\u5b50 (0-based \u7d22\u5f15)                                        |\n| get_hbond_donors                  | \u8bc6\u522b\u6c22\u952e\u4f9b\u4f53\u539f\u5b50 (0-based \u7d22\u5f15)                                        |\n\n<hr>\n\n##### approximate_oe_hydrophobe\n\n\u8ba1\u7b97\"\u758f\u6c34\u7c07\"\u3002  \n\u5728\u7eaf RDKit \u73af\u5883\u4e0b\u8fd1\u4f3c\u5730\u6a21\u4eff OEShape \u7684\u758f\u6c34\u539f\u5b50\u805a\u5408\u903b\u8f91, \u8fd4\u56de\u591a\u4e2a\"\u758f\u6c34\u7c07\"\u3002\n\u6bcf\u4e2a\u7c07\u53ef\u89c6\u4e3a\u4e00\u884c hydrophobe, \u7c7b\u4f3c:  \n3  38 41 42 hydrophobe  \n4  17 19 20 21 hydrophobe\n\n\u7b49\u7b49.\n\n###### \u53c2\u6570\u8bf4\u660e\n\n- mol : RDKit Mol \u5bf9\u8c61\n\n    \u5982\u679c\u5df2\u5e26\u6709\u5408\u7406\u7684 3D conformer\uff0c\u53ef\u4e0d\u518d embed\u3002\u5982\u679c\u6ca13D,\u4e14embed_if_needed=True\u5219\u4f1a\u81ea\u52a8Embed+Optimize\u3002\n\n- distance_threshold : float\n\n    \u4e24\u4e2a\u5019\u9009\u539f\u5b50\u82e5\u8ddd\u79bb < \u6b64\u9608\u503c\u5219\u89c6\u4e3a\u540c\u4e00\u758f\u6c34\u7c07. OEShape\u5e38\u7528 ~1.0 or 1.5\u00c5.\n\n- exclude_aromatic : bool\n\n    \u662f\u5426\u6392\u9664\u6240\u6709\u82b3\u9999\u78b3(\u4f8b\u5982\u82ef\u73afC). OEShape \u91cc\u901a\u5e38\u67d0\u4e9b\u82b3\u9999C\u4e5f\u53ef\u80fd\u7b97\u758f\u6c34, \u8fd9\u91cc\u53ef\u9009\u3002\n\n- partial_charge_cutoff : float or None\n\n    \u82e5\u4e0d\u4e3a None, \u5219\u4f7f\u7528 Gasteiger \u90e8\u5206\u7535\u8377, \u6392\u9664\u7edd\u5bf9\u503c>=\u8be5\u9608\u503c\u7684\u78b3, \u4ee5\u6392\u9664\u6781\u6027\u78b3.  \n    \u4f8b\u5982 0.2 -> |q|\u22650.2 \u7684\u78b3\u89c6\u4e3a\u4e0d\u758f\u6c34.\n\n- min_cluster_size : int\n\n    \u6700\u5c0f\u7c07\u5927\u5c0f, \u82e5\u67d0\u4e2a\u7c07\u53ea\u6709 < min_cluster_size \u4e2a\u539f\u5b50, \u5219\u53ef\u89c6\u4e3a\u566a\u58f0/\u820d\u5f03 (\u6216\u53ef\u4fdd\u7559).\n\n- extended_filter : bool\n\n    \u82e5 True, \u7528 \"\u4e0d\u542b O,N,S,P,\u5364\u7d20\" \u89c4\u5219\u5254\u9664\u78b3; \u5426\u5219\u53ea\u8981\u90bb\u5c45\u91cc\u65e0 O,N \u5373\u4fdd\u7559\u3002\n\n- embed_if_needed : bool\n\n    \u82e5 mol \u65e0 3D \u6784\u8c61, \u662f\u5426\u8c03\u7528 ETKDG embed.\n\n- max_attempts : int\n\n    embed \u51fa\u9519\u65f6\u7684\u5c1d\u8bd5\u6b21\u6570.\n\n###### \u8fd4\u56de\n\n- hydrophobe_lines : list of tuples  [(atom_count, [a1,a2,...]), ...]\n\n    \u5176\u4e2d a1,a2,... \u662f1-based\u539f\u5b50\u7d22\u5f15, \u5df2\u6392\u5e8f, \u4ee3\u8868\u540c\u4e00\u4e2a\u758f\u6c34\u7c07.\n    \u53ef\u4ee5\u628a\u5b83\u8f6c\u6210 SDF-like\u5b57\u7b26\u4e32:\n        f\"{atom_count} {' '.join(map(str, atom_ids))} hydrophobe\"\n    \u4e5f\u53ef\u4ee5\u76f4\u63a5\u5f53\u6570\u636e\u7ed3\u6784\u7528.\n\n###### \u6ce8\u610f\n\n1. \u4e0d\u4fdd\u8bc1\u4e0e OEShape \u7ed3\u679c\u5b8c\u5168\u4e00\u81f4, \u53ea\u662f\"\u529b\u573a + \u8ddd\u79bb\u805a\u7c7b + \u4e0d\u540c\u8fc7\u6ee4\"\u7684\u601d\u8def.\n2. \u82e5 embed \u5931\u8d25/\u5750\u6807\u4e0d\u5408\u7406, \u6216\u5206\u5b50\u8fc7\u5927, \u53ef\u80fd\u7ed3\u679c\u4ecd\u4e0d\u7406\u60f3.\n3. \u53ef\u591a\u6b21\u8c03\u53c2 distance_threshold, partial_charge_cutoff \u7b49, \u89c2\u5bdf\u5bf9\u7ed3\u679c\u7684\u5f71\u54cd.\n\n##### calc_surface_area\n\n\u8ba1\u7b97\u5206\u5b50\u5316\u5408\u7269\u7684\u8868\u9762\u79ef\u3002\n\n\u5728\u8c03\u7528\u4e4b\u524d,\u5fc5\u987b\u505a\u5982\u4e0b\u5904\u7406\uff1a\n\n 1. \u6dfb\u52a0\u663e\u5f0f\u6c22\u539f\u5b50\n    ```python\n        mol.UpdatePropertyCache(strict=False)  \n        mol_with_H = Chem.AddHs(mol)\n    ```\n\n 2. \u751f\u6210 3D \u5750\u6807\n    ```python\n        AllChem.EmbedMolecule(mol_with_H)  \n        AllChem.MMFFOptimizeMolecule(mol_with_H)\n    ```\n\n###### Args\n\n- mol (Chem.Mol): \u5206\u5b50\u5316\u5408\u7269\u5bf9\u8c61\n\n###### Raises\n\n- Exception: _description_\n\n###### Returns\n\n- float: \u5206\u5b50\u603b\u8868\u9762\u79ef\n\n<hr>\n\n##### calc_hydrophobic_surface_area\n\n\u8ba1\u7b97\u5206\u5b50\u758f\u6c34\u8868\u9762\u79ef\u3002\n\n\u5728\u8c03\u7528\u4e4b\u524d,\u5fc5\u987b\u505a\u5982\u4e0b\u5904\u7406\uff1a\n\n1. \u6dfb\u52a0\u663e\u5f0f\u6c22\u539f\u5b50\n    ```python\n        mol.UpdatePropertyCache(strict=False)  \n        mol_with_H = Chem.AddHs(mol)\n    ```\n\n2. \u751f\u6210 3D \u5750\u6807\n    ```python\n        AllChem.EmbedMolecule(mol_with_H)  \n        AllChem.MMFFOptimizeMolecule(mol_with_H)\n    ```\n\n###### Args\n\n- mol (Chem.Mol): \u5206\u5b50\u5316\u5408\u7269\u5bf9\u8c61\n- is_high_precise (bool): \u662f\u5426\u66f4\u7cbe\u786e,\u7f3a\u7701\u4e3aFalse\n\n###### Raises\n\n- e: _description_\n\n###### Returns\n\n- float: \u5206\u5b50\u758f\u6c34\u8868\u9762\u79ef\n\n<hr>\n\n##### calc_diameter\n\n\u8ba1\u7b97\u5355\u4e2a\u5206\u5b50\u7684\u6700\u5927\u76f4\u5f84\u3002\n\n###### Args\n\n- mol (Chem.Mol): RDKit\u5206\u5b50\u5bf9\u8c61,\u5047\u8bbe\u5df2\u7ecf\u67093D\u5750\u6807\u3002\n\n###### Returns\n\n- float: \u5206\u5b50\u7684\u6700\u5927\u76f4\u5f84\uff08Angstrom\uff09\u3002\n\n<hr>\n\n##### calc_sssr\n\n\u8ba1\u7b97\u5206\u5b50\u7684\u73af\u4fe1\u606f\uff08SSSR \u4e0e RingInfo \u5feb\u7167\uff09\u3002\n\n###### \u529f\u80fd\n\n- \u8c03\u7528 GetSymmSSSR(mol) \u89e6\u53d1\u5e76\u83b7\u53d6\u201c\u5bf9\u79f0 SSSR\u201d\u73af\u96c6\uff08\u4ee5\u539f\u5b50\u7d22\u5f15\u8868\u793a\uff09\u3002\n- \u8bfb\u53d6 RingInfo\uff08mol.GetRingInfo()\uff09\uff0c\u7ed9\u51fa\u6bcf\u4e2a\u539f\u5b50/\u952e\u7684\u201c\u73af\u8ba1\u6570\u201d\u7b49\u4fe1\u606f\uff0c\u4ee5\u53ca\u6309\u539f\u5b50/\u6309\u952e\u7684\u73af\u5217\u8868\u3002\n- \u8fd4\u56de\u7ed3\u6784\u5316\u7ed3\u679c\uff0c\u4fbf\u4e8e\u540e\u7eed\u5206\u6790\u4e0e\u7edf\u8ba1\uff08\u4e0d\u518d print\uff09\u3002\n\n###### \u53c2\u6570\n\n- mol : rdkit.Chem.rdchem.Mol\n\n    RDKit \u5206\u5b50\u5bf9\u8c61\u3002\u51fd\u6570\u5185\u90e8\u4e0d\u4f1a\u4fee\u6539\u8be5\u5bf9\u8c61\uff08\u4ec5\u8bfb\u53d6\uff09\u3002\n\n###### \u8fd4\u56de\n\n- Dict[str, Any]\n    {\n\n    \"num_rings\": int,                     # SSSR \u73af\u7684\u6570\u91cf\n\n    \"atom_rings\": List[List[int]],        # SSSR\uff1a\u6bcf\u4e2a\u73af\u5bf9\u5e94\u7684\u539f\u5b50\u7d22\u5f15\u5217\u8868\n\n    \"by_size\": Dict[int, List[List[int]]],# \u6309\u73af\u5c3a\u5bf8\u5206\u7ec4\u7684 SSSR\n\n    \"ri_atom_rings\": List[List[int]],     # RingInfo.AtomRings()\uff08\u4e0d\u4e00\u5b9a\u7b49\u540c\u4e8e SSSR\uff09\n\n    \"ri_bond_rings\": List[List[int]],     # RingInfo.BondRings()\n\n    \"atom_ring_count\": List[int],         # \u6bcf\u4e2a\u539f\u5b50\u5c5e\u4e8e\u591a\u5c11\u4e2a\u73af\n\n    \"bond_ring_count\": List[int],         # \u6bcf\u6839\u952e\u5c5e\u4e8e\u591a\u5c11\u4e2a\u73af\n\n    \"atom_in_ring\": List[bool],           # \u539f\u5b50\u662f\u5426\u5728\u4efb\u4f55\u73af\u4e2d\uff08\u6d3e\u751f\u81ea\u8ba1\u6570>0\uff09\n\n    \"bond_in_ring\": List[bool],           # \u952e\u662f\u5426\u5728\u4efb\u4f55\u73af\u4e2d\uff08\u6d3e\u751f\u81ea\u8ba1\u6570>0\uff09\n\n    \"algorithm\": str,                     # 'SymmSSSR'\n\n    }\n\n###### \u8bf4\u660e\n\n- GetSymmSSSR() \u4f1a\u786e\u4fdd\u73af\u611f\u77e5\u5df2\u8fdb\u884c\uff0c\u5e76\u628a\u4fe1\u606f\u7f13\u5b58\u5230 RingInfo\u3002\n- \u8fd4\u56de\u4e2d\u7684 `atom_rings` \u662f SSSR\uff08\u5bf9\u79f0\u6700\u5c0f\u73af\u96c6\uff09\uff1b`ri_atom_rings/ri_bond_rings` \u6765\u81ea RingInfo\uff0c\u53ef\u80fd\u5305\u542b\u4e0e SSSR \u4e0d\u5b8c\u5168\u4e00\u81f4\u7684\u73af\u679a\u4e3e\uff08\u5b9e\u73b0\u5c42\u9762\u5dee\u5f02\uff09\u3002\n- \u539f\u5b50/\u952e\u7684\u201c\u662f\u5426\u5728\u73af\u4e2d\u201d\u901a\u8fc7\u8ba1\u6570 > 0 \u6d3e\u751f\uff0c\u6548\u7387\u9ad8\u4e14\u76f4\u89c2\u3002\n\n<hr>\n\n##### calc_logp\n\n\u8ba1\u7b97\u5206\u5b50\u5316\u5408\u7269\u7684\u8102\u6eb6\u6027(LogP)\u3002\n\n###### Args\n\n- mol (Chem.Mol): \u5206\u5b50\u5316\u5408\u7269Mol\u5b9e\u4f8b\u5bf9\u8c61,\u652f\u6301Chem.Mol\u5b50\u7c7b\u5b9e\u4f8b\n\n###### Returns\n\n- float: \u8102\u6eb6\u6027(LogP)\u503c\n\n<hr>\n\n##### calc_morgan_fringerprint\n\n\u8ba1\u7b97\u5206\u5b50\u5316\u5408\u7269\u7684Morgan\u6307\u7eb9\u3002\n\n###### \u8bf4\u660e\n\nMorgan\u6307\u7eb9\u662fRDKit\u4e2d\u4e00\u79cd\u5e38\u7528\u7684\u5206\u5b50\u6307\u7eb9\u7c7b\u578b,\u53ef\u4ee5\u7528\u4e8e\u63cf\u8ff0\u5206\u5b50\u7684\u7ed3\u6784\u548c\u76f8\u4f3c\u6027\u3002\n\n\u5b83\u57fa\u4e8e\u5206\u5b50\u7684\u62d3\u6251\u7ed3\u6784\u548c\u534a\u5f84\u53c2\u6570\u751f\u6210,\u5177\u6709\u4ee5\u4e0b\u7279\u70b9\uff1a\n\n1. \u751f\u6210\u7684\u6307\u7eb9\u662f\u4e00\u4e2a\u56fa\u5b9a\u957f\u5ea6\u7684\u4e8c\u8fdb\u5236\u5411\u91cf,\u6bcf\u4e2a\u4f4d\u8868\u793a\u4e00\u4e2a\u5b50\u7ed3\u6784\u7684\u5b58\u5728\u6216\u7f3a\u5931\u3002\n2. \u6307\u7eb9\u7684\u957f\u5ea6\u548c\u534a\u5f84\u53c2\u6570\u53ef\u4ee5\u6839\u636e\u9700\u8981\u8fdb\u884c\u8c03\u6574,\u4ee5\u5e73\u8861\u6307\u7eb9\u7684\u4fe1\u606f\u91cf\u548c\u8ba1\u7b97\u6548\u7387\u3002\n3. \u53ef\u4ee5\u4f7f\u7528\u4e0d\u540c\u7684\u54c8\u5e0c\u51fd\u6570\u6765\u751f\u6210\u6307\u7eb9,\u4ee5\u589e\u52a0\u6307\u7eb9\u7684\u591a\u6837\u6027\u548c\u9c81\u68d2\u6027\u3002\n\nGetMorganGenerator\u7b7e\u540d: \u53c2\u8003doc/specifications/interface/GetMorganGenerator.md\n\n###### \u6ce8\u610f\u4e8b\u9879\n\n\u5173\u4e8ecountSimulation\u53c2\u6570\n\n1. Morgan \u6307\u7eb9\u9ed8\u8ba4\u884c\u4e3a:\n\n    - \u9ed8\u8ba4\u60c5\u51b5\u4e0b\uff08countSimulation=False\uff09\uff1a\n\n        Morgan \u6307\u7eb9\u662f\u4e00\u4e2a\u4f4d\u5411\u91cf,\u503c\u4e3a 0 \u6216 1,\u8868\u793a\u67d0\u4e2a\u5316\u5b66\u73af\u5883\u662f\u5426\u5b58\u5728\u3002\n\n    - \u542f\u7528\u8ba1\u6570\u6a21\u62df\uff08countSimulation=True\uff09\uff1a\n\n        Morgan \u6307\u7eb9\u5305\u542b\u6574\u6570\u503c,\u8868\u793a\u67d0\u4e2a\u5316\u5b66\u73af\u5883\u51fa\u73b0\u7684\u6b21\u6570\u3002\n\n2. \u5728\u5206\u7c7b\u95ee\u9898\u4e2d\uff1a\n\n    2.1 \u5982\u679c\u5316\u5b66\u73af\u5883\u7684 \u5b58\u5728\u4e0e\u5426 \u662f\u5173\u952e,\u5219 0 \u548c 1 \u7684\u4f4d\u5411\u91cf\u5f62\u5f0f\u901a\u5e38\u8db3\u591f\u3002\n\n      - \u9002\u7528\u573a\u666f\uff1a\n\n        a.\u5316\u5b66\u73af\u5883\u7684\u5b58\u5728\u4e0e\u5426\u8db3\u591f\u63cf\u8ff0\u76ee\u6807\u6027\u8d28\u3002\n\n        b.\u4efb\u52a1\u662f\u5206\u7c7b\u95ee\u9898\uff08\u4f8b\u5982,\u662f\u5426\u5177\u6709\u6bd2\u6027\u3001\u662f\u5426\u6d3b\u8dc3\uff09\u3002\n\n        c.\u6570\u636e\u7a00\u758f,\u6216\u5b50\u7ed3\u6784\u7684\u51fa\u73b0\u6b21\u6570\u5206\u5e03\u8f83\u5747\u5300\u3002\n\n    2.2 \u5982\u679c\u5316\u5b66\u73af\u5883\u7684 \u51fa\u73b0\u9891\u7387 \u662f\u5206\u7c7b\u7684\u6f5c\u5728\u51b3\u5b9a\u56e0\u7d20,\u5219\u4fdd\u7559\u8ba1\u6570\u4fe1\u606f\u53ef\u80fd\u66f4\u6709\u5e2e\u52a9\u3002\n\n      - \u9002\u7528\u573a\u666f\uff1a\n\n        a.\u5316\u5b66\u73af\u5883\u7684\u51fa\u73b0\u9891\u7387\u5bf9\u5206\u7c7b\u4efb\u52a1\u6709\u91cd\u8981\u5f71\u54cd\u3002\n\n        b.\u4efb\u52a1\u9700\u8981\u63cf\u8ff0\u5206\u5b50\u4e2d\u529f\u80fd\u6027\u56e2\u7684\u5f3a\u5ea6\uff08\u5982\u9ad8\u6bd2\u6027\u5206\u5b50\uff09\u3002\n\n        c.\u9700\u8981\u6355\u6349\u6570\u91cf\u4fe1\u606f\u7684\u989d\u5916\u4ef7\u503c\u3002\n\n        d.\u5176\u4ed6\u89e3\u51b3\u9884\u6d4b\u95ee\u9898\u6216\u8005\u5206\u6790\u573a\u666f\n\n###### \u5f15\u7528[fingerprint](https://github.com/daiyizheng/DL/blob/master/07-rdkit/08-rdkit%E5%8C%96%E5%AD%A6%E6%8C%87%E7%BA%B9.ipynb)\n\n###### Args\n\n- mol (Chem.Mol): \u5206\u5b50\u5316\u5408\u7269Mol\u5b9e\u4f8b\u5bf9\u8c61,\u652f\u6301Chem.Mol\u5b50\u7c7b\u5b9e\u4f8b\n- countSimulation (bool): \u662f\u5426\u5f00\u542f\u8ba1\u6570,\u7f3a\u7701\u4e3aFalse(\u6b64\u53c2\u6570\u8be6\u7ec6\u53c2\u8003\u6ce8\u610f\u4e8b\u9879\u90e8\u5206)\n- bitSize (int): \u4f4d\u5411\u91cf\u957f\u5ea6,\u7f3a\u77012048\n\n###### Returns\n\n- np.array: Morgan\u6307\u7eb9\u6570\u636e\u6570\u7ec4\n\n<hr>\n\n##### calc_maccs_fingerprint\n\n\u8ba1\u7b97\u5206\u5b50\u5316\u5408\u7269\u7684MACCS\u6307\u7eb9\u3002\n\n###### \u65b9\u6cd5\n\n\u4f7f\u7528rdkit.Chem.MACCSkeys.GenMACCSKeys \u51fd\u6570\u6765\u8ba1\u7b97\u5206\u5b50\n\n###### \u8bf4\u660e\n\nMACCS (Molecular ACCess System) \u5206\u5b50\u6307\u7eb9\u662f\u4e00\u79cd\u7528\u4e8e\u8868\u793a\u5206\u5b50\u7ed3\u6784\u4fe1\u606f\u7684\u4e8c\u8fdb\u5236\u6307\u7eb9\u3002  \nMACCS\u5206\u5b50\u6307\u7eb9\u662f\u57fa\u4e8e\u5206\u5b50\u4e2d\u662f\u5426\u542b\u6709\u7279\u5b9a\u7684\u4e9a\u7ed3\u6784\u6765\u5b9a\u4e49\u7684,\u5171\u5305\u542b166\u4e2a\u4e0d\u540c\u7684\u5206\u5b50\u7279\u5f81\u3002  \n\u6bcf\u4e2a\u7279\u5f81\u90fd\u5bf9\u5e94\u4e8e\u4e00\u4e2a\u7279\u5b9a\u7684\u5316\u5b66\u5b50\u7ed3\u6784,\u4f8b\u5982,\u4e00\u4e2a\u7f9f\u57fa\u3001\u4e00\u4e2a\u82ef\u73af\u6216\u4e00\u4e2a\u6c2e\u539f\u5b50\u7b49\u3002  \n\u5982\u679c\u5206\u5b50\u4e2d\u5b58\u5728\u8fd9\u4e2a\u7279\u5f81,\u5219\u8be5\u7279\u5f81\u5bf9\u5e94\u7684\u4e8c\u8fdb\u5236\u4f4d\u4e0a\u7684\u503c\u4e3a1,\u5426\u5219\u4e3a0\u3002  \nMACCS\u5206\u5b50\u6307\u7eb9\u7684\u957f\u5ea6\u4e3a166\u4f4d,\u5b83\u53ef\u4ee5\u7528\u4e8e\u5206\u5b50\u76f8\u4f3c\u6027\u6bd4\u8f83\u3001\u5206\u5b50\u5206\u7c7b\u3001\u5206\u5b50\u805a\u7c7b\u3001\u5206\u5b50\u7b5b\u9009\u7b49\u8bb8\u591a\u9886\u57df\u4e2d\u7684\u5316\u5b66\u4fe1\u606f\u5b66\u7814\u7a76\u3002\n\n###### \u6ce8\u610f\u4e8b\u9879\n\n\u65e0\n\n###### \u5f15\u7528 [fingerprint](https://github.com/daiyizheng/DL/blob/master/07-rdkit/08-rdkit%E5%8C%96%E5%AD%A6%E6%8C%87%E7%BA%B9.ipynb)\n\n###### Args\n\n- mol (Chem.Mol): \u5206\u5b50\u5316\u5408\u7269Mol\u5b9e\u4f8b\u5bf9\u8c61,\u652f\u6301Chem.Mol\u5b50\u7c7b\u5b9e\u4f8b\n\n###### Returns\n\n- np.array: MACCS\u6307\u7eb9\u6570\u636e\u6570\u7ec4\n\n<hr>\n\n##### calc_crippen_contribs\n\n\u8ba1\u7b97\u5206\u5b50\u7684Crippen\u89c4\u8303\u5316\u8d21\u732e,\u5305\u62ec\u6bcf\u4e2a\u539f\u5b50\u7684 logP \u8d21\u732e\u548c\u6469\u5c14\u6298\u5c04\u7387\u8d21\u732e\u3002\n\n###### \u65b9\u6cd5\n\n\u65e0\n\n###### \u8bf4\u660e\n\n\u57fa\u4e8e\u5206\u5b50\u7684\u539f\u5b50\u7535\u8377\u548c\u5206\u5b50\u7684\u51e0\u4f55\u5f62\u72b6\u8ba1\u7b97\u7684,\u53ef\u4ee5\u7528\u4e8e\u63cf\u8ff0\u5206\u5b50\u7684\u6eb6\u89e3\u5ea6\u3001\u751f\u7269\u5229\u7528\u5ea6\u548c\u5176\u4ed6\u6027\u8d28\u3002  \n\u8fd9\u4e2a\u51fd\u6570\u901a\u5e38\u4e0eRDKit\u5206\u5b50\u5bf9\u8c61\u4e00\u8d77\u4f7f\u7528\u3002\n\n###### \u6ce8\u610f\u4e8b\u9879\n\n  1. \u4f20\u5165\u7684Chem.Mol\u5bf9\u8c61\u5e94\u5148\u8c03\u7528UpdatePropertyCache\u65b9\u6cd5\u5904\u7406\n  2. Crippen\u89c4\u8303\u5316\u8d21\u732e\u867d\u7136\u662f\u6309\u6574\u4e2a\u5206\u5b50\u5316\u5408\u7269\u8ba1\u7b97,\u4f46\u8ba1\u7b97\u7ed3\u679c\u5e94\u6309\u7d22\u5f15\u4f4d\u7f6e\u5c06\u8d21\u732e\u503c\u5206\u914d\u5230\u5bf9\u5e94\u539f\u5b50\u4f5c\u4e3a\u539f\u5b50\u7279\u5f81\u7684\u4e00\u90e8\u5206\n\n###### Args\n\n- mol (Chem.Mol): \u5206\u5b50\u5316\u5408\u7269Mol\u5b9e\u4f8b\u5bf9\u8c61,\u652f\u6301Chem.Mol\u5b50\u7c7b\u5b9e\u4f8b\n\n###### Returns\n\n- tuple: \u5305\u62ec\u6bcf\u4e2a\u539f\u5b50\u7684 logP \u8d21\u732e\u548c\u6469\u5c14\u6298\u5c04\u7387\u8d21\u732e\n        \u5143\u7ec4,\u5176\u4e2d\u5305\u542b\u4e24\u4e2a\u957f\u5ea6\u4e3a\u5206\u5b50\u4e2d\u539f\u5b50\u6570\u7684\u5217\u8868\u3002\n        \u7b2c\u4e00\u4e2a\u5217\u8868\u5305\u542b\u6bcf\u4e2a\u539f\u5b50\u7684Crippen\u8d21\u732e\u7684\u5e73\u5747\u503c,\n        \u7b2c\u4e8c\u4e2a\u5217\u8868\u5305\u542b\u6bcf\u4e2a\u539f\u5b50\u7684Crippen\u8d21\u732e\u7684\u6807\u51c6\u5dee\u3002\n\n<hr>\n\n##### calc_pharmacophore_features\n\n\u8ba1\u7b97\u836f\u6548\u56e2\u7279\u5f81\u3002\n\n###### \u8bf4\u660e\n\n  1. \u68c0\u6d4b\u662f\u5426\u542b 3D conformer, \u82e5\u65e0\u5219\u505a\u7b80\u5355\u7684Embed + Optimize(\u53ef\u9009).\n  2. \u8ba1\u7b97\u6c22\u952e\u53d7\u4f53/\u4f9b\u4f53, \u9634\u9633\u79bb\u5b50, \u73af\u539f\u5b50, \u758f\u6c34\u539f\u5b50\u7b49.\n  3. \u8fd4\u56de (features, features_count, atom_list).\n  \u5176\u4e2d:\n      - features = {\"rings\":0/1,...}\n      - features_count = {\"rings\":N,...}\n      - atom_list = {\"rings\":[...],...} (1-based or list of lists)\n\n<hr>\n\n##### calc_intra_pharmacophore_distance\n\n\u8ba1\u7b97\u6bcf\u79cd\u836f\u6548\u56e2\u7c7b\u578b\u5185\u90e8\u6240\u6709\u539f\u5b50\u7684\u6b27\u51e0\u91cc\u5f97\u8ddd\u79bb\u77e9\u9635\u3002\n\n###### \u53c2\u6570\n\n- mol: RDKit Mol \u5bf9\u8c61,\u9700\u5305\u542b 3D \u5750\u6807 (Conformer)\u3002\n- atom_list: dict,\u6bcf\u79cd\u836f\u6548\u56e2\u7c7b\u578b\u5bf9\u5e94\u7684\u539f\u5b50\u7f16\u53f7\u5217\u8868,\u5982 {'rings': [1,2,3], 'anion': [4,5], ...}\n- conf_id: int,\u53ef\u9009,\u6307\u5b9a\u4f7f\u7528\u54ea\u4e2a conformer \u8ba1\u7b97\u8ddd\u79bb\u3002\n\n###### \u8fd4\u56de\n\n- intra_distances: dict\n\n    key\u4e3a\u836f\u6548\u56e2\u7c7b\u578b,value\u4e3a\u5bf9\u5e94\u7684\u8ddd\u79bb\u77e9\u9635 (\u4e8c\u7ef4list),  \n    \u5982: {'rings': [[0.0, 1.2, ...], [...], ...], 'anion': [...], ...}\n\n<hr>\n\n##### calc_inter_pharmacophore_distance\n\n\u8ba1\u7b97\u4e24\u4e2a\u4e0d\u540c\u836f\u6548\u56e2\u7c7b\u578b\u4e4b\u95f4\u7684\u6b27\u51e0\u91cc\u5f97\u8ddd\u79bb\u77e9\u9635\u3002\n\n###### \u53c2\u6570\n\n- mol: RDKit Mol \u5bf9\u8c61,\u9700\u5305\u542b 3D \u5750\u6807 (Conformer)\u3002\n- atom_list: dict,\u6bcf\u79cd\u836f\u6548\u56e2\u7c7b\u578b\u5bf9\u5e94\u7684\u539f\u5b50\u7f16\u53f7\u5217\u8868,\u4f8b\u5982:\n\n    {'rings': [1,2,3], 'anion': [4,5], 'cation': [], ...}\n\n- type1: str,\u7b2c\u4e00\u4e2a\u836f\u6548\u56e2\u7c7b\u578b (\u5982 'rings', 'anion', 'cation', 'acceptor', 'donor', 'hydrophobe')\n- type2: str,\u7b2c\u4e8c\u4e2a\u836f\u6548\u56e2\u7c7b\u578b\n- conf_id: int,\u53ef\u9009,\u6307\u5b9a\u4f7f\u7528\u54ea\u4e2a conformer \u8ba1\u7b97\u8ddd\u79bb\u3002\n\n###### \u8fd4\u56de\n\n- inter_distance_matrix:\n\n    \u4e8c\u7ef4 list, \u5f62\u72b6\u4e3a (len(type1\u539f\u5b50), len(type2\u539f\u5b50))\n\n<hr>\n\n##### get_ring_atoms\n\n\u63d0\u53d6\u6240\u6709\u73af\u4e2d\u539f\u5b50\u7684\u7d22\u5f15 (1-based)\u3002\n\n###### \u53c2\u6570\n\n- mol: RDKit Mol \u5bf9\u8c61\n- use_ringinfo: bool\n\n    \u5982\u679c\u4e3a True, \u4f7f\u7528 ringinfo \u6765\u8bc6\u522b\u73af\u539f\u5b50;  \n    \u5982\u679c\u4e3a False, \u4f7f\u7528 GetSymmSSSR.\n\n###### \u8fd4\u56de\n\n- List[List[int]] : \u6bcf\u4e2a\u73af\u662f\u4e00\u4e2a\u5217\u8868, \u91cc\u9762\u5b58\u73af\u5185\u7684\u539f\u5b50(1-based).\n\n    \u4f8b\u5982: [[1,2,3,4,5,6],[8,9,10]].\n\n<hr>\n\n##### get_hydrophobic_clusters\n\n\u8bc6\u522b\u758f\u6c34\u539f\u5b50\u5e76\u6839\u636e 3D \u8ddd\u79bb\u805a\u5408\u6210\u591a\u4e2a\u201c\u7c07\u201d\uff0c\u4ee5\u5bf9\u5e94 SDF \u4e2d\u591a\u884c hydrophobe\u3002\n\n\u8fd4\u56de: List[List[int]], \u6bcf\u4e2a\u5b50\u5217\u8868\u662f\u4e00\u7fa4(\u7c07)\u758f\u6c34\u539f\u5b50\u7684 1-based \u7d22\u5f15\u3002\n\n###### \u53c2\u6570\n\n- mol : RDKit Mol \u5bf9\u8c61 (\u9700\u67093D\u6784\u8c61,\u82e5\u65e0\u9700\u5148 Embed + \u4f18\u5316)\n- distance_threshold : float\n\n    \u4efb\u610f\u4e24\u4e2a\u5019\u9009\u758f\u6c34\u539f\u5b50\u76843D\u8ddd\u79bb\u82e5 < \u8be5\u503c\uff0c\u5c31\u89c6\u4e3a\u540c\u4e00\u7c07\u3002  \n    \u9ed8\u8ba41.0\u00c5\uff0c\u4e5f\u53ef\u5c1d\u8bd51.5/2.0\u7b49\u3002\n- extended : bool\n\n    True: \u4e0d\u542b O,N,S,P,\u5364\u7d20(F,Cl,Br,I)\u7684\u78b3\u89c6\u4e3a\u758f\u6c34  \n    False: \u4ec5\u8981\u6c42\u90bb\u5c45\u91cc\u65e0 O,N\n\n###### \u8fd4\u56de\n\n- clusters_1based : List[List[int]]\n\n    \u4f8b\u5982 [[10,12,14],[18,22,23]]\uff0c\u8868\u793a\u4e24\u7c07\u758f\u6c34\u539f\u5b50(1-based)\u3002  \n    \u82e5\u6ca1\u6709\u758f\u6c34\u539f\u5b50\uff0c\u8fd4\u56de\u7a7a\u5217\u8868 []\u3002\n\n<hr>\n\n##### get_anions_cations\n\n\u83b7\u53d6\u9634\u79bb\u5b50\u539f\u5b50\u5217\u8868\u3001\u9633\u79bb\u5b50\u539f\u5b50\u5217\u8868 (1-based)\u3002\n\n\u5f53\u524d\u57fa\u4e8e formal charge \u5224\u5b9a:  \n- atom.GetFormalCharge() <0 => anion\n- atom.GetFormalCharge() >0 => cation\n\n\u5bf9\u591a\u4ef7\u7535\u8377, \u540c\u6837\u8bc6\u522b\u5230\u540c\u4e00\u7ec4, \u5982 +2 => cation.  \n\u82e5\u9700\u90e8\u5206\u7535\u8377, \u9700\u989d\u5916\u529b\u573a/\u91cf\u5316\u8ba1\u7b97.\n\n<hr>\n\n##### get_hbond_acceptors\n\n\u8bc6\u522b\u6c22\u952e\u53d7\u4f53\u539f\u5b50 (0-based \u7d22\u5f15)\u3002\n\n\u8fd4\u56de SubstructMatch \u7684 tuple list, \u6bcf\u4e2a\u5143\u7d20\u662f (atom_idx, ...).  \n\u5982\u679c\u53ea\u9700\u539f\u5b50 idx, \u53ef\u81ea\u884c\u63d0\u53d6 match[0].  \n\u8fd9\u91cc\u4f7f\u7528\u7a0d\u5fae\u66f4\u5168\u7684 SMARTS \u4f8b\u5b50, \u5305\u542b\u82b3\u73afN, \u7fb0\u57faO\u7b49.\n\n<hr>\n\n##### get_hbond_donors\n\n\u8bc6\u522b\u6c22\u952e\u4f9b\u4f53\u539f\u5b50 (0-based \u7d22\u5f15)\u3002\n\n<hr>\n\n### optimizer \u6a21\u5757\n\n\u5206\u5b50\u6784\u8c61\u4f18\u5316\uff0c\u5e8f\u5217\u5316/\u53cd\u5e8f\u5217\u5316\n\n#### Optimizer \u6784\u8c61\u4f18\u5316\u7c7b\n\n<hr>\n\n##### embed_and_optimize (\u9759\u6001\u7c7b\u65b9\u6cd5)\n\n\u4f7f\u7528RDKit\u751f\u6210\u521d\u59cb3D\u6784\u8c61\u5e76\u4f18\u5316\u51e0\u4f55\u7ed3\u6784\u3002\u4f18\u5148\u4f7f\u7528MMFF94\u529b\u573a\u4f18\u5316\uff0c\u5931\u8d25\u5219\u56de\u9000\u5230UFF\u3002\n\n###### Args\n\n- mol (Mol): RDKit \u5206\u5b50\u5bf9\u8c61\uff08\u53ef\u672a\u6d88\u6bd2\uff09\u3002\u51fd\u6570\u5185\u90e8\u4f1a\u590d\u5236\u4e00\u4efd\u5de5\u4f5c\u526f\u672c\uff0c\u4e0d\u4f1a\u4fee\u6539\u6765\u53c2\u3002\n- max_embed_attempts (int, optional): 3D \u6784\u8c61\u5d4c\u5165\uff08ETKDG\uff09\u7684\u6700\u5927\u5c1d\u8bd5\u6b21\u6570\u3002\n\n    \u6570\u503c\u8d8a\u5927\uff0c\u56f0\u96be\u5206\u5b50\u7684\u6210\u529f\u7387\u8d8a\u9ad8\uff0c\u4f46\u65f6\u95f4\u4e5f\u8d8a\u957f\u3002  \n    `\u5efa\u8bae`\uff1a\u4e00\u822c 200\u20131000\uff1b\u542b\u5927\u73af/\u590d\u6742\u7a20\u73af\u53ef\u9002\u5f53\u63d0\u9ad8\u3002. Defaults to 1000.\n- random_seed (int, optional): \u968f\u673a\u79cd\u5b50\u3002\u56fa\u5b9a\u503c\u53ef\u590d\u73b0\u7ed3\u679c\uff1b\n\n    \u8bbe\u7f6e\u4e3a -1 \u8868\u793a\u5b8c\u5168\u968f\u673a\uff08\u975e\u786e\u5b9a\u6027\uff09\u3002  \n    `\u5efa\u8bae`\uff1a\u79d1\u7814/\u8c03\u8bd5\u9636\u6bb5\u5efa\u8bae\u56fa\u5b9a\uff1b\u751f\u4ea7\u6279\u91cf\u53ef\u4f7f\u7528 -1 \u63d0\u9ad8\u591a\u6837\u6027. Defaults to 0xC0FFEE.\n- use_small_ring_torsions (bool, optional): ETKDG \u7684\u5c0f\u73af\u626d\u8f6c\u53c2\u6570\u3002\n\n    \u5f00\u542f\u901a\u5e38\u66f4\u7b26\u5408\u5c0f\u73af\uff08\u5982 3\u20135 \u5143\u73af\uff09\u7ecf\u9a8c\u6784\u8c61\uff0c\u63d0\u5347\u5d4c\u5165\u8d28\u91cf\u3002 to True.\n- use_macrocycle_torsions (bool, optional): . ETKDG \u7684\u5927\u73af\u626d\u8f6c\u5904\u7406\u3002\n\n    \u5bf9\u5927\u73af/\u591a\u73af\u4f53\u7cfb\u5f00\u542f\u6709\u5229\u4e8e\u627e\u5230\u66f4\u5408\u7406\u7684\u521d\u59cb\u6784\u8c61\u3002 to True.\n- prune_rms_thresh (float, optional):\n\n    \u6784\u8c61\u5254\u9664\u7684 RMSD \u9608\u503c\uff08\u91cd\u590d\u6784\u8c61\u7684\u53bb\u5197\u7b56\u7565\uff09\u3002  \n    \u5373\u4fbf\u53ea\u5d4c 1 \u4e2a\u6784\u8c61\uff0c\u8fd9\u4e2a\u9608\u503c\u4e5f\u4f1a\u5f71\u54cd\u201c\u5bfb\u627e\u4e0e\u5df2\u6709\u6784\u8c61\u8db3\u591f\u4e0d\u540c\u201d\u7684\u91cd\u8bd5\u903b\u8f91\u3002  \n    \u503c\u8d8a\u5927\uff0c\u8d8a\u5bb9\u6613\u628a\u76f8\u4f3c\u6784\u8c61\u89c6\u4e3a\u201c\u91cd\u590d\u201d\u800c\u7ee7\u7eed\u5c1d\u8bd5\u3002  \n    `\u5efa\u8bae`\uff1a0.1\u20130.5 \u00c5 \u4e4b\u95f4\u8f83\u5e38\u7528\u3002. Defaults to 0.1.\n- max_ff_iters (int, optional): \u529b\u573a\u6700\u5c0f\u5316\u7684\u6700\u5927\u8fed\u4ee3\u6b21\u6570\u3002\n\n    \u6570\u503c\u8d8a\u5927\uff0c\u8d8a\u6709\u673a\u4f1a\u201c\u6536\u655b\u201d\uff1b\u4f46\u65f6\u95f4\u4e5f\u66f4\u957f\u3002\n\n    `\u5efa\u8bae`\uff1a200\u20131000\u3002\u82e5\u7ecf\u5e38\u201c\u4e0d\u6536\u655b\u201d\uff0c\u53ef\u5148\u589e\u5927\u518d\u8003\u8651\u7ed3\u6784\u9884\u5904\u7406\u3002. Defaults to 500.\n\n###### **Raises:**\n\n- ValueError: RDKit \u5206\u5b50\u5bf9\u8c61\u4e3a\u7a7a\n\n###### **Returns:**\n\nTuple[ Mol, bool, Dict[str, Any]]:\n\n- optimized_mol : rdkit.Chem.Mol \u5df2\u6dfb\u52a0\u663e\u5f0f\u6c22\u7684\u5206\u5b50\u5bf9\u8c61\uff0c\u5305\u542b\u5355\u4e00 3D \u6784\u8c61\u3002\n\n    \u82e5\u8fc7\u7a0b\u4e2d\u5931\u8d25\uff0c\u4e5f\u4f1a\u8fd4\u56de\u5f53\u524d\u5de5\u4f5c\u526f\u672c\u4ee5\u4fbf\u8bca\u65ad\u3002\n- ok : bool\n    \u662f\u5426\u8fbe\u5230\u529b\u573a\u6536\u655b\u6761\u4ef6\uff08True=\u6536\u655b\uff1bFalse=\u672a\u6536\u655b/\u5931\u8d25\uff09\u3002\n- meta : Dict[str, Any]\n    \u8bca\u65ad\u4fe1\u606f\u5b57\u5178\uff0c\u5e38\u89c1\u952e\u5982\u4e0b\uff08\u6309\u60c5\u51b5\u90e8\u5206\u7f3a\u7701\uff09\uff1a\n  - stage : str\n      \u5f53\u524d\u6267\u884c\u9636\u6bb5\uff1a\"init\" | \"sanitize\" | \"embed\" | \"optimize\"\u3002\n  - method : str\n      \u5b9e\u9645\u4f7f\u7528\u7684\u529b\u573a\u65b9\u6cd5\uff1a\"MMFF94\" \u6216 \"UFF\"\u3002\n  - energy : float\n      \u6700\u7ec8\u529b\u573a\u80fd\u91cf\uff08\u529b\u573a\u5355\u4f4d\uff0c\u901a\u5e38\u53ef\u89c6\u4e3a kcal/mol\uff1b\u4ec5\u5728\u540c\u4e00\u529b\u573a\u5185\u6bd4\u8f83\u5177\u6709\u53ef\u6bd4\u6027\uff09\u3002\n  - steps : int\n      _RDKit Minimize \u8fd4\u56de\u7801_\uff08\u6ce8\u610f\uff1a\u4e0d\u662f\u5b9e\u9645\u6b65\u6570\uff09\u30020 \u8868\u793a\u6536\u655b\uff0c\u975e 0 \u8868\u793a\u672a\u6536\u655b\u3002\n  - message : str\n      \u63d0\u793a/\u8b66\u544a/\u9519\u8bef\u4fe1\u606f\uff08\u4f8b\u5982 \"ETKDG embedding failed\"\u3001\"MMFF params unavailable, fallback to UFF\"\uff09\u3002\n\n###### \u884c\u4e3a\u4e0e\u4fdd\u8bc1\n\n- \u4e0d\u4fee\u6539\u4f20\u5165\u7684 `mol`\uff1b\u5728\u5176\u590d\u5236\u4f53\u4e0a\u64cd\u4f5c\u3002\n- \u6267\u884c\u6d88\u6bd2\uff08Sanitize\uff09\u4e0e\u7acb\u4f53\u5316\u5b66\u5206\u914d\uff08AssignStereochemistry\uff09\u3002\n- \u6dfb\u52a0\u663e\u5f0f\u6c22\uff08AddHs\uff09\u3002\n- \u4f7f\u7528 ETKDGv3 \u8fdb\u884c 3D \u5d4c\u5165\uff1b\u6e05\u7a7a\u5e76\u4ec5\u4fdd\u7559 1 \u4e2a\u6784\u8c61\u3002\n- \u4f18\u5148\u5c1d\u8bd5 MMFF94\uff1b\u82e5\u5206\u5b50\u4e0d\u652f\u6301\uff0c\u5219\u56de\u9000 UFF\u3002\n- `ok=True` \u8868\u793a\u529b\u573a\u6700\u5c0f\u5316\u8fd4\u56de\u7801\u4e3a 0\uff08\u8fbe\u5230\u6536\u655b\u6761\u4ef6\uff09\uff1b\u5426\u5219\u4e3a False\u3002\n- \u53d1\u751f\u5e38\u89c1\u5316\u5b66\u95ee\u9898\uff08\u5d4c\u5165\u5931\u8d25\u3001\u529b\u573a\u4e0d\u53ef\u7528\u7b49\uff09\u65f6\u4e0d\u629b\u5f02\u5e38\uff0c\u800c\u662f `ok=False` \u5e76\u5728 `meta['message']` \u7ed9\u51fa\u539f\u56e0\u3002\n    \u82e5 `mol is None` \u6216\u8f93\u5165\u4e0d\u53ef\u7528\uff0c\u53ef\u80fd\u629b\u51fa `ValueError`\u3002\n\n###### \u4f7f\u7528\u5efa\u8bae\n\n- \u9700\u8981\u7ed3\u679c\u53ef\u91cd\u590d\uff1a\u4fdd\u6301\u56fa\u5b9a `random_seed`\u3002\n- \u5927\u73af/\u590d\u6742\u4f53\u7cfb\uff1a\u4fdd\u6301 `use_macrocycle_torsions=True`\uff0c\u9002\u5f53\u8c03\u5927 `max_embed_attempts`\u3002\n- \u7ecf\u5e38\u672a\u6536\u655b\uff1a\u589e\u5927 `max_ff_iters`\uff1b\u6216\u5148\u505a\u7535\u8377/\u4ef7\u6001/\u91d1\u5c5e\u914d\u4f4d\u7b49\u9884\u5904\u7406\u3002\n- \u6279\u91cf\u5904\u7406\u65f6\uff0c\u5efa\u8bae\u8bb0\u5f55/\u6301\u4e45\u5316 `meta`\uff0c\u4fbf\u4e8e\u540e\u671f\u8ffd\u6eaf\u4e0e\u8d28\u91cf\u7b5b\u9009\uff08\u5982\u4f18\u5148\u9009\u7528 MMFF94 \u4e14\u6536\u655b\u7684\u7ed3\u679c\uff09\n\n<hr>\n\n##### **to_serialize (\u9759\u6001\u7c7b\u65b9\u6cd5)**\n\n\u5c06\u4e00\u7ec4 RDKit Mol \u5bf9\u8c61\u5e8f\u5217\u5316\u4e3a\u201c\u5355\u4e00\u4e8c\u8fdb\u5236\u5bb9\u5668 blob\u201d\uff08\u9ad8\u6027\u80fd\u3001\u65e0\u6587\u672c\u4e2d\u95f4\u6001\uff09\u3002\n\n\u8be5\u4e8c\u8fdb\u5236\u5bb9\u5668\u65e8\u5728\u7528\u4e8e**\u8de8\u8fdb\u7a0b/\u8de8\u5e94\u7528 IPC \u6216\u6301\u4e45\u5316**\uff0c\u5b8c\u6574\u4fdd\u7559\u6784\u8c61\u3001\u5750\u6807\u3001\u624b\u6027\u548c\uff08\u53ef\u9009\uff09\u5c5e\u6027\u3002\n\n###### **Args:**\n\n- mols : Iterable[Optional[rdkit.Chem.Mol]]\n\n    \u5206\u5b50\u5e8f\u5217\uff1b\u53ef\u5305\u542b None\uff08\u5c06\u5199\u51fa\u7a7a\u5360\u4f4d\u8bb0\u5f55\uff0c\u4fdd\u6301\u4f4d\u7f6e\u5bf9\u5e94\uff09\u3002\n- include_props : Chem.PropertyPickleOptions, default Chem.GetDefaultPickleProperties()\n\n    \u662f\u5426\u5c06\u5206\u5b50\u5c5e\u6027\uff08props\uff09\u4e00\u5e76\u6253\u5305\u3002\u63a8\u8350 True\u3002\n- with_checksum : bool, default True\n\n    \u662f\u5426\u4e3a\u6bcf\u6761\u8bb0\u5f55\u9644\u52a0 CRC32 \u6821\u9a8c\u3002\u751f\u4ea7\u73af\u5883\u5f3a\u70c8\u5efa\u8bae\u5f00\u542f\uff08\u9ed8\u8ba4\u5f00\u542f\uff09\u3002\n\n###### **Returns:**\n\n- bytes\n    \u81ea\u5b9a\u4e49\u4e8c\u8fdb\u5236\u5bb9\u5668\uff08v2\uff09\u3002\u63a8\u8350\u901a\u8fc7\u7ba1\u9053/Socket/\u5171\u4eab\u5185\u5b58/\u6587\u4ef6\u5728\u8fdb\u7a0b\u6216\u5e94\u7528\u95f4\u4f20\u8f93\u3002\n\n###### **Raises**\n\n- ValueError \u8f93\u5165\u5e8f\u5217\u4e3a\u7a7a\u3002\n- RDKit \u76f8\u5173\u5f02\u5e38,\u4e2a\u522b\u5206\u5b50\u635f\u574f\u7b49\u5bfc\u81f4\u4e8c\u8fdb\u5236\u5199\u51fa\u5931\u8d25\u65f6\u3002\n\n###### **\u6027\u80fd\u4e0e\u53ef\u9760\u6027**\n\n- \u6027\u80fd\uff1a\u76f8\u5bf9\u6587\u672c\uff08SDF/MolBlock/JSON\uff09\u901a\u5e38\u66f4\u5c0f\u66f4\u5feb\u3002CRC32 \u4e3a C \u5b9e\u73b0\uff0c\u5f00\u9500\u5f88\u4f4e\uff08\u6bcf\u79d2 GB \u7ea7\uff09\u3002\n- \u53ef\u9760\uff1a\u957f\u5ea6\u524d\u7f00 + CRC32 \u62b5\u5fa1\u622a\u65ad/\u534a\u5305/\u635f\u574f\uff1b\u5927\u7aef\u7f16\u7801\u5229\u4e8e\u8de8\u8bed\u8a00\u4e00\u81f4\u6027\u3002\n- \u517c\u5bb9\u6027\uff1a\u7528\u4e8e\u5728\u7ebf\u534f\u4f5c/IPC \u975e\u5e38\u7a33\u59a5\uff1b\u82e5\u7528\u4e8e\u8d85\u957f\u671f\u5f52\u6863\uff0c\u5efa\u8bae\u989d\u5916\u4fdd\u7559\u6587\u672c/JSON \u4ee5\u9632\u6781\u7aef\u8de8\u5927\u7248\u672c\u60c5\u51b5\u3002\n\n<hr>\n\n##### **to_unserialize (\u9759\u6001\u7c7b\u65b9\u6cd5)**\n\n\u4ece Optimizer.to_serialize() \u4ea7\u51fa\u7684**\u4e8c\u8fdb\u5236\u5bb9\u5668**\u8fd8\u539f\u51fa Mol/None \u5217\u8868\uff08\u4f4d\u7f6e\u4e00\u4e00\u5bf9\u5e94\uff09\u3002\n\n###### **\u517c\u5bb9\u6027**\n\n- v2 \u5bb9\u5668\uff1ab\"RDKB\\\\x02\"\uff08\u63a8\u8350\uff1b\u542b RDKit \u7248\u672c\u4e0e flags/CRC\uff09\n- v1 \u5bb9\u5668\uff1ab\"RDKB\\\\x01\"\uff08\u5411\u540e\u517c\u5bb9\uff1b\u4ec5 header + count + [len+payload]\uff0c\u65e0\u7248\u672c/flags/CRC\uff09\n- \u88f8 RDKit \u5355\u4f53\u4e8c\u8fdb\u5236\uff1a\u82e5\u9b54\u6570\u4e0d\u5339\u914d\uff0c\u5c1d\u8bd5\u4f5c\u4e3a**\u5355\u4f53 Mol** \u7684 RDKit \u4e8c\u8fdb\u5236\u8bfb\u53d6\uff0c\u6210\u529f\u5219\u8fd4\u56de\u957f\u5ea6\u4e3a 1 \u7684\u5217\u8868\u3002\n\n###### **Args:**\n\n- serialized_mols : bytes\n\n    \u4e8c\u8fdb\u5236\u5bb9\u5668 blob\u3002\n\n###### **Returns:**\n\n- List[Optional[rdkit.Chem.Mol]]\n\n    \u89e3\u6790\u5f97\u5230\u7684\u5206\u5b50\u5217\u8868\uff1b`None` \u8868\u793a\u5bf9\u5e94\u4f4d\u7f6e\u4e3a\u7a7a\u5360\u4f4d\uff08\u6216\u635f\u574f\u8bb0\u5f55\u5728\u542f\u7528 CRC \u4e0b\u88ab\u62d2\u7edd\uff09\u3002\n\n###### **Raises**\n\n- ValueError\n  - \u5165\u53c2\u4e3a\u7a7a\uff1b\n  - \u5bb9\u5668\u5934\u4e0d\u5408\u6cd5\uff0c\u4e14\u4e5f\u4e0d\u662f\u88f8 RDKit \u4e8c\u8fdb\u5236\uff1b\n  - v2/v1 \u5bb9\u5668\u6570\u636e\u7ed3\u6784\u622a\u65ad\u6216\u8bb0\u5f55\u957f\u5ea6\u5f02\u5e38\uff1b\n  - v2 \u5bb9\u5668\u4e14 CRC \u6821\u9a8c\u4e0d\u901a\u8fc7\uff08\u8bf4\u660e\u6570\u636e\u635f\u574f/\u88ab\u622a\u65ad/\u88ab\u7be1\u6539\uff09\u3002\n\n###### **\u8bf4\u660e**\n\n- \u4f7f\u7528\u5927\u7aef\u89e3\u7801\uff08network order\uff09\u3002\n- v2 \u5bb9\u5668\u4f1a\u8bfb\u53d6\u5e76\u5ffd\u7565 RDKit \u7248\u672c\u5b57\u7b26\u4e32\uff08\u53ef\u6839\u636e\u9700\u8981\u8bb0\u5f55\u65e5\u5fd7/\u68c0\u67e5\u517c\u5bb9\uff09\u3002\n- \u9ed8\u8ba4\u5728 v2 \u4e0b\u542f\u7528 CRC32 \u6821\u9a8c\uff08\u5982\u679c\u5199\u7aef\u5f00\u542f\u4e86\u8be5\u6807\u5fd7\uff09\u3002\n\n<hr>\n\n# loaders \u6a21\u5757\n\n\u52a0\u8f7d\u6570\u636e\u6a21\u5757\n\n## sdf_loader \u6a21\u5757\n\n### SDFLoader \u7c7b\n\nSDF\u683c\u5f0f\u6570\u636e\u52a0\u8f7d\u5668\n\n\u4eceSDF\u683c\u5f0f\u6570\u636e\u4e2d\u8bfb\u53d6\u5206\u5b50\u5316\u5408\u7269\u6570\u636e,\u652f\u6301\u4ece\u6587\u4ef6\u3001\u4ece\u76ee\u5f55\u548c\u4ece\u6587\u672c\u4e09\u79cd\u65b9\u5f0f\u8bfb\u53d6\u3002\n\n<hr>\n\n#### **closeWarning \u5173\u95ed\u8b66\u544a\u4fe1\u606f**\n\n\u6b64\u65b9\u6cd5\u662f\u7981\u7528rdkit\u7684\u8b66\u544a\u4fe1\u606f\u8f93\u51fa\n\n##### Args\n\n\u65e0\n\n##### Returns\n\n\u65e0\n\n<hr>\n\n#### openWarning \u6253\u5f00\u8b66\u544a\u4fe1\u606f\n\n\u6b64\u65b9\u6cd5\u662f\u6062\u590drdkit\u7684\u8b66\u544a\u4fe1\u606f\u8f93\u51fa\n\n##### Args\n\n\u65e0\n\n##### Returns\n\n\u65e0\n\n<hr>\n\n#### readDataFromFile\n\n\u4ece\u6307\u5b9asdf\u6587\u4ef6\u4e2d\u8bfb\u53d6 molecule \u6570\u636e\n\n##### Args\n\n- sdfDoc (str):\n\n    sdf \u6587\u6863\u540d\uff08\u542b\u8def\u5f84\uff09\n- startIndex (int, optional):\n\n    \u8d77\u59cb\u7d22\u5f15\uff08\u5305\u542b\uff09. Defaults to 0.\n- endIndex (int, optional):\n\n    \u7ed3\u675f\u7d22\u5f15\uff08\u4e0d\u5305\u542b\uff0c\u9ed8\u8ba4\u503c -1 \u4ee3\u8868\u8bfb\u53d6\u5230\u6587\u4ef6\u672b\u5c3e\uff09. Defaults to -1.\n- ignore_error (bool, optional):\n\n    \u662f\u5426\u5ffd\u7565\u9519\u8bef. Defaults to True.\n\n##### **Raises:**\n\n- FileNotFoundError: \u6307\u5b9a\u6587\u4ef6\u4e0d\u5b58\u5728\n- ValueError: \u5f00\u59cb\u7d22\u5f15\u5fc5\u987b\u662f\u975e\u8d1f\u6570\n- ValueError: \u7ed3\u675f\u7d22\u5f15\u5fc5\u987b\u5927\u4e8e\u5f00\u59cb\u7d22\u5f15\uff0c\u6216\u6307\u5b9a\u4e3a-1\n- ValueError: \u6570\u636e\u6bb5\u9519\u8bef\n\n##### Returns\n\n- tuple[list[Mol], int, int]:\n\n    \u5df2\u8bfb\u51fa molecule \u5217\u8868, \u5e94\u8bfb\u6570\u91cf, \u5b9e\u9645\u8bfb\u53d6\u6570\u91cf\n\n----\n\n#### readDataFromDir\n\n\u4ece\u6307\u5b9a\u76ee\u5f55\u4e2d\u8bfb\u53d6\u6240\u6709\u7684sdf\u6587\u4ef6,\u5e76\u4e14\u8bfb\u53d6\u5168\u90e8\u6587\u4ef6\u7684molecule\u6570\u636e\n\n##### Args\n\n- sdfDir (str):\n\n    \u6307\u5b9a\u8bfb\u53d6\u7684\u6587\u4ef6\u76ee\u5f55\n- recursive (bool, optional):\n\n    \u662f\u5426\u9012\u5f52\u5b50\u76ee\u5f55. Defaults to True.\n- ignore_error (bool, optional):\n\n    \u662f\u5426\u5ffd\u7565\u9519\u8bef. Defaults to True.\n\n##### Raises\n\n- FileNotFoundError: \u6307\u5b9a\u76ee\u5f55\u4e0d\u5b58\u5728\n- TypeError: \u6307\u5b9a\u76ee\u5f55\u5e76\u975e\u76ee\u5f55\u7c7b\u578b\n\n##### Returns\n\n- tuple[list[Mol], int, int]:\n\n    \u5df2\u8bfb\u51famolecule\u5217\u8868,\u5e94\u8bfb\u6570\u91cf,\u5b9e\u9645\u8bfb\u53d6\u6570\u91cf\n\n<hr>\n\n#### readDataFromText\n\n\u4ece\u7ed9\u5b9a\u6587\u672c\u4e2d\u8bfb\u53d6\u5206\u5b50\u5316\u5408\u7269\u6570\u636e\n\n##### Args\n\n- sdfText (str):\n\n    \u5305\u542b\u5206\u5b50\u5316\u5408\u7269\u7684sdf\u683c\u5f0f\u5185\u5bb9\u7684\u6587\u672c\n- ignore_error (bool, optional):\n\n    \u662f\u5426\u5ffd\u7565\u9519\u8bef. Defaults to True.\n\n##### Returns\n\n- tuple[list[Mol], int, int]:\n\n    \u5df2\u8bfb\u51famolecule\u5217\u8868,\u5e94\u8bfb\u6570\u91cf,\u5b9e\u9645\u8bfb\u53d6\u6570\u91cf\n\n<hr>\n\n#### splitDataByMarker\n\n\u6839\u636e\u6307\u5b9a\u7684\u6807\u8bb0\u5c06SDF\u6587\u672c\u6570\u636e\u62c6\u5206\u4e3a\u591a\u4e2a\u5206\u5b50\u6570\u636e\u6bb5\n\n##### Args\n\n- sdfText (str):\n\n    \u5305\u542b\u81f3\u5c11\u4e00\u4e2a\u5206\u5b50\u5316\u5408\u7269\u7684SDF\u683c\u5f0f\u5185\u5bb9\u7684\u6587\u672c\n\n- marker (str, optional):\n\n    \u7528\u4e8e\u62c6\u5206\u7684\u6807\u8bb0\u5b57\u7b26\u4e32. Defaults to \"$$$$\\n\".\n\n- strict (bool, optional):\n    \n    \u662f\u5426\u4e25\u683c\u6a21\u5f0f. Defaults to True.\n\n##### Raises:\n- ValueError: \u5982\u679c\u4e25\u683c\u6a21\u5f0f\u4e0b\u6ca1\u6709\u627e\u5230\u6807\u8bb0\n\n##### Returns\n\n- List[str]:\n\n    \u62c6\u5206\u540e\u7684\u591a\u4e2a\u5206\u5b50\u6570\u636e\u6bb5\u5217\u8868\n\n<hr>\n\n## nih.pubchem.online \u6a21\u5757\n\n\u5728\u7ebf\u5b9e\u65f6\u83b7\u53d6NIH PubChem\u6570\u636e\n\n<hr>\n\n### compound\u6a21\u5757\n\nModule for fetching PubChem compound records and conformers over the NIH PubChem REST API,\nparsing the returned SDF/JSON payloads, and returning structured ALNPCompound / ALNPConformer\nobjects.\n\n#### Primary responsibilities\n\n- Request compound SDF blocks for a list of PubChem CIDs.\n- Optionally request conformer metadata (ConformerID) and then fetch conformer SDFs.\n- Parse SDF content via SDFLoader and construct ALNPCompound and ALNPConformer instances.\n- Return a mapping from CID (string) to ALNPCompound instances, with conformer data\n    attached when requested.\n\n#### Provided function\n\n- **get_compound**(cid_list: List[str], include_conformer: bool = False) -> Dict[str, ALNPCompound]\n\n- **get_similarity_compound**(\n    input: EInputType,\n    value: str | int,\n    operation: EOperationType,\n    output: EOutputType,\n    threshold: int = 90,\n    max_records: int = 10\n) -> List[str]:\n\n----\n\n#### get_compound\u65b9\u6cd5\n\n##### **Parameters**\n\n- **cid_list (List[str])**\n\n    Sequence of PubChem CIDs to fetch. Each CID should be convertible to string; callers\n    typically pass strings (e.g. [\"2244\",\"3672\"]) or integers converted to strings.\n- **include_conformer (bool, default False)**\n\n    When False, only the primary compound SDF/metadata is fetched and returned.\n    When True, the function also queries the conformer metadata endpoint to obtain\n    ConformerIDs for each CID, and then requests SDFs for those conformers and attaches\n    ALNPConformer objects under each ALNPCompound.\n\n##### **Return value**\n\n- Dict[str, ALNPCompound]\n    A dictionary keyed by the CID string. Each value is an ALNPCompound instance\n    populated with:  \n    - PUBCHEM_COMPOUND_CID (from SDF properties)\n    - ROW (raw SDF block text for the molecule)\n    - If include_conformer True:\n      - CONFORMER_ID: List[str] of conformer IDs reported by PubChem for that CID\n      - CONFORMERS: Dict[str, ALNPConformer] keyed by conformer ID; each conformer\n          contains `PUBCHEM_CONFORMER_ID`, `PUBCHEM_COMPOUND_CID` and `ROW` (raw conformer SDF).\n\n##### **Behavior and error handling**\n\n- Uses HTTP endpoints:\n  - Compound SDF by CID(s): /rest/pug/compound/cid/{cid_list}/SDF?response_type=display\n  - Conformer metadata for CIDs: /rest/pug/compound/cid/{cid_list}/conformers/JSON\n  - Conformer SDFs by Conformer ID(s): /rest/pug/conformers/{conformer_id_list}/SDF?response_type=display\n- Interprets fetch_url(...) return value as a dict with at least \"status_code\", \"success\",\n    and \"content\" keys.\n- For 200 responses with \"success\" False, raises ValueError indicating CID(s) not found.\n- For HTTP 404 or 503 at top-level requests, raises ValueError with descriptive messages.\n- For conformer fetching, any exceptions raised while parsing conformer SDF content are\n    caught and printed; partial results may still be returned for other CIDs.\n- Logs important events and warnings:\n  - Missing molecules in SDF payloads\n  - CID mismatches between JSON metadata and SDF properties\n  - Conformer IDs being processed\n\n##### **Notes, constraints and assumptions**\n\n- The SDF parsing relies on SDFLoader.splitDataByMarker and SDFLoader.readDataFromText.\n    Those functions are expected to return lists of RDKit-like molecule objects (mol.GetProp(...))\n    and counts. The code assumes SDF blocks include `PUBCHEM_COMPOUND_CID` and (for conformers)\n    `PUBCHEM_CONFORMER_ID` properties.\n- The returned dictionary is pre-populated with keys from the input cid_list (strings)\n    and values set to ALNPCompound instances only when parsed successfully; entries may remain None\n    if the SDF for a given CID could not be parsed or was absent.\n- Network reliability and rate limits are outside this module's control; callers should\n    handle transient failures or consider retry/backoff when calling get_compound with large lists.\n\n##### **Example**\n\n- Simple usage:\n\n```python\n\n    cids = [\"2244\", \"3672\"]\n    compounds = get_compound(cids, include_conformer=False)\n    # compounds is a dict mapping \"2244\" -> ALNPCompound(...)\n\n```\n\n- Fetch compounds with conformers:\n\n```python\n\n    compounds_with_confs = get_compound([\"2244\"], include_conformer=True)\n    conf_ids = compounds_with_confs[\"2244\"].CONFORMER_ID\n    conformers_map = compounds_with_confs[\"2244\"].CONFORMERS\n```\n\n##### **Types referenced**\n\n- ALNPCompound: container/dataclass representing a PubChem compound record and any attached conformer info.\n- ALNPConformer: container/dataclass representing a single conformer (ID, parent CID, raw SDF).\n\nSecurity and privacy\n\n- Requests are made to the public PubChem REST endpoints; no credentials are required.\n- Raw SDF content is stored in returned objects' ROW fields and may contain structural or identifier data;\n    treat returned data according to your privacy/security policies.\n\n----\n\n#### get_similarity_compound\u65b9\u6cd5\n\n##### **Summary**\n\nRetrieve similar compounds from PubChem and return detailed compound data.\nThis function builds a PubChem similarity search URL from the provided\ninput parameters, performs the HTTP request, parses the returned data\n(SDF/JSON/TXT), extracts matching PubChem Compound IDs (CIDs), and then\nfetches full compound records (and optionally conformers) for those CIDs.\n\n##### Parameters\n\n- **input (EInputType):**\n\n  The type of the search input (for example SMILES, InChI, CID, etc.).\n\n- **value (str | int):**\n\n    The search value. For CID input this must be an integer or a numeric\n        string. Value must not be empty.\n\n- **operation (EOperationType):**\n\n  The PubChem operation type (for example CIDS or RECORD).\n\n- **output (EOutputType):**\n\n    The requested output format from PubChem (SDF, JSON, TXT, ...).\n\n- **threshold (int,optional):**\n\n  Similarity threshold (percent). Default is 90.\n\n- **max_records (int, optional):**\n\n    Maximum number of similar records to request. Default is 10.\n\n\n##### Return value\n\n- List[str]\n    A list by the CID string. Each value is string\n\n##### Raises\n\n- ValueError\n  - If `value` is empty.\n  - If `input` is EInputType.CID and `value` cannot be parsed as an integer.\n  - If `operation` is CIDS and `output` is EOutputType.SDF (disallowed).\n  - If a RECORD operation requests TXT output (invalid combination).\n  - If the HTTP fetch returns a non-200 status or indicates failure.\n  - If the returned content is empty when parsing JSON/SDF/TXT responses.\n\n##### Behavior and error handling\n\n- Uses HTTP endpoints:\n    - Compound SDF by CID(s): /rest/pug/compound/cid/{cid_list}/SDF?response_type=display\n    - Conformer metadata for CIDs: /rest/pug/compound/cid/{cid_list}/conformers/JSON\n    - Conformer SDFs by Conformer ID(s): /rest/pug/conformers/{conformer_id_list}/SDF?response_type=display\n- Interprets fetch_url(...) return value as a dict with at least \"status_code\", \"success\",\n    and \"content\" keys.\n- For 200 responses with \"success\" False, raises ValueError indicating CID(s) not found.\n- For HTTP 404 or 503 at top-level requests, raises ValueError with descriptive messages.\n- For conformer fetching, any exceptions raised while parsing conformer SDF content are\n    caught and printed; partial results may still be returned for other CIDs.\n- Logs important events and warnings:\n    - Missing molecules in SDF payloads\n    - CID mismatches between JSON metadata and SDF properties\n    - Conformer IDs being processed\n\n##### Notes, constraints and assumptions\n\n- This function uses an internal URL template to request PubChem similarity\n    - results, then parses the response according to `output`:\n    - SDF: parsed via SDFLoader.readDataFromText\n    - JSON: parsed via json.loads and expected keys vary with `operation`\n    - TXT: expected to contain one CID per line\n\n- After extracting CIDs, the function calls get_compound(...) to obtain\n    detailed compound (and optional conformer) data.\n- Progress reporting, error continuation, and exact returned ALNPCompound\n    structure are delegated to the underlying fetch/get routines.\n\n- others\n  - input \u53ef\u9009\u503c\u4e3acid,smiles,InChI.\n\n  - ouput \u53ef\u9009\u503c\u4e3aSDF,JSON\u6216TXT\n\n  - operation \u53ef\u9009\u503c\u4e3arecord,cids,sids\n\n  - others\n| Option     | Type    | Meaning                                             | Default   |\n| ---------- | ------- | --------------------------------------------------- | --------- |\n| Threshold  | integer | minimum Tanimoto score for a hit                    | 90        |\n| MaxSeconds | integer | maximum search time in seconds                      | unlimited |\n| MaxRecords | integer | maximum number of hits                              | 2M        |\n| listkey    | string  | restrict to matches within hits from a prior search | none      |\n\n##### Example\n\n- Simple usage:\n\n```python\n\n    cid = 2244\n    compounds = get_similarity_compound(\n                                input=EInputType.CID, \n                                value=cid, \n                                operation=EOperationType.CIDS, \n                                output=EOutputType.TXT)\n    # compounds is a dict mapping \"2244\" -> ALNPCompound(...)\n\n```\n\nor\n\n```python\n\n    smiles = \"CCCCCC1C(C(OC(=O)C(C(OC1=O)C)NC(=O)C2=C(C(=CC=C2)NC=O)O)C)OC(=O)CC(C)C\"\n    print(get_similarity_compound(input=EInputType.SMILES, \n                                  value=smiles, \n                                  operation=EOperationType.CIDS, \n                                  output=EOutputType.TXT))\n    # compounds is a dict mapping smiles -> ALNPCompound(...)\n\n```\n\n- Fetch compounds with conformers:\n\n```python\n\n    smiles = \"CCCCCC1C(C(OC(=O)C(C(OC1=O)C)NC(=O)C2=C(C(=CC=C2)NC=O)O)C)OC(=O)CC(C)C\"\n    print(get_similarity_compound(input=EInputType.CID, \n                                  value=cid_list[-1], \n                                  operation=EOperationType.CIDS, \n                                  output=EOutputType.TXT))\n    # compounds is a dict mapping smiles -> ALNPCompound(...)\n\n```\n\n##### Types referenced\n\n- ALNPCompound: container/dataclass representing a PubChem compound record and any attached conformer info.\n- ALNPConformer: container/dataclass representing a single conformer (ID, parent CID, raw SDF).\nSecurity and privacy\n- Requests are made to the public PubChem REST endpoints; no credentials are required.\n- Raw SDF content is stored in returned objects' ROW fields and may contain structural or identifier data;\n    treat returned data according to your privacy/security policies.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A collection of reusable python biotech library from AI Lingues.",
    "version": "0.2.6",
    "project_urls": null,
    "split_keywords": [
        "ailingues",
        " components",
        " biotech",
        " library"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9aff19720aa037eeb99118595daa300f3471363cf0b4c21f695e55eb18e90737",
                "md5": "3759a0f6e55552b06b6d04a49fe9becf",
                "sha256": "f9df5b1550c7ea1e4d56a80bd4981ad75f7b3c5388c17e72d8081134eae2662a"
            },
            "downloads": -1,
            "filename": "pybiotech-0.2.6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "3759a0f6e55552b06b6d04a49fe9becf",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.11",
            "size": 5261104,
            "upload_time": "2025-10-21T16:41:24",
            "upload_time_iso_8601": "2025-10-21T16:41:24.742945Z",
            "url": "https://files.pythonhosted.org/packages/9a/ff/19720aa037eeb99118595daa300f3471363cf0b4c21f695e55eb18e90737/pybiotech-0.2.6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9810cb5790f3c1faae83dc44fc1c683885ac0532cc8e6d7ccedf2ffff624c8d7",
                "md5": "a1e0b9f699601b574fe450780da979f6",
                "sha256": "04a360450eee4bd4016a9886377b2f477402a81a9cc8e32beacb71f5b89c9827"
            },
            "downloads": -1,
            "filename": "pybiotech-0.2.6-cp311-cp311-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "a1e0b9f699601b574fe450780da979f6",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.11",
            "size": 2494774,
            "upload_time": "2025-10-21T16:41:26",
            "upload_time_iso_8601": "2025-10-21T16:41:26.462749Z",
            "url": "https://files.pythonhosted.org/packages/98/10/cb5790f3c1faae83dc44fc1c683885ac0532cc8e6d7ccedf2ffff624c8d7/pybiotech-0.2.6-cp311-cp311-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-21 16:41:24",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pybiotech"
}
        
Elapsed time: 2.90543s