mmcif-gen


Namemmcif-gen JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/PDBeurope/Investigations/
SummaryCLI tool for creating mmCIF files from various facility data sources
upload_time2025-08-15 09:26:21
maintainerNone
docs_urlNone
authorSyed Ahsan Tanweer
requires_python>=3.6
licenseNone
keywords mmcif crystallography structural-biology pdbe synchrotron
VCS
bugtrack_url
requirements certifi charset-normalizer gemmi idna requests urllib3 pyjq
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # mmcif-gen

A versatile command-line tool for generating any mmCIF files from various data sources. This tool can be to create:

1. Metadata mmCIF files (To capture experimental metadata from different facilities)
2. Investigation mmCIF files (like: https://ftp.ebi.ac.uk/pub/databases/msd/fragment_screening/investigations/)

As is standard practice at the Protein Data Bank (PDB) the files generated are given the extension '.CIF' even though the file format is called mmCIF.
More on mmCIF file format can be found here: [mmcif.wwpdb.org/](https://mmcif.wwpdb.org/)

The tool has transformational mapping to convert data as it is stored at various facilities to corresponding catgories and items in mmcif format.

## Installation

Install directly from PyPI:

```bash
pip install mmcif-gen
```

## Usage

The tool provides two main commands:

1. `fetch-facility-json`: Fetch facility-specific JSON configuration files
2. `make-mmcif`: Generate mmCIF files using the configurations

### Fetching Facility JSON Files

The JSON operations files determine how the data would be mapped from the original source and translated into mmCIF format.

These files can be written, but can also be fetched from the github repository using simple commands.

```bash
# Fetch configuration for a specific facility
mmcif-gen fetch-facility-json dls-metadata

# Specify custom output directory
mmcif-gen fetch-facility-json dls-metadata -o ./mapping_operations
```

### Generating metadata mmCIF Files

Currently the valid facilities to generate mmcif files for are `pdbe`, `maxiv`, `dls`, and `xchem`.

The general syntax for generating mmCIF files is:

```bash
mmcif-gen make-mmcif <facility> [options]
````

Full list of options:
```
[w3_pdb05@pdb-001 Investigations]$ mmcif-gen make-mmcif --help
usage: mmcif-gen make-mmcif [-h] [--json JSON] [--output-folder OUTPUT_FOLDER]
                            [--id ID]
                            {pdbe,maxiv,dls,xchem} ...

positional arguments:
  {pdbe,maxiv,dls,xchem}
                        Specifies facility for which mmcif files will be used
                        for
    pdbe                Parameter requirements for investigation files from
                        PDBe data
    maxiv               Parameter requirements for investigation files from
                        MAX IV data
    dls                 Parameter requirements for creating investigation
                        files from DLS data
    xchem               Parameter requirements for creating investigation
                        files from XChem data

optional arguments:
  -h, --help            show this help message and exit
  --json JSON           Path to transformation JSON file
  --output-folder OUTPUT_FOLDER
                        Output folder for mmCIF files
  --id ID               File identifier
```

Each facility has its own set of required parameters, which can be checked by running the command with the `--help` flag.


```
mmcif-gen make-mmcif pdbe --help
```
#### Example Usage

#### DLS (Diamond Light Source)

```bash
# Using metadata configuration
mmcif-gen make-mmcif --json dls_metadata.json --output-folder ./out --id I_1234 dls --dls-json metadata-from-isypb.json
```

#### XChem
Parameters required
```
$ mmcif-gen make-mmcif xchem --help                                                                      
usage: mmcif-gen make-mmcif xchem [-h] [--sqlite SQLITE] [--cif-type {model,investigation}]

options:
  -h, --help            show this help message and exit
  --sqlite SQLITE       Path to the .sqlite file for each data set
  --cif-type {model,investigation}
                        Type of the CIF file that will be generated
```

Example command:
```
mmcif-gen make-mmcif --id 001 --json mmcif_gen/operations/xchem/xchem_metadata.json --output-folder pdbedeposit xchem --sqlite mmcif_gen/test/data/lb32633-1-soakDBDataFile.sqlite --cif-type model
```

### Working with Investigation Files

Investigation files are a specialized type of mmCIF file that capture metadata across multiple experiments.

Investigation files are created in a very similar way:

#### PDBe

```bash
# Using model folder
mmcif-gen make-mmcif --json pdbe_investigation.json --output-folder ./out --id I_1234 pdbe --model-folder ./models 

# Using PDB IDs
mmcif-gen make-mmcif  --json pdbe_investigation.json --output-folder ./out pdbe  --pdb-ids 6dmn 6dpp 6do8

# Using CSV input
mmcif-gen make-mmcif  --json pdbe_investigation.json --output-folder ./out pdbe --csv-file groups.csv 
```

#### MAX IV

```bash
# Using SQLite database
mmcif-gen make-mmcif maxiv --json maxiv_investigation.json --sqlite fragmax.sqlite --output-folder ./out --id I_1234
```

#### XChem

```bash
# Using SQLite database with additional information
mmcif-gen make-mmcif xchem --json xchem_investigation.json --sqlite soakdb.sqlite --txt ./metadata --deposit ./deposit --output-folder ./out
```


## Data Enrichment

For investigation files that need enrichment with additional data (e.g., ground state information):

```bash
# Using the miss_importer utility
python miss_importer.py --investigation-file inv.cif --sf-file structure.sf --pdb-id 1ABC
```

## Operation JSON Files

The tool uses JSON configuration files to define how data should be transformed into mmCIF format. These files can be:

1. Fetched files using the `fetch-facility-json` command
2. Modified versions of official configurations

### Configuration File Structure

```json
    {
        "source_category" : "_audit_author",
        "source_items" : ["name"],
        "target_category" : "_audit_author",
        "target_items" : "_same",
        "operation" : "distinct_union",
        "operation_parameters" :{
            "primary_parameters" : ["name"]
        }
    }
```

Refer to existing JSON files in the `operations/` directory for examples.


## Development

### Project Structure

```
mmcif-gen/
├── facilities/            # Facility-specific implementations
│   ├── pdbe.py
│   ├── maxiv.py
│   └── ...
├── operations/           # JSON configuration files
│   ├── dls/
│   ├── maxiv/
│   └── ...
├── tests/               # Test cases
├── setup.py            # Package configuration
└── README.md          # Documentation
```

### Running Tests

```bash
python -m unittest discover -s tests
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.


## Support

For issues and questions, please use the [GitHub issue tracker](https://github.com/PDBeurope/Investigations/issues).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/PDBeurope/Investigations/",
    "name": "mmcif-gen",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "mmcif, crystallography, structural-biology, pdbe, synchrotron",
    "author": "Syed Ahsan Tanweer",
    "author_email": "ahsan@ebi.ac.uk",
    "download_url": "https://files.pythonhosted.org/packages/4b/b6/ae650e119d491cb3e9b788669f6d9aa4b09bbfeb1f8affc8df0bff75e0fc/mmcif_gen-1.1.0.tar.gz",
    "platform": null,
    "description": "# mmcif-gen\n\nA versatile command-line tool for generating any mmCIF files from various data sources. This tool can be to create:\n\n1. Metadata mmCIF files (To capture experimental metadata from different facilities)\n2. Investigation mmCIF files (like: https://ftp.ebi.ac.uk/pub/databases/msd/fragment_screening/investigations/)\n\nAs is standard practice at the Protein Data Bank (PDB) the files generated are given the extension '.CIF' even though the file format is called mmCIF.\nMore on mmCIF file format can be found here: [mmcif.wwpdb.org/](https://mmcif.wwpdb.org/)\n\nThe tool has transformational mapping to convert data as it is stored at various facilities to corresponding catgories and items in mmcif format.\n\n## Installation\n\nInstall directly from PyPI:\n\n```bash\npip install mmcif-gen\n```\n\n## Usage\n\nThe tool provides two main commands:\n\n1. `fetch-facility-json`: Fetch facility-specific JSON configuration files\n2. `make-mmcif`: Generate mmCIF files using the configurations\n\n### Fetching Facility JSON Files\n\nThe JSON operations files determine how the data would be mapped from the original source and translated into mmCIF format.\n\nThese files can be written, but can also be fetched from the github repository using simple commands.\n\n```bash\n# Fetch configuration for a specific facility\nmmcif-gen fetch-facility-json dls-metadata\n\n# Specify custom output directory\nmmcif-gen fetch-facility-json dls-metadata -o ./mapping_operations\n```\n\n### Generating metadata mmCIF Files\n\nCurrently the valid facilities to generate mmcif files for are `pdbe`, `maxiv`, `dls`, and `xchem`.\n\nThe general syntax for generating mmCIF files is:\n\n```bash\nmmcif-gen make-mmcif <facility> [options]\n````\n\nFull list of options:\n```\n[w3_pdb05@pdb-001 Investigations]$ mmcif-gen make-mmcif --help\nusage: mmcif-gen make-mmcif [-h] [--json JSON] [--output-folder OUTPUT_FOLDER]\n                            [--id ID]\n                            {pdbe,maxiv,dls,xchem} ...\n\npositional arguments:\n  {pdbe,maxiv,dls,xchem}\n                        Specifies facility for which mmcif files will be used\n                        for\n    pdbe                Parameter requirements for investigation files from\n                        PDBe data\n    maxiv               Parameter requirements for investigation files from\n                        MAX IV data\n    dls                 Parameter requirements for creating investigation\n                        files from DLS data\n    xchem               Parameter requirements for creating investigation\n                        files from XChem data\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --json JSON           Path to transformation JSON file\n  --output-folder OUTPUT_FOLDER\n                        Output folder for mmCIF files\n  --id ID               File identifier\n```\n\nEach facility has its own set of required parameters, which can be checked by running the command with the `--help` flag.\n\n\n```\nmmcif-gen make-mmcif pdbe --help\n```\n#### Example Usage\n\n#### DLS (Diamond Light Source)\n\n```bash\n# Using metadata configuration\nmmcif-gen make-mmcif --json dls_metadata.json --output-folder ./out --id I_1234 dls --dls-json metadata-from-isypb.json\n```\n\n#### XChem\nParameters required\n```\n$ mmcif-gen make-mmcif xchem --help                                                                      \nusage: mmcif-gen make-mmcif xchem [-h] [--sqlite SQLITE] [--cif-type {model,investigation}]\n\noptions:\n  -h, --help            show this help message and exit\n  --sqlite SQLITE       Path to the .sqlite file for each data set\n  --cif-type {model,investigation}\n                        Type of the CIF file that will be generated\n```\n\nExample command:\n```\nmmcif-gen make-mmcif --id 001 --json mmcif_gen/operations/xchem/xchem_metadata.json --output-folder pdbedeposit xchem --sqlite mmcif_gen/test/data/lb32633-1-soakDBDataFile.sqlite --cif-type model\n```\n\n### Working with Investigation Files\n\nInvestigation files are a specialized type of mmCIF file that capture metadata across multiple experiments.\n\nInvestigation files are created in a very similar way:\n\n#### PDBe\n\n```bash\n# Using model folder\nmmcif-gen make-mmcif --json pdbe_investigation.json --output-folder ./out --id I_1234 pdbe --model-folder ./models \n\n# Using PDB IDs\nmmcif-gen make-mmcif  --json pdbe_investigation.json --output-folder ./out pdbe  --pdb-ids 6dmn 6dpp 6do8\n\n# Using CSV input\nmmcif-gen make-mmcif  --json pdbe_investigation.json --output-folder ./out pdbe --csv-file groups.csv \n```\n\n#### MAX IV\n\n```bash\n# Using SQLite database\nmmcif-gen make-mmcif maxiv --json maxiv_investigation.json --sqlite fragmax.sqlite --output-folder ./out --id I_1234\n```\n\n#### XChem\n\n```bash\n# Using SQLite database with additional information\nmmcif-gen make-mmcif xchem --json xchem_investigation.json --sqlite soakdb.sqlite --txt ./metadata --deposit ./deposit --output-folder ./out\n```\n\n\n## Data Enrichment\n\nFor investigation files that need enrichment with additional data (e.g., ground state information):\n\n```bash\n# Using the miss_importer utility\npython miss_importer.py --investigation-file inv.cif --sf-file structure.sf --pdb-id 1ABC\n```\n\n## Operation JSON Files\n\nThe tool uses JSON configuration files to define how data should be transformed into mmCIF format. These files can be:\n\n1. Fetched files using the `fetch-facility-json` command\n2. Modified versions of official configurations\n\n### Configuration File Structure\n\n```json\n    {\n        \"source_category\" : \"_audit_author\",\n        \"source_items\" : [\"name\"],\n        \"target_category\" : \"_audit_author\",\n        \"target_items\" : \"_same\",\n        \"operation\" : \"distinct_union\",\n        \"operation_parameters\" :{\n            \"primary_parameters\" : [\"name\"]\n        }\n    }\n```\n\nRefer to existing JSON files in the `operations/` directory for examples.\n\n\n## Development\n\n### Project Structure\n\n```\nmmcif-gen/\n\u251c\u2500\u2500 facilities/            # Facility-specific implementations\n\u2502   \u251c\u2500\u2500 pdbe.py\n\u2502   \u251c\u2500\u2500 maxiv.py\n\u2502   \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 operations/           # JSON configuration files\n\u2502   \u251c\u2500\u2500 dls/\n\u2502   \u251c\u2500\u2500 maxiv/\n\u2502   \u2514\u2500\u2500 ...\n\u251c\u2500\u2500 tests/               # Test cases\n\u251c\u2500\u2500 setup.py            # Package configuration\n\u2514\u2500\u2500 README.md          # Documentation\n```\n\n### Running Tests\n\n```bash\npython -m unittest discover -s tests\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n\n## Support\n\nFor issues and questions, please use the [GitHub issue tracker](https://github.com/PDBeurope/Investigations/issues).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "CLI tool for creating mmCIF files from various facility data sources",
    "version": "1.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/PDBeurope/Investigations/issues",
        "Documentation": "https://github.com/PDBeurope/Investigations/",
        "Homepage": "https://github.com/PDBeurope/Investigations/",
        "Source Code": "https://github.com/PDBeurope/Investigations/"
    },
    "split_keywords": [
        "mmcif",
        " crystallography",
        " structural-biology",
        " pdbe",
        " synchrotron"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dfc0cfdb1870f6ea19c4452dee4d89daee0cd65cdb8da515e4727ce9bfe5f71f",
                "md5": "1be58f852d42e727784416cd202fdf85",
                "sha256": "038653ad88c1f84f0272cb750251065bde8131adb7a0acb69edf186426786d3f"
            },
            "downloads": -1,
            "filename": "mmcif_gen-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1be58f852d42e727784416cd202fdf85",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 34892,
            "upload_time": "2025-08-15T09:26:21",
            "upload_time_iso_8601": "2025-08-15T09:26:21.033000Z",
            "url": "https://files.pythonhosted.org/packages/df/c0/cfdb1870f6ea19c4452dee4d89daee0cd65cdb8da515e4727ce9bfe5f71f/mmcif_gen-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4bb6ae650e119d491cb3e9b788669f6d9aa4b09bbfeb1f8affc8df0bff75e0fc",
                "md5": "44fa09fc4a67e96dfdbd8c49ce14f0f0",
                "sha256": "cda5506b678f6fc47e44f0b2bc5df5d397a7366b8e023f25311156154529a6cb"
            },
            "downloads": -1,
            "filename": "mmcif_gen-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "44fa09fc4a67e96dfdbd8c49ce14f0f0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 30876,
            "upload_time": "2025-08-15T09:26:21",
            "upload_time_iso_8601": "2025-08-15T09:26:21.899302Z",
            "url": "https://files.pythonhosted.org/packages/4b/b6/ae650e119d491cb3e9b788669f6d9aa4b09bbfeb1f8affc8df0bff75e0fc/mmcif_gen-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-15 09:26:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "PDBeurope",
    "github_project": "Investigations",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2023.7.22"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.2.0"
                ]
            ]
        },
        {
            "name": "gemmi",
            "specs": [
                [
                    "==",
                    "0.6.4"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.4"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.31.0"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.0.5"
                ]
            ]
        },
        {
            "name": "pyjq",
            "specs": [
                [
                    ">=",
                    "2.3.1"
                ]
            ]
        }
    ],
    "lcname": "mmcif-gen"
}
        
Elapsed time: 2.11580s