maggma


Namemaggma JSON
Version 0.71.4 PyPI version JSON
download
home_pageNone
SummaryFramework to develop datapipelines from files on disk to full dissemenation API
upload_time2025-02-06 16:21:33
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licensemaggma Copyright (c) 2017, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: (1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. (2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. (3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory or its contributors, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            
# ![Maggma](docs/logo_w_text.svg)

[![Static Badge](https://img.shields.io/badge/documentation-blue?logo=github)](https://materialsproject.github.io/maggma) [![testing](https://github.com/materialsproject/maggma/workflows/testing/badge.svg)](https://github.com/materialsproject/maggma/actions?query=workflow%3Atesting) [![codecov](https://codecov.io/gh/materialsproject/maggma/branch/main/graph/badge.svg)](https://codecov.io/gh/materialsproject/maggma) [![python](https://img.shields.io/badge/Python-3.9+-blue.svg?logo=python&logoColor=white)]()

## What is Maggma

Maggma is a framework to build scientific data processing pipelines from data stored in
a variety of formats -- databases, Azure Blobs, files on disk, etc., all the way to a
REST API. The rest of this README contains a brief, high-level overview of what `maggma` can do.
For more, please refer to [the documentation](https://materialsproject.github.io/maggma).


## Installation from PyPI

Maggma is published on the [Python Package Index](https://pypi.org/project/maggma/).  The preferred tool for installing
packages from *PyPi* is **pip**.  This tool is provided with all modern
versions of Python.

Open your terminal and run the following command.

``` shell
pip install --upgrade maggma
```

## Basic Concepts

`maggma`'s core classes -- [`Store`](#store) and [`Builder`](#builder) -- provide building blocks for
modular data pipelines. Data resides in one or more `Store` and is processed by a
`Builder`. The results of the processing are saved in another `Store`, and so on:

```mermaid
flowchart LR
    s1(Store 1) --Builder 1--> s2(Store 2) --Builder 2--> s3(Store 3)
s2 -- Builder 3-->s4(Store 4)
```

### Store

A major challenge in building scalable data pipelines is dealing with all the different types of data sources out there. Maggma's `Store` class provides a consistent, unified interface for querying data from arbitrary data sources. It was originally built around MongoDB, so it's interface closely resembles `PyMongo` syntax. However, Maggma makes it possible to use that same syntax to query other types of databases, such as Amazon S3, GridFS, or files on disk, [and many others](https://materialsproject.github.io/maggma/getting_started/stores/#list-of-stores). Stores implement methods to `connect`, `query`, find `distinct` values, `groupby` fields, `update` documents, and `remove` documents.

The example below demonstrates inserting 4 documents (python `dicts`) into a `MongoStore` with `update`, then
accessing the data using `count`, `query`, and `distinct`.

```python
>>> turtles = [{"name": "Leonardo", "color": "blue", "tool": "sword"},
               {"name": "Donatello","color": "purple", "tool": "staff"},
               {"name": "Michelangelo", "color": "orange", "tool": "nunchuks"},
               {"name":"Raphael", "color": "red", "tool": "sai"}
            ]
>>> store = MongoStore(database="my_db_name",
                       collection_name="my_collection_name",
                       username="my_username",
                       password="my_password",
                       host="my_hostname",
                       port=27017,
                       key="name",
                    )
>>> with store:
        store.update(turtles)
>>> store.count()
4
>>> store.query_one({})
{'_id': ObjectId('66746d29a78e8431daa3463a'), 'name': 'Leonardo', 'color': 'blue', 'tool': 'sword'}
>>> store.distinct('color')
['purple', 'orange', 'blue', 'red']
```

### Builder

Builders represent a data processing step, analogous to an extract-transform-load (ETL) operation in a data
warehouse model. Much like `Store` provides a consistent interface for accessing data, the `Builder` classes
provide a consistent interface for transforming it. `Builder` transformation are each broken into 3 phases: `get_items`, `process_item`, and `update_targets`:

1. `get_items`: Retrieve items from the source Store(s) for processing by the next phase
2. `process_item`: Manipulate the input item and create an output document that is sent to the next phase for storage.
3. `update_target`: Add the processed item to the target Store(s).

Both `get_items` and `update_targets` can perform IO (input/output) to the data stores. `process_item` is expected to not perform any IO so that it can be parallelized by Maggma. Builders can be chained together into an array and then saved as a JSON file to be run on a production system.

## Origin and Maintainers

Maggma has been developed and is maintained by the [Materials Project](https://materialsproject.org/) team at Lawrence Berkeley National Laboratory and the [Materials Project Software Foundation](https://github.com/materialsproject/foundation).

Maggma is written in [Python](http://docs.python-guide.org/en/latest/) and supports Python 3.9+.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "maggma",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "The Materials Project <feedback@materialsproject.org>",
    "download_url": "https://files.pythonhosted.org/packages/fe/cf/77f5100b41c8ad1934d784f570a8bf0a975892b9177c28874d2b4fe64116/maggma-0.71.4.tar.gz",
    "platform": null,
    "description": "\n# ![Maggma](docs/logo_w_text.svg)\n\n[![Static Badge](https://img.shields.io/badge/documentation-blue?logo=github)](https://materialsproject.github.io/maggma) [![testing](https://github.com/materialsproject/maggma/workflows/testing/badge.svg)](https://github.com/materialsproject/maggma/actions?query=workflow%3Atesting) [![codecov](https://codecov.io/gh/materialsproject/maggma/branch/main/graph/badge.svg)](https://codecov.io/gh/materialsproject/maggma) [![python](https://img.shields.io/badge/Python-3.9+-blue.svg?logo=python&amp;logoColor=white)]()\n\n## What is Maggma\n\nMaggma is a framework to build scientific data processing pipelines from data stored in\na variety of formats -- databases, Azure Blobs, files on disk, etc., all the way to a\nREST API. The rest of this README contains a brief, high-level overview of what `maggma` can do.\nFor more, please refer to [the documentation](https://materialsproject.github.io/maggma).\n\n\n## Installation from PyPI\n\nMaggma is published on the [Python Package Index](https://pypi.org/project/maggma/).  The preferred tool for installing\npackages from *PyPi* is **pip**.  This tool is provided with all modern\nversions of Python.\n\nOpen your terminal and run the following command.\n\n``` shell\npip install --upgrade maggma\n```\n\n## Basic Concepts\n\n`maggma`'s core classes -- [`Store`](#store) and [`Builder`](#builder) -- provide building blocks for\nmodular data pipelines. Data resides in one or more `Store` and is processed by a\n`Builder`. The results of the processing are saved in another `Store`, and so on:\n\n```mermaid\nflowchart\u00a0LR\n\u00a0\u00a0\u00a0\u00a0s1(Store 1)\u00a0--Builder 1-->\u00a0s2(Store 2) --Builder 2--> s3(Store 3)\ns2 -- Builder 3-->s4(Store 4)\n```\n\n### Store\n\nA major challenge in building scalable data pipelines is dealing with all the different types of data sources out there. Maggma's `Store` class provides a consistent, unified interface for querying data from arbitrary data sources. It was originally built around MongoDB, so it's interface closely resembles `PyMongo` syntax. However, Maggma makes it possible to use that same syntax to query other types of databases, such as Amazon S3, GridFS, or files on disk, [and many others](https://materialsproject.github.io/maggma/getting_started/stores/#list-of-stores). Stores implement methods to `connect`, `query`, find `distinct` values, `groupby` fields, `update` documents, and `remove` documents.\n\nThe example below demonstrates inserting 4 documents (python `dicts`) into a `MongoStore` with `update`, then\naccessing the data using `count`, `query`, and `distinct`.\n\n```python\n>>> turtles = [{\"name\": \"Leonardo\", \"color\": \"blue\", \"tool\": \"sword\"},\n               {\"name\": \"Donatello\",\"color\": \"purple\", \"tool\": \"staff\"},\n               {\"name\": \"Michelangelo\", \"color\": \"orange\", \"tool\": \"nunchuks\"},\n               {\"name\":\"Raphael\", \"color\": \"red\", \"tool\": \"sai\"}\n            ]\n>>> store = MongoStore(database=\"my_db_name\",\n                       collection_name=\"my_collection_name\",\n                       username=\"my_username\",\n                       password=\"my_password\",\n                       host=\"my_hostname\",\n                       port=27017,\n                       key=\"name\",\n                    )\n>>> with store:\n        store.update(turtles)\n>>> store.count()\n4\n>>> store.query_one({})\n{'_id': ObjectId('66746d29a78e8431daa3463a'), 'name': 'Leonardo', 'color': 'blue', 'tool': 'sword'}\n>>> store.distinct('color')\n['purple', 'orange', 'blue', 'red']\n```\n\n### Builder\n\nBuilders represent a data processing step, analogous to an extract-transform-load (ETL) operation in a data\nwarehouse model. Much like `Store` provides a consistent interface for accessing data, the `Builder` classes\nprovide a consistent interface for transforming it. `Builder` transformation are each broken into 3 phases: `get_items`, `process_item`, and `update_targets`:\n\n1. `get_items`: Retrieve items from the source Store(s) for processing by the next phase\n2. `process_item`: Manipulate the input item and create an output document that is sent to the next phase for storage.\n3. `update_target`: Add the processed item to the target Store(s).\n\nBoth `get_items` and `update_targets` can perform IO (input/output) to the data stores. `process_item` is expected to not perform any IO so that it can be parallelized by Maggma. Builders can be chained together into an array and then saved as a JSON file to be run on a production system.\n\n## Origin and Maintainers\n\nMaggma has been developed and is maintained by the [Materials Project](https://materialsproject.org/) team at Lawrence Berkeley National Laboratory and the [Materials Project Software Foundation](https://github.com/materialsproject/foundation).\n\nMaggma is written in [Python](http://docs.python-guide.org/en/latest/) and supports Python 3.9+.\n",
    "bugtrack_url": null,
    "license": "maggma Copyright (c) 2017, The Regents of the University of\n        California, through Lawrence Berkeley National Laboratory (subject\n        to receipt of any required approvals from the U.S. Dept. of Energy).\n        All rights reserved.\n        \n        Redistribution and use in source and binary forms, with or without\n        modification, are permitted provided that the following conditions\n        are met:\n        \n        (1) Redistributions of source code must retain the above copyright\n        notice, this list of conditions and the following disclaimer.\n        \n        (2) Redistributions in binary form must reproduce the above\n        copyright notice, this list of conditions and the following\n        disclaimer in the documentation and/or other materials provided with\n        the distribution.\n        \n        (3) Neither the name of the University of California, Lawrence\n        Berkeley National Laboratory, U.S. Dept. of Energy nor the names of\n        its contributors may be used to endorse or promote products derived\n        from this software without specific prior written permission.\n        \n        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n        \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\n        LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS\n        FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE\n        COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,\n        INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,\n        BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;\n        LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\n        LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN\n        ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE\n        POSSIBILITY OF SUCH DAMAGE.\n        \n        You are under no obligation whatsoever to provide any bug fixes,\n        patches, or upgrades to the features, functionality or performance\n        of the source code (\"Enhancements\") to anyone; however, if you\n        choose to make your Enhancements available either publicly, or\n        directly to Lawrence Berkeley National Laboratory or its\n        contributors, without imposing a separate written license agreement\n        for such Enhancements, then you hereby grant the following license:\n        a  non-exclusive, royalty-free perpetual license to install, use,\n        modify, prepare derivative works, incorporate into other computer\n        software, distribute, and sublicense such enhancements or derivative\n        works thereof, in binary and source code form.\n        ",
    "summary": "Framework to develop datapipelines from files on disk to full dissemenation API",
    "version": "0.71.4",
    "project_urls": {
        "Docs": "https://materialsproject.github.io/maggma/",
        "Package": "https://pypi.org/project/maggma",
        "Repo": "https://github.com/materialsproject/maggma"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "53b8e302bd16aef60875c597c41eb822d302d45bb00ce833f04242509265d84b",
                "md5": "710afa6af899aab35277e425c013a771",
                "sha256": "9d0ec7781f4dda6e5f822e57430a4debf5d8e4172179eb9ac4cfb0acb3ff6971"
            },
            "downloads": -1,
            "filename": "maggma-0.71.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "710afa6af899aab35277e425c013a771",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 122427,
            "upload_time": "2025-02-06T16:21:31",
            "upload_time_iso_8601": "2025-02-06T16:21:31.356297Z",
            "url": "https://files.pythonhosted.org/packages/53/b8/e302bd16aef60875c597c41eb822d302d45bb00ce833f04242509265d84b/maggma-0.71.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fecf77f5100b41c8ad1934d784f570a8bf0a975892b9177c28874d2b4fe64116",
                "md5": "fe838e353aefdb3fa50bed07a8566b38",
                "sha256": "ad267385ff8778ba95859abd9eb302da6a19d22370b03a5703c8fd0ffb4a72f9"
            },
            "downloads": -1,
            "filename": "maggma-0.71.4.tar.gz",
            "has_sig": false,
            "md5_digest": "fe838e353aefdb3fa50bed07a8566b38",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 226088,
            "upload_time": "2025-02-06T16:21:33",
            "upload_time_iso_8601": "2025-02-06T16:21:33.912945Z",
            "url": "https://files.pythonhosted.org/packages/fe/cf/77f5100b41c8ad1934d784f570a8bf0a975892b9177c28874d2b4fe64116/maggma-0.71.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-06 16:21:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "materialsproject",
    "github_project": "maggma",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "maggma"
}
        
Elapsed time: 0.58555s