Name | maggma JSON |
Version |
0.71.4
JSON |
| download |
home_page | None |
Summary | Framework to develop datapipelines from files on disk to full dissemenation API |
upload_time | 2025-02-06 16:21:33 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | maggma Copyright (c) 2017, The Regents of the University of
California, through Lawrence Berkeley National Laboratory (subject
to receipt of any required approvals from the U.S. Dept. of Energy).
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
(1) Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
(2) Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided with
the distribution.
(3) Neither the name of the University of California, Lawrence
Berkeley National Laboratory, U.S. Dept. of Energy nor the names of
its contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
You are under no obligation whatsoever to provide any bug fixes,
patches, or upgrades to the features, functionality or performance
of the source code ("Enhancements") to anyone; however, if you
choose to make your Enhancements available either publicly, or
directly to Lawrence Berkeley National Laboratory or its
contributors, without imposing a separate written license agreement
for such Enhancements, then you hereby grant the following license:
a non-exclusive, royalty-free perpetual license to install, use,
modify, prepare derivative works, incorporate into other computer
software, distribute, and sublicense such enhancements or derivative
works thereof, in binary and source code form.
|
keywords |
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
|
# 
[](https://materialsproject.github.io/maggma) [](https://github.com/materialsproject/maggma/actions?query=workflow%3Atesting) [](https://codecov.io/gh/materialsproject/maggma) []()
## What is Maggma
Maggma is a framework to build scientific data processing pipelines from data stored in
a variety of formats -- databases, Azure Blobs, files on disk, etc., all the way to a
REST API. The rest of this README contains a brief, high-level overview of what `maggma` can do.
For more, please refer to [the documentation](https://materialsproject.github.io/maggma).
## Installation from PyPI
Maggma is published on the [Python Package Index](https://pypi.org/project/maggma/). The preferred tool for installing
packages from *PyPi* is **pip**. This tool is provided with all modern
versions of Python.
Open your terminal and run the following command.
``` shell
pip install --upgrade maggma
```
## Basic Concepts
`maggma`'s core classes -- [`Store`](#store) and [`Builder`](#builder) -- provide building blocks for
modular data pipelines. Data resides in one or more `Store` and is processed by a
`Builder`. The results of the processing are saved in another `Store`, and so on:
```mermaid
flowchart LR
s1(Store 1) --Builder 1--> s2(Store 2) --Builder 2--> s3(Store 3)
s2 -- Builder 3-->s4(Store 4)
```
### Store
A major challenge in building scalable data pipelines is dealing with all the different types of data sources out there. Maggma's `Store` class provides a consistent, unified interface for querying data from arbitrary data sources. It was originally built around MongoDB, so it's interface closely resembles `PyMongo` syntax. However, Maggma makes it possible to use that same syntax to query other types of databases, such as Amazon S3, GridFS, or files on disk, [and many others](https://materialsproject.github.io/maggma/getting_started/stores/#list-of-stores). Stores implement methods to `connect`, `query`, find `distinct` values, `groupby` fields, `update` documents, and `remove` documents.
The example below demonstrates inserting 4 documents (python `dicts`) into a `MongoStore` with `update`, then
accessing the data using `count`, `query`, and `distinct`.
```python
>>> turtles = [{"name": "Leonardo", "color": "blue", "tool": "sword"},
{"name": "Donatello","color": "purple", "tool": "staff"},
{"name": "Michelangelo", "color": "orange", "tool": "nunchuks"},
{"name":"Raphael", "color": "red", "tool": "sai"}
]
>>> store = MongoStore(database="my_db_name",
collection_name="my_collection_name",
username="my_username",
password="my_password",
host="my_hostname",
port=27017,
key="name",
)
>>> with store:
store.update(turtles)
>>> store.count()
4
>>> store.query_one({})
{'_id': ObjectId('66746d29a78e8431daa3463a'), 'name': 'Leonardo', 'color': 'blue', 'tool': 'sword'}
>>> store.distinct('color')
['purple', 'orange', 'blue', 'red']
```
### Builder
Builders represent a data processing step, analogous to an extract-transform-load (ETL) operation in a data
warehouse model. Much like `Store` provides a consistent interface for accessing data, the `Builder` classes
provide a consistent interface for transforming it. `Builder` transformation are each broken into 3 phases: `get_items`, `process_item`, and `update_targets`:
1. `get_items`: Retrieve items from the source Store(s) for processing by the next phase
2. `process_item`: Manipulate the input item and create an output document that is sent to the next phase for storage.
3. `update_target`: Add the processed item to the target Store(s).
Both `get_items` and `update_targets` can perform IO (input/output) to the data stores. `process_item` is expected to not perform any IO so that it can be parallelized by Maggma. Builders can be chained together into an array and then saved as a JSON file to be run on a production system.
## Origin and Maintainers
Maggma has been developed and is maintained by the [Materials Project](https://materialsproject.org/) team at Lawrence Berkeley National Laboratory and the [Materials Project Software Foundation](https://github.com/materialsproject/foundation).
Maggma is written in [Python](http://docs.python-guide.org/en/latest/) and supports Python 3.9+.
Raw data
{
"_id": null,
"home_page": null,
"name": "maggma",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "The Materials Project <feedback@materialsproject.org>",
"download_url": "https://files.pythonhosted.org/packages/fe/cf/77f5100b41c8ad1934d784f570a8bf0a975892b9177c28874d2b4fe64116/maggma-0.71.4.tar.gz",
"platform": null,
"description": "\n# \n\n[](https://materialsproject.github.io/maggma) [](https://github.com/materialsproject/maggma/actions?query=workflow%3Atesting) [](https://codecov.io/gh/materialsproject/maggma) []()\n\n## What is Maggma\n\nMaggma is a framework to build scientific data processing pipelines from data stored in\na variety of formats -- databases, Azure Blobs, files on disk, etc., all the way to a\nREST API. The rest of this README contains a brief, high-level overview of what `maggma` can do.\nFor more, please refer to [the documentation](https://materialsproject.github.io/maggma).\n\n\n## Installation from PyPI\n\nMaggma is published on the [Python Package Index](https://pypi.org/project/maggma/). The preferred tool for installing\npackages from *PyPi* is **pip**. This tool is provided with all modern\nversions of Python.\n\nOpen your terminal and run the following command.\n\n``` shell\npip install --upgrade maggma\n```\n\n## Basic Concepts\n\n`maggma`'s core classes -- [`Store`](#store) and [`Builder`](#builder) -- provide building blocks for\nmodular data pipelines. Data resides in one or more `Store` and is processed by a\n`Builder`. The results of the processing are saved in another `Store`, and so on:\n\n```mermaid\nflowchart\u00a0LR\n\u00a0\u00a0\u00a0\u00a0s1(Store 1)\u00a0--Builder 1-->\u00a0s2(Store 2) --Builder 2--> s3(Store 3)\ns2 -- Builder 3-->s4(Store 4)\n```\n\n### Store\n\nA major challenge in building scalable data pipelines is dealing with all the different types of data sources out there. Maggma's `Store` class provides a consistent, unified interface for querying data from arbitrary data sources. It was originally built around MongoDB, so it's interface closely resembles `PyMongo` syntax. However, Maggma makes it possible to use that same syntax to query other types of databases, such as Amazon S3, GridFS, or files on disk, [and many others](https://materialsproject.github.io/maggma/getting_started/stores/#list-of-stores). Stores implement methods to `connect`, `query`, find `distinct` values, `groupby` fields, `update` documents, and `remove` documents.\n\nThe example below demonstrates inserting 4 documents (python `dicts`) into a `MongoStore` with `update`, then\naccessing the data using `count`, `query`, and `distinct`.\n\n```python\n>>> turtles = [{\"name\": \"Leonardo\", \"color\": \"blue\", \"tool\": \"sword\"},\n {\"name\": \"Donatello\",\"color\": \"purple\", \"tool\": \"staff\"},\n {\"name\": \"Michelangelo\", \"color\": \"orange\", \"tool\": \"nunchuks\"},\n {\"name\":\"Raphael\", \"color\": \"red\", \"tool\": \"sai\"}\n ]\n>>> store = MongoStore(database=\"my_db_name\",\n collection_name=\"my_collection_name\",\n username=\"my_username\",\n password=\"my_password\",\n host=\"my_hostname\",\n port=27017,\n key=\"name\",\n )\n>>> with store:\n store.update(turtles)\n>>> store.count()\n4\n>>> store.query_one({})\n{'_id': ObjectId('66746d29a78e8431daa3463a'), 'name': 'Leonardo', 'color': 'blue', 'tool': 'sword'}\n>>> store.distinct('color')\n['purple', 'orange', 'blue', 'red']\n```\n\n### Builder\n\nBuilders represent a data processing step, analogous to an extract-transform-load (ETL) operation in a data\nwarehouse model. Much like `Store` provides a consistent interface for accessing data, the `Builder` classes\nprovide a consistent interface for transforming it. `Builder` transformation are each broken into 3 phases: `get_items`, `process_item`, and `update_targets`:\n\n1. `get_items`: Retrieve items from the source Store(s) for processing by the next phase\n2. `process_item`: Manipulate the input item and create an output document that is sent to the next phase for storage.\n3. `update_target`: Add the processed item to the target Store(s).\n\nBoth `get_items` and `update_targets` can perform IO (input/output) to the data stores. `process_item` is expected to not perform any IO so that it can be parallelized by Maggma. Builders can be chained together into an array and then saved as a JSON file to be run on a production system.\n\n## Origin and Maintainers\n\nMaggma has been developed and is maintained by the [Materials Project](https://materialsproject.org/) team at Lawrence Berkeley National Laboratory and the [Materials Project Software Foundation](https://github.com/materialsproject/foundation).\n\nMaggma is written in [Python](http://docs.python-guide.org/en/latest/) and supports Python 3.9+.\n",
"bugtrack_url": null,
"license": "maggma Copyright (c) 2017, The Regents of the University of\n California, through Lawrence Berkeley National Laboratory (subject\n to receipt of any required approvals from the U.S. Dept. of Energy).\n All rights reserved.\n \n Redistribution and use in source and binary forms, with or without\n modification, are permitted provided that the following conditions\n are met:\n \n (1) Redistributions of source code must retain the above copyright\n notice, this list of conditions and the following disclaimer.\n \n (2) Redistributions in binary form must reproduce the above\n copyright notice, this list of conditions and the following\n disclaimer in the documentation and/or other materials provided with\n the distribution.\n \n (3) Neither the name of the University of California, Lawrence\n Berkeley National Laboratory, U.S. Dept. of Energy nor the names of\n its contributors may be used to endorse or promote products derived\n from this software without specific prior written permission.\n \n THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\n LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS\n FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE\n COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,\n INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,\n BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;\n LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\n CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\n LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN\n ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE\n POSSIBILITY OF SUCH DAMAGE.\n \n You are under no obligation whatsoever to provide any bug fixes,\n patches, or upgrades to the features, functionality or performance\n of the source code (\"Enhancements\") to anyone; however, if you\n choose to make your Enhancements available either publicly, or\n directly to Lawrence Berkeley National Laboratory or its\n contributors, without imposing a separate written license agreement\n for such Enhancements, then you hereby grant the following license:\n a non-exclusive, royalty-free perpetual license to install, use,\n modify, prepare derivative works, incorporate into other computer\n software, distribute, and sublicense such enhancements or derivative\n works thereof, in binary and source code form.\n ",
"summary": "Framework to develop datapipelines from files on disk to full dissemenation API",
"version": "0.71.4",
"project_urls": {
"Docs": "https://materialsproject.github.io/maggma/",
"Package": "https://pypi.org/project/maggma",
"Repo": "https://github.com/materialsproject/maggma"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "53b8e302bd16aef60875c597c41eb822d302d45bb00ce833f04242509265d84b",
"md5": "710afa6af899aab35277e425c013a771",
"sha256": "9d0ec7781f4dda6e5f822e57430a4debf5d8e4172179eb9ac4cfb0acb3ff6971"
},
"downloads": -1,
"filename": "maggma-0.71.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "710afa6af899aab35277e425c013a771",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 122427,
"upload_time": "2025-02-06T16:21:31",
"upload_time_iso_8601": "2025-02-06T16:21:31.356297Z",
"url": "https://files.pythonhosted.org/packages/53/b8/e302bd16aef60875c597c41eb822d302d45bb00ce833f04242509265d84b/maggma-0.71.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "fecf77f5100b41c8ad1934d784f570a8bf0a975892b9177c28874d2b4fe64116",
"md5": "fe838e353aefdb3fa50bed07a8566b38",
"sha256": "ad267385ff8778ba95859abd9eb302da6a19d22370b03a5703c8fd0ffb4a72f9"
},
"downloads": -1,
"filename": "maggma-0.71.4.tar.gz",
"has_sig": false,
"md5_digest": "fe838e353aefdb3fa50bed07a8566b38",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 226088,
"upload_time": "2025-02-06T16:21:33",
"upload_time_iso_8601": "2025-02-06T16:21:33.912945Z",
"url": "https://files.pythonhosted.org/packages/fe/cf/77f5100b41c8ad1934d784f570a8bf0a975892b9177c28874d2b4fe64116/maggma-0.71.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-06 16:21:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "materialsproject",
"github_project": "maggma",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "maggma"
}