graphique


Namegraphique JSON
Version 1.8 PyPI version JSON
download
home_pageNone
SummaryGraphQL service for arrow tables and parquet data sets.
upload_time2024-11-02 00:29:00
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseCopyright 2022 Aric Coady Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
keywords graphql arrow parquet
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![image](https://img.shields.io/pypi/v/graphique.svg)](https://pypi.org/project/graphique/)
![image](https://img.shields.io/pypi/pyversions/graphique.svg)
[![image](https://pepy.tech/badge/graphique)](https://pepy.tech/project/graphique)
![image](https://img.shields.io/pypi/status/graphique.svg)
[![build](https://github.com/coady/graphique/actions/workflows/build.yml/badge.svg)](https://github.com/coady/graphique/actions/workflows/build.yml)
[![image](https://codecov.io/gh/coady/graphique/branch/main/graph/badge.svg)](https://codecov.io/gh/coady/graphique/)
[![CodeQL](https://github.com/coady/graphique/actions/workflows/github-code-scanning/codeql/badge.svg)](https://github.com/coady/graphique/actions/workflows/github-code-scanning/codeql)
[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/coady/graphique)
[![image](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![image](https://mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)

[GraphQL](https://graphql.org) service for [arrow](https://arrow.apache.org) tables and [parquet](https://parquet.apache.org) data sets. The schema for a query API is derived automatically.

## Usage
```console
% env PARQUET_PATH=... uvicorn graphique.service:app
```

Open http://localhost:8000/ to try out the API in [GraphiQL](https://github.com/graphql/graphiql/tree/main/packages/graphiql#readme). There is a test fixture at `./tests/fixtures/zipcodes.parquet`.

```console
% env PARQUET_PATH=... strawberry export-schema graphique.service:app.schema
```
outputs the graphql schema for a parquet data set.

### Configuration
Graphique uses [Starlette's config](https://www.starlette.io/config/): in environment variables or a `.env` file. Config variables are used as input to a [parquet dataset](https://arrow.apache.org/docs/python/dataset.html).

* PARQUET_PATH: path to the parquet directory or file
* FEDERATED = '': field name to extend type `Query` with a federated `Table` 
* DEBUG = False: run service in debug mode, which includes metrics
* COLUMNS = None: list of names, or mapping of aliases, of columns to select
* FILTERS = None: json `filter` query for which rows to read at startup

For more options create a custom [ASGI](https://asgi.readthedocs.io/en/latest/index.html) app. Call graphique's `GraphQL` on an arrow [Dataset](https://arrow.apache.org/docs/python/api/dataset.html), [Scanner](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html), or [Table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html). The GraphQL `Table` type will be the root Query type.

Supply a mapping of names to datasets for multiple roots, and to enable federation.

```python
import pyarrow.dataset as ds
from graphique import GraphQL

source = ds.dataset(...)
app = GraphQL(source)  # Table is root query type
app = GraphQL.federated({<name>: source, ...}, keys={<name>: [], ...})  # Tables on federated fields
```

Start like any ASGI app.

```console
uvicorn <module>:app
```

Configuration options exist to provide a convenient no-code solution, but are subject to change in the future. Using a custom app is recommended for production usage.

### API
#### types
* `Dataset`: interface for an arrow dataset, scanner, or table.
* `Table`: implements the `Dataset` interface. Adds typed `row`, `columns`, and `filter` fields from introspecting the schema.
* `Column`: interface for an arrow column (a.k.a. ChunkedArray). Each arrow data type has a corresponding column implementation: Boolean, Int, Long, Float, Decimal, Date, Datetime, Time, Duration, Base64, String, List, Struct. All columns have a `values` field for their list of scalars. Additional fields vary by type.
* `Row`: scalar fields. Arrow tables are column-oriented, and graphique encourages that usage for performance. A single `row` field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.

#### selection
* `slice`: contiguous selection of rows
* `filter`: select rows with simple predicates
* `scan`: select rows and project columns with expressions

#### projection
* `columns`: provides a field for every `Column` in the schema
* `column`: access a column of any type by name
* `row`: provides a field for each scalar of a single row
* `apply`: transform columns by applying a function
* `join`: join tables by key columns

#### aggregation
* `group`: group by given columns, and aggregate the others
* `runs`: partition on adjacent values in given columns, transforming the others into list columns
* `tables`: return a list of tables by splitting on the scalars in list columns
* `flatten`: flatten list columns with repeated scalars

#### ordering
* `sort`: sort table by given columns
* `rank`: select rows with smallest or largest values

### Performance
Graphique relies on native [PyArrow](https://arrow.apache.org/docs/python/index.html) routines wherever possible. Otherwise it falls back to using [NumPy](https://numpy.org/doc/stable/) or custom optimizations.

By default, datasets are read on-demand, with only the necessary rows and columns scanned. Although graphique is a running service, [parquet is performant](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html) at reading a subset of data. Optionally specify `FILTERS` in the json `filter` format to read a subset of rows at startup, trading-off memory for latency. An empty filter (`{}`) will read the whole table.

Specifying `COLUMNS` will limit memory usage when reading at startup (`FILTERS`). There is little speed difference as unused columns are inherently ignored. Optional aliasing can also be used for camel casing.

If index columns are detected in the schema metadata, then an initial `filter` will also attempt a binary search on tables.

## Installation
```console
% pip install graphique[server]
```

## Dependencies
* pyarrow
* strawberry-graphql[asgi,cli]
* numpy
* isodate
* uvicorn (or other [ASGI server](https://asgi.readthedocs.io/en/latest/implementations.html))

## Tests
100% branch coverage.

```console
% pytest [--cov]
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "graphique",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "graphql, arrow, parquet",
    "author": null,
    "author_email": "Aric Coady <aric.coady@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/91/0d/3de4e1e8a4bdf32f2e97903213e6620ede65c0ed5b1c76952c14550b47f4/graphique-1.8.tar.gz",
    "platform": null,
    "description": "[![image](https://img.shields.io/pypi/v/graphique.svg)](https://pypi.org/project/graphique/)\n![image](https://img.shields.io/pypi/pyversions/graphique.svg)\n[![image](https://pepy.tech/badge/graphique)](https://pepy.tech/project/graphique)\n![image](https://img.shields.io/pypi/status/graphique.svg)\n[![build](https://github.com/coady/graphique/actions/workflows/build.yml/badge.svg)](https://github.com/coady/graphique/actions/workflows/build.yml)\n[![image](https://codecov.io/gh/coady/graphique/branch/main/graph/badge.svg)](https://codecov.io/gh/coady/graphique/)\n[![CodeQL](https://github.com/coady/graphique/actions/workflows/github-code-scanning/codeql/badge.svg)](https://github.com/coady/graphique/actions/workflows/github-code-scanning/codeql)\n[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/coady/graphique)\n[![image](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![image](https://mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)\n\n[GraphQL](https://graphql.org) service for [arrow](https://arrow.apache.org) tables and [parquet](https://parquet.apache.org) data sets. The schema for a query API is derived automatically.\n\n## Usage\n```console\n% env PARQUET_PATH=... uvicorn graphique.service:app\n```\n\nOpen http://localhost:8000/ to try out the API in [GraphiQL](https://github.com/graphql/graphiql/tree/main/packages/graphiql#readme). There is a test fixture at `./tests/fixtures/zipcodes.parquet`.\n\n```console\n% env PARQUET_PATH=... strawberry export-schema graphique.service:app.schema\n```\noutputs the graphql schema for a parquet data set.\n\n### Configuration\nGraphique uses [Starlette's config](https://www.starlette.io/config/): in environment variables or a `.env` file. Config variables are used as input to a [parquet dataset](https://arrow.apache.org/docs/python/dataset.html).\n\n* PARQUET_PATH: path to the parquet directory or file\n* FEDERATED = '': field name to extend type `Query` with a federated `Table` \n* DEBUG = False: run service in debug mode, which includes metrics\n* COLUMNS = None: list of names, or mapping of aliases, of columns to select\n* FILTERS = None: json `filter` query for which rows to read at startup\n\nFor more options create a custom [ASGI](https://asgi.readthedocs.io/en/latest/index.html) app. Call graphique's `GraphQL` on an arrow [Dataset](https://arrow.apache.org/docs/python/api/dataset.html), [Scanner](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html), or [Table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html). The GraphQL `Table` type will be the root Query type.\n\nSupply a mapping of names to datasets for multiple roots, and to enable federation.\n\n```python\nimport pyarrow.dataset as ds\nfrom graphique import GraphQL\n\nsource = ds.dataset(...)\napp = GraphQL(source)  # Table is root query type\napp = GraphQL.federated({<name>: source, ...}, keys={<name>: [], ...})  # Tables on federated fields\n```\n\nStart like any ASGI app.\n\n```console\nuvicorn <module>:app\n```\n\nConfiguration options exist to provide a convenient no-code solution, but are subject to change in the future. Using a custom app is recommended for production usage.\n\n### API\n#### types\n* `Dataset`: interface for an arrow dataset, scanner, or table.\n* `Table`: implements the `Dataset` interface. Adds typed `row`, `columns`, and `filter` fields from introspecting the schema.\n* `Column`: interface for an arrow column (a.k.a. ChunkedArray). Each arrow data type has a corresponding column implementation: Boolean, Int, Long, Float, Decimal, Date, Datetime, Time, Duration, Base64, String, List, Struct. All columns have a `values` field for their list of scalars. Additional fields vary by type.\n* `Row`: scalar fields. Arrow tables are column-oriented, and graphique encourages that usage for performance. A single `row` field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.\n\n#### selection\n* `slice`: contiguous selection of rows\n* `filter`: select rows with simple predicates\n* `scan`: select rows and project columns with expressions\n\n#### projection\n* `columns`: provides a field for every `Column` in the schema\n* `column`: access a column of any type by name\n* `row`: provides a field for each scalar of a single row\n* `apply`: transform columns by applying a function\n* `join`: join tables by key columns\n\n#### aggregation\n* `group`: group by given columns, and aggregate the others\n* `runs`: partition on adjacent values in given columns, transforming the others into list columns\n* `tables`: return a list of tables by splitting on the scalars in list columns\n* `flatten`: flatten list columns with repeated scalars\n\n#### ordering\n* `sort`: sort table by given columns\n* `rank`: select rows with smallest or largest values\n\n### Performance\nGraphique relies on native [PyArrow](https://arrow.apache.org/docs/python/index.html) routines wherever possible. Otherwise it falls back to using [NumPy](https://numpy.org/doc/stable/) or custom optimizations.\n\nBy default, datasets are read on-demand, with only the necessary rows and columns scanned. Although graphique is a running service, [parquet is performant](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html) at reading a subset of data. Optionally specify `FILTERS` in the json `filter` format to read a subset of rows at startup, trading-off memory for latency. An empty filter (`{}`) will read the whole table.\n\nSpecifying `COLUMNS` will limit memory usage when reading at startup (`FILTERS`). There is little speed difference as unused columns are inherently ignored. Optional aliasing can also be used for camel casing.\n\nIf index columns are detected in the schema metadata, then an initial `filter` will also attempt a binary search on tables.\n\n## Installation\n```console\n% pip install graphique[server]\n```\n\n## Dependencies\n* pyarrow\n* strawberry-graphql[asgi,cli]\n* numpy\n* isodate\n* uvicorn (or other [ASGI server](https://asgi.readthedocs.io/en/latest/implementations.html))\n\n## Tests\n100% branch coverage.\n\n```console\n% pytest [--cov]\n```\n",
    "bugtrack_url": null,
    "license": "Copyright 2022 Aric Coady  Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at  http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ",
    "summary": "GraphQL service for arrow tables and parquet data sets.",
    "version": "1.8",
    "project_urls": {
        "Changelog": "https://github.com/coady/graphique/blob/main/CHANGELOG.md",
        "Documentation": "https://coady.github.io/graphique",
        "Homepage": "https://github.com/coady/graphique",
        "Issues": "https://github.com/coady/graphique/issues"
    },
    "split_keywords": [
        "graphql",
        " arrow",
        " parquet"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8061ec7f151cf2e3b948e90b26a16e38148f4de0016928243f8a91d450cba8a7",
                "md5": "2ea108ad1ae967867f7b728dab56d232",
                "sha256": "50b9d0ae0976c8890bfc5a81b1c3d9b1d22d0d3b846372adec525fcd318886f0"
            },
            "downloads": -1,
            "filename": "graphique-1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2ea108ad1ae967867f7b728dab56d232",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 34579,
            "upload_time": "2024-11-02T00:28:58",
            "upload_time_iso_8601": "2024-11-02T00:28:58.307949Z",
            "url": "https://files.pythonhosted.org/packages/80/61/ec7f151cf2e3b948e90b26a16e38148f4de0016928243f8a91d450cba8a7/graphique-1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "910d3de4e1e8a4bdf32f2e97903213e6620ede65c0ed5b1c76952c14550b47f4",
                "md5": "3c42c126f095553b6da274c69f3ac445",
                "sha256": "927f45183e8c673c259383dcfd7de495b1f226331a1a776d4e32b01d2de56faf"
            },
            "downloads": -1,
            "filename": "graphique-1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "3c42c126f095553b6da274c69f3ac445",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 45898,
            "upload_time": "2024-11-02T00:29:00",
            "upload_time_iso_8601": "2024-11-02T00:29:00.233031Z",
            "url": "https://files.pythonhosted.org/packages/91/0d/3de4e1e8a4bdf32f2e97903213e6620ede65c0ed5b1c76952c14550b47f4/graphique-1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-02 00:29:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "coady",
    "github_project": "graphique",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "graphique"
}
        
Elapsed time: 0.49333s