queryish


Namequeryish JSON
Version 0.2 PyPI version JSON
download
home_pagehttps://github.com/wagtail/queryish
SummaryA library for constructing queries on arbitrary data sources following Django's QuerySet API
upload_time2023-09-05 17:38:28
maintainer
docs_urlNone
authorMatthew Westcott
requires_python>=3.7
licenseBSD
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # queryish

A Python library for constructing queries on arbitrary data sources following Django's QuerySet API.

## Motivation

Django's QuerySet API is a powerful tool for constructing queries on a database. It allows you to compose queries incrementally, with the query only being executed when the results are needed:

```python
books = Book.objects.all()
python_books = books.filter(topic='python')
latest_python_books = python_books.order_by('-publication_date')[:5]
print(latest_python_books)  # Query is executed here
```

This pattern is a good fit for building web interfaces for listing data, as it allows filtering, ordering and pagination to be handled as separate steps.

We may often be required to implement similar interfaces for data taken from sources other than a database, such as a REST API or a search engine. In these cases, we would like to have a similarly rich API for constructing queries to these data sources. Even better would be to follow the QuerySet API as closely as possible, so that we can take advantage of ready-made tools such as [Django's generic class-based views](https://docs.djangoproject.com/en/stable/topics/class-based-views/) that are designed to work with this API.

_queryish_ is a library for building wrappers around data sources that replicate the QuerySet API, allowing you to work with the data in the same way that you would with querysets and models.

## Installation

Install using pip:

```bash
pip install queryish
```

## Usage - REST APIs

_queryish_ provides a base class `queryish.rest.APIModel` for wrapping REST APIs. By default, this follows the out-of-the-box structure served by [Django REST Framework](https://www.django-rest-framework.org/), but various options are available to customise this.

```python
from queryish.rest import APIModel

class Party(APIModel):
    class Meta:
        base_url = "https://demozoo.org/api/v1/parties/"
        fields = ["id", "name", "start_date", "end_date", "location", "country_code"]
        pagination_style = "page-number"
        page_size = 100

    def __str__(self):
        return self.name
```

The resulting class has an `objects` property that supports the usual filtering, ordering and slicing operations familiar from Django's QuerySet API, although these may be limited by the capabilities of the REST API being accessed.

```python
>>> Party.objects.count()
4623
>>> Party.objects.filter(country_code="GB")[:10]
<PartyQuerySet [<Party: 16 Bit Show 1991>, <Party: Acorn User Show 1991>, <Party: Anarchy Easter Party 1992>, <Party: Anarchy Winter Conference 1991>, <Party: Atari Preservation Party 2007>, <Party: Commodore Computer Club UK 1st Meet>, <Party: Commodore Show 1987>, <Party: Commodore Show 1988>, <Party: Deja Vu 1998>, <Party: Deja Vu 1999>]>
>>> Party.objects.get(name="Nova 2023")
<Party: Nova 2023>
```

Methods supported include `all`, `count`, `filter`, `order_by`, `get`, `first`, and `in_bulk`. The result set can be sliced at arbitrary indices - these do not have to match the pagination supported by the underlying API. `APIModel` will automatically make multiple API requests as required.

The following attributes are available on `APIModel.Meta`:

* `base_url`: The base URL of the API from where results can be fetched.
* `pk_field_name`: The name of the primary key field. Defaults to `"id"`. Lookups on the field name `"pk"` will be mapped to this field.
* `detail_url`: A string template for the URL of a single object, such as `"https://demozoo.org/api/v1/parties/%s/"`. If this is specified, lookups on the primary key and no other fields will be directed to this URL rather than `base_url`.
* `fields`: A list of field names defined in the API response that will be copied to attributes of the returned object.
* `pagination_style`: The style of pagination used by the API. Recognised values are `"page-number"` and `"offset-limit"`; all others (including the default of `None`) indicate no pagination.
* `page_size`: Required if `pagination_style` is `"page-number"` - the number of results per page returned by the API.
* `page_query_param`: The name of the URL query parameter used to specify the page number. Defaults to `"page"`.
* `offset_query_param`: The name of the URL query parameter used to specify the offset. Defaults to `"offset"`.
* `limit_query_param`: The name of the URL query parameter used to specify the limit. Defaults to `"limit"`.
* `ordering_query_param`: The name of the URL query parameter used to specify the ordering. Defaults to `"ordering"`.

To accommodate APIs where the returned JSON does not map cleanly to the intended set of model attributes, the class methods `from_query_data` and `from_individual_data` on `APIModel` can be overridden:

```python
class Pokemon(APIModel):
    class Meta:
        base_url = "https://pokeapi.co/api/v2/pokemon/"
        detail_url = "https://pokeapi.co/api/v2/pokemon/%s/"
        fields = ["id", "name"]
        pagination_style = "offset-limit"
        verbose_name_plural = "pokemon"

    @classmethod
    def from_query_data(cls, data):
        """
        Given a record returned from the listing endpoint (base_url), return an instance of the model.
        """
        # Records within the listing endpoint return a `url` field, from which we want to extract the ID
        return cls(
            id=int(re.match(r'https://pokeapi.co/api/v2/pokemon/(\d+)/', data['url']).group(1)),
            name=data['name'],
        )

    @classmethod
    def from_individual_data(cls, data):
        """
        Given a record returned from the detail endpoint (detail_url), return an instance of the model.
        """
        return cls(
            id=data['id'],
            name=data['name'],
        )

    def __str__(self):
        return self.name
```

## Customising the REST API queryset class

The `objects` attribute of an `APIModel` subclass is an instance of `queryish.rest.APIQuerySet` which initially consists of the complete set of records. As with Django's QuerySet, methods such as `filter` return a new instance.

It may be necessary to subclass `APIQuerySet` and override methods in order to support certain API responses. For example, the base implementation expects unpaginated API endpoints to return a list as the top-level JSON object, and paginated API endpoints to return a dict with a `results` item. If the API you are working with returns a different structure, you can override the `get_results_from_response` method to extract the list of results from the response:

```python
from queryish.rest import APIQuerySet

class TreeQuerySet(APIQuerySet):
    base_url = "https://api.data.amsterdam.nl/v1/bomen/stamgegevens/"
    pagination_style = "page-number"
    page_size = 20
    http_headers = {"Accept": "application/hal+json"}

    def get_results_from_response(self, response):
        return response["_embedded"]["stamgegevens"]
```

`APIQuerySet` subclasses can be instantiated independently of an `APIModel`, but results will be returned as plain JSON values:

```python
>>> TreeQuerySet().filter(jaarVanAanleg=1986).first()
{'_links': {'schema': 'https://schemas.data.amsterdam.nl/datasets/bomen/dataset#stamgegevens', 'self': {'href': 'https://api.data.amsterdam.nl/v1/bomen/stamgegevens/1101570/', 'title': '1101570', 'id': 1101570}, 'gbdBuurt': {'href': 'https://api.data.amsterdam.nl/v1/gebieden/buurten/03630980000211/', 'title': '03630980000211', 'identificatie': '03630980000211'}}, 'id': 1101570, 'gbdBuurtId': '03630980000211', 'geometrie': {'type': 'Point', 'coordinates': [115162.72, 485972.68]}, 'boomhoogteklasseActueel': 'c. 9 tot 12 m.', 'jaarVanAanleg': 1986, 'soortnaam': "Salix alba 'Chermesina'", 'stamdiameterklasse': '0,5 tot 1 m.', 'typeObject': 'Gekandelaberde boom', 'typeSoortnaam': 'Bomen', 'soortnaamKort': 'Salix', 'soortnaamTop': 'Wilg (Salix)'}
```

This can be overridden by defining a `model` attribute on the queryset, or overriding the `get_instance` / `get_individual_instance` methods. To use a customised queryset with an `APIModel`, define the `base_query_class` attribute on the model class:

```python
class Tree(APIModel):
    base_query_class = TreeQuerySet
    class Meta:
        fields = ["id", "geometrie", "boomhoogteklasseActueel", "jaarVanAanleg", "soortnaam", "soortnaamKort"]

# >>> Tree.objects.filter(jaarVanAanleg=1986).first()
# <Tree: Tree object (1101570)>
```

## Other data sources

_queryish_ is not limited to REST APIs - the base class `queryish.Queryish` can be used to build a QuerySet-like API around any data source. At minimum, this requires defining a `run_query` method that returns an iterable of records that is filtered, ordered and sliced according to the queryset's attributes. For example, a queryset implementation that works from a simple in-memory list of objects might look like this:

```python
from queryish import Queryish

class CountryQuerySet(Queryish):
    def run_query(self):
        countries = [
            {"code": "nl", "name": "Netherlands"},
            {"code": "de", "name": "Germany"},
            {"code": "fr", "name": "France"},
            {"code": "gb", "name": "United Kingdom"},
            {"code": "us", "name": "United States"},
        ]

        # Filter the list of countries by `self.filters` - a list of (key, value) tuples
        for (key, val) in self.filters:
            countries = [c for c in countries if c[key] == val]

        # Sort the list of countries by `self.ordering` - a tuple of field names
        countries.sort(key=lambda c: [c.get(field, None) for field in self.ordering])

        # Slice the list of countries by `self.offset` and `self.limit`. `offset` is always numeric
        # and defaults to 0 for an unsliced list; `limit` is either numeric or None (denoting no limit).
        return countries[self.offset : self.offset + self.limit if self.limit else None]
```

Subclasses will also typically override the method `run_count`, which returns the number of records in the queryset accounting for any filtering and slicing. If this is not overridden, the default implementation will call `run_query` and count the results.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wagtail/queryish",
    "name": "queryish",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Matthew Westcott",
    "author_email": "matthew.westcott@torchbox.com",
    "download_url": "https://files.pythonhosted.org/packages/a6/d2/e8df727b39fca88e0a386b3341a91042ce808c43c24c6d24aa4be861af00/queryish-0.2.tar.gz",
    "platform": null,
    "description": "# queryish\n\nA Python library for constructing queries on arbitrary data sources following Django's QuerySet API.\n\n## Motivation\n\nDjango's QuerySet API is a powerful tool for constructing queries on a database. It allows you to compose queries incrementally, with the query only being executed when the results are needed:\n\n```python\nbooks = Book.objects.all()\npython_books = books.filter(topic='python')\nlatest_python_books = python_books.order_by('-publication_date')[:5]\nprint(latest_python_books)  # Query is executed here\n```\n\nThis pattern is a good fit for building web interfaces for listing data, as it allows filtering, ordering and pagination to be handled as separate steps.\n\nWe may often be required to implement similar interfaces for data taken from sources other than a database, such as a REST API or a search engine. In these cases, we would like to have a similarly rich API for constructing queries to these data sources. Even better would be to follow the QuerySet API as closely as possible, so that we can take advantage of ready-made tools such as [Django's generic class-based views](https://docs.djangoproject.com/en/stable/topics/class-based-views/) that are designed to work with this API.\n\n_queryish_ is a library for building wrappers around data sources that replicate the QuerySet API, allowing you to work with the data in the same way that you would with querysets and models.\n\n## Installation\n\nInstall using pip:\n\n```bash\npip install queryish\n```\n\n## Usage - REST APIs\n\n_queryish_ provides a base class `queryish.rest.APIModel` for wrapping REST APIs. By default, this follows the out-of-the-box structure served by [Django REST Framework](https://www.django-rest-framework.org/), but various options are available to customise this.\n\n```python\nfrom queryish.rest import APIModel\n\nclass Party(APIModel):\n    class Meta:\n        base_url = \"https://demozoo.org/api/v1/parties/\"\n        fields = [\"id\", \"name\", \"start_date\", \"end_date\", \"location\", \"country_code\"]\n        pagination_style = \"page-number\"\n        page_size = 100\n\n    def __str__(self):\n        return self.name\n```\n\nThe resulting class has an `objects` property that supports the usual filtering, ordering and slicing operations familiar from Django's QuerySet API, although these may be limited by the capabilities of the REST API being accessed.\n\n```python\n>>> Party.objects.count()\n4623\n>>> Party.objects.filter(country_code=\"GB\")[:10]\n<PartyQuerySet [<Party: 16 Bit Show 1991>, <Party: Acorn User Show 1991>, <Party: Anarchy Easter Party 1992>, <Party: Anarchy Winter Conference 1991>, <Party: Atari Preservation Party 2007>, <Party: Commodore Computer Club UK 1st Meet>, <Party: Commodore Show 1987>, <Party: Commodore Show 1988>, <Party: Deja Vu 1998>, <Party: Deja Vu 1999>]>\n>>> Party.objects.get(name=\"Nova 2023\")\n<Party: Nova 2023>\n```\n\nMethods supported include `all`, `count`, `filter`, `order_by`, `get`, `first`, and `in_bulk`. The result set can be sliced at arbitrary indices - these do not have to match the pagination supported by the underlying API. `APIModel` will automatically make multiple API requests as required.\n\nThe following attributes are available on `APIModel.Meta`:\n\n* `base_url`: The base URL of the API from where results can be fetched.\n* `pk_field_name`: The name of the primary key field. Defaults to `\"id\"`. Lookups on the field name `\"pk\"` will be mapped to this field.\n* `detail_url`: A string template for the URL of a single object, such as `\"https://demozoo.org/api/v1/parties/%s/\"`. If this is specified, lookups on the primary key and no other fields will be directed to this URL rather than `base_url`.\n* `fields`: A list of field names defined in the API response that will be copied to attributes of the returned object.\n* `pagination_style`: The style of pagination used by the API. Recognised values are `\"page-number\"` and `\"offset-limit\"`; all others (including the default of `None`) indicate no pagination.\n* `page_size`: Required if `pagination_style` is `\"page-number\"` - the number of results per page returned by the API.\n* `page_query_param`: The name of the URL query parameter used to specify the page number. Defaults to `\"page\"`.\n* `offset_query_param`: The name of the URL query parameter used to specify the offset. Defaults to `\"offset\"`.\n* `limit_query_param`: The name of the URL query parameter used to specify the limit. Defaults to `\"limit\"`.\n* `ordering_query_param`: The name of the URL query parameter used to specify the ordering. Defaults to `\"ordering\"`.\n\nTo accommodate APIs where the returned JSON does not map cleanly to the intended set of model attributes, the class methods `from_query_data` and `from_individual_data` on `APIModel` can be overridden:\n\n```python\nclass Pokemon(APIModel):\n    class Meta:\n        base_url = \"https://pokeapi.co/api/v2/pokemon/\"\n        detail_url = \"https://pokeapi.co/api/v2/pokemon/%s/\"\n        fields = [\"id\", \"name\"]\n        pagination_style = \"offset-limit\"\n        verbose_name_plural = \"pokemon\"\n\n    @classmethod\n    def from_query_data(cls, data):\n        \"\"\"\n        Given a record returned from the listing endpoint (base_url), return an instance of the model.\n        \"\"\"\n        # Records within the listing endpoint return a `url` field, from which we want to extract the ID\n        return cls(\n            id=int(re.match(r'https://pokeapi.co/api/v2/pokemon/(\\d+)/', data['url']).group(1)),\n            name=data['name'],\n        )\n\n    @classmethod\n    def from_individual_data(cls, data):\n        \"\"\"\n        Given a record returned from the detail endpoint (detail_url), return an instance of the model.\n        \"\"\"\n        return cls(\n            id=data['id'],\n            name=data['name'],\n        )\n\n    def __str__(self):\n        return self.name\n```\n\n## Customising the REST API queryset class\n\nThe `objects` attribute of an `APIModel` subclass is an instance of `queryish.rest.APIQuerySet` which initially consists of the complete set of records. As with Django's QuerySet, methods such as `filter` return a new instance.\n\nIt may be necessary to subclass `APIQuerySet` and override methods in order to support certain API responses. For example, the base implementation expects unpaginated API endpoints to return a list as the top-level JSON object, and paginated API endpoints to return a dict with a `results` item. If the API you are working with returns a different structure, you can override the `get_results_from_response` method to extract the list of results from the response:\n\n```python\nfrom queryish.rest import APIQuerySet\n\nclass TreeQuerySet(APIQuerySet):\n    base_url = \"https://api.data.amsterdam.nl/v1/bomen/stamgegevens/\"\n    pagination_style = \"page-number\"\n    page_size = 20\n    http_headers = {\"Accept\": \"application/hal+json\"}\n\n    def get_results_from_response(self, response):\n        return response[\"_embedded\"][\"stamgegevens\"]\n```\n\n`APIQuerySet` subclasses can be instantiated independently of an `APIModel`, but results will be returned as plain JSON values:\n\n```python\n>>> TreeQuerySet().filter(jaarVanAanleg=1986).first()\n{'_links': {'schema': 'https://schemas.data.amsterdam.nl/datasets/bomen/dataset#stamgegevens', 'self': {'href': 'https://api.data.amsterdam.nl/v1/bomen/stamgegevens/1101570/', 'title': '1101570', 'id': 1101570}, 'gbdBuurt': {'href': 'https://api.data.amsterdam.nl/v1/gebieden/buurten/03630980000211/', 'title': '03630980000211', 'identificatie': '03630980000211'}}, 'id': 1101570, 'gbdBuurtId': '03630980000211', 'geometrie': {'type': 'Point', 'coordinates': [115162.72, 485972.68]}, 'boomhoogteklasseActueel': 'c. 9 tot 12 m.', 'jaarVanAanleg': 1986, 'soortnaam': \"Salix alba 'Chermesina'\", 'stamdiameterklasse': '0,5 tot 1 m.', 'typeObject': 'Gekandelaberde boom', 'typeSoortnaam': 'Bomen', 'soortnaamKort': 'Salix', 'soortnaamTop': 'Wilg (Salix)'}\n```\n\nThis can be overridden by defining a `model` attribute on the queryset, or overriding the `get_instance` / `get_individual_instance` methods. To use a customised queryset with an `APIModel`, define the `base_query_class` attribute on the model class:\n\n```python\nclass Tree(APIModel):\n    base_query_class = TreeQuerySet\n    class Meta:\n        fields = [\"id\", \"geometrie\", \"boomhoogteklasseActueel\", \"jaarVanAanleg\", \"soortnaam\", \"soortnaamKort\"]\n\n# >>> Tree.objects.filter(jaarVanAanleg=1986).first()\n# <Tree: Tree object (1101570)>\n```\n\n## Other data sources\n\n_queryish_ is not limited to REST APIs - the base class `queryish.Queryish` can be used to build a QuerySet-like API around any data source. At minimum, this requires defining a `run_query` method that returns an iterable of records that is filtered, ordered and sliced according to the queryset's attributes. For example, a queryset implementation that works from a simple in-memory list of objects might look like this:\n\n```python\nfrom queryish import Queryish\n\nclass CountryQuerySet(Queryish):\n    def run_query(self):\n        countries = [\n            {\"code\": \"nl\", \"name\": \"Netherlands\"},\n            {\"code\": \"de\", \"name\": \"Germany\"},\n            {\"code\": \"fr\", \"name\": \"France\"},\n            {\"code\": \"gb\", \"name\": \"United Kingdom\"},\n            {\"code\": \"us\", \"name\": \"United States\"},\n        ]\n\n        # Filter the list of countries by `self.filters` - a list of (key, value) tuples\n        for (key, val) in self.filters:\n            countries = [c for c in countries if c[key] == val]\n\n        # Sort the list of countries by `self.ordering` - a tuple of field names\n        countries.sort(key=lambda c: [c.get(field, None) for field in self.ordering])\n\n        # Slice the list of countries by `self.offset` and `self.limit`. `offset` is always numeric\n        # and defaults to 0 for an unsliced list; `limit` is either numeric or None (denoting no limit).\n        return countries[self.offset : self.offset + self.limit if self.limit else None]\n```\n\nSubclasses will also typically override the method `run_count`, which returns the number of records in the queryset accounting for any filtering and slicing. If this is not overridden, the default implementation will call `run_query` and count the results.\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "A library for constructing queries on arbitrary data sources following Django's QuerySet API",
    "version": "0.2",
    "project_urls": {
        "Homepage": "https://github.com/wagtail/queryish"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b16d309fb3afedfbb7eb688df628a61da3547f053a70d69397999d28add8bc79",
                "md5": "aaa680298be09663e4a89c3c66e24154",
                "sha256": "2e460537f6b7cd5af187b78a3635e80ceec421221adb62883282d419dc170ea8"
            },
            "downloads": -1,
            "filename": "queryish-0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aaa680298be09663e4a89c3c66e24154",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 9902,
            "upload_time": "2023-09-05T17:38:26",
            "upload_time_iso_8601": "2023-09-05T17:38:26.062321Z",
            "url": "https://files.pythonhosted.org/packages/b1/6d/309fb3afedfbb7eb688df628a61da3547f053a70d69397999d28add8bc79/queryish-0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a6d2e8df727b39fca88e0a386b3341a91042ce808c43c24c6d24aa4be861af00",
                "md5": "00b819d61578b6a43e0f25b1f4d6e550",
                "sha256": "60150be41673af3d0597f78fb5e77be0e30dc49658a83d274b2a0959c4f97c1b"
            },
            "downloads": -1,
            "filename": "queryish-0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "00b819d61578b6a43e0f25b1f4d6e550",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 16081,
            "upload_time": "2023-09-05T17:38:28",
            "upload_time_iso_8601": "2023-09-05T17:38:28.052053Z",
            "url": "https://files.pythonhosted.org/packages/a6/d2/e8df727b39fca88e0a386b3341a91042ce808c43c24c6d24aa4be861af00/queryish-0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-05 17:38:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wagtail",
    "github_project": "queryish",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "queryish"
}
        
Elapsed time: 0.11001s