elasticsearch-django

Name	elasticsearch-django JSON
Version	8.5.2 JSON
	download
home_page	https://github.com/yunojuno/elasticsearch-django
Summary	Elasticsearch Django app.
upload_time	2023-11-25 13:27:07
maintainer	YunoJuno
docs_url	None
author	YunoJuno
requires_python	>=3.8,<4.0
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            **This project now requires Python 3.8+ and Django 3.2+.
For previous versions please refer to the relevant tag or branch.**

# Elasticsearch for Django

This is a lightweight Django app for people who are using Elasticsearch
with Django, and want to manage their indexes.

## Compatibility

The master branch is now based on `elasticsearch-py` 8. If you are
using older versions, please switch to the relevant branch (released on
PyPI as 2.x, 5.x, 6.x).

## Search Index Lifecycle

The basic lifecycle for a search index is simple:

1. Create an index
2. Post documents to the index
3. Query the index

Relating this to our use of search within a Django project it looks like this:

1. Create mapping file for a named index
2. Add index configuration to Django settings
3. Map models to document types in the index
4. Post document representation of objects to the index
5. Update the index when an object is updated
6. Remove the document when an object is deleted
7. Query the index
8. Convert search results into a QuerySet (preserving relevance)

# Django Implementation

This section shows how to set up Django to recognise ES indexes, and the
models that should appear in an index. From this setup you should be
able to run the management commands that will create and populate each
index, and keep the indexes in sync with the database.

## Create index mapping file

The prerequisite to configuring Django to work with an index is having
the mapping for the index available. This is a bit chicken-and-egg, but
the underlying assumption is that you are capable of creating the index
mappings outside of Django itself, as raw JSON. (The easiest way to
spoof this is to POST a JSON document representing your document type at
URL on your ES instance (`POST http://ELASTICSEARCH_URL/{{index_name}}`)
and then retrieving the auto-magic mapping that ES created via `GET
http://ELASTICSEARCH_URL/{{index_name}}/_mapping`.)

Once you have the JSON mapping, you should save it in the root of the
Django project as `search/mappings/{{index_name}}.json`.

## Configure Django settings

The Django settings for search are contained in a dictionary called
`SEARCH_SETTINGS`, which should be in the main `django.conf.settings`
file. The dictionary has three root nodes, `connections`, `indexes` and
`settings`. Below is an example:

```python

    SEARCH_SETTINGS = {
        'connections': {
            'default': getenv('ELASTICSEARCH_URL'),
            'backup': {
                # all Elasticsearch init kwargs can be used here
                'cloud_id': '{{ cloud_id }}'
            }
        },
        'indexes': {
            'blog': {
                'models': [
                    'website.BlogPost',
                ]
            }
        },
        'settings': {
            # batch size for ES bulk api operations
            'chunk_size': 500,
            # default page size for search results
            'page_size': 25,
            # set to True to connect post_save/delete signals
            'auto_sync': True,
            # List of models which will never auto_sync even if auto_sync is True
            'never_auto_sync': [],
            # if true, then indexes must have mapping files
            'strict_validation': False
        }
    }
```

The `connections` node is (hopefully) self-explanatory - we support
multiple connections, but in practice you should only need the one -
'default' connection. This is the URL used to connect to your ES
instance. The `settings` node contains site-wide search settings. The
`indexes` nodes is where we configure how Django and ES play together,
and is where most of the work happens.

Note that prior to v8.2 the connection value had to be a connection
string; since v8.2 this can still be a connection string, but can also
be a dictionary that contains any kwarg that can be passed to the
`Elasticsearch` init method.

**Index settings**

Inside the index node we have a collection of named indexes - in this
case just the single index called `blog`. Inside each index we have a
`models` key which contains a list of Django models that should appear
in the index, denoted in `app.ModelName` format. You can have multiple
models in an index, and a model can appear in multiple indexes. How
models and indexes interact is described in the next section.

**Configuration Validation**

When the app boots up it validates the settings, which involves the
following:

1. Do each of the indexes specified have a mapping file?
2. Do each of the models implement the required mixins?

## Implement search document mixins

So far we have configured Django to know the names of the indexes we
want, and the models that we want to index. What it doesn't yet know is
which objects to index, and how to convert an object to its search index
document. This is done by implementing two separate mixins -
`SearchDocumentMixin` and `SearchDocumentManagerMixin`. The
configuration validation routine will tell you if these are not
implemented. **SearchDocumentMixin**

This mixin is responsible for the seaerch index document format. We are
indexing JSON representations of each object, and we have two methods on
the mixin responsible for outputting the correct format -
`as_search_document` and `as_search_document_update`.

An aside on the mechanics of the `auto_sync` process, which is hooked up
using Django's `post_save` and `post_delete` model signals. ES supports
partial updates to documents that already exist, and we make a
fundamental assumption about indexing models - that **if you pass the
`update_fields` kwarg to a `model.save` method call, then you are
performing a partial update**, and this will be propagated to ES as a
partial update only.

To this end, we have two methods for generating the model's JSON
representation - `as_search_document`, which should return a dict that
represents the entire object; and `as_search_document_update`, which
takes the `update_fields` kwarg. This method handler two partial update
'strategies', defined in the `SEARCH_SETTINGS`, 'full' and 'partial'.
The default 'full' strategy simply proxies the `as_search_document`
method - i.e. partial updates are treated as a full document update. The
'partial' strategy is more intelligent - it will map the update_fields
specified to the field names defined in the index mapping files. If a
field name is passed into the save method but is not in the mapping
file, it is ignored. In addition, if the underlying Django model field
is a related object, a `ValueError` will be raised, as we cannot
serialize this automatically. In this scenario, you will need to
override the method in your subclass - see the code for more details.

To better understand this, let us say that we have a model (`MyModel`)
that is configured to be included in an index called `myindex`. If we
save an object, without passing `update_fields`, then this is considered
a full document update, which triggers the object's
`index_search_document` method:

```python
obj = MyModel.objects.first()
obj.save()
...
# AUTO_SYNC=true will trigger a re-index of the complete object document:
obj.index_search_document(index='myindex')
```

However, if we only want to update a single field (say the `timestamp`),
and we pass this in to the save method, then this will trigger the
`update_search_document` method, passing in the names of the fields that
we want updated.

```python
# save a single field on the object
obj.save(update_fields=['timestamp'])
...
# AUTO_SYNC=true will trigger a partial update of the object document
obj.update_search_document(index, update_fields=['timestamp'])
```

We pass the name of the index being updated as the first arg, as objects may have different representations in different indexes:

```python
    def as_search_document(self, index):
        return {'name': "foo"} if index == 'foo' else {'name': "bar"}
```

In the case of the second method, the simplest possible implementation
would be a dictionary containing the names of the fields being updated
and their new values, and this is the default implementation. If the
fields passed in are simple fields (numbers, dates, strings, etc.) then
a simple `{'field_name': getattr(obj, field_name}` is returned. However,
if the field name relates to a complex object (e.g. a related object)
then this method will raise an `InvalidUpdateFields` exception. In this
scenario you should override the default implementationwith one of your
own.

```python

    def as_search_document_update(self, index, update_fields):
        if 'user' in update_fields:
            # remove so that it won't raise a ValueError
            update_fields.remove('user')
            doc = super().as_search_document_update(index, update_fields)
            doc['user'] = self.user.get_full_name()
            return doc
        return super().as_search_document_update(index, update_fields)
```

The reason we have split out the update from the full-document index
comes from a real problem that we ourselves suffered. The full object
representation that we were using was quite DB intensive - we were
storing properties of the model that required walking the ORM tree.
However, because we were also touching the objects (see below) to record
activity timestamps, we ended up flooding the database with queries
simply to update a single field in the output document. Partial updates
solves this issue:

```python

    def touch(self):
        self.timestamp = now()
        self.save(update_fields=['timestamp'])

    def as_search_document_update(self, index, update_fields):
        if list(update_fields) == ['timestamp']:
            # only propagate changes if it's +1hr since the last timestamp change
            if now() - self.timestamp < timedelta(hours=1):
                return {}
            else:
                return {'timestamp': self.timestamp}
        ....
```

**Processing updates async**

If you are generating a lot of index updates you may want to run them
async (via some kind of queueing mechanism). There is no built-in method
to do this, given the range of queueing libraries and patterns
available, however it is possible using the `pre_index`, `pre_update`
and `pre_delete` signals. In this case, you should also turn off
`AUTO_SYNC` (as this will run the updates synchronously), and process
the updates yourself. The signals pass in the kwargs required by the
relevant model methods, as well as the `instance` involved:

```python
# ensure that SEARCH_AUTO_SYNC=False

from django.dispatch import receiver
import django_rq
from elasticsearch_django.signals import (
    pre_index,
    pre_update,
    pre_delete
)

queue = django_rq.get_queue("elasticsearch")


@receiver(pre_index, dispatch_uid="async_index_document")
def index_search_document_async(sender, **kwargs):
    """Queue up search index document update via RQ."""
    instance = kwargs.pop("instance")
    queue.enqueue(
        instance.update_search_document,
        index=kwargs.pop("index"),
    )


@receiver(pre_update, dispatch_uid="async_update_document")
def update_search_document_async(sender, **kwargs):
    """Queue up search index document update via RQ."""
    instance = kwargs.pop("instance")
    queue.enqueue(
        instance.index_search_document,
        index=kwargs.pop("index"),
        update_fields=kwargs.pop("update_fields"),
    )


@receiver(pre_delete, dispatch_uid="async_delete_document")
def delete_search_document_async(sender, **kwargs):
    """Queue up search index document deletion via RQ."""
    instance = kwargs.pop("instance")
    queue.enqueue(
        instance.delete_search_document,
        index=kwargs.pop("index"),
    )
```

**SearchDocumentManagerMixin**

This mixin must be implemented by the model's default manager
(`objects`). It also requires a single method implementation -
`get_search_queryset()` - which returns a queryset of objects that are
to be indexed. This can also use the `index` kwarg to provide different
sets of objects to different indexes.

```python
    def get_search_queryset(self, index='_all'):
        return self.get_queryset().filter(foo='bar')
```

We now have the bare bones of our search implementation. We can now use
the included management commands to create and populate our search
index:

```shell
# create the index 'foo' from the 'foo.json' mapping file
$ ./manage.py create_search_index foo

# populate foo with all the relevant objects
$ ./manage.py update_search_index foo
```

The next step is to ensure that our models stay in sync with the index.

## Add model signal handlers to update index

If the setting `auto_sync` is True, then on `AppConfig.ready` each model
configured for use in an index has its `post_save` and `post_delete`
signals connected. This means that they will be kept in sync across all
indexes that they appear in whenever the relevant model method is
called. (There is some very basic caching to prevent too many updates -
the object document is cached for one minute, and if there is no change
in the document the index update is ignored.)

There is a **VERY IMPORTANT** caveat to the signal handling. It will
**only** pick up on changes to the model itself, and not on related
(`ForeignKey`, `ManyToManyField`) model changes. If the search document
is affected by such a change then you will need to implement additional
signal handling yourself.

In addition to `object.save()`, SeachDocumentMixin also provides the
`update_search_index(self, action, index='_all', update_fields=None,
force=False)` method. Action should be 'index', 'update' or 'delete'.
The difference between 'index' and 'update' is that 'update' is a
partial update that only changes the fields specified, rather than
re-updating the entire document. If `action` is 'update' whilst
`update_fields` is None, action will be changed to `index`.

We now have documents in our search index, kept up to date with their
Django counterparts. We are ready to start querying ES.

---

# Search Queries (How to Search)

## Running search queries

**SearchQuery**

The `elasticsearch_django.models.SearchQuery` model wraps this
functionality up and provides helper properties, as well as logging the
query:

```python
from elasticsearch_django.settings import get_client
from elasticsearch_django.models import execute_search

# run a default match_all query
sq = execute_search(index="foo", query={"match_all": {}})
# the raw response is stored on the return object,
# but is not stored on the object in the database.
print(sq.response)
```

Calling the `execute_search` function will execute the underlying
search, log the query JSON, the number of hits, and the list of hit meta
information for future analysis. The `execute` method also includes
these additional kwargs:

* `user` - the user who is making the query, useful for logging
* `search_terms` - the search query supplied by the user (as opposed to
  the DSL) - not used by ES, but stored in the logs
* `reference` - a free text reference field - used for grouping searches
  together - could be session id.
* `save` - by default the SearchQuery created will be saved, but passing
  in False will prevent this.

## Converting search hits into Django objects

Running a search against an index will return a page of results, each
containing the `_source` attribute which is the search document itself
(as created by the `SearchDocumentMixin.as_search_document` method),
together with meta info about the result - most significantly the
relevance **score**, which is the magic value used for ranking
(ordering) results. However, the search document probably doesn't
contain all the of the information that you need to display the result,
so what you really need is a standard Django QuerySet, containing the
objects in the search results, but maintaining the order. This means
injecting the ES score into the queryset, and then using it for
ordering. There is a method on the `SearchDocumentManagerMixin` called
`from_search_query` which will do this for you. It uses raw SQL to add
the score as an annotation to each object in the queryset. (It also adds
the 'rank' - so that even if the score is identical for all hits, the
ordering is preserved.)

```python
from models import BlogPost

# run a default match_all query
sq = execute_search(index="blog", query={"match_all": {}})
for obj in BlogPost.objects.from_search_query(sq):
    print obj.search_score, obj.search_rank
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yunojuno/elasticsearch-django",
    "name": "elasticsearch-django",
    "maintainer": "YunoJuno",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "code@yunojuno.com",
    "keywords": "",
    "author": "YunoJuno",
    "author_email": "code@yunojuno.com",
    "download_url": "https://files.pythonhosted.org/packages/88/62/dd52e2112c8ce1ae32feb38788d3d8af5b598d28fa02c6a179fc7c6dd0e7/elasticsearch_django-8.5.2.tar.gz",
    "platform": null,
    "description": "**This project now requires Python 3.8+ and Django 3.2+.\nFor previous versions please refer to the relevant tag or branch.**\n\n# Elasticsearch for Django\n\nThis is a lightweight Django app for people who are using Elasticsearch\nwith Django, and want to manage their indexes.\n\n## Compatibility\n\nThe master branch is now based on `elasticsearch-py` 8. If you are\nusing older versions, please switch to the relevant branch (released on\nPyPI as 2.x, 5.x, 6.x).\n\n## Search Index Lifecycle\n\nThe basic lifecycle for a search index is simple:\n\n1. Create an index\n2. Post documents to the index\n3. Query the index\n\nRelating this to our use of search within a Django project it looks like this:\n\n1. Create mapping file for a named index\n2. Add index configuration to Django settings\n3. Map models to document types in the index\n4. Post document representation of objects to the index\n5. Update the index when an object is updated\n6. Remove the document when an object is deleted\n7. Query the index\n8. Convert search results into a QuerySet (preserving relevance)\n\n# Django Implementation\n\nThis section shows how to set up Django to recognise ES indexes, and the\nmodels that should appear in an index. From this setup you should be\nable to run the management commands that will create and populate each\nindex, and keep the indexes in sync with the database.\n\n## Create index mapping file\n\nThe prerequisite to configuring Django to work with an index is having\nthe mapping for the index available. This is a bit chicken-and-egg, but\nthe underlying assumption is that you are capable of creating the index\nmappings outside of Django itself, as raw JSON. (The easiest way to\nspoof this is to POST a JSON document representing your document type at\nURL on your ES instance (`POST http://ELASTICSEARCH_URL/{{index_name}}`)\nand then retrieving the auto-magic mapping that ES created via `GET\nhttp://ELASTICSEARCH_URL/{{index_name}}/_mapping`.)\n\nOnce you have the JSON mapping, you should save it in the root of the\nDjango project as `search/mappings/{{index_name}}.json`.\n\n## Configure Django settings\n\nThe Django settings for search are contained in a dictionary called\n`SEARCH_SETTINGS`, which should be in the main `django.conf.settings`\nfile. The dictionary has three root nodes, `connections`, `indexes` and\n`settings`. Below is an example:\n\n```python\n\n    SEARCH_SETTINGS = {\n        'connections': {\n            'default': getenv('ELASTICSEARCH_URL'),\n            'backup': {\n                # all Elasticsearch init kwargs can be used here\n                'cloud_id': '{{ cloud_id }}'\n            }\n        },\n        'indexes': {\n            'blog': {\n                'models': [\n                    'website.BlogPost',\n                ]\n            }\n        },\n        'settings': {\n            # batch size for ES bulk api operations\n            'chunk_size': 500,\n            # default page size for search results\n            'page_size': 25,\n            # set to True to connect post_save/delete signals\n            'auto_sync': True,\n            # List of models which will never auto_sync even if auto_sync is True\n            'never_auto_sync': [],\n            # if true, then indexes must have mapping files\n            'strict_validation': False\n        }\n    }\n```\n\nThe `connections` node is (hopefully) self-explanatory - we support\nmultiple connections, but in practice you should only need the one -\n'default' connection. This is the URL used to connect to your ES\ninstance. The `settings` node contains site-wide search settings. The\n`indexes` nodes is where we configure how Django and ES play together,\nand is where most of the work happens.\n\nNote that prior to v8.2 the connection value had to be a connection\nstring; since v8.2 this can still be a connection string, but can also\nbe a dictionary that contains any kwarg that can be passed to the\n`Elasticsearch` init method.\n\n**Index settings**\n\nInside the index node we have a collection of named indexes - in this\ncase just the single index called `blog`. Inside each index we have a\n`models` key which contains a list of Django models that should appear\nin the index, denoted in `app.ModelName` format. You can have multiple\nmodels in an index, and a model can appear in multiple indexes. How\nmodels and indexes interact is described in the next section.\n\n**Configuration Validation**\n\nWhen the app boots up it validates the settings, which involves the\nfollowing:\n\n1. Do each of the indexes specified have a mapping file?\n2. Do each of the models implement the required mixins?\n\n## Implement search document mixins\n\nSo far we have configured Django to know the names of the indexes we\nwant, and the models that we want to index. What it doesn't yet know is\nwhich objects to index, and how to convert an object to its search index\ndocument. This is done by implementing two separate mixins -\n`SearchDocumentMixin` and `SearchDocumentManagerMixin`. The\nconfiguration validation routine will tell you if these are not\nimplemented. **SearchDocumentMixin**\n\nThis mixin is responsible for the seaerch index document format. We are\nindexing JSON representations of each object, and we have two methods on\nthe mixin responsible for outputting the correct format -\n`as_search_document` and `as_search_document_update`.\n\nAn aside on the mechanics of the `auto_sync` process, which is hooked up\nusing Django's `post_save` and `post_delete` model signals. ES supports\npartial updates to documents that already exist, and we make a\nfundamental assumption about indexing models - that **if you pass the\n`update_fields` kwarg to a `model.save` method call, then you are\nperforming a partial update**, and this will be propagated to ES as a\npartial update only.\n\nTo this end, we have two methods for generating the model's JSON\nrepresentation - `as_search_document`, which should return a dict that\nrepresents the entire object; and `as_search_document_update`, which\ntakes the `update_fields` kwarg. This method handler two partial update\n'strategies', defined in the `SEARCH_SETTINGS`, 'full' and 'partial'.\nThe default 'full' strategy simply proxies the `as_search_document`\nmethod - i.e. partial updates are treated as a full document update. The\n'partial' strategy is more intelligent - it will map the update_fields\nspecified to the field names defined in the index mapping files. If a\nfield name is passed into the save method but is not in the mapping\nfile, it is ignored. In addition, if the underlying Django model field\nis a related object, a `ValueError` will be raised, as we cannot\nserialize this automatically. In this scenario, you will need to\noverride the method in your subclass - see the code for more details.\n\nTo better understand this, let us say that we have a model (`MyModel`)\nthat is configured to be included in an index called `myindex`. If we\nsave an object, without passing `update_fields`, then this is considered\na full document update, which triggers the object's\n`index_search_document` method:\n\n```python\nobj = MyModel.objects.first()\nobj.save()\n...\n# AUTO_SYNC=true will trigger a re-index of the complete object document:\nobj.index_search_document(index='myindex')\n```\n\nHowever, if we only want to update a single field (say the `timestamp`),\nand we pass this in to the save method, then this will trigger the\n`update_search_document` method, passing in the names of the fields that\nwe want updated.\n\n```python\n# save a single field on the object\nobj.save(update_fields=['timestamp'])\n...\n# AUTO_SYNC=true will trigger a partial update of the object document\nobj.update_search_document(index, update_fields=['timestamp'])\n```\n\nWe pass the name of the index being updated as the first arg, as objects may have different representations in different indexes:\n\n```python\n    def as_search_document(self, index):\n        return {'name': \"foo\"} if index == 'foo' else {'name': \"bar\"}\n```\n\nIn the case of the second method, the simplest possible implementation\nwould be a dictionary containing the names of the fields being updated\nand their new values, and this is the default implementation. If the\nfields passed in are simple fields (numbers, dates, strings, etc.) then\na simple `{'field_name': getattr(obj, field_name}` is returned. However,\nif the field name relates to a complex object (e.g. a related object)\nthen this method will raise an `InvalidUpdateFields` exception. In this\nscenario you should override the default implementationwith one of your\nown.\n\n```python\n\n    def as_search_document_update(self, index, update_fields):\n        if 'user' in update_fields:\n            # remove so that it won't raise a ValueError\n            update_fields.remove('user')\n            doc = super().as_search_document_update(index, update_fields)\n            doc['user'] = self.user.get_full_name()\n            return doc\n        return super().as_search_document_update(index, update_fields)\n```\n\nThe reason we have split out the update from the full-document index\ncomes from a real problem that we ourselves suffered. The full object\nrepresentation that we were using was quite DB intensive - we were\nstoring properties of the model that required walking the ORM tree.\nHowever, because we were also touching the objects (see below) to record\nactivity timestamps, we ended up flooding the database with queries\nsimply to update a single field in the output document. Partial updates\nsolves this issue:\n\n```python\n\n    def touch(self):\n        self.timestamp = now()\n        self.save(update_fields=['timestamp'])\n\n    def as_search_document_update(self, index, update_fields):\n        if list(update_fields) == ['timestamp']:\n            # only propagate changes if it's +1hr since the last timestamp change\n            if now() - self.timestamp < timedelta(hours=1):\n                return {}\n            else:\n                return {'timestamp': self.timestamp}\n        ....\n```\n\n**Processing updates async**\n\nIf you are generating a lot of index updates you may want to run them\nasync (via some kind of queueing mechanism). There is no built-in method\nto do this, given the range of queueing libraries and patterns\navailable, however it is possible using the `pre_index`, `pre_update`\nand `pre_delete` signals. In this case, you should also turn off\n`AUTO_SYNC` (as this will run the updates synchronously), and process\nthe updates yourself. The signals pass in the kwargs required by the\nrelevant model methods, as well as the `instance` involved:\n\n```python\n# ensure that SEARCH_AUTO_SYNC=False\n\nfrom django.dispatch import receiver\nimport django_rq\nfrom elasticsearch_django.signals import (\n    pre_index,\n    pre_update,\n    pre_delete\n)\n\nqueue = django_rq.get_queue(\"elasticsearch\")\n\n\n@receiver(pre_index, dispatch_uid=\"async_index_document\")\ndef index_search_document_async(sender, **kwargs):\n    \"\"\"Queue up search index document update via RQ.\"\"\"\n    instance = kwargs.pop(\"instance\")\n    queue.enqueue(\n        instance.update_search_document,\n        index=kwargs.pop(\"index\"),\n    )\n\n\n@receiver(pre_update, dispatch_uid=\"async_update_document\")\ndef update_search_document_async(sender, **kwargs):\n    \"\"\"Queue up search index document update via RQ.\"\"\"\n    instance = kwargs.pop(\"instance\")\n    queue.enqueue(\n        instance.index_search_document,\n        index=kwargs.pop(\"index\"),\n        update_fields=kwargs.pop(\"update_fields\"),\n    )\n\n\n@receiver(pre_delete, dispatch_uid=\"async_delete_document\")\ndef delete_search_document_async(sender, **kwargs):\n    \"\"\"Queue up search index document deletion via RQ.\"\"\"\n    instance = kwargs.pop(\"instance\")\n    queue.enqueue(\n        instance.delete_search_document,\n        index=kwargs.pop(\"index\"),\n    )\n```\n\n**SearchDocumentManagerMixin**\n\nThis mixin must be implemented by the model's default manager\n(`objects`). It also requires a single method implementation -\n`get_search_queryset()` - which returns a queryset of objects that are\nto be indexed. This can also use the `index` kwarg to provide different\nsets of objects to different indexes.\n\n```python\n    def get_search_queryset(self, index='_all'):\n        return self.get_queryset().filter(foo='bar')\n```\n\nWe now have the bare bones of our search implementation. We can now use\nthe included management commands to create and populate our search\nindex:\n\n```shell\n# create the index 'foo' from the 'foo.json' mapping file\n$ ./manage.py create_search_index foo\n\n# populate foo with all the relevant objects\n$ ./manage.py update_search_index foo\n```\n\nThe next step is to ensure that our models stay in sync with the index.\n\n## Add model signal handlers to update index\n\nIf the setting `auto_sync` is True, then on `AppConfig.ready` each model\nconfigured for use in an index has its `post_save` and `post_delete`\nsignals connected. This means that they will be kept in sync across all\nindexes that they appear in whenever the relevant model method is\ncalled. (There is some very basic caching to prevent too many updates -\nthe object document is cached for one minute, and if there is no change\nin the document the index update is ignored.)\n\nThere is a **VERY IMPORTANT** caveat to the signal handling. It will\n**only** pick up on changes to the model itself, and not on related\n(`ForeignKey`, `ManyToManyField`) model changes. If the search document\nis affected by such a change then you will need to implement additional\nsignal handling yourself.\n\nIn addition to `object.save()`, SeachDocumentMixin also provides the\n`update_search_index(self, action, index='_all', update_fields=None,\nforce=False)` method. Action should be 'index', 'update' or 'delete'.\nThe difference between 'index' and 'update' is that 'update' is a\npartial update that only changes the fields specified, rather than\nre-updating the entire document. If `action` is 'update' whilst\n`update_fields` is None, action will be changed to `index`.\n\nWe now have documents in our search index, kept up to date with their\nDjango counterparts. We are ready to start querying ES.\n\n---\n\n# Search Queries (How to Search)\n\n## Running search queries\n\n**SearchQuery**\n\nThe `elasticsearch_django.models.SearchQuery` model wraps this\nfunctionality up and provides helper properties, as well as logging the\nquery:\n\n```python\nfrom elasticsearch_django.settings import get_client\nfrom elasticsearch_django.models import execute_search\n\n# run a default match_all query\nsq = execute_search(index=\"foo\", query={\"match_all\": {}})\n# the raw response is stored on the return object,\n# but is not stored on the object in the database.\nprint(sq.response)\n```\n\nCalling the `execute_search` function will execute the underlying\nsearch, log the query JSON, the number of hits, and the list of hit meta\ninformation for future analysis. The `execute` method also includes\nthese additional kwargs:\n\n* `user` - the user who is making the query, useful for logging\n* `search_terms` - the search query supplied by the user (as opposed to\n  the DSL) - not used by ES, but stored in the logs\n* `reference` - a free text reference field - used for grouping searches\n  together - could be session id.\n* `save` - by default the SearchQuery created will be saved, but passing\n  in False will prevent this.\n\n## Converting search hits into Django objects\n\nRunning a search against an index will return a page of results, each\ncontaining the `_source` attribute which is the search document itself\n(as created by the `SearchDocumentMixin.as_search_document` method),\ntogether with meta info about the result - most significantly the\nrelevance **score**, which is the magic value used for ranking\n(ordering) results. However, the search document probably doesn't\ncontain all the of the information that you need to display the result,\nso what you really need is a standard Django QuerySet, containing the\nobjects in the search results, but maintaining the order. This means\ninjecting the ES score into the queryset, and then using it for\nordering. There is a method on the `SearchDocumentManagerMixin` called\n`from_search_query` which will do this for you. It uses raw SQL to add\nthe score as an annotation to each object in the queryset. (It also adds\nthe 'rank' - so that even if the score is identical for all hits, the\nordering is preserved.)\n\n```python\nfrom models import BlogPost\n\n# run a default match_all query\nsq = execute_search(index=\"blog\", query={\"match_all\": {}})\nfor obj in BlogPost.objects.from_search_query(sq):\n    print obj.search_score, obj.search_rank\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Elasticsearch Django app.",
    "version": "8.5.2",
    "project_urls": {
        "Homepage": "https://github.com/yunojuno/elasticsearch-django",
        "Repository": "https://github.com/yunojuno/elasticsearch-django"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "817b44cf5729fe8badbc77270cc502a3ed2668df9ea9b72a74b4bc85f75b177a",
                "md5": "506e1094effe624b70d56290d8081c53",
                "sha256": "6c91d506c3df6d232924e182cdd94171c5450d0a527a47fbee25dd41d8259183"
            },
            "downloads": -1,
            "filename": "elasticsearch_django-8.5.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "506e1094effe624b70d56290d8081c53",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 37922,
            "upload_time": "2023-11-25T13:27:04",
            "upload_time_iso_8601": "2023-11-25T13:27:04.744678Z",
            "url": "https://files.pythonhosted.org/packages/81/7b/44cf5729fe8badbc77270cc502a3ed2668df9ea9b72a74b4bc85f75b177a/elasticsearch_django-8.5.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8862dd52e2112c8ce1ae32feb38788d3d8af5b598d28fa02c6a179fc7c6dd0e7",
                "md5": "cff8b97ada69c3c53aeb2a3fb2f0b144",
                "sha256": "c80b34f4e33255fd95627f7525832096d6e6c81585b8afc1be0ac38875cd75fc"
            },
            "downloads": -1,
            "filename": "elasticsearch_django-8.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "cff8b97ada69c3c53aeb2a3fb2f0b144",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 32356,
            "upload_time": "2023-11-25T13:27:07",
            "upload_time_iso_8601": "2023-11-25T13:27:07.149733Z",
            "url": "https://files.pythonhosted.org/packages/88/62/dd52e2112c8ce1ae32feb38788d3d8af5b598d28fa02c6a179fc7c6dd0e7/elasticsearch_django-8.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-25 13:27:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yunojuno",
    "github_project": "elasticsearch-django",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "tox": true,
    "lcname": "elasticsearch-django"
}

YunoJuno