wagtail-meilisearch


Namewagtail-meilisearch JSON
Version 0.17.1 PyPI version JSON
download
home_pagehttps://github.com/hactar-is/wagtail-meilisearch
SummaryA MeiliSearch backend for Wagatil
upload_time2024-10-01 09:28:09
maintainerNone
docs_urlNone
authorHactar
requires_python<4.0,>=3.8
licenseMIT
keywords wagtail django search meilisearch
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Wagtail MeiliSearch

This is a (beta) Wagtail search backend for the [MeiliSearch](https://github.com/meilisearch/MeiliSearch) search engine.


## Installation

`poetry add wagtail_meilisearch` or `pip install wagtail_meilisearch`

## Upgrading

If you're upgrading MeiliSearch from 0.9.x to anything higher, you will need to destroy and re-create MeiliSearch's data.ms directory.

## Configuration

See the [MeiliSearch docs](https://docs.meilisearch.com/guides/advanced_guides/installation.html#environment-variables-and-flags) for info on the values you want to add here.

```
WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')
    },
}
```

## Update strategies

Indexing a very large site with `python manage.py update_index` can be pretty taxing on the CPU, take quite a long time, and reduce the responsiveness of the MeiliSearch server. Wagtail-MeiliSearch offers two update strategies, `soft` and `hard`. The default, `soft` strategy will do an "add or update" call for each document sent to it, while the `hard` strategy will delete every document in the index and then replace them.

There are tradeoffs with either strategy - `hard` will guarantee that your search data matches your model data, but be hard work on the CPU for longer. `soft` will be faster and less CPU intensive, but if a field is removed from your model between indexings, that field data will remain in the search index.

One useful trick is to tell Wagtail that you have two search backends, with the default backend set to do `soft` updates that you can run nightly, and a second backend with `hard` updates that you can run less frequently.

```
WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')
    },
    'hard': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', ''),
        'UPDATE_STRATEGY': 'hard'
    }
}
```

If you use this technique, remember to pass the backend name into the `update_index` command otherwise both will run.

`python manage.py update_index --backend default` for a soft update
`python manage.py update_index --backend hard` for a hard update

### Delta strategy

The `delta` strategy is useful if you habitually add created_at and updated_at timestamps to your models. This strategy will check the fields...

* `first_published_at`
* `last_published_at`
* `created_at`
* `updated_at`

And only update the records for objects where one or more of these fields has a date more recent than the time delta specified in the settings.

```
WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')
        'UPDATE_STRATEGY': delta,
        'UPDATE_DELTA': {
            'weeks': -1
        }
    }
}
```

If the delta is set to `{'weeks': -1}`, wagtail-meilisearch will only update indexes for documents where one of the timestamp fields has a date within the last week. Your time delta _must_ be a negative.

Under the hood we use [Arrow](https://arrow.readthedocs.io), so you can use any keyword args supported by [Arrow's `shift()`](https://arrow.readthedocs.io/en/latest/index.html#replace-shift).

If you set `UPDATE_STRATEGY` to `delta` but don't provide a value for `UPDATE_DELTA` wagtail-meilisearch will default to `{'weeks': -1}`.

## Skip models

Sometimes you might have a site where a certain page model is guaranteed not to change, for instance an archive section. After creating your initial search index, you can add a `SKIP_MODELS` key to the config to tell wagtail-meilisearch to ignore specific models when running `update_index`. Behind the scenes wagtail-meilisearch returns a dummy model index to the `update_index` management command for every model listed in your `SKIP_MODELS` - this ensures that this setting only affects `update_index`, so if you manually edit one of the models listed it should get re-indexed with the update signal.

```
WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', ''),
        'UPDATE_STRATEGY': 'delta',
        'SKIP_MODELS': [
            'core.ArchivePage',
        ]
    }
}
```

## Stop Words

Stop words are words for which we don't want to place significance on their frequency. For instance, the search query `tom and jerry` would return far less relevant results if the word `and` was given the same importance as `tom` and `jerry`. There's a fairly sane list of English language stop words supplied, but you can also supply your own. This is particularly useful if you have a lot of content in any other language.

```
MY_STOP_WORDS = ['a', 'list', 'of', 'words']

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        [...]
        'STOP_WORDS': MY_STOP_WORDS
    },
}
```

Or alternatively, you can extend the built in list.

```
from wagtail_meilisearch.settings import STOP_WORDS

MY_STOP_WORDS = STOP_WORDS + WELSH_STOP_WORDS + FRENCH_STOP_WORDS

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        [...]
        'STOP_WORDS': MY_STOP_WORDS
    },
}
```


## Query limits

If you have a lot of DB documents, the final query to the database can be quite a heavy load. Meilisearch's relevance means that it's usually pretty safe to restrict the number of documents Meilisearch returns, and therefore the number of documents your app needs to get from the database. The limit is **per model**, so if your project has 10 page types and you set a limit of 1000, there's a possible 10000 results.

```
WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        [...]
        'QUERY_LIMIT': 1000
    },
}
```

## Contributing

If you want to help with the development I'd be more than happy. The vast majority of the heavy lifting is done by MeiliSearch itself, but there is a TODO list...


### TODO

* Faceting
* Write tests
* Performance improvements
* Make use of the async in meilisearch-python
* ~~Implement boosting in the sort algorithm~~
* ~~Implement stop words~~
* ~~Search results~~
* ~~Add support for the autocomplete api~~
* ~~Ensure we're getting results by relevance~~

## Change Log

#### 0.17.1
* Fixes a bug where multi_search can fail when a model index doesn't exist. For models have no documents meilisearch doesn't create the empty index, so we need to check active indexes before calling multi_search otherwise the entire call fails.

#### 0.17.0
* A few small performance and reliability improvements, and a lot of refactoring of the code into multiple files to make future development a bit simpler.

#### 0.16.0
* Thanks to @BertrandBordage, a massive speed improvement through using the /multi-search endpoint introduced in Meilisearch 1.1.0

#### 0.14.0
* Adds Django 4 support and compatibility with the latest meilisearch server (0.30.2) and meilisearch python (0.23.0)

#### 0.14.0
* Updates to work with the latest versions of Meilisearch (v0.28.1) and meilisearch-python (^0.19.1)

#### 0.13.0
* Yanked, sorry

#### 0.12.0
* Adds QUERY_LIMIT option to settings

#### 0.11.0
* Compatibility changes to keep up with MeiliSearch and [meilisearch-python](https://github.com/meilisearch/meilisearch-python)
* we've also switched to more closely tracking the major and minor version numbers of meilisearch-python so that it's easier to see compatibility at a glance.
* Note: if you're upgrading from an old version of MeiliSearch you may need to destroy MeiliSearch's data directory and start with a clean index.

#### 0.1.5
* Adds the delta update strategy
* Adds the SKIP_MODELS setting
* Adds support for using boost on your search fields


### Thanks

Thank you to the devs of [Wagtail-Whoosh](https://github.com/wagtail/wagtail-whoosh). Reading the code over there was the only way I could work out how Wagtail Search backends are supposed to work.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hactar-is/wagtail-meilisearch",
    "name": "wagtail-meilisearch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": null,
    "keywords": "wagtail, django, search, meilisearch",
    "author": "Hactar",
    "author_email": "systems@hactar.is",
    "download_url": "https://files.pythonhosted.org/packages/4e/fd/32b966693dc9c0a099863d583d247e03a7a1567e9c88c6210111ed4f5cfe/wagtail_meilisearch-0.17.1.tar.gz",
    "platform": null,
    "description": "# Wagtail MeiliSearch\n\nThis is a (beta) Wagtail search backend for the [MeiliSearch](https://github.com/meilisearch/MeiliSearch) search engine.\n\n\n## Installation\n\n`poetry add wagtail_meilisearch` or `pip install wagtail_meilisearch`\n\n## Upgrading\n\nIf you're upgrading MeiliSearch from 0.9.x to anything higher, you will need to destroy and re-create MeiliSearch's data.ms directory.\n\n## Configuration\n\nSee the [MeiliSearch docs](https://docs.meilisearch.com/guides/advanced_guides/installation.html#environment-variables-and-flags) for info on the values you want to add here.\n\n```\nWAGTAILSEARCH_BACKENDS = {\n    'default': {\n        'BACKEND': 'wagtail_meilisearch.backend',\n        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),\n        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),\n        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')\n    },\n}\n```\n\n## Update strategies\n\nIndexing a very large site with `python manage.py update_index` can be pretty taxing on the CPU, take quite a long time, and reduce the responsiveness of the MeiliSearch server. Wagtail-MeiliSearch offers two update strategies, `soft` and `hard`. The default, `soft` strategy will do an \"add or update\" call for each document sent to it, while the `hard` strategy will delete every document in the index and then replace them.\n\nThere are tradeoffs with either strategy - `hard` will guarantee that your search data matches your model data, but be hard work on the CPU for longer. `soft` will be faster and less CPU intensive, but if a field is removed from your model between indexings, that field data will remain in the search index.\n\nOne useful trick is to tell Wagtail that you have two search backends, with the default backend set to do `soft` updates that you can run nightly, and a second backend with `hard` updates that you can run less frequently.\n\n```\nWAGTAILSEARCH_BACKENDS = {\n    'default': {\n        'BACKEND': 'wagtail_meilisearch.backend',\n        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),\n        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),\n        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')\n    },\n    'hard': {\n        'BACKEND': 'wagtail_meilisearch.backend',\n        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),\n        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),\n        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', ''),\n        'UPDATE_STRATEGY': 'hard'\n    }\n}\n```\n\nIf you use this technique, remember to pass the backend name into the `update_index` command otherwise both will run.\n\n`python manage.py update_index --backend default` for a soft update\n`python manage.py update_index --backend hard` for a hard update\n\n### Delta strategy\n\nThe `delta` strategy is useful if you habitually add created_at and updated_at timestamps to your models. This strategy will check the fields...\n\n* `first_published_at`\n* `last_published_at`\n* `created_at`\n* `updated_at`\n\nAnd only update the records for objects where one or more of these fields has a date more recent than the time delta specified in the settings.\n\n```\nWAGTAILSEARCH_BACKENDS = {\n    'default': {\n        'BACKEND': 'wagtail_meilisearch.backend',\n        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),\n        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),\n        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')\n        'UPDATE_STRATEGY': delta,\n        'UPDATE_DELTA': {\n            'weeks': -1\n        }\n    }\n}\n```\n\nIf the delta is set to `{'weeks': -1}`, wagtail-meilisearch will only update indexes for documents where one of the timestamp fields has a date within the last week. Your time delta _must_ be a negative.\n\nUnder the hood we use [Arrow](https://arrow.readthedocs.io), so you can use any keyword args supported by [Arrow's `shift()`](https://arrow.readthedocs.io/en/latest/index.html#replace-shift).\n\nIf you set `UPDATE_STRATEGY` to `delta` but don't provide a value for `UPDATE_DELTA` wagtail-meilisearch will default to `{'weeks': -1}`.\n\n## Skip models\n\nSometimes you might have a site where a certain page model is guaranteed not to change, for instance an archive section. After creating your initial search index, you can add a `SKIP_MODELS` key to the config to tell wagtail-meilisearch to ignore specific models when running `update_index`. Behind the scenes wagtail-meilisearch returns a dummy model index to the `update_index` management command for every model listed in your `SKIP_MODELS` - this ensures that this setting only affects `update_index`, so if you manually edit one of the models listed it should get re-indexed with the update signal.\n\n```\nWAGTAILSEARCH_BACKENDS = {\n    'default': {\n        'BACKEND': 'wagtail_meilisearch.backend',\n        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),\n        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),\n        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', ''),\n        'UPDATE_STRATEGY': 'delta',\n        'SKIP_MODELS': [\n            'core.ArchivePage',\n        ]\n    }\n}\n```\n\n## Stop Words\n\nStop words are words for which we don't want to place significance on their frequency. For instance, the search query `tom and jerry` would return far less relevant results if the word `and` was given the same importance as `tom` and `jerry`. There's a fairly sane list of English language stop words supplied, but you can also supply your own. This is particularly useful if you have a lot of content in any other language.\n\n```\nMY_STOP_WORDS = ['a', 'list', 'of', 'words']\n\nWAGTAILSEARCH_BACKENDS = {\n    'default': {\n        'BACKEND': 'wagtail_meilisearch.backend',\n        [...]\n        'STOP_WORDS': MY_STOP_WORDS\n    },\n}\n```\n\nOr alternatively, you can extend the built in list.\n\n```\nfrom wagtail_meilisearch.settings import STOP_WORDS\n\nMY_STOP_WORDS = STOP_WORDS + WELSH_STOP_WORDS + FRENCH_STOP_WORDS\n\nWAGTAILSEARCH_BACKENDS = {\n    'default': {\n        'BACKEND': 'wagtail_meilisearch.backend',\n        [...]\n        'STOP_WORDS': MY_STOP_WORDS\n    },\n}\n```\n\n\n## Query limits\n\nIf you have a lot of DB documents, the final query to the database can be quite a heavy load. Meilisearch's relevance means that it's usually pretty safe to restrict the number of documents Meilisearch returns, and therefore the number of documents your app needs to get from the database. The limit is **per model**, so if your project has 10 page types and you set a limit of 1000, there's a possible 10000 results.\n\n```\nWAGTAILSEARCH_BACKENDS = {\n    'default': {\n        'BACKEND': 'wagtail_meilisearch.backend',\n        [...]\n        'QUERY_LIMIT': 1000\n    },\n}\n```\n\n## Contributing\n\nIf you want to help with the development I'd be more than happy. The vast majority of the heavy lifting is done by MeiliSearch itself, but there is a TODO list...\n\n\n### TODO\n\n* Faceting\n* Write tests\n* Performance improvements\n* Make use of the async in meilisearch-python\n* ~~Implement boosting in the sort algorithm~~\n* ~~Implement stop words~~\n* ~~Search results~~\n* ~~Add support for the autocomplete api~~\n* ~~Ensure we're getting results by relevance~~\n\n## Change Log\n\n#### 0.17.1\n* Fixes a bug where multi_search can fail when a model index doesn't exist. For models have no documents meilisearch doesn't create the empty index, so we need to check active indexes before calling multi_search otherwise the entire call fails.\n\n#### 0.17.0\n* A few small performance and reliability improvements, and a lot of refactoring of the code into multiple files to make future development a bit simpler.\n\n#### 0.16.0\n* Thanks to @BertrandBordage, a massive speed improvement through using the /multi-search endpoint introduced in Meilisearch 1.1.0\n\n#### 0.14.0\n* Adds Django 4 support and compatibility with the latest meilisearch server (0.30.2) and meilisearch python (0.23.0)\n\n#### 0.14.0\n* Updates to work with the latest versions of Meilisearch (v0.28.1) and meilisearch-python (^0.19.1)\n\n#### 0.13.0\n* Yanked, sorry\n\n#### 0.12.0\n* Adds QUERY_LIMIT option to settings\n\n#### 0.11.0\n* Compatibility changes to keep up with MeiliSearch and [meilisearch-python](https://github.com/meilisearch/meilisearch-python)\n* we've also switched to more closely tracking the major and minor version numbers of meilisearch-python so that it's easier to see compatibility at a glance.\n* Note: if you're upgrading from an old version of MeiliSearch you may need to destroy MeiliSearch's data directory and start with a clean index.\n\n#### 0.1.5\n* Adds the delta update strategy\n* Adds the SKIP_MODELS setting\n* Adds support for using boost on your search fields\n\n\n### Thanks\n\nThank you to the devs of [Wagtail-Whoosh](https://github.com/wagtail/wagtail-whoosh). Reading the code over there was the only way I could work out how Wagtail Search backends are supposed to work.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A MeiliSearch backend for Wagatil",
    "version": "0.17.1",
    "project_urls": {
        "Homepage": "https://github.com/hactar-is/wagtail-meilisearch",
        "Repository": "https://github.com/hactar-is/wagtail-meilisearch"
    },
    "split_keywords": [
        "wagtail",
        " django",
        " search",
        " meilisearch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a87ea332d7f4cf9cf8e927afb56c12b388def50e6585b0c21aaeeb519a6ffadb",
                "md5": "87a01cc74584ba5d941ce15f19893c76",
                "sha256": "b39189e476ae7e81ce946ab33733d28374d7478394462f6667c62d7e941f5361"
            },
            "downloads": -1,
            "filename": "wagtail_meilisearch-0.17.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "87a01cc74584ba5d941ce15f19893c76",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 18199,
            "upload_time": "2024-10-01T09:28:07",
            "upload_time_iso_8601": "2024-10-01T09:28:07.991932Z",
            "url": "https://files.pythonhosted.org/packages/a8/7e/a332d7f4cf9cf8e927afb56c12b388def50e6585b0c21aaeeb519a6ffadb/wagtail_meilisearch-0.17.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4efd32b966693dc9c0a099863d583d247e03a7a1567e9c88c6210111ed4f5cfe",
                "md5": "33df8a2365ff5627a8b8dd639935e9db",
                "sha256": "d72799873dde9e0a920b7740a6e38bed517be6231c2e3fd6a8ccd4053962fc7d"
            },
            "downloads": -1,
            "filename": "wagtail_meilisearch-0.17.1.tar.gz",
            "has_sig": false,
            "md5_digest": "33df8a2365ff5627a8b8dd639935e9db",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 16901,
            "upload_time": "2024-10-01T09:28:09",
            "upload_time_iso_8601": "2024-10-01T09:28:09.443660Z",
            "url": "https://files.pythonhosted.org/packages/4e/fd/32b966693dc9c0a099863d583d247e03a7a1567e9c88c6210111ed4f5cfe/wagtail_meilisearch-0.17.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-01 09:28:09",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hactar-is",
    "github_project": "wagtail-meilisearch",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "wagtail-meilisearch"
}
        
Elapsed time: 0.36289s