<img src="https://raw.githubusercontent.com/wallneradam/esorm/main/docs/_static/img/esorm.svg" width="110" height="110" align="left" style="margin-right: 1em; margin-bottom: 0.5em" alt="Logo"/>
# ESORM - Python ElasticSearch ORM based on Pydantic
ESORM is an ElasticSearch Object Relational Mapper or Object Document Mapper (ODM) if you like,
for Python based on Pydantic. It is a high-level library for managing ElasticSearch documents
in Python. It is fully async and uses annotations and type hints for type checking and IDE autocompletion.
## ☰ Table of Contents
- [💾 Installation](#installation)
- [🚀 Features](#features)
- [Supported ElasticSearch versions](#supported-elasticsearch-versions)
- [Supported Python versions](#supported-python-versions)
- [📖 Usage](#usage)
- [Define a model](#define-a-model)
- [Python basic types](#python-basic-types)
- [ESORM field types](#esorm-field-types)
- [Nested documents](#nested-documents)
- [List primitive fields](#list-primitive-fields)
- [ESBaseModel](#esbasemodel)
- [Id field](#id-field)
- [Model Settings](#model-settings)
- [Describe fields](#describe-fields)
- [ESModelTimestamp](#esmodeltimestamp)
- [Connecting to ElasticSearch](#connecting-to-elasticsearch)
- [Client](#client)
- [Create index templates](#create-index-templates)
- [Create indices and mappings](#create-indices-and-mappings)
- [Model instances](#model-instances)
- [CRUD: Create](#crud-create)
- [CRUD: Read](#crud-read)
- [CRUD: Update](#crud-update)
- [CRUD: Delete](#crud-delete)
- [Bulk operations](#bulk-operations)
- [Search](#search)
- [General search](#general-search)
- [Search with field value terms (dictioanry search)](#search-with-field-value-terms-dictionary-search)
- [Aggregations](#aggregations)
- [Pagination and sorting](#pagination-and-sorting)
- [🔬 Advanced usage](docs/advanced.md#advanced-usage)
- [Optimistic concurrency control](docs/advanced.md#optimistic-concurrency-control)
- [Lazy properties](docs/advanced.md#lazy-properties)
- [Shard routing](docs/advanced.md#shard-routing)
- [Retreive Selected Fields Only](docs/advanced.md#retreive-selected-fields-only)
- [Watchers](docs/advanced.md#watchers)
- [FastAPI integration](docs/advanced.md#fastapi-integration)
- [🧪 Testing](#testing)
- [🛡 License](#license)
- [📃 Citation](#citation)
<a id="installation"></a>
## 💾 Installation
```bash
pip install pyesorm
```
<a id="features"></a>
## 🚀 Features
- Pydantic model representation of ElasticSearch documents
- Automatic mapping and index creation
- CRUD operations
- Full async support (no sync version at all)
- Mapping to and from ElasticSearch types
- Support for nested documents
- Automatic optimistic concurrency control
- Custom id field
- Context for bulk operations
- Supported IDE autocompletion and type checking (PyCharm tested)
- Everything in the source code is documented and annotated
- `TypedDict`s for ElasticSearch queries and aggregations
- Docstring support for fields
- Shard routing support
- Lazy properties
- Support >= Python 3.8 (tested with 3.8 through 3.12)
- Support for ElasticSearch 8.x and 7.x
- Watcher support (You may need ElasticSearch subscription license for this)
- Pagination and sorting
- FastAPI integration
Not all ElasticSearch features are supported yet, pull requests are welcome.
<a id="supported-elasticsearch-versions"></a>
### Supported ElasticSearch versions
It is tested with ElasticSearch 7.x and 8.x.
<a id="supported-python-versions"></a>
### Supported Python versions
Tested with Python 3.8 through 3.12.
<a id="usage"></a>
## 📖 Usage
<a id="define-a-model"></a>
### Define a model
You can use all [Pydantic](https://pydantic-docs.helpmanual.io/usage/models/) model features, because `ESModel` is a subclass of `pydantic.BaseModel`.
(Actually it is a subclass of `ESBaseModel`, see more [below...](#esbasemodel))
`ESModel` extends pydantic `BaseModel` with ElasticSearch specific features. It serializes and deserializes
documents to and from ElasticSearch types and handle ElasticSearch operations in the background.
<a id="python-basic-types"></a>
#### Python basic types
```python
from esorm import ESModel
class User(ESModel):
name: str
age: int
```
This is how the python types are converted to ES types:
| Python type | ES type | Comment |
|---------------------|-----------|-----------------------------|
| `str` | `text` | |
| `int` | `long` | |
| `float` | `double` | |
| `bool` | `boolean` | |
| `datetime.datetime` | `date` | |
| `datetime.date` | `date` | |
| `datetime.time` | `date` | Stored as 1970-01-01 + time |
| `typing.Literal` | `keyword` | |
| `UUID` | `keyword` | |
| `Path` | `keyword` | |
| `IntEnum` | `integer` | |
| `Enum` | `keyword` | also StrEnum |
Some special pydanctic types are also supported:
| Pydantic type | ES type | Comment |
|-----------------|-----------|---------|
| `URL` | `keyword` | |
| `IPvAddressAny` | `ip` | |
<a id="esorm-field-types"></a>
#### ESORM field types
You can specify ElasticSearch special fields using `esorm.fields` module.
```python
from esorm import ESModel
from esorm.fields import keyword, text, byte, geo_point
class User(ESModel):
name: text
email: keyword
age: byte
location: geo_point
...
```
The supported fields are:
| Field name | ES type |
|-----------------------------|-----------------|
| `keyword` | `keyword` |
| `text` | `text` |
| `binary` | `binary` |
| `byte` | `byte` |
| `short` | `short` |
| `integer` or `int32` | `integer` |
| `long` or `int64` | `long` |
| `unsigned_long` or `uint64` | `unsigned_long` |
| `float16` or `half_float` | `half_float` |
| `float32` | `float` |
| `double` | `double` |
| `boolean` | `boolean` |
| `geo_point` | `geo_point` |
The `binary` field accepts **base64** encoded strings. However, if you provide `bytes` to it, they
will be automatically converted to a **base64** string during serialization. When you retrieve the
field, it will always be a **base64** encoded string. You can easily convert it back to bytes using
the `bytes()` method: `binary_field.bytes()`.
You can also use `Annotated` types to specify the ES type, like Pydantic `PositiveInt` and
`NegativeInt` and similar.
##### geo_point
You can use geo_point field type for location data:
```python
from esorm import ESModel
from esorm.fields import geo_point
class Place(ESModel):
name: str
location: geo_point
def create_place():
place = Place(name='Budapest', location=geo_point(lat=47.4979, long=19.0402))
place.save()
```
<a id="nested-documents"></a>
#### Nested documents
```python
from esorm import ESModel
from esorm.fields import keyword, text, byte
class User(ESModel):
name: text
email: keyword
age: byte = 18
class Post(ESModel):
title: text
content: text
writer: User # User is a nested document
```
<a id="list-primitive-fields"></a>
#### List primitive fields
You can use list of primitive fields:
```python
from typing import List
from esorm import ESModel
class User(ESModel):
emails: List[str]
favorite_ids: List[int]
...
```
<a id="esbasemodel"></a>
#### ESBaseModel
`ESBaseModel` is the base of `ESModel`.
##### Use it for abstract models
```python
from esorm import ESModel, ESBaseModel
from esorm.fields import keyword, text, byte
# This way `User` model won't be in the index
class BaseUser(ESBaseModel): # <---------------
# This config will be inherited by User
class ESConfig:
id_field = 'email'
name: text
email: keyword
# This will be in the index because it is a subclass of ESModel
class UserExtended(BaseUser, ESModel):
age: byte = 18
async def create_user():
user = UserExtended(
name='John Doe',
email="john@example.com",
age=25
)
await user.save()
```
##### Use it for nested documents
It is useful to use it for nested documents, because by using it will not be included in the
ElasticSearch index.
```python
from esorm import ESModel, ESBaseModel
from esorm.fields import keyword, text, byte
# This way `User` model won't be in the index
class User(ESBaseModel): # <---------------
name: text
email: keyword
age: byte = 18
class Post(ESModel):
title: text
content: text
writer: User # User is a nested document
```
<a id="id-field"></a>
#### Id field
You can specify id field in [model settings](#model-settings):
```python
from esorm import ESModel
from esorm.fields import keyword, text, byte
class User(ESModel):
class ESConfig:
id_field = 'email'
name: text
email: keyword
age: byte = 18
```
This way the field specified in `id_field` will be removed from the document and used as the document `_id` in the
index.
If you specify a field named `id` in your model, it will be used as the document `_id` in the index
(it will automatically override the `id_field` setting):
```python
from esorm import ESModel
class User(ESModel):
id: int # This will be used as the document _id in the index
name: str
```
You can also create an `__id__` property in your model to return a custom id:
```python
from esorm import ESModel
from esorm.fields import keyword, text, byte
class User(ESModel):
name: text
email: keyword
age: byte = 18
@property
def __id__(self) -> str:
return self.email
```
NOTE: annotation of `__id__` method is important, and it must be declared as a property.
<a id="model-settings"></a>
#### Model Settings
You can specify model settings using `ESConfig` child class.
```python
from typing import Optional, List, Dict, Any
from esorm import ESModel
class User(ESModel):
class ESConfig:
""" ESModel Config """
# The index name
index: Optional[str] = None
# The name of the 'id' field
id_field: Optional[str] = None
# Default sort
default_sort: Optional[List[Dict[str, Dict[str, str]]]] = None
# ElasticSearch index settings (https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html)
settings: Optional[Dict[str, Any]] = None
# Maximum recursion depth of lazy properties
lazy_property_max_recursion_depth: int = 1
```
<a id="esmodeltimestamp"></a>
#### ESModelTimestamp
You can use `ESModelTimestamp` class to add `created_at` and `updated_at` fields to your model:
```python
from esorm import ESModelTimestamp
class User(ESModelTimestamp):
name: str
age: int
```
These fields will be automatically updated to the actual `datetime` when you create or update a document.
The `created_at` field will be set only when you create a document. The `updated_at` field will be set
when you create or update a document.
<a id="describe-fields"></a>
#### Describe fields
You can use the usual `Pydantic` field description, but you can also use docstrings like this:
```python
from esorm import ESModel
from esorm.fields import TextField
class User(ESModel):
name: str = 'John Doe'
""" The name of the user """
age: int = 18
""" The age of the user """
# This is the usual Pydantic way, but I think docstrings are more intuitive and readable
address: str = TextField(description="The address of the user")
```
The documentation is usseful if you create an API and you want to generate documentation from the model.
It can be used in [FastAPI](https://fastapi.tiangolo.com/) for example.
<a id="aliases"></a>
### Aliases
You can specify aliases for fields:
```python
from esorm import ESModel
from esorm.fields import keyword, Field
class User(ESModel):
full_name: keyword = Field(alias='fullName') # In ES `fullName` will be the field name
```
This is good for renaming fields in the model without changing the ElasticSearch field name.
<a id="connecting-to-elasticsearch"></a>
### Connecting to ElasticSearch
You can connect with a simple connection string:
```python
from esorm import connect
async def es_init():
await connect('localhost:9200')
```
Also you can connect to multiple hosts if you have a cluster:
```python
from esorm import connect
async def es_init():
await connect(['localhost:9200', 'localhost:9201'])
```
You can wait for node or cluster to be ready (recommended):
```python
from esorm import connect
async def es_init():
await connect('localhost:9200', wait=True)
```
This will ping the node in 2 seconds intervals until it is ready. It can be a long time.
You can pass any arguments that `AsyncElasticsearch` supports:
```python
from esorm import connect
async def es_init():
await connect('localhost:9200', wait=True, sniff_on_start=True, sniff_on_connection_fail=True)
```
<a id="client"></a>
#### Client
The `connect` function is a wrapper for the `AsyncElasticsearch` constructor. It creates and stores
a global instance of a proxy to an `AsyncElasticsearch` instance. The model operations will use this
instance to communicate with ElasticSearch. You can retrieve the proxy client instance and you can
use the same way as `AsyncElasticsearch` instance:
```python
from esorm import es
async def es_init():
await es.ping()
```
<a id="create-index-templates"></a>
### Create index templates
You can create index templates easily:
```python
from esorm import model as esorm_model
# Create index template
async def prepare_es():
await esorm_model.create_index_template('default_template',
prefix_name='esorm_',
shards=3,
auto_expand_replicas='1-5')
```
Here this will be applied all `esorm_` prefixed (default) indices.
All indices created by ESORM have a prefix, which you can modify globally if you want:
```python
from esorm.model import set_default_index_prefix
set_default_index_prefix('custom_prefix_')
```
The default prefix is `esorm_`.
<a id="create-indices-and-mappings"></a>
### Create indices and mappings
You can create indices and mappings automatically from your models:
```python
from esorm import setup_mappings
# Create indices and mappings
async def prepare_es():
import models # Import your models
# Here models argument is not needed, but you can pass it to prevent unused import warning
await setup_mappings(models)
```
First you must create (import) all model classes. Model classes will be registered into a global registry.
Then you can call `setup_mappings` function to create indices and mappings for all registered models.
**IMPORTANT:** This method will ignore mapping errors if you already have an index with the same name. It can update the
indices
by new fields, but cannot modify or delete fields! For that you need to reindex your ES database. It is an ElasticSearch
limitation.
<a id="model-instances"></a>
### Model instances
When you get a model instance from elasticsearch by `search` or `get` methods, you will get the following private
attributes filled automatically:
| Attribute | Description |
|-----------------|-------------------------------------|
| `_id` | The ES id of the document |
| `_routing` | The routing value of the document |
| `_version` | Version of the document |
| `_primary_term` | The primary term of the document |
| `_seq_no` | The sequence number of the document |
<a id="crud-create"></a>
### CRUD: Create
```python
from esorm import ESModel
# Here the model have automatically generated id
class User(ESModel):
name: str
age: int
async def create_user():
# Create a new user
user = User(name='John Doe', age=25)
# Save the user to ElasticSearch
new_user_id = await user.save()
print(new_user_id)
```
<a id="crud-read"></a>
### CRUD: Read
```python
from esorm import ESModel
# Here the model have automatically generated id
class User(ESModel):
name: str
age: int
async def get_user(user_id: str):
user = await User.get(user_id)
print(user.name)
```
<a id="crud-update"></a>
### CRUD: Update
On update race conditions are checked automatically (with the help of _primary_term and _seq_no fields).
This way an optimistic locking mechanism is implemented.
```python
from esorm import ESModel
# Here the model have automatically generated id
class User(ESModel):
name: str
age: int
async def update_user(user_id: str):
user = await User.get(user_id)
user.name = 'Jane Doe'
await user.save()
```
<a id="crud-delete"></a>
### CRUD: Delete
```python
from esorm import ESModel
# Here the model have automatically generated id
class User(ESModel):
name: str
age: int
async def delete_user(user_id: str):
user = await User.get(user_id)
await user.delete()
```
<a id="bulk-operations"></a>
### Bulk operations
Bulk operations could be much faster than single operations, if you have lot of documents to
create, update or delete.
You can use context for bulk operations:
```python
from typing import List
from esorm import ESModel, ESBulk
# Here the model have automatically generated id
class User(ESModel):
name: str
age: int
async def bulk_create_users():
async with ESBulk() as bulk:
# Creating or modifiying models
for i in range(10):
user = User(name=f'User {i}', age=i)
await bulk.save(user)
async def bulk_delete_users(users: List[User]):
async with ESBulk(wait_for=True) as bulk: # Here we wait for the bulk operation to finish
# Deleting models
for user in users:
await bulk.delete(user)
```
The `wait_for` argument is optional. If it is `True`, the context will wait for the bulk operation to finish.
<a id="search"></a>
### Search
<a id="general-search"></a>
#### General search
You can search for documents using `search` method, where an ES query can be specified as a dictionary.
You can use `res_dict=True` argument to get the result as a dictionary instead of a list. The key will be the
`id` of the document: `await User.search(query, res_dict=True)`.
If you only need one result, you can use `search_one` method.
```python
from esorm import ESModel
# Here the model have automatically generated id
class User(ESModel):
name: str
age: int
async def search_users():
# Search for users at least 18 years old
users = await User.search(
query={
'bool': {
'must': [{
'range': {
'age': {
'gte': 18
}
}
}]
}
}
)
for user in users:
print(user.name)
async def search_one_user():
# Search a user named John Doe
user = await User.search_one(
query={
'bool': {
'must': [{
'match': {
'name': {
'query': 'John Doe'
}
}
}]
}
}
)
print(user.name)
```
Queries are type checked, because they are annotated as `TypedDict`s. You can use IDE autocompletion and type checking.
<a id="search-with-field-value-terms-dictionary-search"></a>
#### Search with field value terms (dictionary search)
You can search for documents using `search_by_fields` method, where you can specify a field and a value.
It also has a `res_dict` argument and `search_one_by_fields` variant.
```python
from esorm import ESModel
# Here the model have automatically generated id
class User(ESModel):
name: str
age: int
async def search_users():
# Search users age is 18
users = await User.search_by_fields({'age': 18})
for user in users:
print(user.name)
```
<a id="aggregations"></a>
### Aggregations
You can use `aggregate` method to get aggregations.
You can specify an ES aggregation query as a dictionary. It also accepts normal ES queries,
to be able to fiter which documents you want to aggregate.
Both the aggs parameter and the query parameter are type checked, because they are annotated as `TypedDict`s.
You can use IDE autocompletion and type checking.
```python
from esorm import ESModel
# Here the model have automatically generated id
class User(ESModel):
name: str
age: int
country: str
async def aggregate_avg():
# Get average age of users
aggs_def = {
'avg_age': {
'avg': {
'field': 'age'
}
}
}
aggs = await User.aggregate(aggs_def)
print(aggs['avg_age']['value'])
async def aggregate_avg_by_country(country = 'Hungary'):
# Get average age of users by country
aggs_def = {
'avg_age': {
'avg': {
'field': 'age'
}
}
}
query = {
'bool': {
'must': [{
'match': {
'country': {
'query': country
}
}
}]
}
}
aggs = await User.aggregate(aggs_def, query)
print(aggs['avg_age']['value'])
async def aggregate_terms():
# Get number of users by country
aggs_def = {
'countries': {
'terms': {
'field': 'country'
}
}
}
aggs = await User.aggregate(aggs_def)
for bucket in aggs['countries']['buckets']:
print(bucket['key'], bucket['doc_count'])
```
<a id="pagination-and-sorting"></a>
### Pagination and sorting
You can use `Pagination` and `Sort` classes to decorate your models. They simply wrap your models
and add pagination and sorting functionality to them.
#### Pagination
You can add a callback parameter to the `Pagination` class which will be invoked after the search with
the total number of documents found.
```python
from esorm.model import ESModel, Pagination
class User(ESModel):
id: int # This will be used as the document _id in the index
name: str
age: int
def get_users(page = 1, page_size = 10):
def pagination_callback(total: int):
# You may set a header value or something else here
print(f'Total users: {total}')
# 1st create the decorator itself
pagination = Pagination(page=page, page_size=page_size)
# Then decorate your model
res = pagination(User).search_by_fields(age=18)
# Here the result has maximum 10 items
return res
```
#### Sorting
It is similar to pagination:
```python
from esorm.model import ESModel, Sort
class User(ESModel):
id: int # This will be used as the document _id in the index
name: str
age: int
def get_users():
# 1st create the decorator itself
sort = Sort(sort=[
{'age': {'order': 'desc'}},
{'name': {'order': 'asc'}}
])
# Then decorate your model
res = sort(User).search_by_fields(age=18)
# Here the result is sorted by age ascending
return res
def get_user_sorted_by_name():
# You can also use this simplified syntax
sort = Sort(sort='name')
# Then decorate your model
res = sort(User).all()
# Here the result is sorted by age descending
return res
```
<a id="testing"></a>
## 🧪 Testing
For testing you can use the `test.sh` in the root directory. It is a script to running
tests on multiple python interpreters in virtual environments. At the top of the file you can specify
which python interpreters you want to test. The ES versions are specified in `tests/docker-compose.yml` file.
If you already have a virtual environment, simply use `pytest` to run the tests.
<a id="license"></a>
## 🛡 License
This project is licensed under the terms of the [Mozilla Public License 2.0](https://www.mozilla.org/en-US/MPL/2.0/) (
MPL 2.0) license.
<a id="citation"></a>
## 📃 Citation
If you use this project in your research, please cite it using the following BibTeX entry:
```bibtex
@misc{esorm,
author = {Adam Wallner},
title = {ESORM: ElasticSearch Object Relational Mapper},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/wallneradam/esorm}},
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/wallneradam/esorm",
"name": "pyesorm",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "ElasticSearch, ORM, Pydantic",
"author": "Adam Wallner",
"author_email": "adam.wallner@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/ce/74/1dfe9dbd515b777feb5a3f65c27a8516bb30928949ffa540716c8aa20654/pyesorm-0.6.7.tar.gz",
"platform": null,
"description": "<img src=\"https://raw.githubusercontent.com/wallneradam/esorm/main/docs/_static/img/esorm.svg\" width=\"110\" height=\"110\" align=\"left\" style=\"margin-right: 1em; margin-bottom: 0.5em\" alt=\"Logo\"/>\n\n# ESORM - Python ElasticSearch ORM based on Pydantic\n\nESORM is an ElasticSearch Object Relational Mapper or Object Document Mapper (ODM) if you like,\n for Python based on Pydantic. It is a high-level library for managing ElasticSearch documents\n in Python. It is fully async and uses annotations and type hints for type checking and IDE autocompletion.\n \n## \u2630 Table of Contents\n\n- [\ud83d\udcbe\u2003Installation](#installation)\n- [\ud83d\ude80\u2003Features](#features)\n - [Supported ElasticSearch versions](#supported-elasticsearch-versions)\n - [Supported Python versions](#supported-python-versions)\n- [\ud83d\udcd6\u2003Usage](#usage)\n - [Define a model](#define-a-model)\n - [Python basic types](#python-basic-types)\n - [ESORM field types](#esorm-field-types)\n - [Nested documents](#nested-documents)\n - [List primitive fields](#list-primitive-fields)\n - [ESBaseModel](#esbasemodel)\n - [Id field](#id-field)\n - [Model Settings](#model-settings)\n - [Describe fields](#describe-fields)\n - [ESModelTimestamp](#esmodeltimestamp)\n - [Connecting to ElasticSearch](#connecting-to-elasticsearch)\n - [Client](#client)\n - [Create index templates](#create-index-templates)\n - [Create indices and mappings](#create-indices-and-mappings)\n - [Model instances](#model-instances)\n - [CRUD: Create](#crud-create)\n - [CRUD: Read](#crud-read)\n - [CRUD: Update](#crud-update)\n - [CRUD: Delete](#crud-delete)\n - [Bulk operations](#bulk-operations)\n - [Search](#search)\n - [General search](#general-search)\n - [Search with field value terms (dictioanry search)](#search-with-field-value-terms-dictionary-search)\n - [Aggregations](#aggregations)\n - [Pagination and sorting](#pagination-and-sorting)\n- [\ud83d\udd2c\u2003Advanced usage](docs/advanced.md#advanced-usage) \n - [Optimistic concurrency control](docs/advanced.md#optimistic-concurrency-control)\n - [Lazy properties](docs/advanced.md#lazy-properties)\n - [Shard routing](docs/advanced.md#shard-routing)\n - [Retreive Selected Fields Only](docs/advanced.md#retreive-selected-fields-only)\n - [Watchers](docs/advanced.md#watchers)\n - [FastAPI integration](docs/advanced.md#fastapi-integration)\n- [\ud83e\uddea\u2003Testing](#testing)\n- [\ud83d\udee1\u2003License](#license)\n- [\ud83d\udcc3\u2003Citation](#citation)\n\n<a id=\"installation\"></a>\n## \ud83d\udcbe\u2003Installation\n\n\n```bash\npip install pyesorm\n```\n\n<a id=\"features\"></a>\n## \ud83d\ude80\u2003Features\n\n- Pydantic model representation of ElasticSearch documents\n- Automatic mapping and index creation\n- CRUD operations\n- Full async support (no sync version at all)\n- Mapping to and from ElasticSearch types\n- Support for nested documents\n- Automatic optimistic concurrency control\n- Custom id field\n- Context for bulk operations\n- Supported IDE autocompletion and type checking (PyCharm tested)\n- Everything in the source code is documented and annotated\n- `TypedDict`s for ElasticSearch queries and aggregations\n- Docstring support for fields\n- Shard routing support\n- Lazy properties\n- Support >= Python 3.8 (tested with 3.8 through 3.12)\n- Support for ElasticSearch 8.x and 7.x\n- Watcher support (You may need ElasticSearch subscription license for this)\n- Pagination and sorting\n- FastAPI integration\n\nNot all ElasticSearch features are supported yet, pull requests are welcome.\n\n<a id=\"supported-elasticsearch-versions\"></a>\n### Supported ElasticSearch versions\n\nIt is tested with ElasticSearch 7.x and 8.x.\n\n<a id=\"supported-python-versions\"></a>\n### Supported Python versions\n\nTested with Python 3.8 through 3.12.\n\n<a id=\"usage\"></a>\n## \ud83d\udcd6\u2003Usage\n\n<a id=\"define-a-model\"></a>\n### Define a model\n\nYou can use all [Pydantic](https://pydantic-docs.helpmanual.io/usage/models/) model features, because `ESModel` is a subclass of `pydantic.BaseModel`.\n(Actually it is a subclass of `ESBaseModel`, see more [below...](#esbasemodel))\n\n`ESModel` extends pydantic `BaseModel` with ElasticSearch specific features. It serializes and deserializes\ndocuments to and from ElasticSearch types and handle ElasticSearch operations in the background.\n\n<a id=\"python-basic-types\"></a>\n#### Python basic types\n\n```python\nfrom esorm import ESModel\n\n\nclass User(ESModel):\n name: str\n age: int\n```\n\nThis is how the python types are converted to ES types:\n\n| Python type | ES type | Comment |\n|---------------------|-----------|-----------------------------|\n| `str` | `text` | |\n| `int` | `long` | |\n| `float` | `double` | |\n| `bool` | `boolean` | |\n| `datetime.datetime` | `date` | |\n| `datetime.date` | `date` | |\n| `datetime.time` | `date` | Stored as 1970-01-01 + time |\n| `typing.Literal` | `keyword` | |\n| `UUID` | `keyword` | |\n| `Path` | `keyword` | |\n| `IntEnum` | `integer` | |\n| `Enum` | `keyword` | also StrEnum |\n\nSome special pydanctic types are also supported:\n\n| Pydantic type | ES type | Comment |\n|-----------------|-----------|---------|\n| `URL` | `keyword` | |\n| `IPvAddressAny` | `ip` | |\n\n\n<a id=\"esorm-field-types\"></a>\n#### ESORM field types\n\nYou can specify ElasticSearch special fields using `esorm.fields` module.\n\n```python\nfrom esorm import ESModel\nfrom esorm.fields import keyword, text, byte, geo_point\n\n\nclass User(ESModel):\n name: text\n email: keyword\n age: byte\n location: geo_point\n ...\n```\n\nThe supported fields are:\n\n| Field name | ES type |\n|-----------------------------|-----------------|\n| `keyword` | `keyword` |\n| `text` | `text` |\n| `binary` | `binary` |\n| `byte` | `byte` |\n| `short` | `short` |\n| `integer` or `int32` | `integer` |\n| `long` or `int64` | `long` |\n| `unsigned_long` or `uint64` | `unsigned_long` |\n| `float16` or `half_float` | `half_float` |\n| `float32` | `float` |\n| `double` | `double` |\n| `boolean` | `boolean` |\n| `geo_point` | `geo_point` |\n\nThe `binary` field accepts **base64** encoded strings. However, if you provide `bytes` to it, they \nwill be automatically converted to a **base64** string during serialization. When you retrieve the \nfield, it will always be a **base64** encoded string. You can easily convert it back to bytes using \nthe `bytes()` method: `binary_field.bytes()`.\n\nYou can also use `Annotated` types to specify the ES type, like Pydantic `PositiveInt` and \n`NegativeInt` and similar.\n\n##### geo_point\n\nYou can use geo_point field type for location data:\n\n```python\nfrom esorm import ESModel\nfrom esorm.fields import geo_point\n\n\nclass Place(ESModel):\n name: str\n location: geo_point\n \n\ndef create_place():\n place = Place(name='Budapest', location=geo_point(lat=47.4979, long=19.0402))\n place.save()\n```\n\n\n<a id=\"nested-documents\"></a>\n#### Nested documents\n\n```python\nfrom esorm import ESModel\nfrom esorm.fields import keyword, text, byte\n\n\nclass User(ESModel):\n name: text\n email: keyword\n age: byte = 18\n\n\nclass Post(ESModel):\n title: text\n content: text\n writer: User # User is a nested document\n```\n\n<a id=\"list-primitive-fields\"></a>\n#### List primitive fields\n\nYou can use list of primitive fields:\n\n```python \nfrom typing import List\nfrom esorm import ESModel\n\n\nclass User(ESModel):\n emails: List[str]\n favorite_ids: List[int] \n ... \n```\n\n<a id=\"esbasemodel\"></a>\n#### ESBaseModel\n\n`ESBaseModel` is the base of `ESModel`.\n\n##### Use it for abstract models\n\n```python\nfrom esorm import ESModel, ESBaseModel\nfrom esorm.fields import keyword, text, byte\n\n\n# This way `User` model won't be in the index\nclass BaseUser(ESBaseModel): # <---------------\n # This config will be inherited by User\n class ESConfig:\n id_field = 'email' \n \n name: text\n email: keyword\n \n\n# This will be in the index because it is a subclass of ESModel\nclass UserExtended(BaseUser, ESModel):\n age: byte = 18\n\n\nasync def create_user():\n user = UserExtended(\n name='John Doe',\n email=\"john@example.com\",\n age=25\n )\n await user.save()\n```\n\n##### Use it for nested documents \n\nIt is useful to use it for nested documents, because by using it will not be included in the \nElasticSearch index.\n\n```python\nfrom esorm import ESModel, ESBaseModel\nfrom esorm.fields import keyword, text, byte\n\n\n# This way `User` model won't be in the index\nclass User(ESBaseModel): # <---------------\n name: text\n email: keyword\n age: byte = 18\n\n\nclass Post(ESModel):\n title: text\n content: text\n writer: User # User is a nested document\n\n``` \n\n<a id=\"id-field\"></a>\n#### Id field\n\nYou can specify id field in [model settings](#model-settings):\n\n```python\nfrom esorm import ESModel\nfrom esorm.fields import keyword, text, byte\n\n\nclass User(ESModel):\n class ESConfig:\n id_field = 'email'\n\n name: text\n email: keyword\n age: byte = 18\n```\n\nThis way the field specified in `id_field` will be removed from the document and used as the document `_id` in the\nindex.\n\nIf you specify a field named `id` in your model, it will be used as the document `_id` in the index\n(it will automatically override the `id_field` setting):\n\n```python\nfrom esorm import ESModel\n\n\nclass User(ESModel):\n id: int # This will be used as the document _id in the index\n name: str\n```\n\nYou can also create an `__id__` property in your model to return a custom id:\n\n```python\nfrom esorm import ESModel\nfrom esorm.fields import keyword, text, byte\n\n\nclass User(ESModel):\n name: text\n email: keyword\n age: byte = 18\n\n @property\n def __id__(self) -> str:\n return self.email\n```\n\nNOTE: annotation of `__id__` method is important, and it must be declared as a property.\n\n<a id=\"model-settings\"></a>\n#### Model Settings\n\nYou can specify model settings using `ESConfig` child class.\n\n```python\nfrom typing import Optional, List, Dict, Any\nfrom esorm import ESModel\n\n\nclass User(ESModel):\n class ESConfig:\n \"\"\" ESModel Config \"\"\"\n # The index name\n index: Optional[str] = None\n # The name of the 'id' field\n id_field: Optional[str] = None\n # Default sort\n default_sort: Optional[List[Dict[str, Dict[str, str]]]] = None\n # ElasticSearch index settings (https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html)\n settings: Optional[Dict[str, Any]] = None\n # Maximum recursion depth of lazy properties\n lazy_property_max_recursion_depth: int = 1\n```\n\n<a id=\"esmodeltimestamp\"></a>\n#### ESModelTimestamp\n\nYou can use `ESModelTimestamp` class to add `created_at` and `updated_at` fields to your model:\n\n```python \nfrom esorm import ESModelTimestamp\n\n\nclass User(ESModelTimestamp):\n name: str\n age: int\n```\n\nThese fields will be automatically updated to the actual `datetime` when you create or update a document.\nThe `created_at` field will be set only when you create a document. The `updated_at` field will be set\nwhen you create or update a document.\n\n<a id=\"describe-fields\"></a>\n#### Describe fields\n\nYou can use the usual `Pydantic` field description, but you can also use docstrings like this:\n\n```python\nfrom esorm import ESModel\nfrom esorm.fields import TextField\n\n\nclass User(ESModel):\n name: str = 'John Doe'\n \"\"\" The name of the user \"\"\"\n age: int = 18\n \"\"\" The age of the user \"\"\"\n\n # This is the usual Pydantic way, but I think docstrings are more intuitive and readable\n address: str = TextField(description=\"The address of the user\")\n```\n\nThe documentation is usseful if you create an API and you want to generate documentation from the model.\nIt can be used in [FastAPI](https://fastapi.tiangolo.com/) for example.\n\n<a id=\"aliases\"></a>\n### Aliases\n\nYou can specify aliases for fields:\n\n```python\nfrom esorm import ESModel\nfrom esorm.fields import keyword, Field\n\n\nclass User(ESModel):\n full_name: keyword = Field(alias='fullName') # In ES `fullName` will be the field name\n```\n\nThis is good for renaming fields in the model without changing the ElasticSearch field name.\n\n<a id=\"connecting-to-elasticsearch\"></a>\n### Connecting to ElasticSearch\n\nYou can connect with a simple connection string:\n\n```python\nfrom esorm import connect\n\n\nasync def es_init():\n await connect('localhost:9200')\n```\n\nAlso you can connect to multiple hosts if you have a cluster:\n\n```python\nfrom esorm import connect\n\n\nasync def es_init():\n await connect(['localhost:9200', 'localhost:9201'])\n```\n\nYou can wait for node or cluster to be ready (recommended):\n\n```python\nfrom esorm import connect\n\n\nasync def es_init():\n await connect('localhost:9200', wait=True)\n```\n\nThis will ping the node in 2 seconds intervals until it is ready. It can be a long time.\n\nYou can pass any arguments that `AsyncElasticsearch` supports:\n\n```python\nfrom esorm import connect\n\n\nasync def es_init():\n await connect('localhost:9200', wait=True, sniff_on_start=True, sniff_on_connection_fail=True)\n```\n\n<a id=\"client\"></a>\n#### Client\n\nThe `connect` function is a wrapper for the `AsyncElasticsearch` constructor. It creates and stores\na global instance of a proxy to an `AsyncElasticsearch` instance. The model operations will use this\ninstance to communicate with ElasticSearch. You can retrieve the proxy client instance and you can\nuse the same way as `AsyncElasticsearch` instance:\n\n```python\nfrom esorm import es\n\n\nasync def es_init():\n await es.ping()\n```\n\n<a id=\"create-index-templates\"></a>\n### Create index templates\n\nYou can create index templates easily:\n\n```python\nfrom esorm import model as esorm_model\n\n\n# Create index template\nasync def prepare_es():\n await esorm_model.create_index_template('default_template',\n prefix_name='esorm_',\n shards=3,\n auto_expand_replicas='1-5')\n```\n\nHere this will be applied all `esorm_` prefixed (default) indices.\n\nAll indices created by ESORM have a prefix, which you can modify globally if you want:\n\n```python\nfrom esorm.model import set_default_index_prefix\n\nset_default_index_prefix('custom_prefix_')\n```\n\nThe default prefix is `esorm_`.\n\n<a id=\"create-indices-and-mappings\"></a>\n### Create indices and mappings\n\nYou can create indices and mappings automatically from your models:\n\n```python\nfrom esorm import setup_mappings\n\n\n# Create indices and mappings\nasync def prepare_es():\n import models # Import your models\n # Here models argument is not needed, but you can pass it to prevent unused import warning\n await setup_mappings(models) \n```\n\nFirst you must create (import) all model classes. Model classes will be registered into a global registry.\nThen you can call `setup_mappings` function to create indices and mappings for all registered models.\n\n**IMPORTANT:** This method will ignore mapping errors if you already have an index with the same name. It can update the\nindices\nby new fields, but cannot modify or delete fields! For that you need to reindex your ES database. It is an ElasticSearch\nlimitation.\n\n<a id=\"model-instances\"></a>\n### Model instances\n\nWhen you get a model instance from elasticsearch by `search` or `get` methods, you will get the following private\nattributes filled automatically:\n\n| Attribute | Description |\n|-----------------|-------------------------------------|\n| `_id` | The ES id of the document |\n| `_routing` | The routing value of the document |\n| `_version` | Version of the document |\n| `_primary_term` | The primary term of the document |\n| `_seq_no` | The sequence number of the document |\n\n<a id=\"crud-create\"></a>\n### CRUD: Create\n\n```python\nfrom esorm import ESModel\n\n\n# Here the model have automatically generated id\nclass User(ESModel):\n name: str\n age: int\n\n\nasync def create_user():\n # Create a new user \n user = User(name='John Doe', age=25)\n # Save the user to ElasticSearch\n new_user_id = await user.save()\n print(new_user_id)\n```\n\n<a id=\"crud-read\"></a>\n### CRUD: Read\n\n```python\nfrom esorm import ESModel\n\n\n# Here the model have automatically generated id\nclass User(ESModel):\n name: str\n age: int\n\n\nasync def get_user(user_id: str):\n user = await User.get(user_id)\n print(user.name)\n```\n\n<a id=\"crud-update\"></a>\n### CRUD: Update\n\nOn update race conditions are checked automatically (with the help of _primary_term and _seq_no fields).\nThis way an optimistic locking mechanism is implemented.\n\n```python\nfrom esorm import ESModel\n\n\n# Here the model have automatically generated id\nclass User(ESModel):\n name: str\n age: int\n\n\nasync def update_user(user_id: str):\n user = await User.get(user_id)\n user.name = 'Jane Doe'\n await user.save()\n```\n\n<a id=\"crud-delete\"></a>\n### CRUD: Delete\n\n```python\nfrom esorm import ESModel\n\n\n# Here the model have automatically generated id\nclass User(ESModel):\n name: str\n age: int\n\n\nasync def delete_user(user_id: str):\n user = await User.get(user_id)\n await user.delete()\n```\n\n<a id=\"bulk-operations\"></a>\n### Bulk operations\n\nBulk operations could be much faster than single operations, if you have lot of documents to \ncreate, update or delete.\n \nYou can use context for bulk operations:\n\n```python\nfrom typing import List\nfrom esorm import ESModel, ESBulk\n\n\n# Here the model have automatically generated id\nclass User(ESModel):\n name: str\n age: int\n\n\nasync def bulk_create_users():\n async with ESBulk() as bulk:\n # Creating or modifiying models\n for i in range(10):\n user = User(name=f'User {i}', age=i)\n await bulk.save(user)\n\n\nasync def bulk_delete_users(users: List[User]):\n async with ESBulk(wait_for=True) as bulk: # Here we wait for the bulk operation to finish\n # Deleting models\n for user in users:\n await bulk.delete(user)\n```\n\nThe `wait_for` argument is optional. If it is `True`, the context will wait for the bulk operation to finish.\n\n<a id=\"search\"></a>\n### Search\n\n<a id=\"general-search\"></a>\n#### General search\n\nYou can search for documents using `search` method, where an ES query can be specified as a dictionary.\nYou can use `res_dict=True` argument to get the result as a dictionary instead of a list. The key will be the\n`id` of the document: `await User.search(query, res_dict=True)`.\n\nIf you only need one result, you can use `search_one` method.\n\n```python\nfrom esorm import ESModel\n\n\n# Here the model have automatically generated id\nclass User(ESModel):\n name: str\n age: int\n\n\nasync def search_users():\n # Search for users at least 18 years old\n users = await User.search(\n query={\n 'bool': {\n 'must': [{\n 'range': {\n 'age': {\n 'gte': 18\n }\n }\n }]\n }\n }\n )\n for user in users:\n print(user.name)\n\n\nasync def search_one_user():\n # Search a user named John Doe\n user = await User.search_one(\n query={\n 'bool': {\n 'must': [{\n 'match': {\n 'name': {\n 'query': 'John Doe'\n }\n }\n }]\n }\n }\n )\n print(user.name)\n```\n\nQueries are type checked, because they are annotated as `TypedDict`s. You can use IDE autocompletion and type checking.\n\n<a id=\"search-with-field-value-terms-dictionary-search\"></a>\n#### Search with field value terms (dictionary search)\n\nYou can search for documents using `search_by_fields` method, where you can specify a field and a value.\nIt also has a `res_dict` argument and `search_one_by_fields` variant.\n\n```python\nfrom esorm import ESModel\n\n\n# Here the model have automatically generated id\nclass User(ESModel):\n name: str\n age: int\n\n\nasync def search_users():\n # Search users age is 18\n users = await User.search_by_fields({'age': 18})\n for user in users:\n print(user.name)\n```\n\n<a id=\"aggregations\"></a>\n### Aggregations\n\nYou can use `aggregate` method to get aggregations. \nYou can specify an ES aggregation query as a dictionary. It also accepts normal ES queries,\nto be able to fiter which documents you want to aggregate. \nBoth the aggs parameter and the query parameter are type checked, because they are annotated as `TypedDict`s.\nYou can use IDE autocompletion and type checking.\n\n```python\nfrom esorm import ESModel\n\n# Here the model have automatically generated id\nclass User(ESModel):\n name: str\n age: int\n country: str\n \nasync def aggregate_avg():\n # Get average age of users\n aggs_def = {\n 'avg_age': {\n 'avg': {\n 'field': 'age'\n }\n }\n }\n aggs = await User.aggregate(aggs_def)\n print(aggs['avg_age']['value'])\n \nasync def aggregate_avg_by_country(country = 'Hungary'):\n # Get average age of users by country\n aggs_def = {\n 'avg_age': {\n 'avg': {\n 'field': 'age'\n }\n }\n }\n query = {\n 'bool': {\n 'must': [{\n 'match': {\n 'country': {\n 'query': country\n }\n }\n }]\n }\n }\n aggs = await User.aggregate(aggs_def, query)\n print(aggs['avg_age']['value'])\n \n \nasync def aggregate_terms():\n # Get number of users by country\n aggs_def = {\n 'countries': {\n 'terms': {\n 'field': 'country'\n }\n }\n }\n aggs = await User.aggregate(aggs_def)\n for bucket in aggs['countries']['buckets']:\n print(bucket['key'], bucket['doc_count'])\n```\n\n<a id=\"pagination-and-sorting\"></a>\n### Pagination and sorting\n\nYou can use `Pagination` and `Sort` classes to decorate your models. They simply wrap your models\nand add pagination and sorting functionality to them.\n\n#### Pagination\n\nYou can add a callback parameter to the `Pagination` class which will be invoked after the search with\nthe total number of documents found.\n\n```python\nfrom esorm.model import ESModel, Pagination\n\n\nclass User(ESModel):\n id: int # This will be used as the document _id in the index\n name: str\n age: int\n\n\ndef get_users(page = 1, page_size = 10):\n\n def pagination_callback(total: int):\n # You may set a header value or something else here\n print(f'Total users: {total}')\n\n # 1st create the decorator itself\n pagination = Pagination(page=page, page_size=page_size)\n \n # Then decorate your model\n res = pagination(User).search_by_fields(age=18)\n \n # Here the result has maximum 10 items\n return res\n```\n\n#### Sorting\n\nIt is similar to pagination:\n\n```python\nfrom esorm.model import ESModel, Sort\n\n\nclass User(ESModel):\n id: int # This will be used as the document _id in the index\n name: str\n age: int\n \n \ndef get_users():\n # 1st create the decorator itself\n sort = Sort(sort=[\n {'age': {'order': 'desc'}},\n {'name': {'order': 'asc'}}\n ])\n \n # Then decorate your model\n res = sort(User).search_by_fields(age=18)\n \n # Here the result is sorted by age ascending\n return res\n \ndef get_user_sorted_by_name():\n # You can also use this simplified syntax \n sort = Sort(sort='name')\n \n # Then decorate your model\n res = sort(User).all()\n \n # Here the result is sorted by age descending\n return res\n```\n\n<a id=\"testing\"></a>\n## \ud83e\uddea\u2003Testing\n\nFor testing you can use the `test.sh` in the root directory. It is a script to running\ntests on multiple python interpreters in virtual environments. At the top of the file you can specify\nwhich python interpreters you want to test. The ES versions are specified in `tests/docker-compose.yml` file.\n\nIf you already have a virtual environment, simply use `pytest` to run the tests.\n\n<a id=\"license\"></a>\n## \ud83d\udee1\u2003License\n\nThis project is licensed under the terms of the [Mozilla Public License 2.0](https://www.mozilla.org/en-US/MPL/2.0/) (\nMPL 2.0) license.\n\n<a id=\"citation\"></a>\n## \ud83d\udcc3\u2003Citation\n\nIf you use this project in your research, please cite it using the following BibTeX entry:\n\n```bibtex\n@misc{esorm,\n author = {Adam Wallner},\n title = {ESORM: ElasticSearch Object Relational Mapper},\n year = {2023},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/wallneradam/esorm}},\n} \n```\n",
"bugtrack_url": null,
"license": "MPL-2.0",
"summary": "Python ElasticSearch ORM based on Pydantic",
"version": "0.6.7",
"project_urls": {
"Bug Tracker": "https://github.com/wallneradam/esorm/issues",
"Changelog": "https://esorm.readthedocs.io/en/latest/changelog.html",
"Documentation": "https://esorm.readthedocs.io/en/latest/",
"Homepage": "https://github.com/wallneradam/esorm",
"Source Code": "https://github.com/wallneradam/esorm"
},
"split_keywords": [
"elasticsearch",
" orm",
" pydantic"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ce741dfe9dbd515b777feb5a3f65c27a8516bb30928949ffa540716c8aa20654",
"md5": "d1a6a4ea7b4890f0f0d272c520c47c52",
"sha256": "b7ee6f51eda2296760ad44fe5cb908a30f01e75da9ed44473c1d552c426cd833"
},
"downloads": -1,
"filename": "pyesorm-0.6.7.tar.gz",
"has_sig": false,
"md5_digest": "d1a6a4ea7b4890f0f0d272c520c47c52",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 52717,
"upload_time": "2024-11-23T10:57:44",
"upload_time_iso_8601": "2024-11-23T10:57:44.051829Z",
"url": "https://files.pythonhosted.org/packages/ce/74/1dfe9dbd515b777feb5a3f65c27a8516bb30928949ffa540716c8aa20654/pyesorm-0.6.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-23 10:57:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "wallneradam",
"github_project": "esorm",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "pyesorm"
}