cerializer


Namecerializer JSON
Version 1.4.0 PyPI version JSON
download
home_pagehttps://gitlab.com/quantlane/libs/cerializer
SummaryEven faster alternative to FastAvro
upload_time2023-09-26 09:59:02
maintainerQuantlane
docs_urlNone
authormatejmicek
requires_python>=3.10,<3.13
license
keywords packaging poetry
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Cerializer

![PyPI](https://img.shields.io/pypi/v/Cerializer)

Cerializer is an Avro de/serialization library that aims at providing an even faster alternative to FastAvro and Avro standard library.

This speed increase does not come without a cost. Cerializer will work only with predefined set of schemata for which it will generate tailor made Cython code. This way, the overhead caused by the universality of other serialization libraries will be avoided.

Special credit needs to be given to [FastAvro](https://github.com/fastavro/fastavro) library, by which is this project heavily inspired.

## Example of a schema and the corresponding code

SCHEMA
```python
{
    'name': 'array_schema',
    'doc': 'Array schema',
    'namespace': 'cerializer',
    'type': 'record',
    'fields': [
        {
            'name': 'order_id',
            'doc': 'Id of order',
            'type': 'string'
        },
        {
            'name': 'trades',
            'type': {
                'type': 'array',
                'items': ['string', 'int']
            }
        }
    ]
}
```

CORRESPONDING CODE
```python
def serialize(data, output):
    cdef bytearray buffer = bytearray()
    cdef dict datum
    cdef str type_0
    write.write_string(buffer, data['order_id'])
    if len(data['trades']) > 0:
        write.write_long(buffer, len(data['trades']))
        for val_0 in data['trades']:
            if type(val_0) is tuple:
                type_0, val_1 = val_0

                if type_0 == 'string':
                    write.write_long(buffer, 0)
                    write.write_string(buffer, val_1)

                elif type_0 == 'int':
                    write.write_long(buffer, 1)
                    write.write_int(buffer, val_1)

            else:
                if type(val_0) is str:
                    write.write_long(buffer, 0)
                    write.write_string(buffer, val_0)
                elif type(val_0) is int:
                    write.write_long(buffer, 1)
                    write.write_int(buffer, val_0)
    write.write_long(buffer, 0)
    output.write(buffer)



def deserialize(fo):
    cdef long long i_0
    cdef long long i_1
    cdef long i_2
    data = {}
    data['order_id'] = read.read_string(fo)
    data['trades'] = []

    i_1 = read.read_long(fo)
    while i_1 != 0:
        if i_1 < 0:
            i_1 = -i_1
            read.read_long(fo)
        for i_0 in range(i_1):
            i_2 = read.read_int(fo)
            if i_2 == 0:
                val_2 = read.read_string(fo)
            if i_2 == 1:
                val_2 = read.read_int(fo)
            data['trades'].append(val_2)
        i_1 = read.read_long(fo)
    return data
```


## Usage Example
1. Create an instance of CerializerSchemata
For initializing CerializerSchemata it is necessary to supply a list of tuples in form of (schema_identifier, schema)
where schema_identifier is a string and schema is a dict representing the Avro schema.
schema tuple = (namespace.schema_name, schema). eg.:
    ```python
    import cerializer.schema_handler
    import os
    import yaml
    
    def list_schemata():
        # iterates through all your schemata and yields schema_identifier and path to schema folder
        raise NotImplemented
    
    def schemata() -> cerializer.schema_handler.CerializerSchemata:
        schemata = []
        for schema_identifier, schema_root in list_schemata():
            schema_tuple = schema_identifier, yaml.unsafe_load( # type: ignore
                open(os.path.join(schema_root, 'schema.yaml'))
            )
            schemata.append(schema_tuple)
        return cerializer.schema_handler.CerializerSchemata(schemata)
    ```

2. Create an instance of Cerializer for each of your schemata by calling `cerializer_handler.Cerializer`.
eg. `cerializer_instance = cerializer_handler.Cerializer(cerializer_schemata, schema_namespace, schema_name)`
This will create an instance of Cerializer that can serialize and deserialize data in the particular schema format.

3. Use the instance accordingly.
    eg.:
    ```python
    data_record = {
        'order_id': 'aaaa',
        'trades': [123, 456, 765]
    }
    
    cerializer_instance = cerializer.cerializer_handler.Cerializer(cerializer_schemata, 'school', 'student')
    serialized_data = cerializer_instance.serialize(data_record)
    print(serialized_data)
    ```

Serialized data
```
b'\x08aaaa\x06\x02\xf6\x01\x02\x90\x07\x02\xfa\x0b\x00'
```

You can also use `serialize_into` if you already have an IO buffer:

```python
output = io.BytesIO()
cerializer_instance.serialize_into(output, data_record)
print(output.getvalue())
```

## Benchmark
```
cerializer.default_schema:3            2.5661 times faster,   0.0209s : 0.0082s
cerializer.fixed_decimal_schema:1      1.2795 times faster,   0.1588s : 0.1241s
cerializer.int_date_schema:1           2.8285 times faster,   0.0273s : 0.0097s
cerializer.plain_int:1                 2.2334 times faster,   0.0146s : 0.0065s
cerializer.timestamp_schema_micros:1   2.3759 times faster,   0.0577s : 0.0243s
cerializer.default_schema:2            2.8129 times faster,   0.0240s : 0.0085s
cerializer.array_schema:3              1.2177 times faster,   0.3088s : 0.2536s
cerializer.timestamp_schema:1          2.5928 times faster,   0.0577s : 0.0223s
cerializer.array_schema:2              1.4756 times faster,   0.6542s : 0.4434s
cerializer.union_schema:1              3.0796 times faster,   0.0284s : 0.0092s
cerializer.bytes_decimal_schema:1      1.8449 times faster,   0.0490s : 0.0266s
cerializer.array_schema:1              2.1771 times faster,   0.0344s : 0.0158s
cerializer.string_uuid_schema:1        1.8887 times faster,   0.0494s : 0.0262s
cerializer.map_schema:2                2.0896 times faster,   0.0331s : 0.0158s
cerializer.fixed_schema:1              3.4042 times faster,   0.0213s : 0.0062s
cerializer.long_time_micros_schema:1   2.3747 times faster,   0.0352s : 0.0148s
cerializer.array_schema:4              2.8779 times faster,   0.0591s : 0.0205s
cerializer.default_schema:1            2.0182 times faster,   0.0393s : 0.0195s
cerializer.map_schema:1                3.4610 times faster,   0.0597s : 0.0172s
cerializer.string_schema:1             2.2048 times faster,   0.0352s : 0.0159s
cerializer.reference_schema:1          2.9309 times faster,   0.1525s : 0.0520s
cerializer.enum_schema:1               3.0065 times faster,   0.0217s : 0.0072s
cerializer.tree_schema:1               4.0494 times faster,   0.0869s : 0.0215s
cerializer.huge_schema:1               2.8161 times faster,   0.1453s : 0.0516s
AVERAGE: 1.7814 times faster
```

Measured against Fastavro using the benchmark in Cerializer/tests.

Device: ASUS ZenBook 14 UM425QA, AMD Ryzen 7 5800H, 16 GB 2133 MHz LPDDR4X

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/quantlane/libs/cerializer",
    "name": "cerializer",
    "maintainer": "Quantlane",
    "docs_url": null,
    "requires_python": ">=3.10,<3.13",
    "maintainer_email": "code@quantlane.com",
    "keywords": "packaging,poetry",
    "author": "matejmicek",
    "author_email": "matej.micek@quantlane.com",
    "download_url": "https://files.pythonhosted.org/packages/bc/52/ee30a37b4c8f757f5ed5c9e7e1b729068140a33d4f70b8d2cb808dc35c73/cerializer-1.4.0.tar.gz",
    "platform": null,
    "description": "# Cerializer\n\n![PyPI](https://img.shields.io/pypi/v/Cerializer)\n\nCerializer is an Avro de/serialization library that aims at providing an even faster alternative to FastAvro and Avro standard library.\n\nThis speed increase does not come without a cost. Cerializer will work only with predefined set of schemata for which it will generate tailor made Cython code. This way, the overhead caused by the universality of other serialization libraries will be avoided.\n\nSpecial credit needs to be given to [FastAvro](https://github.com/fastavro/fastavro) library, by which is this project heavily inspired.\n\n## Example of a schema and the corresponding code\n\nSCHEMA\n```python\n{\n    'name': 'array_schema',\n    'doc': 'Array schema',\n    'namespace': 'cerializer',\n    'type': 'record',\n    'fields': [\n        {\n            'name': 'order_id',\n            'doc': 'Id of order',\n            'type': 'string'\n        },\n        {\n            'name': 'trades',\n            'type': {\n                'type': 'array',\n                'items': ['string', 'int']\n            }\n        }\n    ]\n}\n```\n\nCORRESPONDING CODE\n```python\ndef serialize(data, output):\n    cdef bytearray buffer = bytearray()\n    cdef dict datum\n    cdef str type_0\n    write.write_string(buffer, data['order_id'])\n    if len(data['trades']) > 0:\n        write.write_long(buffer, len(data['trades']))\n        for val_0 in data['trades']:\n            if type(val_0) is tuple:\n                type_0, val_1 = val_0\n\n                if type_0 == 'string':\n                    write.write_long(buffer, 0)\n                    write.write_string(buffer, val_1)\n\n                elif type_0 == 'int':\n                    write.write_long(buffer, 1)\n                    write.write_int(buffer, val_1)\n\n            else:\n                if type(val_0) is str:\n                    write.write_long(buffer, 0)\n                    write.write_string(buffer, val_0)\n                elif type(val_0) is int:\n                    write.write_long(buffer, 1)\n                    write.write_int(buffer, val_0)\n    write.write_long(buffer, 0)\n    output.write(buffer)\n\n\n\ndef deserialize(fo):\n    cdef long long i_0\n    cdef long long i_1\n    cdef long i_2\n    data = {}\n    data['order_id'] = read.read_string(fo)\n    data['trades'] = []\n\n    i_1 = read.read_long(fo)\n    while i_1 != 0:\n        if i_1 < 0:\n            i_1 = -i_1\n            read.read_long(fo)\n        for i_0 in range(i_1):\n            i_2 = read.read_int(fo)\n            if i_2 == 0:\n                val_2 = read.read_string(fo)\n            if i_2 == 1:\n                val_2 = read.read_int(fo)\n            data['trades'].append(val_2)\n        i_1 = read.read_long(fo)\n    return data\n```\n\n\n## Usage Example\n1. Create an instance of CerializerSchemata\nFor initializing CerializerSchemata it is necessary to supply a list of tuples in form of (schema_identifier, schema)\nwhere schema_identifier is a string and schema is a dict representing the Avro schema.\nschema tuple = (namespace.schema_name, schema). eg.:\n    ```python\n    import cerializer.schema_handler\n    import os\n    import yaml\n    \n    def list_schemata():\n        # iterates through all your schemata and yields schema_identifier and path to schema folder\n        raise NotImplemented\n    \n    def schemata() -> cerializer.schema_handler.CerializerSchemata:\n        schemata = []\n        for schema_identifier, schema_root in list_schemata():\n            schema_tuple = schema_identifier, yaml.unsafe_load( # type: ignore\n                open(os.path.join(schema_root, 'schema.yaml'))\n            )\n            schemata.append(schema_tuple)\n        return cerializer.schema_handler.CerializerSchemata(schemata)\n    ```\n\n2. Create an instance of Cerializer for each of your schemata by calling `cerializer_handler.Cerializer`.\neg. `cerializer_instance = cerializer_handler.Cerializer(cerializer_schemata, schema_namespace, schema_name)`\nThis will create an instance of Cerializer that can serialize and deserialize data in the particular schema format.\n\n3. Use the instance accordingly.\n    eg.:\n    ```python\n    data_record = {\n        'order_id': 'aaaa',\n        'trades': [123, 456, 765]\n    }\n    \n    cerializer_instance = cerializer.cerializer_handler.Cerializer(cerializer_schemata, 'school', 'student')\n    serialized_data = cerializer_instance.serialize(data_record)\n    print(serialized_data)\n    ```\n\nSerialized data\n```\nb'\\x08aaaa\\x06\\x02\\xf6\\x01\\x02\\x90\\x07\\x02\\xfa\\x0b\\x00'\n```\n\nYou can also use `serialize_into` if you already have an IO buffer:\n\n```python\noutput = io.BytesIO()\ncerializer_instance.serialize_into(output, data_record)\nprint(output.getvalue())\n```\n\n## Benchmark\n```\ncerializer.default_schema:3            2.5661 times faster,   0.0209s : 0.0082s\ncerializer.fixed_decimal_schema:1      1.2795 times faster,   0.1588s : 0.1241s\ncerializer.int_date_schema:1           2.8285 times faster,   0.0273s : 0.0097s\ncerializer.plain_int:1                 2.2334 times faster,   0.0146s : 0.0065s\ncerializer.timestamp_schema_micros:1   2.3759 times faster,   0.0577s : 0.0243s\ncerializer.default_schema:2            2.8129 times faster,   0.0240s : 0.0085s\ncerializer.array_schema:3              1.2177 times faster,   0.3088s : 0.2536s\ncerializer.timestamp_schema:1          2.5928 times faster,   0.0577s : 0.0223s\ncerializer.array_schema:2              1.4756 times faster,   0.6542s : 0.4434s\ncerializer.union_schema:1              3.0796 times faster,   0.0284s : 0.0092s\ncerializer.bytes_decimal_schema:1      1.8449 times faster,   0.0490s : 0.0266s\ncerializer.array_schema:1              2.1771 times faster,   0.0344s : 0.0158s\ncerializer.string_uuid_schema:1        1.8887 times faster,   0.0494s : 0.0262s\ncerializer.map_schema:2                2.0896 times faster,   0.0331s : 0.0158s\ncerializer.fixed_schema:1              3.4042 times faster,   0.0213s : 0.0062s\ncerializer.long_time_micros_schema:1   2.3747 times faster,   0.0352s : 0.0148s\ncerializer.array_schema:4              2.8779 times faster,   0.0591s : 0.0205s\ncerializer.default_schema:1            2.0182 times faster,   0.0393s : 0.0195s\ncerializer.map_schema:1                3.4610 times faster,   0.0597s : 0.0172s\ncerializer.string_schema:1             2.2048 times faster,   0.0352s : 0.0159s\ncerializer.reference_schema:1          2.9309 times faster,   0.1525s : 0.0520s\ncerializer.enum_schema:1               3.0065 times faster,   0.0217s : 0.0072s\ncerializer.tree_schema:1               4.0494 times faster,   0.0869s : 0.0215s\ncerializer.huge_schema:1               2.8161 times faster,   0.1453s : 0.0516s\nAVERAGE: 1.7814 times faster\n```\n\nMeasured against Fastavro using the benchmark in Cerializer/tests.\n\nDevice: ASUS ZenBook 14 UM425QA, AMD Ryzen 7 5800H, 16 GB 2133 MHz LPDDR4X\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Even faster alternative to FastAvro",
    "version": "1.4.0",
    "project_urls": {
        "Homepage": "https://gitlab.com/quantlane/libs/cerializer",
        "Repository": "https://gitlab.com/quantlane/libs/cerializer"
    },
    "split_keywords": [
        "packaging",
        "poetry"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8118848358f86a79a957ef34324393dcdc23d063a79b4874f94a4b7d0a50f3f7",
                "md5": "de77a5b1dee7011609cf6f84498891fd",
                "sha256": "380b251e08d296a2f43bfaa4bd591cbc634a4e77e4ac23b442d835f212296473"
            },
            "downloads": -1,
            "filename": "cerializer-1.4.0-cp310-cp310-manylinux_2_36_x86_64.whl",
            "has_sig": false,
            "md5_digest": "de77a5b1dee7011609cf6f84498891fd",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.10,<3.13",
            "size": 527931,
            "upload_time": "2023-09-26T09:58:56",
            "upload_time_iso_8601": "2023-09-26T09:58:56.126587Z",
            "url": "https://files.pythonhosted.org/packages/81/18/848358f86a79a957ef34324393dcdc23d063a79b4874f94a4b7d0a50f3f7/cerializer-1.4.0-cp310-cp310-manylinux_2_36_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "03faf6c2edab7e24e61633e4cfea262aec3bb4c9cbf7f947131463b84cde9669",
                "md5": "9d553e9706c80db159ec5d816f16dfed",
                "sha256": "793c7d25ca85e4f6094710853a3e69a389c78e0a483dfa9c5f46a8bae4236548"
            },
            "downloads": -1,
            "filename": "cerializer-1.4.0-cp311-cp311-manylinux_2_36_x86_64.whl",
            "has_sig": false,
            "md5_digest": "9d553e9706c80db159ec5d816f16dfed",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.10,<3.13",
            "size": 291443,
            "upload_time": "2023-09-26T09:58:58",
            "upload_time_iso_8601": "2023-09-26T09:58:58.305110Z",
            "url": "https://files.pythonhosted.org/packages/03/fa/f6c2edab7e24e61633e4cfea262aec3bb4c9cbf7f947131463b84cde9669/cerializer-1.4.0-cp311-cp311-manylinux_2_36_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "21221f461aaf0ca31d0cf299788ad5484dfc5662298086a901761f22388c6bc5",
                "md5": "88df81846aa4e0471a6d44d814544f32",
                "sha256": "46e0c14435c6eb5def7f5ef27fa193c4500d88806ff4fa435d018e4977405367"
            },
            "downloads": -1,
            "filename": "cerializer-1.4.0-cp311-cp311-manylinux_2_38_x86_64.whl",
            "has_sig": false,
            "md5_digest": "88df81846aa4e0471a6d44d814544f32",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.10,<3.13",
            "size": 292412,
            "upload_time": "2023-09-26T09:59:00",
            "upload_time_iso_8601": "2023-09-26T09:59:00.323208Z",
            "url": "https://files.pythonhosted.org/packages/21/22/1f461aaf0ca31d0cf299788ad5484dfc5662298086a901761f22388c6bc5/cerializer-1.4.0-cp311-cp311-manylinux_2_38_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bc52ee30a37b4c8f757f5ed5c9e7e1b729068140a33d4f70b8d2cb808dc35c73",
                "md5": "8ccb6b183dd3ac3f8dfacc0cc6934e30",
                "sha256": "7fa7f6188f9277165de752f0927a00c40ac6909f15023fe07e00b9afee581d1c"
            },
            "downloads": -1,
            "filename": "cerializer-1.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8ccb6b183dd3ac3f8dfacc0cc6934e30",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10,<3.13",
            "size": 24568,
            "upload_time": "2023-09-26T09:59:02",
            "upload_time_iso_8601": "2023-09-26T09:59:02.390025Z",
            "url": "https://files.pythonhosted.org/packages/bc/52/ee30a37b4c8f757f5ed5c9e7e1b729068140a33d4f70b8d2cb808dc35c73/cerializer-1.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-26 09:59:02",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "codeberg": false,
    "gitlab_user": "quantlane",
    "gitlab_project": "libs",
    "lcname": "cerializer"
}
        
Elapsed time: 0.13635s