pySOS: Simple Objects Storage
=============================
> persistant dictionaries and lists for python
This is ideal for lists or dictionaries which either need persistence,
are too big to fit in memory or both.
There are existing alternatives like `shelve`, which are very good too.
There main difference with `pysos` is that:
- only the index is kept in memory, not the values (so you can hold more data than what would fit in memory)
- it provides both persistent dicts *and* lists
- objects must be json "dumpable" (no cyclic references, etc.)
- it's fast (much faster than `shelve` on windows, but slightly slower than native `dbms` on linux)
- it's unbuffered by design: when the function returns, you are sure it has been written on disk
- it's safe: even if the machine crashes in the middle of a big write, data will not be corrupted
- it is platform independent, unlike `shelve` which relies on an underlying `dbm` implementation, which may vary from system to system
- the data is stored in a plain text format
Usage
-----
`pip install pysos`
Dictionaries:
```
import pysos
db = pysos.Dict('somefile')
db['hello'] = 'persistence!'
```
Lists:
```
import pysos
db = pysos.List('somefile')
db.append('it is now saved in the file')
```
Performance
-----------
Just to give a ballpark figure, there is a mini benchmark included in `test_benchmark.py`.
Here are the results on my laptop:
Writes: 28521 / second
Reads: 188502 / second
The test is just writing 100k small key/values, and reading them all too.
It's just meant to give a rough idea.
It writes every time you set a value, but only the key/value pair.
So the cost of adding an item stays constant.
On the other hand, lots of updates / deletes / re-inserts would lead to data fragmentation in the file.
This might deteriorate performance in the long run.
F.A.Q.
------
### Is it thread safe?
No. It's not thread safe.
In practice, synchronization mechanisms are typically desired on a higher level anyway.
### Why not make it async writes?
In the original version, there was a switch to choose between sync and async mode.
However, it turned out to have only a relatively small impact on overall performance.
Less than 25% on the hardware/OS/data I tested if I remember right.
Since the benefits seem rather low, I removed the flag and the associated code altogether,
in order to ensure safety by default.
IMHO, it's preferable to loose a few microseconds rather than data upon a crash.
### Why not use memory mapped files?
I experimented with that too. In my experience, with the hardware/OS/data I tested,
it turned out to ...*suck*. Using memory mapped files lead to inconsistent and unpredictible performance,
often much slower than direct file access.
Raw data
{
"_id": null,
"home_page": "https://github.com/dagnelies/pysos",
"name": "pysos",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "persistent persistence dict list file",
"author": "Arnaud Dagnelies",
"author_email": "arnaud.dagnelies@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/5d/bb/43b90c2f743e958f66af0909e09f8d0256ff305c02219c977a8bb9b6ae52/pysos-1.3.0.tar.gz",
"platform": null,
"description": "pySOS: Simple Objects Storage\n=============================\n\n> persistant dictionaries and lists for python\n\nThis is ideal for lists or dictionaries which either need persistence,\nare too big to fit in memory or both.\n\nThere are existing alternatives like `shelve`, which are very good too.\nThere main difference with `pysos` is that:\n\n- only the index is kept in memory, not the values (so you can hold more data than what would fit in memory)\n- it provides both persistent dicts *and* lists\n- objects must be json \"dumpable\" (no cyclic references, etc.)\n- it's fast (much faster than `shelve` on windows, but slightly slower than native `dbms` on linux)\n- it's unbuffered by design: when the function returns, you are sure it has been written on disk\n- it's safe: even if the machine crashes in the middle of a big write, data will not be corrupted\n- it is platform independent, unlike `shelve` which relies on an underlying `dbm` implementation, which may vary from system to system\n- the data is stored in a plain text format\n\nUsage\n-----\n\n`pip install pysos`\n\nDictionaries:\n```\nimport pysos\ndb = pysos.Dict('somefile')\ndb['hello'] = 'persistence!'\n```\n\nLists:\n```\nimport pysos\ndb = pysos.List('somefile')\ndb.append('it is now saved in the file')\n```\n\n\nPerformance\n-----------\n\nJust to give a ballpark figure, there is a mini benchmark included in `test_benchmark.py`.\nHere are the results on my laptop:\n\n Writes: 28521 / second\n Reads: 188502 / second\n\nThe test is just writing 100k small key/values, and reading them all too.\nIt's just meant to give a rough idea.\n\nIt writes every time you set a value, but only the key/value pair.\nSo the cost of adding an item stays constant.\nOn the other hand, lots of updates / deletes / re-inserts would lead to data fragmentation in the file.\nThis might deteriorate performance in the long run.\n\n\nF.A.Q.\n------\n\n### Is it thread safe?\n\nNo. It's not thread safe.\nIn practice, synchronization mechanisms are typically desired on a higher level anyway.\n\n### Why not make it async writes?\n\nIn the original version, there was a switch to choose between sync and async mode.\nHowever, it turned out to have only a relatively small impact on overall performance.\nLess than 25% on the hardware/OS/data I tested if I remember right.\nSince the benefits seem rather low, I removed the flag and the associated code altogether, \nin order to ensure safety by default.\nIMHO, it's preferable to loose a few microseconds rather than data upon a crash.\n\n\n### Why not use memory mapped files?\n\nI experimented with that too. In my experience, with the hardware/OS/data I tested,\nit turned out to ...*suck*. Using memory mapped files lead to inconsistent and unpredictible performance,\noften much slower than direct file access.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Simple Object Storage - Persistent dicts and lists for python.",
"version": "1.3.0",
"split_keywords": [
"persistent",
"persistence",
"dict",
"list",
"file"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6ec80d0d6aac6ef9889e66e41083834210cccb0482c8d3113190c162522e151d",
"md5": "98518ef237fa996bfc51cb6300ae68c7",
"sha256": "c526763b6a238115fea141fb043439373e0b78ad5693efdb4969185f4b6b5c6b"
},
"downloads": -1,
"filename": "pysos-1.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "98518ef237fa996bfc51cb6300ae68c7",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 15930,
"upload_time": "2023-01-31T09:11:22",
"upload_time_iso_8601": "2023-01-31T09:11:22.242571Z",
"url": "https://files.pythonhosted.org/packages/6e/c8/0d0d6aac6ef9889e66e41083834210cccb0482c8d3113190c162522e151d/pysos-1.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5dbb43b90c2f743e958f66af0909e09f8d0256ff305c02219c977a8bb9b6ae52",
"md5": "df9cd899b6aa4283d2a6a89ac05b21bd",
"sha256": "4993c197482afcfec9d0549b110578c96181e3dc4a264aa68b1e49e768436c8f"
},
"downloads": -1,
"filename": "pysos-1.3.0.tar.gz",
"has_sig": false,
"md5_digest": "df9cd899b6aa4283d2a6a89ac05b21bd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10892,
"upload_time": "2023-01-31T09:11:24",
"upload_time_iso_8601": "2023-01-31T09:11:24.120636Z",
"url": "https://files.pythonhosted.org/packages/5d/bb/43b90c2f743e958f66af0909e09f8d0256ff305c02219c977a8bb9b6ae52/pysos-1.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-31 09:11:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "dagnelies",
"github_project": "pysos",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pysos"
}