swh.storage


Nameswh.storage JSON
Version 2.9.0 PyPI version JSON
download
home_pageNone
SummarySoftware Heritage storage manager
upload_time2024-11-27 15:31:25
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Software Heritage - Storage
===========================

Abstraction layer over the archive, allowing to access all stored source code
artifacts as well as their metadata.

Quick start
-----------

Dependencies
^^^^^^^^^^^^

Python tests for this module include tests that cannot be run without a local
Postgresql database, so you need the Postgresql server executable on your
machine (no need to have a running Postgresql server). They also expect a
cassandra server.

Debian-like host
""""""""""""""""

.. code-block:: shell

   $ sudo apt install libpq-dev postgresql-11 cassandra


Non Debian-like host
""""""""""""""""""""

The tests expect the path to ``cassandra`` to either be unspecified, it is then
looked up at ``/usr/sbin/cassandra``, either specified through the environment
variable ``SWH_CASSANDRA_BIN``.

Optionally, you can avoid running the cassandra tests.

.. code-block:: shell

   (swh) :~/swh-storage$ tox -- -m 'not cassandra'


Installation
^^^^^^^^^^^^

It is strongly recommended to use a virtualenv. In the following, we
consider you work in a virtualenv named ``swh``. See the
`developer setup guide <https://docs.softwareheritage.org/devel/developer-setup.html#developer-setup>`_
for a more details on how to setup a working environment.


You can install the package directly from
`pypi <https://pypi.org/p/swh.storage>`_:

.. code-block:: shell

   (swh) :~$ pip install swh.storage
   [...]


Or from sources:

.. code-block:: shell

   (swh) :~$ git clone https://forge.softwareheritage.org/source/swh-storage.git
   [...]
   (swh) :~$ cd swh-storage
   (swh) :~/swh-storage$ pip install .
   [...]


Then you can check it's properly installed:

.. code-block:: shell

   (swh) :~$ swh storage --help
   Usage: swh storage [OPTIONS] COMMAND [ARGS]...

     Software Heritage Storage tools.

   Options:
     -h, --help  Show this message and exit.

   Commands:
     rpc-serve  Software Heritage Storage RPC server.


Tests
-----

The best way of running Python tests for this module is to use
`tox <https://tox.readthedocs.io>`_.

.. code-block:: shell

   (swh) :~$ pip install tox


tox
^^^

From the sources directory, simply use tox:

.. code-block:: shell

   (swh) :~/swh-storage$ tox
   [...]
   ========= 315 passed, 6 skipped, 15 warnings in 40.86 seconds ==========
   _______________________________ summary ________________________________
     flake8: commands succeeded
     py3: commands succeeded
     congratulations :)


Note: it is possible to set the ``JAVA_HOME`` environment variable to specify the
version of the JVM to be used by Cassandra. For example, at the time of writing
this, Cassandra is meant to be run with Java 11. On Debian bookworm, one needs
to manually install openjdk-11-jre-headless from bullseye or unstable and
set the appropriate environment variable:

.. code-block:: shell

   (swh) :~/swh-storage$ export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
   (swh) :~/swh-storage$ tox
   [...]


Development
-----------

The storage server can be locally started. It requires a configuration file and
a running Postgresql database.

Sample configuration
^^^^^^^^^^^^^^^^^^^^

A typical configuration ``storage.yml`` file is:

.. code-block:: yaml

   storage:
     cls: postgresql
     db: "dbname=softwareheritage-dev user=<user> password=<pwd>"
     objstorage:
       cls: pathslicing
       root: /tmp/swh-storage/
       slicing: 0:2/2:4/4:6


which means, this uses:

- a local storage instance whose db connection is to
  ``softwareheritage-dev`` local instance,

- the objstorage uses a local objstorage instance whose:

  - ``root`` path is /tmp/swh-storage,

  - slicing scheme is ``0:2/2:4/4:6``. This means that the identifier of
    the content (sha1) which will be stored on disk at first level
    with the first 2 hex characters, the second level with the next 2
    hex characters and the third level with the next 2 hex
    characters. And finally the complete hash file holding the raw
    content. For example: ``00062f8bd330715c4f819373653d97b3cd34394c``
    will be stored at ``00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c``

Note that the ``root`` path should exist on disk before starting the server.


Starting the storage server
^^^^^^^^^^^^^^^^^^^^^^^^^^^

If the python package has been properly installed (e.g. in a virtual env), you
should be able to use the command:

.. code-block:: shell

   (swh) :~/swh-storage$ swh storage -C storage.yml rpc-serve


This runs a local swh-storage api at 5002 port.

.. code-block:: shell

   (swh) :~/swh-storage$ curl http://127.0.0.1:5002
   <html>
   <head><title>Software Heritage storage server</title></head>
   <body>
   <p>You have reached the
   <a href="https://www.softwareheritage.org/">Software Heritage</a>
   storage server.<br />
   See its
   <a href="https://docs.softwareheritage.org/devel/swh-storage/">documentation
   and API</a> for more information</p>


And then what?
^^^^^^^^^^^^^^

In your upper layer
(`loader-git <https://forge.softwareheritage.org/source/swh-loader-git>`_,
`loader-svn <https://forge.softwareheritage.org/source/swh-loader-svn>`_,
etc...), you can define a remote storage with this snippet of yaml
configuration.

.. code-block:: yaml

   storage:
     cls: remote
     url: http://localhost:5002/


You could directly define a postgresql storage with the following snippet:

.. code-block:: yaml

   storage:
     cls: postgresql
     db: service=swh-dev
     objstorage:
       cls: pathslicing
       root: /home/storage/swh-storage/
       slicing: 0:2/2:4/4:6


Cassandra
---------

As an alternative to PostgreSQL, swh-storage can use Cassandra as a database
backend. It can be used like this:

.. code-block:: yaml

   storage:
     cls: cassandra
     hosts:
       - localhost
     keyspace: swh
     objstorage:
       cls: pathslicing
       root: /home/storage/swh-storage/
       slicing: 0:2/2:4/4:6


The Cassandra swh-storage implementation supports both Cassandra >= 4.0-alpha2
and ScyllaDB >= 4.4 (and possibly earlier versions, but this is untested).

While the main code supports both transparently, running tests
or configuring the schema requires specific code when using ScyllaDB,
enabled by setting the ``SWH_USE_SCYLLADB=1`` environment variable.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "swh.storage",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Software Heritage developers <swh-devel@inria.fr>",
    "download_url": "https://files.pythonhosted.org/packages/3f/2d/a4f7bc35f25812314be8ff9f4b001b9633ca9193ef4e5a568ca5d6892f14/swh_storage-2.9.0.tar.gz",
    "platform": null,
    "description": "Software Heritage - Storage\n===========================\n\nAbstraction layer over the archive, allowing to access all stored source code\nartifacts as well as their metadata.\n\nQuick start\n-----------\n\nDependencies\n^^^^^^^^^^^^\n\nPython tests for this module include tests that cannot be run without a local\nPostgresql database, so you need the Postgresql server executable on your\nmachine (no need to have a running Postgresql server). They also expect a\ncassandra server.\n\nDebian-like host\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: shell\n\n   $ sudo apt install libpq-dev postgresql-11 cassandra\n\n\nNon Debian-like host\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nThe tests expect the path to ``cassandra`` to either be unspecified, it is then\nlooked up at ``/usr/sbin/cassandra``, either specified through the environment\nvariable ``SWH_CASSANDRA_BIN``.\n\nOptionally, you can avoid running the cassandra tests.\n\n.. code-block:: shell\n\n   (swh) :~/swh-storage$ tox -- -m 'not cassandra'\n\n\nInstallation\n^^^^^^^^^^^^\n\nIt is strongly recommended to use a virtualenv. In the following, we\nconsider you work in a virtualenv named ``swh``. See the\n`developer setup guide <https://docs.softwareheritage.org/devel/developer-setup.html#developer-setup>`_\nfor a more details on how to setup a working environment.\n\n\nYou can install the package directly from\n`pypi <https://pypi.org/p/swh.storage>`_:\n\n.. code-block:: shell\n\n   (swh) :~$ pip install swh.storage\n   [...]\n\n\nOr from sources:\n\n.. code-block:: shell\n\n   (swh) :~$ git clone https://forge.softwareheritage.org/source/swh-storage.git\n   [...]\n   (swh) :~$ cd swh-storage\n   (swh) :~/swh-storage$ pip install .\n   [...]\n\n\nThen you can check it's properly installed:\n\n.. code-block:: shell\n\n   (swh) :~$ swh storage --help\n   Usage: swh storage [OPTIONS] COMMAND [ARGS]...\n\n     Software Heritage Storage tools.\n\n   Options:\n     -h, --help  Show this message and exit.\n\n   Commands:\n     rpc-serve  Software Heritage Storage RPC server.\n\n\nTests\n-----\n\nThe best way of running Python tests for this module is to use\n`tox <https://tox.readthedocs.io>`_.\n\n.. code-block:: shell\n\n   (swh) :~$ pip install tox\n\n\ntox\n^^^\n\nFrom the sources directory, simply use tox:\n\n.. code-block:: shell\n\n   (swh) :~/swh-storage$ tox\n   [...]\n   ========= 315 passed, 6 skipped, 15 warnings in 40.86 seconds ==========\n   _______________________________ summary ________________________________\n     flake8: commands succeeded\n     py3: commands succeeded\n     congratulations :)\n\n\nNote: it is possible to set the ``JAVA_HOME`` environment variable to specify the\nversion of the JVM to be used by Cassandra. For example, at the time of writing\nthis, Cassandra is meant to be run with Java 11. On Debian bookworm, one needs\nto manually install openjdk-11-jre-headless from bullseye or unstable and\nset the appropriate environment variable:\n\n.. code-block:: shell\n\n   (swh) :~/swh-storage$ export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64\n   (swh) :~/swh-storage$ tox\n   [...]\n\n\nDevelopment\n-----------\n\nThe storage server can be locally started. It requires a configuration file and\na running Postgresql database.\n\nSample configuration\n^^^^^^^^^^^^^^^^^^^^\n\nA typical configuration ``storage.yml`` file is:\n\n.. code-block:: yaml\n\n   storage:\n     cls: postgresql\n     db: \"dbname=softwareheritage-dev user=<user> password=<pwd>\"\n     objstorage:\n       cls: pathslicing\n       root: /tmp/swh-storage/\n       slicing: 0:2/2:4/4:6\n\n\nwhich means, this uses:\n\n- a local storage instance whose db connection is to\n  ``softwareheritage-dev`` local instance,\n\n- the objstorage uses a local objstorage instance whose:\n\n  - ``root`` path is /tmp/swh-storage,\n\n  - slicing scheme is ``0:2/2:4/4:6``. This means that the identifier of\n    the content (sha1) which will be stored on disk at first level\n    with the first 2 hex characters, the second level with the next 2\n    hex characters and the third level with the next 2 hex\n    characters. And finally the complete hash file holding the raw\n    content. For example: ``00062f8bd330715c4f819373653d97b3cd34394c``\n    will be stored at ``00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c``\n\nNote that the ``root`` path should exist on disk before starting the server.\n\n\nStarting the storage server\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf the python package has been properly installed (e.g. in a virtual env), you\nshould be able to use the command:\n\n.. code-block:: shell\n\n   (swh) :~/swh-storage$ swh storage -C storage.yml rpc-serve\n\n\nThis runs a local swh-storage api at 5002 port.\n\n.. code-block:: shell\n\n   (swh) :~/swh-storage$ curl http://127.0.0.1:5002\n   <html>\n   <head><title>Software Heritage storage server</title></head>\n   <body>\n   <p>You have reached the\n   <a href=\"https://www.softwareheritage.org/\">Software Heritage</a>\n   storage server.<br />\n   See its\n   <a href=\"https://docs.softwareheritage.org/devel/swh-storage/\">documentation\n   and API</a> for more information</p>\n\n\nAnd then what?\n^^^^^^^^^^^^^^\n\nIn your upper layer\n(`loader-git <https://forge.softwareheritage.org/source/swh-loader-git>`_,\n`loader-svn <https://forge.softwareheritage.org/source/swh-loader-svn>`_,\netc...), you can define a remote storage with this snippet of yaml\nconfiguration.\n\n.. code-block:: yaml\n\n   storage:\n     cls: remote\n     url: http://localhost:5002/\n\n\nYou could directly define a postgresql storage with the following snippet:\n\n.. code-block:: yaml\n\n   storage:\n     cls: postgresql\n     db: service=swh-dev\n     objstorage:\n       cls: pathslicing\n       root: /home/storage/swh-storage/\n       slicing: 0:2/2:4/4:6\n\n\nCassandra\n---------\n\nAs an alternative to PostgreSQL, swh-storage can use Cassandra as a database\nbackend. It can be used like this:\n\n.. code-block:: yaml\n\n   storage:\n     cls: cassandra\n     hosts:\n       - localhost\n     keyspace: swh\n     objstorage:\n       cls: pathslicing\n       root: /home/storage/swh-storage/\n       slicing: 0:2/2:4/4:6\n\n\nThe Cassandra swh-storage implementation supports both Cassandra >= 4.0-alpha2\nand ScyllaDB >= 4.4 (and possibly earlier versions, but this is untested).\n\nWhile the main code supports both transparently, running tests\nor configuring the schema requires specific code when using ScyllaDB,\nenabled by setting the ``SWH_USE_SCYLLADB=1`` environment variable.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Software Heritage storage manager",
    "version": "2.9.0",
    "project_urls": {
        "Bug Reports": "https://gitlab.softwareheritage.org/swh/devel/swh-storage/-/issues",
        "Documentation": "https://docs.softwareheritage.org/devel/swh-storage/",
        "Funding": "https://www.softwareheritage.org/donate",
        "Homepage": "https://gitlab.softwareheritage.org/swh/devel/swh-storage",
        "Source": "https://gitlab.softwareheritage.org/swh/devel/swh-storage.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "400f60be7efa9c82d2babbeb6d8e52e8e62f2fa799e88e4a4838c8752a879fc0",
                "md5": "0993eed625d3e813d77148860c26b4cf",
                "sha256": "79845c10e3e6f3f2da4868169e0e7868d2ca6c445623c739fdc22d60147fcf54"
            },
            "downloads": -1,
            "filename": "swh.storage-2.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0993eed625d3e813d77148860c26b4cf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 564439,
            "upload_time": "2024-11-27T15:31:23",
            "upload_time_iso_8601": "2024-11-27T15:31:23.836848Z",
            "url": "https://files.pythonhosted.org/packages/40/0f/60be7efa9c82d2babbeb6d8e52e8e62f2fa799e88e4a4838c8752a879fc0/swh.storage-2.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3f2da4f7bc35f25812314be8ff9f4b001b9633ca9193ef4e5a568ca5d6892f14",
                "md5": "5cd950953c692499f4c11e958bf596da",
                "sha256": "53f99dc932c54e3b32571e03269fa99e51335c8e22e6c312875a739e498e1219"
            },
            "downloads": -1,
            "filename": "swh_storage-2.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5cd950953c692499f4c11e958bf596da",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 439816,
            "upload_time": "2024-11-27T15:31:25",
            "upload_time_iso_8601": "2024-11-27T15:31:25.867357Z",
            "url": "https://files.pythonhosted.org/packages/3f/2d/a4f7bc35f25812314be8ff9f4b001b9633ca9193ef4e5a568ca5d6892f14/swh_storage-2.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-27 15:31:25",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "swh.storage"
}
        
Elapsed time: 0.42926s