dspace-stats-collector


Namedspace-stats-collector JSON
Version 0.6.9 PyPI version JSON
download
home_pagehttps://github.com/lareferencia/dspace-stats-collector
SummaryA python library for sending usage stats events from Dspace to Matomo
upload_time2023-05-05 15:15:58
maintainer
docs_urlNone
authorLA Referencia
requires_python
licenseGNU General Public License v3
keywords dspace_stats_collector
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            ============================
DSpace Usage Stats Collector
============================

.. image:: https://img.shields.io/pypi/v/dspace-stats-collector.svg
        :target: https://pypi.python.org/pypi/dspace-stats-collector

.. image:: https://img.shields.io/travis/lareferencia/dspace-stats-collector.svg
        :target: https://travis-ci.org/lareferencia/dspace-stats-collector

.. image:: https://readthedocs.org/projects/dspace-stats-collector/badge/?version=latest
        :target: https://dspace-stats-collector.readthedocs.io/en/latest/?badge=latest
        :alt: Documentation Status
        
.. image:: https://img.shields.io/pypi/l/dspace-stats-collector.svg
        :target: https://pypi.python.org/pypi/dspace-stats-collector
        :alt: License


A python agent for sending DSpace usage statistics events to Matomo/OpenAIRE. 

* Free software: GNU General Public License v3

* Documentation: http://doc.lareferencia.info/es/public/estadisticas/dspace-stats-collector


Implementation of a lightweight, easy-to-deploy, read-only alternative for a DSpace usage data collector compatible with Matomo and OpenAire usage statistics infrastructure. It sends usage data from individual repositories to an external regional aggregator by issuing read-only queries to the out-of-the-box DSpace Solr statistics subsystem.  

A regional usage statistics service allows the sharing of data on item access across repositories, e-journals and CRIS systems in order to support evaluation, management and reporting. The success of this kind of service depends on installing a collector component in  every repository, so one of the main requirements was to provide a user-friendly, non-invasive and reliable deploying process for repository managers.

This development is part of LA Referencia´s tasks in OpenAIRE Advance project,  aimed to build a pilot on usage data exchange between Latin America and Europe open science infrastructures. 

The design and the development of this usage data collector agent have been based on the following fundamental principles:

* open-source, collaborative development 

* straightforward installation procedure for non-expert Linux users without root or superuser privileges 

* capable of running in a sandbox without the need for installing system-wide packages in the host system

* light-weight and preserving system stability and performance

* fully compatible with OpenAIRE Usage Statistics Service [1]

* adaptable to other software platforms and aggregator services 


Implementation highlights
-------------------------

The solution is based on a “pipe and filter” architecture with input, filter and output stages for events. This approach aims to factorize the problem in independent components, so more stages can be added/connected in the future, allowing to cover other software platforms.

In this first version of the agent, the following  stages have been implemented for DSpace versions 4, 5 and 6, sending events to a Matomo instance, which is analysis platform used by the OpenAIRE [1]:

* DSpace Solr Statistics Input: an initial input component queries the internal DSpace Solr statistics core for new (later than a given/stored timestamp) usage events (item views/ item downloads).  This initial event contains fields for timestamp, item id, user agent, IP address, among others    

* COUNTER Robots Filter: this filter excludes events generated by internet robots and crawlers based on a list of user agent values provided by project COUNTER [3] 

* DSpace Database Filter: this stage queries the internal DSpace relational database (currently only Postgres supported) for complementary item information which is not stored in the Solr core but is required by OpenAire specifications. This filter adds item title, bitstream filename and oai_identifier as event fields

* Matomo API Filter: this filter transforms previously gathered data into the set of parameters required by  Matomo Tracking API [4]

* Matomo Sender Output: this filter buffers and sends batches of events into the regional tracker using the bulk tracking feature of Matomo HTTP Tracking API [4]

.. image::  https://raw.githubusercontent.com/lareferencia/dspace-stats-collector/master/docs/pipeline-diagram.png

The resulting pipeline runs from the main collector script that stores the last successfully sent timestamp as a state for future calls. 

Credits
-------

This component is part of an alternative DSpace Usage Statistics collector strategy developed by LA Referencia / CONCYTEC (Perú) / IBICT (Brasil) / OpenAIRE as part of OpenAIRE Advance project - WP5 - Subtask 5.2.2. "Pilot common methods for usage statistics across Europe & Latin America"


References
----------

[1] Schirrwagen, Jochen, Pierrakos, Dimitris, MacIntyre, Ross, Needham, Paul, Simeonov, Georgi, Príncipe, Pedro, & Dazy, André. (2017). 

[2] OpenAIRE2020 - Usage Statistics Services - D8.5. doi: https://doi.org/10.5281/zenodo.1034164

[3] Python generators https://wiki.python.org/moin/Generators

[4] Project COUNTER https://www.projectcounter.org/

[5] Matomo tracking API, https://developer.matomo.org/api-reference/tracking-api

[6] DSpace Statistics https://wiki.lyrasis.org/display/DSDOC3x/DSpace+Statistics



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/lareferencia/dspace-stats-collector",
    "name": "dspace-stats-collector",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "dspace_stats_collector",
    "author": "LA Referencia",
    "author_email": "lareferencia.dev@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d6/31/7cae9f20cc5d28df3dafcc884bd2ed5e8d15579f2714cf0d5dc47b9c50b2/dspace_stats_collector-0.6.9.tar.gz",
    "platform": null,
    "description": "============================\nDSpace Usage Stats Collector\n============================\n\n.. image:: https://img.shields.io/pypi/v/dspace-stats-collector.svg\n        :target: https://pypi.python.org/pypi/dspace-stats-collector\n\n.. image:: https://img.shields.io/travis/lareferencia/dspace-stats-collector.svg\n        :target: https://travis-ci.org/lareferencia/dspace-stats-collector\n\n.. image:: https://readthedocs.org/projects/dspace-stats-collector/badge/?version=latest\n        :target: https://dspace-stats-collector.readthedocs.io/en/latest/?badge=latest\n        :alt: Documentation Status\n        \n.. image:: https://img.shields.io/pypi/l/dspace-stats-collector.svg\n        :target: https://pypi.python.org/pypi/dspace-stats-collector\n        :alt: License\n\n\nA python agent for sending DSpace usage statistics events to Matomo/OpenAIRE. \n\n* Free software: GNU General Public License v3\n\n* Documentation: http://doc.lareferencia.info/es/public/estadisticas/dspace-stats-collector\n\n\nImplementation of a lightweight, easy-to-deploy, read-only alternative for a DSpace usage data collector compatible with Matomo and OpenAire usage statistics infrastructure. It sends usage data from individual repositories to an external regional aggregator by issuing read-only queries to the out-of-the-box DSpace Solr statistics subsystem.  \n\nA regional usage statistics service allows the sharing of data on item access across repositories, e-journals and CRIS systems in order to support evaluation, management and reporting. The success of this kind of service depends on installing a collector component in  every repository, so one of the main requirements was to provide a user-friendly, non-invasive and reliable deploying process for repository managers.\n\nThis development is part of LA Referencia\u00b4s tasks in OpenAIRE Advance project,  aimed to build a pilot on usage data exchange between Latin America and Europe open science infrastructures. \n\nThe design and the development of this usage data collector agent have been based on the following fundamental principles:\n\n* open-source, collaborative development \n\n* straightforward installation procedure for non-expert Linux users without root or superuser privileges \n\n* capable of running in a sandbox without the need for installing system-wide packages in the host system\n\n* light-weight and preserving system stability and performance\n\n* fully compatible with OpenAIRE Usage Statistics Service [1]\n\n* adaptable to other software platforms and aggregator services \n\n\nImplementation highlights\n-------------------------\n\nThe solution is based on a \u201cpipe and filter\u201d architecture with input, filter and output stages for events. This approach aims to factorize the problem in independent components, so more stages can be added/connected in the future, allowing to cover other software platforms.\n\nIn this first version of the agent, the following  stages have been implemented for DSpace versions 4, 5 and 6, sending events to a Matomo instance, which is analysis platform used by the OpenAIRE [1]:\n\n* DSpace Solr Statistics Input: an initial input component queries the internal DSpace Solr statistics core for new (later than a given/stored timestamp) usage events (item views/ item downloads).  This initial event contains fields for timestamp, item id, user agent, IP address, among others    \n\n* COUNTER Robots Filter: this filter excludes events generated by internet robots and crawlers based on a list of user agent values provided by project COUNTER [3] \n\n* DSpace Database Filter: this stage queries the internal DSpace relational database (currently only Postgres supported) for complementary item information which is not stored in the Solr core but is required by OpenAire specifications. This filter adds item title, bitstream filename and oai_identifier as event fields\n\n* Matomo API Filter: this filter transforms previously gathered data into the set of parameters required by  Matomo Tracking API [4]\n\n* Matomo Sender Output: this filter buffers and sends batches of events into the regional tracker using the bulk tracking feature of Matomo HTTP Tracking API [4]\n\n.. image::  https://raw.githubusercontent.com/lareferencia/dspace-stats-collector/master/docs/pipeline-diagram.png\n\nThe resulting pipeline runs from the main collector script that stores the last successfully sent timestamp as a state for future calls. \n\nCredits\n-------\n\nThis component is part of an alternative DSpace Usage Statistics collector strategy developed by LA Referencia / CONCYTEC (Per\u00fa) / IBICT (Brasil) / OpenAIRE as part of OpenAIRE Advance project - WP5 - Subtask 5.2.2. \"Pilot common methods for usage statistics across Europe & Latin America\"\n\n\nReferences\n----------\n\n[1] Schirrwagen, Jochen, Pierrakos, Dimitris, MacIntyre, Ross, Needham, Paul, Simeonov, Georgi, Pr\u00edncipe, Pedro, & Dazy, Andr\u00e9. (2017). \n\n[2] OpenAIRE2020 - Usage Statistics Services - D8.5. doi: https://doi.org/10.5281/zenodo.1034164\n\n[3] Python generators https://wiki.python.org/moin/Generators\n\n[4] Project COUNTER https://www.projectcounter.org/\n\n[5] Matomo tracking API, https://developer.matomo.org/api-reference/tracking-api\n\n[6] DSpace Statistics https://wiki.lyrasis.org/display/DSDOC3x/DSpace+Statistics\n\n\n",
    "bugtrack_url": null,
    "license": "GNU General Public License v3",
    "summary": "A python library for sending usage stats events from Dspace to Matomo",
    "version": "0.6.9",
    "project_urls": {
        "Homepage": "https://github.com/lareferencia/dspace-stats-collector"
    },
    "split_keywords": [
        "dspace_stats_collector"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d7046d973489642f4e168c28d1b5ad6b8586ed1b517eaeff042feadf3d02911",
                "md5": "b79dc0f953015d4c5e26e9f584cb0c4c",
                "sha256": "e2815035eb45a9371924a942f18bfc63505a660b5561b3e286080696067e417c"
            },
            "downloads": -1,
            "filename": "dspace_stats_collector-0.6.9-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b79dc0f953015d4c5e26e9f584cb0c4c",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 32164,
            "upload_time": "2023-05-05T15:15:56",
            "upload_time_iso_8601": "2023-05-05T15:15:56.892149Z",
            "url": "https://files.pythonhosted.org/packages/0d/70/46d973489642f4e168c28d1b5ad6b8586ed1b517eaeff042feadf3d02911/dspace_stats_collector-0.6.9-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d6317cae9f20cc5d28df3dafcc884bd2ed5e8d15579f2714cf0d5dc47b9c50b2",
                "md5": "0c0caaf598a34531f3097278d7e7d39d",
                "sha256": "4d936c1cc9a1c386bbcbdb1758a417caf9ca8dfbabf7cc46839010ba5376ee20"
            },
            "downloads": -1,
            "filename": "dspace_stats_collector-0.6.9.tar.gz",
            "has_sig": false,
            "md5_digest": "0c0caaf598a34531f3097278d7e7d39d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 102164,
            "upload_time": "2023-05-05T15:15:58",
            "upload_time_iso_8601": "2023-05-05T15:15:58.777310Z",
            "url": "https://files.pythonhosted.org/packages/d6/31/7cae9f20cc5d28df3dafcc884bd2ed5e8d15579f2714cf0d5dc47b9c50b2/dspace_stats_collector-0.6.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-05 15:15:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lareferencia",
    "github_project": "dspace-stats-collector",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "dspace-stats-collector"
}
        
Elapsed time: 0.06340s