cvmfs-server-scraper

Name	cvmfs-server-scraper JSON
Version	0.0.4 JSON
	download
home_page	https://github.com/eessi/cvmfs-server-scraper
Summary	Scrape metadata from CVMFS Stratum servers.
upload_time	2024-06-14 23:42:50
maintainer	Terje Kvernes
docs_url	None
author	Terje Kvernes
requires_python	<4.0,>=3.8
license	GPLv2
keywords	cvmfs scrape eessi
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # CVMFS server scraper and prometheus exporter

This tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:

    - cvmfs/info/v1/repositories.json 

And then for every repo it finds (that it's not told to ignore), it grabs:

    - cvmfs/<repo>/.cvmfs_status.json
    - cvmfs/<repo>/.cvmfspublished

## Installation

`pip install cvmfs-server-scraper`

## Usage

````python
#!/usr/bin/env python3

import logging
from cvmfsscraper import scrape, scrape_server, set_log_level

# server = scrape_server("aws-eu-west1.stratum1.cvmfs.eessi-infra.org")

set_log_level(logging.DEBUG)

servers = scrape(
    stratum0_servers=[
        "stratum0.tld",
    ],
    stratum1_servers=[
        "stratum1-no.tld",
        "stratum1-au.tld",
    ],
    repos=[],
    ignore_repos=[],
)

# Note that the order of servers is undefined.
print(servers[0])

for repo in servers[0].repositories:
    print("Repo: " + repo.name )
    print("Root size: " + repo.root_size)
    print("Revision: " + repo.revision)
    print("Revision timestamp: " + repo.revision_timestamp)
    print("Last snapshot: " + str(repo.last_snapshot))
````

Note that if you are using a Stratum1 server with S3 as its backend, you need to set repos explicitly.
This is because the S3 backend does not have a `cvmfs/info/v1/repositories.json` file. Also, the GeoAPI
status will be `NOT_FOUND` for these servers.

````python

# Data structure

## Server

A server object, representing a specific server that has been scraped.

````python
servers = scrape(...)
server_one = servers[0]
````

### Name

#### Type: Attribute

`server.name`

#### Returns

The name of the server, usually its fully qualified domain name.

### GeoApi status

#### Type: Attribute

`server.geoapi_status`

#### Returns

A GeoAPIstatus enum object. Defined in `constants.py`. The possible values are:

- OK (0: OK)
- LOCATION_ERROR (1: GeoApi gives wrong location)
- NO_RESPONSE (2: No response)
- NOT_FOUND (9: The server has no repository available so the GeoApi cannot be tested)
- NOT_YET_TESTED (99: The server has not yet been tested)

### Repositories

#### Type: attribute

`server.repositories`

#### Returns

A list of repository objects, sorted by name. Empty if no repositores are scraped on the server.

### Ignored repositories

#### Type: Attribute

`server.ignored_repositories`

#### Returns

List of repositories names that are to be ignored by the scraper.

### Forced repositories

#### Type: Attribute

`server.forced_repositories`

#### Returns

A list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.

## Repository

A repository object, representing a single repository on a scraped server.

````python
servers = scrape(...)
repo_one = servers[0].repositories[0]
````

### Name

#### Type: Attribute

`repo_one.name`

#### Returns

The fully qualified name of the repository.

### Server

#### Type: Attribute

`repo_one.server`

#### Returns

The server object to which the repository belongs.

### Path

#### Type: Attribute

`repo_one.path`

#### Returns

The path for the repository on the server. May differ from the name. To get a complete URL, one can do:

`url = "http://" + repo_one.server.name + repo_one.path`

### Status attributes

These attributes are populated from `cvmfs_status.json`:

| Attribute | Value |
| --- | --- |
| last_gc | Timestamp of last garbage collection |
| last_snapshot | Timestamp of the last snapshot |

Information from `.cvmfspublished` is also provided. For explanations for these keys, please see CVMFS' [official documentation](https://cvmfs.readthedocs.io/en/stable/cpt-details.html). The field value in the table is the field key from `.cvmfspublished`.

| Attribute | Field |
| --- | --- |
| alternative_name | A |
| full_name | N |
| is_garbage_collectable | G |
| metadata_cryptographic_hash | M |
| micro_cataogues | L |
| reflog_checksum_cryptographic_hash | Y |
| revision_timestamp | T |
| root_catalogue_ttl | D |
| root_cryptographic_hash | C |
| root_size | B |
| root_path_hash | R |
| signature | The end signature blob |
| signing_certificate_cryptographic_hash | X |
| tag_history_cryptographic_hash | H |

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/eessi/cvmfs-server-scraper",
    "name": "cvmfs-server-scraper",
    "maintainer": "Terje Kvernes",
    "docs_url": null,
    "requires_python": "<4.0,>=3.8",
    "maintainer_email": "terje@kvernes.no",
    "keywords": "cvmfs, scrape, eessi",
    "author": "Terje Kvernes",
    "author_email": "terje@kvernes.no",
    "download_url": "https://files.pythonhosted.org/packages/67/5f/a8eef0a65e851e30ad7c81be32b255f7edef07a962c66a022eaf5e7a142c/cvmfs_server_scraper-0.0.4.tar.gz",
    "platform": null,
    "description": "# CVMFS server scraper and prometheus exporter\n\nThis tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:\n\n    - cvmfs/info/v1/repositories.json \n\nAnd then for every repo it finds (that it's not told to ignore), it grabs:\n\n    - cvmfs/<repo>/.cvmfs_status.json\n    - cvmfs/<repo>/.cvmfspublished\n\n## Installation\n\n`pip install cvmfs-server-scraper`\n\n## Usage\n\n````python\n#!/usr/bin/env python3\n\nimport logging\nfrom cvmfsscraper import scrape, scrape_server, set_log_level\n\n# server = scrape_server(\"aws-eu-west1.stratum1.cvmfs.eessi-infra.org\")\n\nset_log_level(logging.DEBUG)\n\nservers = scrape(\n    stratum0_servers=[\n        \"stratum0.tld\",\n    ],\n    stratum1_servers=[\n        \"stratum1-no.tld\",\n        \"stratum1-au.tld\",\n    ],\n    repos=[],\n    ignore_repos=[],\n)\n\n# Note that the order of servers is undefined.\nprint(servers[0])\n\nfor repo in servers[0].repositories:\n    print(\"Repo: \" + repo.name )\n    print(\"Root size: \" + repo.root_size)\n    print(\"Revision: \" + repo.revision)\n    print(\"Revision timestamp: \" + repo.revision_timestamp)\n    print(\"Last snapshot: \" + str(repo.last_snapshot))\n````\n\nNote that if you are using a Stratum1 server with S3 as its backend, you need to set repos explicitly.\nThis is because the S3 backend does not have a `cvmfs/info/v1/repositories.json` file. Also, the GeoAPI\nstatus will be `NOT_FOUND` for these servers.\n\n````python\n\n# Data structure\n\n## Server\n\nA server object, representing a specific server that has been scraped.\n\n````python\nservers = scrape(...)\nserver_one = servers[0]\n````\n\n### Name\n\n#### Type: Attribute\n\n`server.name`\n\n#### Returns\n\nThe name of the server, usually its fully qualified domain name.\n\n### GeoApi status\n\n#### Type: Attribute\n\n`server.geoapi_status`\n\n#### Returns\n\nA GeoAPIstatus enum object. Defined in `constants.py`. The possible values are:\n\n- OK (0: OK)\n- LOCATION_ERROR (1: GeoApi gives wrong location)\n- NO_RESPONSE (2: No response)\n- NOT_FOUND (9: The server has no repository available so the GeoApi cannot be tested)\n- NOT_YET_TESTED (99: The server has not yet been tested)\n\n### Repositories\n\n#### Type: attribute\n\n`server.repositories`\n\n#### Returns\n\nA list of repository objects, sorted by name. Empty if no repositores are scraped on the server.\n\n### Ignored repositories\n\n#### Type: Attribute\n\n`server.ignored_repositories`\n\n#### Returns\n\nList of repositories names that are to be ignored by the scraper.\n\n### Forced repositories\n\n#### Type: Attribute\n\n`server.forced_repositories`\n\n#### Returns\n\nA list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.\n\n## Repository\n\nA repository object, representing a single repository on a scraped server.\n\n````python\nservers = scrape(...)\nrepo_one = servers[0].repositories[0]\n````\n\n### Name\n\n#### Type: Attribute\n\n`repo_one.name`\n\n#### Returns\n\nThe fully qualified name of the repository.\n\n### Server\n\n#### Type: Attribute\n\n`repo_one.server`\n\n#### Returns\n\nThe server object to which the repository belongs.\n\n### Path\n\n#### Type: Attribute\n\n`repo_one.path`\n\n#### Returns\n\nThe path for the repository on the server. May differ from the name. To get a complete URL, one can do:\n\n`url = \"http://\" + repo_one.server.name + repo_one.path`\n\n### Status attributes\n\nThese attributes are populated from `cvmfs_status.json`:\n\n| Attribute | Value |\n| --- | --- |\n| last_gc | Timestamp of last garbage collection |\n| last_snapshot | Timestamp of the last snapshot |\n\nInformation from `.cvmfspublished` is also provided. For explanations for these keys, please see CVMFS' [official documentation](https://cvmfs.readthedocs.io/en/stable/cpt-details.html). The field value in the table is the field key from `.cvmfspublished`.\n\n| Attribute |\u00a0Field |\n| --- | --- |\n| alternative_name | A\u00a0|\n| full_name | N |\n| is_garbage_collectable | G |\n| metadata_cryptographic_hash | M |\n| micro_cataogues | L |\n| reflog_checksum_cryptographic_hash | Y |\n| revision_timestamp | T |\n| root_catalogue_ttl | D |\n| root_cryptographic_hash | C |\n| root_size | B |\n| root_path_hash |\u00a0R\u00a0|\n| signature | The end signature blob |\n| signing_certificate_cryptographic_hash | X |\n| tag_history_cryptographic_hash | H |\n",
    "bugtrack_url": null,
    "license": "GPLv2",
    "summary": "Scrape metadata from CVMFS Stratum servers.",
    "version": "0.0.4",
    "project_urls": {
        "Documentation": "https://github.com/eessi/cvmfs-server-scraper",
        "Homepage": "https://github.com/eessi/cvmfs-server-scraper",
        "Repository": "https://github.com/eessi/cvmfs-server-scraper"
    },
    "split_keywords": [
        "cvmfs",
        " scrape",
        " eessi"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "13ec62c5ba9a626cd002168ce709b2b1bab37ee6920b3a77c6dda3ff11114f31",
                "md5": "908fd857379fb22b85270bc1cc1736ae",
                "sha256": "db4ebd9a4689545bc1e1aed0fa141f0d87299b5d5045ef3a9387bbcc670209a9"
            },
            "downloads": -1,
            "filename": "cvmfs_server_scraper-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "908fd857379fb22b85270bc1cc1736ae",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8",
            "size": 45946,
            "upload_time": "2024-06-14T23:42:48",
            "upload_time_iso_8601": "2024-06-14T23:42:48.669622Z",
            "url": "https://files.pythonhosted.org/packages/13/ec/62c5ba9a626cd002168ce709b2b1bab37ee6920b3a77c6dda3ff11114f31/cvmfs_server_scraper-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "675fa8eef0a65e851e30ad7c81be32b255f7edef07a962c66a022eaf5e7a142c",
                "md5": "89e07747ca97a9cea45eb10bf2f73f9d",
                "sha256": "e4e8dcd563cf4f2f73f3c392d79aa38a3f40b9e7536810f892baf9cb0fff2efd"
            },
            "downloads": -1,
            "filename": "cvmfs_server_scraper-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "89e07747ca97a9cea45eb10bf2f73f9d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8",
            "size": 32758,
            "upload_time": "2024-06-14T23:42:50",
            "upload_time_iso_8601": "2024-06-14T23:42:50.452134Z",
            "url": "https://files.pythonhosted.org/packages/67/5f/a8eef0a65e851e30ad7c81be32b255f7edef07a962c66a022eaf5e7a142c/cvmfs_server_scraper-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-14 23:42:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "eessi",
    "github_project": "cvmfs-server-scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "cvmfs-server-scraper"
}

Terje Kvernes