cvmfs-server-scraper

Name	cvmfs-server-scraper JSON
Version	0.0.3 JSON
	download
home_page	https://github.com/eessi/cvmfs-server-scraper
Summary	Scrape metadata from CVMFS Stratum servers.
upload_time	2024-01-21 21:17:13
maintainer	Terje Kvernes
docs_url	None
author	Terje Kvernes
requires_python	>=3.8,<4.0
license	GPLv2
keywords	cvmfs scrape eessi
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # CVMFS server scraper and prometheus exporter

This tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:

    - cvmfs/info/v1/repositories.json 

And then for every repo it finds (that it's not told to ignore), it grabs:

    - cvmfs/<repo>/.cvmfs_status.json
    - cvmfs/<repo>/.cvmfspublished

## Usage

````python
#!/usr/bin/env python3

import logging
from cvmfsscraper import scrape, scrape_server, set_log_level

# server = scrape_server("aws-eu-west1.stratum1.cvmfs.eessi-infra.org")

set_log_level(logging.DEBUG)

servers = scrape(
    stratum0_servers=[
        "stratum0.tld",
    ],
    stratum1_servers=[
        "stratum1-no.tld",
        "stratum1-au.tld",
    ],
    repos=[],
    ignore_repos=[],
)

# Note that the order of servers is undefined.
print(servers[0])

for repo in servers[0].repositories:
    print("Repo: " + repo.name )
    print("Root size: " + repo.root_size)
    print("Revision: " + repo.revision)
    print("Revision timestamp: " + repo.revision_timestamp)
    print("Last snapshot: " + str(repo.last_snapshot))
````

# Data structure

## Server

A server object, representing a specific server that has been scraped.

````python
servers = scrape(...)
server_one = servers[0]
````

### Name

#### Type: Attribute

`server.name`

#### Returns

The name of the server, usually its fully qualified domain name.

### GeoApi status

#### Type: Attribute

`server.geoapi_status`

#### Returns

A GeoAPIstatus enum object. Defined in `constants.py`. The possible values are:

- OK (0: OK)
- LOCATION_ERROR (1: GeoApi gives wrong location)
- NO_RESPONSE (2: No response)
- NOT_FOUND (9: The server has no repository available so the GeoApi cannot be tested)
- NOT_YET_TESTED (99: The server has not yet been tested)

### Repositories

#### Type: attribute

`server.repositories`

#### Returns

A list of repository objects, sorted by name. Empty if no repositores are scraped on the server.

### Ignored repositories

#### Type: Attribute

`server.ignored_repositories`

#### Returns

List of repositories names that are to be ignored by the scraper.

### Forced repositories

#### Type: Attribute

`server.forced_repositories`

#### Returns

A list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.

## Repository

A repository object, representing a single repository on a scraped server.

````python
servers = scrape(...)
repo_one = servers[0].repositories[0]
````

### Name

#### Type: Attribute

`repo_one.name`

#### Returns

The fully qualified name of the repository.

### Server

#### Type: Attribute

`repo_one.server`

#### Returns

The server object to which the repository belongs.

### Path

#### Type: Attribute

`repo_one.path`

#### Returns

The path for the repository on the server. May differ from the name. To get a complete URL, one can do:

`url = "http://" + repo_one.server.name + repo_one.path`

### Status attributes

These attributes are populated from `cvmfs_status.json`:

| Attribute | Value |
| --- | --- |
| last_gc | Timestamp of last garbage collection |
| last_snapshot | Timestamp of the last snapshot |

Information from `.cvmfspublished` is also provided. For explanations for these keys, please see CVMFS' [official documentation](https://cvmfs.readthedocs.io/en/stable/cpt-details.html). The field value in the table is the field key from `.cvmfspublished`.

| Attribute | Field |
| --- | --- |
| alternative_name | A |
| full_name | N |
| is_garbage_collectable | G |
| metadata_cryptographic_hash | M |
| micro_cataogues | L |
| reflog_checksum_cryptographic_hash | Y |
| revision_timestamp | T |
| root_catalogue_ttl | D |
| root_cryptographic_hash | C |
| root_size | B |
| root_path_hash | R |
| signature | The end signature blob |
| signing_certificate_cryptographic_hash | X |
| tag_history_cryptographic_hash | H |

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/eessi/cvmfs-server-scraper",
    "name": "cvmfs-server-scraper",
    "maintainer": "Terje Kvernes",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "terje@kvernes.no",
    "keywords": "cvmfs,scrape,eessi",
    "author": "Terje Kvernes",
    "author_email": "terje@kvernes.no",
    "download_url": "https://files.pythonhosted.org/packages/66/4d/45db8e4a37f4376bddbce08f8cf0e0bf153f21fbfe3ae5aabacc6852182c/cvmfs_server_scraper-0.0.3.tar.gz",
    "platform": null,
    "description": "# CVMFS server scraper and prometheus exporter\n\nThis tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:\n\n    - cvmfs/info/v1/repositories.json \n\nAnd then for every repo it finds (that it's not told to ignore), it grabs:\n\n    - cvmfs/<repo>/.cvmfs_status.json\n    - cvmfs/<repo>/.cvmfspublished\n\n## Usage\n\n````python\n#!/usr/bin/env python3\n\nimport logging\nfrom cvmfsscraper import scrape, scrape_server, set_log_level\n\n# server = scrape_server(\"aws-eu-west1.stratum1.cvmfs.eessi-infra.org\")\n\nset_log_level(logging.DEBUG)\n\nservers = scrape(\n    stratum0_servers=[\n        \"stratum0.tld\",\n    ],\n    stratum1_servers=[\n        \"stratum1-no.tld\",\n        \"stratum1-au.tld\",\n    ],\n    repos=[],\n    ignore_repos=[],\n)\n\n# Note that the order of servers is undefined.\nprint(servers[0])\n\nfor repo in servers[0].repositories:\n    print(\"Repo: \" + repo.name )\n    print(\"Root size: \" + repo.root_size)\n    print(\"Revision: \" + repo.revision)\n    print(\"Revision timestamp: \" + repo.revision_timestamp)\n    print(\"Last snapshot: \" + str(repo.last_snapshot))\n````\n\n# Data structure\n\n## Server\n\nA server object, representing a specific server that has been scraped.\n\n````python\nservers = scrape(...)\nserver_one = servers[0]\n````\n\n### Name\n\n#### Type: Attribute\n\n`server.name`\n\n#### Returns\n\nThe name of the server, usually its fully qualified domain name.\n\n### GeoApi status\n\n#### Type: Attribute\n\n`server.geoapi_status`\n\n#### Returns\n\nA GeoAPIstatus enum object. Defined in `constants.py`. The possible values are:\n\n- OK (0: OK)\n- LOCATION_ERROR (1: GeoApi gives wrong location)\n- NO_RESPONSE (2: No response)\n- NOT_FOUND (9: The server has no repository available so the GeoApi cannot be tested)\n- NOT_YET_TESTED (99: The server has not yet been tested)\n\n### Repositories\n\n#### Type: attribute\n\n`server.repositories`\n\n#### Returns\n\nA list of repository objects, sorted by name. Empty if no repositores are scraped on the server.\n\n### Ignored repositories\n\n#### Type: Attribute\n\n`server.ignored_repositories`\n\n#### Returns\n\nList of repositories names that are to be ignored by the scraper.\n\n### Forced repositories\n\n#### Type: Attribute\n\n`server.forced_repositories`\n\n#### Returns\n\nA list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.\n\n## Repository\n\nA repository object, representing a single repository on a scraped server.\n\n````python\nservers = scrape(...)\nrepo_one = servers[0].repositories[0]\n````\n\n### Name\n\n#### Type: Attribute\n\n`repo_one.name`\n\n#### Returns\n\nThe fully qualified name of the repository.\n\n### Server\n\n#### Type: Attribute\n\n`repo_one.server`\n\n#### Returns\n\nThe server object to which the repository belongs.\n\n### Path\n\n#### Type: Attribute\n\n`repo_one.path`\n\n#### Returns\n\nThe path for the repository on the server. May differ from the name. To get a complete URL, one can do:\n\n`url = \"http://\" + repo_one.server.name + repo_one.path`\n\n### Status attributes\n\nThese attributes are populated from `cvmfs_status.json`:\n\n| Attribute | Value |\n| --- | --- |\n| last_gc | Timestamp of last garbage collection |\n| last_snapshot | Timestamp of the last snapshot |\n\nInformation from `.cvmfspublished` is also provided. For explanations for these keys, please see CVMFS' [official documentation](https://cvmfs.readthedocs.io/en/stable/cpt-details.html). The field value in the table is the field key from `.cvmfspublished`.\n\n| Attribute |\u00a0Field |\n| --- | --- |\n| alternative_name | A\u00a0|\n| full_name | N |\n| is_garbage_collectable | G |\n| metadata_cryptographic_hash | M |\n| micro_cataogues | L |\n| reflog_checksum_cryptographic_hash | Y |\n| revision_timestamp | T |\n| root_catalogue_ttl | D |\n| root_cryptographic_hash | C |\n| root_size | B |\n| root_path_hash |\u00a0R\u00a0|\n| signature | The end signature blob |\n| signing_certificate_cryptographic_hash | X |\n| tag_history_cryptographic_hash | H |\n",
    "bugtrack_url": null,
    "license": "GPLv2",
    "summary": "Scrape metadata from CVMFS Stratum servers.",
    "version": "0.0.3",
    "project_urls": {
        "Documentation": "https://github.com/eessi/cvmfs-server-scraper",
        "Homepage": "https://github.com/eessi/cvmfs-server-scraper",
        "Repository": "https://github.com/eessi/cvmfs-server-scraper"
    },
    "split_keywords": [
        "cvmfs",
        "scrape",
        "eessi"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "387697629acab1a6398800fb4c9f606964eead93d494af08bd8088ad352c285b",
                "md5": "bf9a8ffeda597295f3ee299ca3edf100",
                "sha256": "ecb4160ff9b3c8ae19941068dd0eabc4f4ce1b1be8d4272d33ce43faf340a923"
            },
            "downloads": -1,
            "filename": "cvmfs_server_scraper-0.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bf9a8ffeda597295f3ee299ca3edf100",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 45042,
            "upload_time": "2024-01-21T21:17:11",
            "upload_time_iso_8601": "2024-01-21T21:17:11.938413Z",
            "url": "https://files.pythonhosted.org/packages/38/76/97629acab1a6398800fb4c9f606964eead93d494af08bd8088ad352c285b/cvmfs_server_scraper-0.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "664d45db8e4a37f4376bddbce08f8cf0e0bf153f21fbfe3ae5aabacc6852182c",
                "md5": "22245b73d26c262d3d836afa5b9a4728",
                "sha256": "608a3c6aaa0747ac10abc79b3997995825f9cdd0bc911f945c9855587b9e53fb"
            },
            "downloads": -1,
            "filename": "cvmfs_server_scraper-0.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "22245b73d26c262d3d836afa5b9a4728",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 31852,
            "upload_time": "2024-01-21T21:17:13",
            "upload_time_iso_8601": "2024-01-21T21:17:13.889159Z",
            "url": "https://files.pythonhosted.org/packages/66/4d/45db8e4a37f4376bddbce08f8cf0e0bf153f21fbfe3ae5aabacc6852182c/cvmfs_server_scraper-0.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-21 21:17:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "eessi",
    "github_project": "cvmfs-server-scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "cvmfs-server-scraper"
}

Terje Kvernes