datacite-websnap


Namedatacite-websnap JSON
Version 1.0.3 PyPI version JSON
download
home_pageNone
SummaryCLI tool that bulk exports DataCite metadata records for a specific repository to an S3 bucket.
upload_time2025-08-27 09:38:52
maintainerNone
docs_urlNone
authorRebecca Buchholz
requires_python>=3.11
licenseMIT
keywords s3 boto3 boto3 api backup aws aws sdk aws sdk for python datacite
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # datacite-websnap

<div>
    <img alt="Supported Python Versions" src="https://img.shields.io/badge/python-3.11%20|%203.12%20|%203.13-blue">
     <a href="https://pypi.org/project/datacite-websnap" target="_blank">
        <img alt="PyPI - Version" src="https://img.shields.io/pypi/v/datacite-websnap">
    </a>
    <a href="https://github.com/EnviDat/datacite-websnap/blob/main/LICENSE" target="_blank">
      <img alt="License" src="https://img.shields.io/pypi/l/websnap?color=%232780C1">
    </a>
    <img alt="Code Style - ruff" src="https://img.shields.io/badge/style-ruff-41B5BE?style=flat">
</div>

### CLI tool that bulk exports DataCite metadata records for a specific repository to an S3 bucket. 
#### Also supports exporting repository records to a local machine.

---


## Purpose

`datacite-websnap` was developed to facilitate interoperability between the data platforms of the ETH research institutions in Switzerland. 

`datacite-websnap` empowers research institutions to share their DataCite metadata records by exporting the records to publicly accessible S3 cloud storage.  


## Installation

```bash
pip install datacite-websnap
```


## Terminal Documentation

To access CLI documentation:
```bash
datacite-websnap --help
```

To access more detailed documentation for the `export` command:
```bash
datacite-websnap export --help
```

## CLI Options

<details>
  <summary>Click to unfold</summary>

### Command: `export`

Bulk export DataCite XML metadata records that correspond to the records for a particular DataCite repository and/or DOI prefix.

The default behavior is to export DataCite XML records to an S3 bucket but command also supports exporting the records to a local machine.

| Option             | Default                    | Description                                                                                                                                                                                                                                                                                                                                           |
|--------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `--doi-prefix`     | `None`                     | <ul><li>DataCite DOI prefix used to filter results</li><li>Accepts single or multiple prefix arguments</li><li>*Example*: `--doi-prefix 10.16904 --doi-prefix 10.25678`</li></ul>                                                                                                                                                                     |
| `--client-id`      | `None`                     | <ul><li>DataCite repository account ID used to filter results</li><li>*Example*: `--client-id ethz.wsl`</li></ul>                                                                                                                                                                                                                                     |
| `--destination`    | `S3`                       | <ul><li>Export destination for the DataCite XML records</li><li>`S3` (default) for an S3 bucket</li><li>`local` for local file system</li></ul>                                                                                                                                                                                                       |
| `--bucket`         | `None`                     | <ul><li>Name of S3 bucket that DataCite XML records (as S3 objects) will be written in</li><li>*Example*: `--bucket opendataswiss`</li><ul>                                                                                                                                                                                                           |
| `--key-prefix`     | `None`                     | <ul><li>Optional key prefix for objects in S3 bucket</li><li>If omitted then objects are written in S3 bucket without a prefix</li><li>*Example*: `--key-prefix wsl`</li></ul>                                                                                                                                                                        |
| `--directory-path` | `None`                     | <ul><li>Only used if exporting to `local` destination<li>Path of the local directory that DataCite XML records will be written in </li></ul>                                                                                                                                                                                                          |
| `--file-logs`      | `False`                    | <ul><li>Enables logging info messages and errors to a file log</li></ul>                                                                                                                                                                                                                                                                              |
| `--log-level`      | `INFO`                     | <ul><li>Level to use for logging if using `--file-logs` option</li><li>Default value is `INFO`</li><li>Valid logging levels are `DEBUG`, `INFO`, `WARNING`, `ERROR`, or `CRITICAL`</li><li><a href="https://docs.python.org/3/library/logging.html#logging-levels" target="_blank">Click here to learn more about Python logging levels</a></li></ul> |
| `--early-exit`     | `False`                    | <ul><li>If enabled then terminates program immediately after export error occurs</li><li>Default value is `False` (not enabled)</li><li>If `False` then only logs export error and continues to try to export other DataCite XML records returned by search query</li></ul>                                                                           |
| `--api-url`        | `https://api.datacite.org` | <ul><li>DataCite API base URL used for queries</li><li>Can also be set using a DataCite API configuration variable</li></ul>                                                                                                                                                                                                                          |
| `--page-size`      | `250`                      | <ul><li>Number of records returned per page of DataCite API response using pagination</li><li>Can also be set using a DataCite API configuration variable</li></ul>                                                                                                                                                                                   |

</details>

## DataCite Filters

<details>
  <summary>
  Click to unfold
  </summary>

Repository account ID and DOI prefix are the supported filters used to select DataCite records that will be exported. 

The filters can be applied for both S3 bucket and local machine usage.  

### Repository Account ID

_Please note that applying this filter will bulk export ALL records for the specified repository account ID!_

Repositories with records on DataCite each have their own DataCite repository account ID.

To confirm you have the correct repository ID you can call the [DataCite API client endpoint](https://support.datacite.org/reference/get_clients-id). 

If you do not know the repository ID but do know a specific DOI that belongs to the repository:
1. Navigate to [DataCite Commons](https://commons.datacite.org/)
2. Enter the DOI in the search box. For example: 10.16904/envidat.576
3. Click on the record and then click "Download Metadata", select "DataCite JSON"
4. The repository account ID is the value for `"clientId"`. For DOI 10.16904/envidat.576 the `"clientId"` value is `"ethz.wsl"`.

Example usage as a command line argument: `--client-id ethz.wsl`

### DOI Prefix

_Please note that applying this filter will bulk export ALL records for the specified DOI prefix!_

Records can also be exported by their DOI prefix. 

The `--doi-prefix` argument accepts single or multiple prefix arguments.

Example usage as a command line argument: `--doi-prefix 10.16904 --doi-prefix 10.25678`

It can also be combined with the `--client-id` argument.

</details>


## Usage: S3 Bucket

<details>
  <summary>
  Click to unfold
  </summary>

Utilizes the AWS SDK for Python (Boto3) to export DataCite XML metadata records for a specific repository and/or DOI prefix as objects in an S3 bucket. 

### Environment Variables 

The environment variables listed below are **required** to export records to an S3 bucket.

| Environment Variable    | Description                              |
|-------------------------|------------------------------------------|
| `ENDPOINT_URL`          | URL to use for the constructed S3 client |
| `AWS_ACCESS_KEY_ID`     | AWS access key ID                        |
| `AWS_SECRET_ACCESS_KEY` | AWS secret access key                    |


Supports setting environment variables in a `.env` file. 

The `.env` file **must** be located in the directory where the CLI is being executed.

For example, if you are running the program from `my-drive/cli-tools/datacite-websnap` then the `.env` file **must** be in that directory.

Example `.env` file:

```
ENDPOINT_URL=https://dreamycloud.com
AWS_ACCESS_KEY_ID=1234567abcdefg
AWS_SECRET_ACCESS_KEY=hijklmn1234567
```

### Examples

To export the records to an S3 bucket:
- `--bucket` option **must** be assigned to an existing S3 bucket

#### Basic Usage

- Return all DataCite records for the EnviDat repository (using client-id `ethz.wsl`)
- Write XML records to a bucket called "opendataswiss" 

```bash
datacite-websnap export --client-id ethz.wsl --bucket opendataswiss
```

#### Advanced Usage

- Return all DataCite records for the EnviDat repository (using client-id `ethz.wsl`)
- Write XML records to a bucket called "opendataswiss" 
- Use key prefix `wsl`
- Enable logging to a file

```bash
datacite-websnap export --client-id ethz.wsl --bucket opendataswiss --key-prefix wsl --file-logs
```

</details>



## Usage: Local Machine

<details>
  <summary>
  Click to unfold
  </summary>

Export DataCite XML metadata records for a specific repository and/or DOI prefix to a local machine. 

To write the records locally:
- `--destination` option **must** be assigned to `local`
- `--directory-path` option **must** be assigned to a local existing directory 

### Example

- Return all DataCite records for the EnviDat repository (using client-id `ethz.wsl`)
- Write XML records locally
- Write XML records to a directory called "opendata/wsl"

```bash
datacite-websnap export --client-id ethz.wsl --destination local --directory-path "opendata/wsl"
```

</details>


## Record Name Formatting

<details>
  <summary>
  Click to unfold
  </summary>

Exported DataCite XML records are assigned file names (or S3 keys) using the DOI that corresponds to the record.

- The "/" slash character that divides the DOI prefix and suffix are replaced with a "_" underscore character
- ".xml" is appended to the DOI as a file extension 

### Example

Record DOI: `10.16904/envidat.31`

File name (or S3 key) for exported record: `10.16904_envidat.31.xml`

</details>


## Logs

<details>
  <summary>
  Click to unfold
  </summary>

Info messages and errors are logged to the console.

Optionally log messages errors can be written to a file log called by default `"datacite-websnap.log"`.

To enable file logs the following option **must** be enabled: `--file-logs`

### Example   
```bash
datacite-websnap export --client-id ethz.wsl --bucket opendataswiss --file-logs            
```

### Configuration: Logs

Variables are assigned in `config.py` for logging configuration.

To override the default configuration variables related to logging the variables in the table below can be set in `config.py`. 

`LOG_NAME` is the name of the file log (used if the `--file-logs` option is enabled).

<a href="https://docs.python.org/3/library/logging.html#logging.basicConfig" target="_blank">Python logging basic configuration documentation.</a>

| Configuration Variable | Default                                                                               |
|------------------------|---------------------------------------------------------------------------------------|
| `LOG_NAME`             | `"datacite-websnap.log"`                                                              |
| `LOG_FORMAT`           | `"%(asctime)s \| %(levelname)s \| %(module)s.%(funcName)s:%(lineno)d \| %(message)s"` |
| `LOG_DATE_FORMAT`      | `"%Y-%m-%d %H:%M:%S"`                                                                 |


</details>


## DataCite API

<details>
  <summary>
  Click to unfold
  </summary>

`datacite-websnap` retrieves XML metadata records from the DataCite API.

Documentation for the DataCite API endpoints and pagination used in `datacite-websnap`:
- <a href="https://support.datacite.org/reference/get_dois" target="_blank">Return a list of DOIs</a>
- <a href="https://support.datacite.org/docs/pagination#method-2-cursor" target="_blank">Cursor-based pagination</a>
- <a href="https://support.datacite.org/reference/get_clients-id" target="_blank">Return a client (DataCite repository)</a>

### Configuration: DataCite API 

Default configuration variables are assigned in `config.py` for DataCite API base URL, endpoints, page size and timeout.

To override the default configuration variables related to DataCite the variables in the table below can be set in `config.py`. 

| Configuration Variable          | Default                    | Description                                                                                                      |
|---------------------------------|----------------------------|------------------------------------------------------------------------------------------------------------------|
| `TIMEOUT`                       | `32`                       | Timeout of API requests in seconds.                                                                              |
| `DATACITE_API_URL`              | `https://api.datacite.org` | DataCite base URL used for API requests.<br>Value is assigned as default to `--api-url` CLI option.              |
| `DATACITE_API_CLIENTS_ENDPOINT` | `/clients`                 | Endpoint used to retrieve client.                                                                                |
| `DATACITE_API_DOIS_ENDPOINT`    | `/dois`                    | Endpoint used to retrieve list of DOIs.                                                                          |
| `DATACITE_PAGE_SIZE`            | `250`                      | Number of DOIs retrieved per page using pagination.<br>Value is assigned as default to `--page-size` CLI option. |


</details>


## Author

<a href="http://www.linkedin.com/in/rebeccabuchholz" target="_blank">Rebecca Buchholz,</a> 
EnviDat Software Engineer

<a href="https://www.envidat.ch" target="_blank">EnviDat</a> is the environmental data 
portal of the Swiss Federal Institute for Forest, Snow and Landscape Research WSL. 


## Inspiration

<h3><a href="https://pypi.org/project/websnap" target="_blank">websnap</a></h3>

An EnviDat PyPI package that copies files retrieved from an API to an S3 bucket or a local machine.

## License

<a href="https://github.com/EnviDat/datacite-websnap/blob/main/LICENSE" target="_blank">MIT License</a>
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "datacite-websnap",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "EnviDat <envidat@wsl.ch>",
    "keywords": "S3, Boto3, boto3, API, backup, AWS, AWS SDK, AWS SDK for Python, DataCite",
    "author": "Rebecca Buchholz",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/68/a5/ed94584ce8a5254049a4548f7847e375e72983b2ea8e0f37608bce564f60/datacite_websnap-1.0.3.tar.gz",
    "platform": null,
    "description": "# datacite-websnap\n\n<div>\n    <img alt=\"Supported Python Versions\" src=\"https://img.shields.io/badge/python-3.11%20|%203.12%20|%203.13-blue\">\n     <a href=\"https://pypi.org/project/datacite-websnap\" target=\"_blank\">\n        <img alt=\"PyPI - Version\" src=\"https://img.shields.io/pypi/v/datacite-websnap\">\n    </a>\n    <a href=\"https://github.com/EnviDat/datacite-websnap/blob/main/LICENSE\" target=\"_blank\">\n      <img alt=\"License\" src=\"https://img.shields.io/pypi/l/websnap?color=%232780C1\">\n    </a>\n    <img alt=\"Code Style - ruff\" src=\"https://img.shields.io/badge/style-ruff-41B5BE?style=flat\">\n</div>\n\n### CLI tool that bulk exports DataCite metadata records for a specific repository to an S3 bucket. \n#### Also supports exporting repository records to a local machine.\n\n---\n\n\n## Purpose\n\n`datacite-websnap` was developed to facilitate interoperability between the data platforms of the ETH research institutions in Switzerland. \n\n`datacite-websnap` empowers research institutions to share their DataCite metadata records by exporting the records to publicly accessible S3 cloud storage.  \n\n\n## Installation\n\n```bash\npip install datacite-websnap\n```\n\n\n## Terminal Documentation\n\nTo access CLI documentation:\n```bash\ndatacite-websnap --help\n```\n\nTo access more detailed documentation for the `export` command:\n```bash\ndatacite-websnap export --help\n```\n\n## CLI Options\n\n<details>\n  <summary>Click to unfold</summary>\n\n### Command: `export`\n\nBulk export DataCite XML metadata records that correspond to the records for a particular DataCite repository and/or DOI prefix.\n\nThe default behavior is to export DataCite XML records to an S3 bucket but command also supports exporting the records to a local machine.\n\n| Option             | Default                    | Description                                                                                                                                                                                                                                                                                                                                           |\n|--------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `--doi-prefix`     | `None`                     | <ul><li>DataCite DOI prefix used to filter results</li><li>Accepts single or multiple prefix arguments</li><li>*Example*: `--doi-prefix 10.16904 --doi-prefix 10.25678`</li></ul>                                                                                                                                                                     |\n| `--client-id`      | `None`                     | <ul><li>DataCite repository account ID used to filter results</li><li>*Example*: `--client-id ethz.wsl`</li></ul>                                                                                                                                                                                                                                     |\n| `--destination`    | `S3`                       | <ul><li>Export destination for the DataCite XML records</li><li>`S3` (default) for an S3 bucket</li><li>`local` for local file system</li></ul>                                                                                                                                                                                                       |\n| `--bucket`         | `None`                     | <ul><li>Name of S3 bucket that DataCite XML records (as S3 objects) will be written in</li><li>*Example*: `--bucket opendataswiss`</li><ul>                                                                                                                                                                                                           |\n| `--key-prefix`     | `None`                     | <ul><li>Optional key prefix for objects in S3 bucket</li><li>If omitted then objects are written in S3 bucket without a prefix</li><li>*Example*: `--key-prefix wsl`</li></ul>                                                                                                                                                                        |\n| `--directory-path` | `None`                     | <ul><li>Only used if exporting to `local` destination<li>Path of the local directory that DataCite XML records will be written in </li></ul>                                                                                                                                                                                                          |\n| `--file-logs`      | `False`                    | <ul><li>Enables logging info messages and errors to a file log</li></ul>                                                                                                                                                                                                                                                                              |\n| `--log-level`      | `INFO`                     | <ul><li>Level to use for logging if using `--file-logs` option</li><li>Default value is `INFO`</li><li>Valid logging levels are `DEBUG`, `INFO`, `WARNING`, `ERROR`, or `CRITICAL`</li><li><a href=\"https://docs.python.org/3/library/logging.html#logging-levels\" target=\"_blank\">Click here to learn more about Python logging levels</a></li></ul> |\n| `--early-exit`     | `False`                    | <ul><li>If enabled then terminates program immediately after export error occurs</li><li>Default value is `False` (not enabled)</li><li>If `False` then only logs export error and continues to try to export other DataCite XML records returned by search query</li></ul>                                                                           |\n| `--api-url`        | `https://api.datacite.org` | <ul><li>DataCite API base URL used for queries</li><li>Can also be set using a DataCite API configuration variable</li></ul>                                                                                                                                                                                                                          |\n| `--page-size`      | `250`                      | <ul><li>Number of records returned per page of DataCite API response using pagination</li><li>Can also be set using a DataCite API configuration variable</li></ul>                                                                                                                                                                                   |\n\n</details>\n\n## DataCite Filters\n\n<details>\n  <summary>\n  Click to unfold\n  </summary>\n\nRepository account ID and DOI prefix are the supported filters used to select DataCite records that will be exported. \n\nThe filters can be applied for both S3 bucket and local machine usage.  \n\n### Repository Account ID\n\n_Please note that applying this filter will bulk export ALL records for the specified repository account ID!_\n\nRepositories with records on DataCite each have their own DataCite repository account ID.\n\nTo confirm you have the correct repository ID you can call the [DataCite API client endpoint](https://support.datacite.org/reference/get_clients-id). \n\nIf you do not know the repository ID but do know a specific DOI that belongs to the repository:\n1. Navigate to [DataCite Commons](https://commons.datacite.org/)\n2. Enter the DOI in the search box. For example: 10.16904/envidat.576\n3. Click on the record and then click \"Download Metadata\", select \"DataCite JSON\"\n4. The repository account ID is the value for `\"clientId\"`. For DOI 10.16904/envidat.576 the `\"clientId\"` value is `\"ethz.wsl\"`.\n\nExample usage as a command line argument: `--client-id ethz.wsl`\n\n### DOI Prefix\n\n_Please note that applying this filter will bulk export ALL records for the specified DOI prefix!_\n\nRecords can also be exported by their DOI prefix. \n\nThe `--doi-prefix` argument accepts single or multiple prefix arguments.\n\nExample usage as a command line argument: `--doi-prefix 10.16904 --doi-prefix 10.25678`\n\nIt can also be combined with the `--client-id` argument.\n\n</details>\n\n\n## Usage: S3 Bucket\n\n<details>\n  <summary>\n  Click to unfold\n  </summary>\n\nUtilizes the AWS SDK for Python (Boto3) to export DataCite XML metadata records for a specific repository and/or DOI prefix as objects in an S3 bucket. \n\n### Environment Variables \n\nThe environment variables listed below are **required** to export records to an S3 bucket.\n\n| Environment Variable    | Description                              |\n|-------------------------|------------------------------------------|\n| `ENDPOINT_URL`          | URL to use for the constructed S3 client |\n| `AWS_ACCESS_KEY_ID`     | AWS access key ID                        |\n| `AWS_SECRET_ACCESS_KEY` | AWS secret access key                    |\n\n\nSupports setting environment variables in a `.env` file. \n\nThe `.env` file **must** be located in the directory where the CLI is being executed.\n\nFor example, if you are running the program from `my-drive/cli-tools/datacite-websnap` then the `.env` file **must** be in that directory.\n\nExample `.env` file:\n\n```\nENDPOINT_URL=https://dreamycloud.com\nAWS_ACCESS_KEY_ID=1234567abcdefg\nAWS_SECRET_ACCESS_KEY=hijklmn1234567\n```\n\n### Examples\n\nTo export the records to an S3 bucket:\n- `--bucket` option **must** be assigned to an existing S3 bucket\n\n#### Basic Usage\n\n- Return all DataCite records for the EnviDat repository (using client-id `ethz.wsl`)\n- Write XML records to a bucket called \"opendataswiss\" \n\n```bash\ndatacite-websnap export --client-id ethz.wsl --bucket opendataswiss\n```\n\n#### Advanced Usage\n\n- Return all DataCite records for the EnviDat repository (using client-id `ethz.wsl`)\n- Write XML records to a bucket called \"opendataswiss\" \n- Use key prefix `wsl`\n- Enable logging to a file\n\n```bash\ndatacite-websnap export --client-id ethz.wsl --bucket opendataswiss --key-prefix wsl --file-logs\n```\n\n</details>\n\n\n\n## Usage: Local Machine\n\n<details>\n  <summary>\n  Click to unfold\n  </summary>\n\nExport DataCite XML metadata records for a specific repository and/or DOI prefix to a local machine. \n\nTo write the records locally:\n- `--destination` option **must** be assigned to `local`\n- `--directory-path` option **must** be assigned to a local existing directory \n\n### Example\n\n- Return all DataCite records for the EnviDat repository (using client-id `ethz.wsl`)\n- Write XML records locally\n- Write XML records to a directory called \"opendata/wsl\"\n\n```bash\ndatacite-websnap export --client-id ethz.wsl --destination local --directory-path \"opendata/wsl\"\n```\n\n</details>\n\n\n## Record Name Formatting\n\n<details>\n  <summary>\n  Click to unfold\n  </summary>\n\nExported DataCite XML records are assigned file names (or S3 keys) using the DOI that corresponds to the record.\n\n- The \"/\" slash character that divides the DOI prefix and suffix are replaced with a \"_\" underscore character\n- \".xml\" is appended to the DOI as a file extension \n\n### Example\n\nRecord DOI: `10.16904/envidat.31`\n\nFile name (or S3 key) for exported record: `10.16904_envidat.31.xml`\n\n</details>\n\n\n## Logs\n\n<details>\n  <summary>\n  Click to unfold\n  </summary>\n\nInfo messages and errors are logged to the console.\n\nOptionally log messages errors can be written to a file log called by default `\"datacite-websnap.log\"`.\n\nTo enable file logs the following option **must** be enabled: `--file-logs`\n\n### Example   \n```bash\ndatacite-websnap export --client-id ethz.wsl --bucket opendataswiss --file-logs            \n```\n\n### Configuration: Logs\n\nVariables are assigned in `config.py` for logging configuration.\n\nTo override the default configuration variables related to logging the variables in the table below can be set in `config.py`. \n\n`LOG_NAME` is the name of the file log (used if the `--file-logs` option is enabled).\n\n<a href=\"https://docs.python.org/3/library/logging.html#logging.basicConfig\" target=\"_blank\">Python logging basic configuration documentation.</a>\n\n| Configuration Variable | Default                                                                               |\n|------------------------|---------------------------------------------------------------------------------------|\n| `LOG_NAME`             | `\"datacite-websnap.log\"`                                                              |\n| `LOG_FORMAT`           | `\"%(asctime)s \\| %(levelname)s \\| %(module)s.%(funcName)s:%(lineno)d \\| %(message)s\"` |\n| `LOG_DATE_FORMAT`      | `\"%Y-%m-%d %H:%M:%S\"`                                                                 |\n\n\n</details>\n\n\n## DataCite API\n\n<details>\n  <summary>\n  Click to unfold\n  </summary>\n\n`datacite-websnap` retrieves XML metadata records from the DataCite API.\n\nDocumentation for the DataCite API endpoints and pagination used in `datacite-websnap`:\n- <a href=\"https://support.datacite.org/reference/get_dois\" target=\"_blank\">Return a list of DOIs</a>\n- <a href=\"https://support.datacite.org/docs/pagination#method-2-cursor\" target=\"_blank\">Cursor-based pagination</a>\n- <a href=\"https://support.datacite.org/reference/get_clients-id\" target=\"_blank\">Return a client (DataCite repository)</a>\n\n### Configuration: DataCite API \n\nDefault configuration variables are assigned in `config.py` for DataCite API base URL, endpoints, page size and timeout.\n\nTo override the default configuration variables related to DataCite the variables in the table below can be set in `config.py`. \n\n| Configuration Variable          | Default                    | Description                                                                                                      |\n|---------------------------------|----------------------------|------------------------------------------------------------------------------------------------------------------|\n| `TIMEOUT`                       | `32`                       | Timeout of API requests in seconds.                                                                              |\n| `DATACITE_API_URL`              | `https://api.datacite.org` | DataCite base URL used for API requests.<br>Value is assigned as default to `--api-url` CLI option.              |\n| `DATACITE_API_CLIENTS_ENDPOINT` | `/clients`                 | Endpoint used to retrieve client.                                                                                |\n| `DATACITE_API_DOIS_ENDPOINT`    | `/dois`                    | Endpoint used to retrieve list of DOIs.                                                                          |\n| `DATACITE_PAGE_SIZE`            | `250`                      | Number of DOIs retrieved per page using pagination.<br>Value is assigned as default to `--page-size` CLI option. |\n\n\n</details>\n\n\n## Author\n\n<a href=\"http://www.linkedin.com/in/rebeccabuchholz\" target=\"_blank\">Rebecca Buchholz,</a> \nEnviDat Software Engineer\n\n<a href=\"https://www.envidat.ch\" target=\"_blank\">EnviDat</a> is the environmental data \nportal of the Swiss Federal Institute for Forest, Snow and Landscape Research WSL. \n\n\n## Inspiration\n\n<h3><a href=\"https://pypi.org/project/websnap\" target=\"_blank\">websnap</a></h3>\n\nAn EnviDat PyPI package that copies files retrieved from an API to an S3 bucket or a local machine.\n\n## License\n\n<a href=\"https://github.com/EnviDat/datacite-websnap/blob/main/LICENSE\" target=\"_blank\">MIT License</a>",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "CLI tool that bulk exports DataCite metadata records for a specific repository to an S3 bucket.",
    "version": "1.0.3",
    "project_urls": {
        "changelog": "https://github.com/EnviDat/datacite-websnap/blob/main/CHANGELOG.md",
        "documentation": "https://github.com/EnviDat/datacite-websnap/blob/main/README.md",
        "repository": "https://github.com/EnviDat/datacite-websnap"
    },
    "split_keywords": [
        "s3",
        " boto3",
        " boto3",
        " api",
        " backup",
        " aws",
        " aws sdk",
        " aws sdk for python",
        " datacite"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8b87a5e6d5a9d938330e585197bf3290149d9ff02628b1e631f345223f63f58b",
                "md5": "acaf7025778ff57af86d1c4687738636",
                "sha256": "827c4e1eb8125ba410fefabb65fb2d8dcc6f32d9080c6af7a60107bcbd51103a"
            },
            "downloads": -1,
            "filename": "datacite_websnap-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "acaf7025778ff57af86d1c4687738636",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 16459,
            "upload_time": "2025-08-27T09:38:51",
            "upload_time_iso_8601": "2025-08-27T09:38:51.732780Z",
            "url": "https://files.pythonhosted.org/packages/8b/87/a5e6d5a9d938330e585197bf3290149d9ff02628b1e631f345223f63f58b/datacite_websnap-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "68a5ed94584ce8a5254049a4548f7847e375e72983b2ea8e0f37608bce564f60",
                "md5": "b3c06202aa44613aa555f9ba45be59af",
                "sha256": "00d2dc7f76715a96866055c6dcf13f9b0e1623ad7ab7acf738ba0cbf8a18be31"
            },
            "downloads": -1,
            "filename": "datacite_websnap-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "b3c06202aa44613aa555f9ba45be59af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 20440,
            "upload_time": "2025-08-27T09:38:52",
            "upload_time_iso_8601": "2025-08-27T09:38:52.734241Z",
            "url": "https://files.pythonhosted.org/packages/68/a5/ed94584ce8a5254049a4548f7847e375e72983b2ea8e0f37608bce564f60/datacite_websnap-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-27 09:38:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "EnviDat",
    "github_project": "datacite-websnap",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "tox": true,
    "lcname": "datacite-websnap"
}
        
Elapsed time: 1.50649s