semanticscholar-datasetapi


Namesemanticscholar-datasetapi JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/k1000dai/semanticscholar-datasetapi
SummaryA Python wrapper for the Semantic Scholar Dataset API that provides easy access to academic papers, citations, and related data
upload_time2025-02-19 08:07:39
maintainerNone
docs_urlNone
authorKohei Sendai
requires_python>=3.7
licenseNone
keywords semantic scholar dataset academic papers citations research api
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Semantic Scholar Dataset API Wrapper

A Python wrapper for the Semantic Scholar Dataset API that provides easy access to academic papers, citations, and related data.

## Description

This library provides a simple interface to interact with the Semantic Scholar Dataset API, allowing you to:
- Access various academic datasets (papers, citations, authors, etc.)
- Download dataset releases
- Get diffs between releases
- Manage large dataset downloads efficiently

## Installation

```bash
pip install semanticscholar-datasetapi
```

## Requirements

- Python 3.7+
- requests

## Basic Usage

```python
from semanticscholar_datasetapi import SemanticScholarDataset
import os

# Initialize the client with your API key
api_key = os.getenv("SEMANTIC_SCHOLAR_API_KEY")
client = SemanticScholarDataset(api_key=api_key)

# List available datasets
datasets = client.get_available_datasets()
print(datasets)

# Get latest release information
releases = client.get_available_releases()
print(releases)

# Download latest release of a specific dataset
client.download_latest_release(datasetname="papers", save_dir="downloads")

# Get diffs between releases
client.download_diffs(
    start_release_id="2024-12-31",
    end_release_id="latest",
    datasetname="papers",
    save_dir="diffs"
)
```

## Available Datasets

The API provides access to the following datasets:
- abstracts
- authors
- citations
- embeddings-specter_v1
- embeddings-specter_v2
- paper-ids
- papers
- publication-venues
- s2orc
- tldrs

## API Reference

### Main Methods

#### `SemanticScholarDataset(api_key: Optional[str] = None)`
Initialize the API client with an optional API key.

- `api_key`: API key for accessing the Semantic Scholar Dataset API. Required for most operations.

#### `get_available_releases() -> list`
Get a list of all available dataset releases.

#### `get_available_datasets() -> list`
Get a list of all available datasets.

#### `get_download_urls_from_release(datasetname: Optional[str] = None, release_id: str = "latest") -> Dict[str, Any]`
Get download URLs for a specific release of a dataset.

- `datasetname`: Name of the dataset to get URLs for
- `release_id`: ID of the release (defaults to "latest")

#### `get_download_urls_from_diffs(start_release_id: Optional[str], end_release_id: str = "latest", datasetname: Optional[str]) -> Dict[str, Any]`
Get download URLs for differences between two releases.

- `start_release_id`: Starting release ID
- `end_release_id`: Ending release ID (defaults to "latest")
- `datasetname`: Name of the dataset to get diff URLs for

#### `download_latest_release(datasetname: Optional[str] = None, save_dir: Optional[str] = None, range: Optional[range] = None) -> None`
Download the latest release of a specific dataset.

- `datasetname`: Name of the dataset to download
- `save_dir`: Directory to save downloaded files (defaults to current directory)
- `download_range`: Optional range of indices to download from the list of files

#### `download_past_release(release_id: str, datasetname: Optional[str] = None, save_dir: Optional[str] = None, range: Optional[range] = None) -> None`
Download a specific past release of a dataset.

- `release_id`: ID of the release to download
- `datasetname`: Name of the dataset to download
- `save_dir`: Directory to save downloaded files (defaults to current directory)
- `download_range`: Optional range of indices to download from the list of files

#### `download_diffs(start_release_id: str, end_release_id: str, datasetname: Optional[str] = None, save_dir: Optional[str] = None) -> None`
Download the differences between two releases of a dataset.

- `start_release_id`: Starting release ID
- `end_release_id`: Ending release ID
- `datasetname`: Name of the dataset to download diffs for
- `save_dir`: Directory to save downloaded files (defaults to current directory)

### Error Handling

The library includes comprehensive error handling for:
- Invalid dataset names
- Missing API keys
- Network errors
- Invalid release IDs

### File Naming

Downloaded files follow these naming patterns:
- Latest release: `{datasetname}_latest_{index}.json.gz`
- Past release: `{datasetname}_{release_id}_{index}.json.gz`
- Diffs: 
  - Updates: `{datasetname}_{from_release}_{to_release}_update_{index}.json.gz`
  - Deletes: `{datasetname}_{from_release}_{to_release}_delete_{index}.json.gz`

## Environment Variables

- `SEMANTIC_SCHOLAR_API_KEY`: Your API key for the Semantic Scholar Dataset API

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Acknowledgments

- Semantic Scholar for providing the Dataset API
- The academic community for maintaining and contributing to the datasets

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/k1000dai/semanticscholar-datasetapi",
    "name": "semanticscholar-datasetapi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "semantic scholar, dataset, academic papers, citations, research, api",
    "author": "Kohei Sendai",
    "author_email": "your.email@example.com",
    "download_url": "https://files.pythonhosted.org/packages/7c/7b/f9330fa576028da50199ec8ff400be965801bf9d760c8c509185c1b0c7fc/semanticscholar_datasetapi-0.1.2.tar.gz",
    "platform": null,
    "description": "# Semantic Scholar Dataset API Wrapper\n\nA Python wrapper for the Semantic Scholar Dataset API that provides easy access to academic papers, citations, and related data.\n\n## Description\n\nThis library provides a simple interface to interact with the Semantic Scholar Dataset API, allowing you to:\n- Access various academic datasets (papers, citations, authors, etc.)\n- Download dataset releases\n- Get diffs between releases\n- Manage large dataset downloads efficiently\n\n## Installation\n\n```bash\npip install semanticscholar-datasetapi\n```\n\n## Requirements\n\n- Python 3.7+\n- requests\n\n## Basic Usage\n\n```python\nfrom semanticscholar_datasetapi import SemanticScholarDataset\nimport os\n\n# Initialize the client with your API key\napi_key = os.getenv(\"SEMANTIC_SCHOLAR_API_KEY\")\nclient = SemanticScholarDataset(api_key=api_key)\n\n# List available datasets\ndatasets = client.get_available_datasets()\nprint(datasets)\n\n# Get latest release information\nreleases = client.get_available_releases()\nprint(releases)\n\n# Download latest release of a specific dataset\nclient.download_latest_release(datasetname=\"papers\", save_dir=\"downloads\")\n\n# Get diffs between releases\nclient.download_diffs(\n    start_release_id=\"2024-12-31\",\n    end_release_id=\"latest\",\n    datasetname=\"papers\",\n    save_dir=\"diffs\"\n)\n```\n\n## Available Datasets\n\nThe API provides access to the following datasets:\n- abstracts\n- authors\n- citations\n- embeddings-specter_v1\n- embeddings-specter_v2\n- paper-ids\n- papers\n- publication-venues\n- s2orc\n- tldrs\n\n## API Reference\n\n### Main Methods\n\n#### `SemanticScholarDataset(api_key: Optional[str] = None)`\nInitialize the API client with an optional API key.\n\n- `api_key`: API key for accessing the Semantic Scholar Dataset API. Required for most operations.\n\n#### `get_available_releases() -> list`\nGet a list of all available dataset releases.\n\n#### `get_available_datasets() -> list`\nGet a list of all available datasets.\n\n#### `get_download_urls_from_release(datasetname: Optional[str] = None, release_id: str = \"latest\") -> Dict[str, Any]`\nGet download URLs for a specific release of a dataset.\n\n- `datasetname`: Name of the dataset to get URLs for\n- `release_id`: ID of the release (defaults to \"latest\")\n\n#### `get_download_urls_from_diffs(start_release_id: Optional[str], end_release_id: str = \"latest\", datasetname: Optional[str]) -> Dict[str, Any]`\nGet download URLs for differences between two releases.\n\n- `start_release_id`: Starting release ID\n- `end_release_id`: Ending release ID (defaults to \"latest\")\n- `datasetname`: Name of the dataset to get diff URLs for\n\n#### `download_latest_release(datasetname: Optional[str] = None, save_dir: Optional[str] = None, range: Optional[range] = None) -> None`\nDownload the latest release of a specific dataset.\n\n- `datasetname`: Name of the dataset to download\n- `save_dir`: Directory to save downloaded files (defaults to current directory)\n- `download_range`: Optional range of indices to download from the list of files\n\n#### `download_past_release(release_id: str, datasetname: Optional[str] = None, save_dir: Optional[str] = None, range: Optional[range] = None) -> None`\nDownload a specific past release of a dataset.\n\n- `release_id`: ID of the release to download\n- `datasetname`: Name of the dataset to download\n- `save_dir`: Directory to save downloaded files (defaults to current directory)\n- `download_range`: Optional range of indices to download from the list of files\n\n#### `download_diffs(start_release_id: str, end_release_id: str, datasetname: Optional[str] = None, save_dir: Optional[str] = None) -> None`\nDownload the differences between two releases of a dataset.\n\n- `start_release_id`: Starting release ID\n- `end_release_id`: Ending release ID\n- `datasetname`: Name of the dataset to download diffs for\n- `save_dir`: Directory to save downloaded files (defaults to current directory)\n\n### Error Handling\n\nThe library includes comprehensive error handling for:\n- Invalid dataset names\n- Missing API keys\n- Network errors\n- Invalid release IDs\n\n### File Naming\n\nDownloaded files follow these naming patterns:\n- Latest release: `{datasetname}_latest_{index}.json.gz`\n- Past release: `{datasetname}_{release_id}_{index}.json.gz`\n- Diffs: \n  - Updates: `{datasetname}_{from_release}_{to_release}_update_{index}.json.gz`\n  - Deletes: `{datasetname}_{from_release}_{to_release}_delete_{index}.json.gz`\n\n## Environment Variables\n\n- `SEMANTIC_SCHOLAR_API_KEY`: Your API key for the Semantic Scholar Dataset API\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Acknowledgments\n\n- Semantic Scholar for providing the Dataset API\n- The academic community for maintaining and contributing to the datasets\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python wrapper for the Semantic Scholar Dataset API that provides easy access to academic papers, citations, and related data",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/k1000dai/semanticscholar-datasetapi/issues",
        "Documentation": "https://github.com/k1000dai/semanticscholar-datasetapi#readme",
        "Homepage": "https://github.com/k1000dai/semanticscholar-datasetapi",
        "Source Code": "https://github.com/k1000dai/semanticscholar-datasetapi"
    },
    "split_keywords": [
        "semantic scholar",
        " dataset",
        " academic papers",
        " citations",
        " research",
        " api"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6fe985ee3f2042f25b0841c3b5c51ee427ca85b4ff114a785ed34615497fd7d4",
                "md5": "a9f05db8b107155d696f2643667dee3c",
                "sha256": "72446c7dc2369e85281ded7bdc6ff960c6e0a7fbe976a8a77b8b3d5b02b87758"
            },
            "downloads": -1,
            "filename": "semanticscholar_datasetapi-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a9f05db8b107155d696f2643667dee3c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 6991,
            "upload_time": "2025-02-19T08:07:37",
            "upload_time_iso_8601": "2025-02-19T08:07:37.581655Z",
            "url": "https://files.pythonhosted.org/packages/6f/e9/85ee3f2042f25b0841c3b5c51ee427ca85b4ff114a785ed34615497fd7d4/semanticscholar_datasetapi-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7c7bf9330fa576028da50199ec8ff400be965801bf9d760c8c509185c1b0c7fc",
                "md5": "5b30f410969ff6893dd3fc67b7ec0573",
                "sha256": "b221f2af3596c9074b7a5cd3989fef10509559278b0e7bcf31d3902bf42511f7"
            },
            "downloads": -1,
            "filename": "semanticscholar_datasetapi-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "5b30f410969ff6893dd3fc67b7ec0573",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 6160,
            "upload_time": "2025-02-19T08:07:39",
            "upload_time_iso_8601": "2025-02-19T08:07:39.341305Z",
            "url": "https://files.pythonhosted.org/packages/7c/7b/f9330fa576028da50199ec8ff400be965801bf9d760c8c509185c1b0c7fc/semanticscholar_datasetapi-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-19 08:07:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "k1000dai",
    "github_project": "semanticscholar-datasetapi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "semanticscholar-datasetapi"
}
        
Elapsed time: 0.42007s