tap-dune


Nametap-dune JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/blueprint-data/tap-dune
SummarySinger tap for extracting data from Dune Analytics API
upload_time2025-08-08 20:13:02
maintainerNone
docs_urlNone
authorBlueprint Data
requires_python<4.0,>=3.9
licenseApache-2.0
keywords singer tap dune analytics meltano
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # tap-dune

This is a [Singer](https://singer.io) tap that produces JSON-formatted data following the [Singer spec](https://hub.meltano.com/spec).

This tap:
- Pulls data from the [Dune Analytics API](https://dune.com/docs/api/)
- Extracts data from specified Dune queries
- Produces [Singer](https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md) formatted data following the Singer spec
- Supports incremental replication using query parameters
- Automatically infers schema from query results
 - Advertises configurable primary keys for correct upsert/dedup behavior in targets

## Installation

```bash
pip install tap-dune
```

## Configuration

### Accepted Config Options

A full list of supported settings and capabilities is available by running:

```bash
tap-dune --about
```

### Config File Setup

1. Copy the example config file:
   ```bash
   cp config.json.example config.json
   ```

2. Edit `config.json` with your settings:

```json
{
    "api_key": "YOUR_DUNE_API_KEY",
    "query_id": "YOUR_QUERY_ID",
    "performance": "medium",
    "query_parameters": [
        {
            "key": "date_from",
            "value": "2025-08-01",
            "type": "date",
            "replication_key": true,
            "replication_key_field": "day"
        }
    ]
}
```

### Configuration Fields

| Field | Required | Description |
|-------|----------|-------------|
| `api_key` | Yes | Your Dune Analytics API key |
| `query_id` | Yes | The ID of the Dune query to execute |
| `performance` | No | Query execution performance tier: 'medium' (10 credits) or 'large' (20 credits). Defaults to 'medium' |
| `query_parameters` | No | Array of parameters to pass to your Dune query |
| `schema` | No | Optional: JSON Schema definition of your query's output fields. If not provided, schema will be inferred from query results |
| `primary_keys` | No | Array of field names that uniquely identify each record. Used by targets for upsert/dedup |

#### Query Parameters

Each query parameter object can have:
- `key`: Parameter name in your Dune query
- `value`: Parameter value
- `replication_key`: Set to `true` for the parameter that should be used for incremental replication
- `replication_key_field`: The field in the query results to use for tracking replication state (required if replication_key is true)
- `type`: The data type of the parameter value. Can be one of:
  - `string` (default)
  - `integer`
  - `number`
  - `date`
  - `date-time`

#### Schema Configuration

The schema can be:
1. Automatically inferred from query results (recommended)
2. Explicitly defined in the config file

When automatically inferring the schema:
- The tap will execute the query once to get sample data
- Data types are detected based on the values in the results
- Special formats like dates and timestamps are automatically recognized
- Null values are handled by looking at other rows to determine the correct type
- If a type cannot be determined, it defaults to string

If you need to explicitly define the schema, each field should specify:
- `type`: The data type ('string', 'number', 'integer', 'boolean', 'object', 'array')
- `format` (optional): Special format for string fields (e.g., 'date', 'date-time')

When using incremental replication, the schema configuration is particularly important for the replication key field:
- The field's type in the schema determines how values are compared for incremental replication
- You can specify any type that supports ordering (string, number, integer)
- For date/time fields, you can add the appropriate format ('date' or 'date-time')

Examples of query parameter configurations with different replication key types:

1. Date-based replication (most common):
```json
{
    "api_key": "YOUR_DUNE_API_KEY",
    "query_id": "YOUR_QUERY_ID",
    "primary_keys": ["date", "source"],
    "query_parameters": [
        {
            "key": "start_date",
            "value": "2025-08-01",
            "type": "date",
            "replication_key": true
        }
    ]
}
```

2. Numeric replication (e.g., for block numbers):
```json
{
    "api_key": "YOUR_DUNE_API_KEY",
    "query_id": "YOUR_QUERY_ID",
    "query_parameters": [
        {
            "key": "min_block",
            "value": "1000000",
            "type": "integer",
            "replication_key": true
        }
    ]
}
```

3. Timestamp replication:
```json
{
    "api_key": "YOUR_DUNE_API_KEY",
    "query_id": "YOUR_QUERY_ID",
    "query_parameters": [
        {
            "key": "start_time",
            "value": "2025-08-01T00:00:00Z",
            "type": "date-time",
            "replication_key": true
        }
    ]
}
```

### Source Authentication and Authorization

1. Visit [Dune Analytics](https://dune.com)
2. Create an account and obtain an API key
3. Add the API key to your config file

## Usage

### Basic Usage

1. Generate a catalog file:
   ```bash
   tap-dune --config config.json --discover > catalog.json
   ```

2. Run the tap:
   ```bash
   tap-dune --config config.json --catalog catalog.json
   ```

### Incremental Replication

To use incremental replication:

1. Mark one of your query parameters with `"replication_key": true`
2. Ensure the parameter value is in a format that can be ordered (e.g., dates, timestamps, numbers)
3. The tap will track the last value processed and resume from there in subsequent runs

When using incremental replication, you need to configure:

1. The query parameter that will be used for filtering (`replication_key: true`)
2. The field in the query results that will be used for state tracking (`replication_key_field`)
3. The data type of the parameter (`type`)

For example, if your query:
- Takes a `date_from` parameter for filtering
- Returns records with a `day` field containing dates
- You want to use that `day` field for tracking progress

Your configuration would look like:
```json
{
    "query_parameters": [
        {
            "key": "date_from",
            "value": "2025-08-01",
            "type": "date",
            "replication_key": true,
            "replication_key_field": "day"
        }
    ]
}
```

The tap will:
1. Use `date_from` to filter the query results
2. Track the `day` field values from the results
3. Use those values to set `date_from` in subsequent runs

The parameter type can be:
- `date` or `date-time` for date-based parameters
- `integer` or `number` for numeric parameters
- `string` (default) for text parameters

### Pipeline Usage

You can easily run `tap-dune` in a pipeline using [Meltano](https://meltano.com/) or any other Singer-compatible tool.

Example with `target-jsonl`:
```bash
tap-dune --config config.json --catalog catalog.json | target-jsonl
```

When loading to a database target that performs upserts (e.g., Snowflake):

- Set `primary_keys` in the tap config to the fields that uniquely identify a row in your query output (e.g., `["date", "source"]`).
- Ensure your loader configuration (e.g., PipelineWise or Meltano target) uses the same primary keys for merge/upsert.
- For append-only behavior, leave `primary_keys` empty and configure your loader for pure inserts.

## Development

### Initialize your Development Environment

```bash
# Clone the repository
git clone https://github.com/blueprint-data/tap-dune.git
cd tap-dune

# Install Poetry
pipx install poetry

# Install dependencies
poetry install
```

### Development Workflow

This project follows [Semantic Versioning](https://semver.org/) and uses [Conventional Commits](https://www.conventionalcommits.org/) for automatic versioning.

1. Create a feature branch:
   ```bash
   git checkout -b feat/your-feature
   # or
   git checkout -b fix/your-bugfix
   ```

2. Make your changes and commit using conventional commits:
   ```bash
   # For new features
   git commit -m "feat: add new feature X"

   # For bug fixes
   git commit -m "fix: resolve issue with Y"

   # For breaking changes
   git commit -m "feat: redesign API

   BREAKING CHANGE: This changes the API interface"
   ```

   Commit types:
   - `feat`: A new feature (minor version bump)
   - `fix`: A bug fix (patch version bump)
   - `docs`: Documentation only changes
   - `style`: Changes that don't affect the code's meaning
   - `refactor`: Code change that neither fixes a bug nor adds a feature
   - `perf`: Code change that improves performance
   - `test`: Adding missing tests
   - `chore`: Changes to the build process or auxiliary tools
   - `BREAKING CHANGE`: Any change that breaks backward compatibility (major version bump)

3. Run tests:
   ```bash
   poetry run pytest
   ```

4. Create a pull request to main

### Release Process

1. Create a release branch from main:
   ```bash
   git checkout main
   git pull
   git checkout -b release
   ```

2. Push the branch:
   ```bash
   git push -u origin release
   ```

3. The release workflow will automatically:
   - Analyze commits since last release
   - Determine the next version number based on commit types:
     - `fix:` → patch version (1.0.0 → 1.0.1)
     - `feat:` → minor version (1.0.0 → 1.1.0)
     - `BREAKING CHANGE:` → major version (1.0.0 → 2.0.0)
   - Update CHANGELOG.md
   - Create a git tag with the new version
   - Create a GitHub release
   - Build and publish to PyPI

   Note: Only commits following the [Conventional Commits](https://www.conventionalcommits.org/) format will trigger version updates.

4. After successful release:
   - Create a PR from the release branch to main
   - This PR will contain all the version updates (CHANGELOG.md, version number)
   - Merge to keep main up-to-date with the latest release
   - Note: Only blueprint-data team members can merge to main

5. Clean up:
   ```bash
   git checkout main
   git pull
   git branch -d release
   ```

### Repository Permissions

This repository follows these security practices:
- Only blueprint-data team members can merge to main
- All PRs require at least one review
- All tests must pass before merging
- Branch protection rules prevent bypassing these requirements

### Testing

```bash
poetry run pytest
```

### SDK Dev Guide

See the [dev guide](https://sdk.meltano.com/en/latest/dev_guide.html) for more instructions on how to use the SDK to develop your own taps and targets.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/blueprint-data/tap-dune",
    "name": "tap-dune",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "singer, tap, dune, analytics, meltano",
    "author": "Blueprint Data",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/ee/24/db3a8cc9c56b5ce6e68c8bc5692639c580e2181c689847addb9eb313ec31/tap_dune-1.0.0.tar.gz",
    "platform": null,
    "description": "# tap-dune\n\nThis is a [Singer](https://singer.io) tap that produces JSON-formatted data following the [Singer spec](https://hub.meltano.com/spec).\n\nThis tap:\n- Pulls data from the [Dune Analytics API](https://dune.com/docs/api/)\n- Extracts data from specified Dune queries\n- Produces [Singer](https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md) formatted data following the Singer spec\n- Supports incremental replication using query parameters\n- Automatically infers schema from query results\n - Advertises configurable primary keys for correct upsert/dedup behavior in targets\n\n## Installation\n\n```bash\npip install tap-dune\n```\n\n## Configuration\n\n### Accepted Config Options\n\nA full list of supported settings and capabilities is available by running:\n\n```bash\ntap-dune --about\n```\n\n### Config File Setup\n\n1. Copy the example config file:\n   ```bash\n   cp config.json.example config.json\n   ```\n\n2. Edit `config.json` with your settings:\n\n```json\n{\n    \"api_key\": \"YOUR_DUNE_API_KEY\",\n    \"query_id\": \"YOUR_QUERY_ID\",\n    \"performance\": \"medium\",\n    \"query_parameters\": [\n        {\n            \"key\": \"date_from\",\n            \"value\": \"2025-08-01\",\n            \"type\": \"date\",\n            \"replication_key\": true,\n            \"replication_key_field\": \"day\"\n        }\n    ]\n}\n```\n\n### Configuration Fields\n\n| Field | Required | Description |\n|-------|----------|-------------|\n| `api_key` | Yes | Your Dune Analytics API key |\n| `query_id` | Yes | The ID of the Dune query to execute |\n| `performance` | No | Query execution performance tier: 'medium' (10 credits) or 'large' (20 credits). Defaults to 'medium' |\n| `query_parameters` | No | Array of parameters to pass to your Dune query |\n| `schema` | No | Optional: JSON Schema definition of your query's output fields. If not provided, schema will be inferred from query results |\n| `primary_keys` | No | Array of field names that uniquely identify each record. Used by targets for upsert/dedup |\n\n#### Query Parameters\n\nEach query parameter object can have:\n- `key`: Parameter name in your Dune query\n- `value`: Parameter value\n- `replication_key`: Set to `true` for the parameter that should be used for incremental replication\n- `replication_key_field`: The field in the query results to use for tracking replication state (required if replication_key is true)\n- `type`: The data type of the parameter value. Can be one of:\n  - `string` (default)\n  - `integer`\n  - `number`\n  - `date`\n  - `date-time`\n\n#### Schema Configuration\n\nThe schema can be:\n1. Automatically inferred from query results (recommended)\n2. Explicitly defined in the config file\n\nWhen automatically inferring the schema:\n- The tap will execute the query once to get sample data\n- Data types are detected based on the values in the results\n- Special formats like dates and timestamps are automatically recognized\n- Null values are handled by looking at other rows to determine the correct type\n- If a type cannot be determined, it defaults to string\n\nIf you need to explicitly define the schema, each field should specify:\n- `type`: The data type ('string', 'number', 'integer', 'boolean', 'object', 'array')\n- `format` (optional): Special format for string fields (e.g., 'date', 'date-time')\n\nWhen using incremental replication, the schema configuration is particularly important for the replication key field:\n- The field's type in the schema determines how values are compared for incremental replication\n- You can specify any type that supports ordering (string, number, integer)\n- For date/time fields, you can add the appropriate format ('date' or 'date-time')\n\nExamples of query parameter configurations with different replication key types:\n\n1. Date-based replication (most common):\n```json\n{\n    \"api_key\": \"YOUR_DUNE_API_KEY\",\n    \"query_id\": \"YOUR_QUERY_ID\",\n    \"primary_keys\": [\"date\", \"source\"],\n    \"query_parameters\": [\n        {\n            \"key\": \"start_date\",\n            \"value\": \"2025-08-01\",\n            \"type\": \"date\",\n            \"replication_key\": true\n        }\n    ]\n}\n```\n\n2. Numeric replication (e.g., for block numbers):\n```json\n{\n    \"api_key\": \"YOUR_DUNE_API_KEY\",\n    \"query_id\": \"YOUR_QUERY_ID\",\n    \"query_parameters\": [\n        {\n            \"key\": \"min_block\",\n            \"value\": \"1000000\",\n            \"type\": \"integer\",\n            \"replication_key\": true\n        }\n    ]\n}\n```\n\n3. Timestamp replication:\n```json\n{\n    \"api_key\": \"YOUR_DUNE_API_KEY\",\n    \"query_id\": \"YOUR_QUERY_ID\",\n    \"query_parameters\": [\n        {\n            \"key\": \"start_time\",\n            \"value\": \"2025-08-01T00:00:00Z\",\n            \"type\": \"date-time\",\n            \"replication_key\": true\n        }\n    ]\n}\n```\n\n### Source Authentication and Authorization\n\n1. Visit [Dune Analytics](https://dune.com)\n2. Create an account and obtain an API key\n3. Add the API key to your config file\n\n## Usage\n\n### Basic Usage\n\n1. Generate a catalog file:\n   ```bash\n   tap-dune --config config.json --discover > catalog.json\n   ```\n\n2. Run the tap:\n   ```bash\n   tap-dune --config config.json --catalog catalog.json\n   ```\n\n### Incremental Replication\n\nTo use incremental replication:\n\n1. Mark one of your query parameters with `\"replication_key\": true`\n2. Ensure the parameter value is in a format that can be ordered (e.g., dates, timestamps, numbers)\n3. The tap will track the last value processed and resume from there in subsequent runs\n\nWhen using incremental replication, you need to configure:\n\n1. The query parameter that will be used for filtering (`replication_key: true`)\n2. The field in the query results that will be used for state tracking (`replication_key_field`)\n3. The data type of the parameter (`type`)\n\nFor example, if your query:\n- Takes a `date_from` parameter for filtering\n- Returns records with a `day` field containing dates\n- You want to use that `day` field for tracking progress\n\nYour configuration would look like:\n```json\n{\n    \"query_parameters\": [\n        {\n            \"key\": \"date_from\",\n            \"value\": \"2025-08-01\",\n            \"type\": \"date\",\n            \"replication_key\": true,\n            \"replication_key_field\": \"day\"\n        }\n    ]\n}\n```\n\nThe tap will:\n1. Use `date_from` to filter the query results\n2. Track the `day` field values from the results\n3. Use those values to set `date_from` in subsequent runs\n\nThe parameter type can be:\n- `date` or `date-time` for date-based parameters\n- `integer` or `number` for numeric parameters\n- `string` (default) for text parameters\n\n### Pipeline Usage\n\nYou can easily run `tap-dune` in a pipeline using [Meltano](https://meltano.com/) or any other Singer-compatible tool.\n\nExample with `target-jsonl`:\n```bash\ntap-dune --config config.json --catalog catalog.json | target-jsonl\n```\n\nWhen loading to a database target that performs upserts (e.g., Snowflake):\n\n- Set `primary_keys` in the tap config to the fields that uniquely identify a row in your query output (e.g., `[\"date\", \"source\"]`).\n- Ensure your loader configuration (e.g., PipelineWise or Meltano target) uses the same primary keys for merge/upsert.\n- For append-only behavior, leave `primary_keys` empty and configure your loader for pure inserts.\n\n## Development\n\n### Initialize your Development Environment\n\n```bash\n# Clone the repository\ngit clone https://github.com/blueprint-data/tap-dune.git\ncd tap-dune\n\n# Install Poetry\npipx install poetry\n\n# Install dependencies\npoetry install\n```\n\n### Development Workflow\n\nThis project follows [Semantic Versioning](https://semver.org/) and uses [Conventional Commits](https://www.conventionalcommits.org/) for automatic versioning.\n\n1. Create a feature branch:\n   ```bash\n   git checkout -b feat/your-feature\n   # or\n   git checkout -b fix/your-bugfix\n   ```\n\n2. Make your changes and commit using conventional commits:\n   ```bash\n   # For new features\n   git commit -m \"feat: add new feature X\"\n\n   # For bug fixes\n   git commit -m \"fix: resolve issue with Y\"\n\n   # For breaking changes\n   git commit -m \"feat: redesign API\n\n   BREAKING CHANGE: This changes the API interface\"\n   ```\n\n   Commit types:\n   - `feat`: A new feature (minor version bump)\n   - `fix`: A bug fix (patch version bump)\n   - `docs`: Documentation only changes\n   - `style`: Changes that don't affect the code's meaning\n   - `refactor`: Code change that neither fixes a bug nor adds a feature\n   - `perf`: Code change that improves performance\n   - `test`: Adding missing tests\n   - `chore`: Changes to the build process or auxiliary tools\n   - `BREAKING CHANGE`: Any change that breaks backward compatibility (major version bump)\n\n3. Run tests:\n   ```bash\n   poetry run pytest\n   ```\n\n4. Create a pull request to main\n\n### Release Process\n\n1. Create a release branch from main:\n   ```bash\n   git checkout main\n   git pull\n   git checkout -b release\n   ```\n\n2. Push the branch:\n   ```bash\n   git push -u origin release\n   ```\n\n3. The release workflow will automatically:\n   - Analyze commits since last release\n   - Determine the next version number based on commit types:\n     - `fix:` \u2192 patch version (1.0.0 \u2192 1.0.1)\n     - `feat:` \u2192 minor version (1.0.0 \u2192 1.1.0)\n     - `BREAKING CHANGE:` \u2192 major version (1.0.0 \u2192 2.0.0)\n   - Update CHANGELOG.md\n   - Create a git tag with the new version\n   - Create a GitHub release\n   - Build and publish to PyPI\n\n   Note: Only commits following the [Conventional Commits](https://www.conventionalcommits.org/) format will trigger version updates.\n\n4. After successful release:\n   - Create a PR from the release branch to main\n   - This PR will contain all the version updates (CHANGELOG.md, version number)\n   - Merge to keep main up-to-date with the latest release\n   - Note: Only blueprint-data team members can merge to main\n\n5. Clean up:\n   ```bash\n   git checkout main\n   git pull\n   git branch -d release\n   ```\n\n### Repository Permissions\n\nThis repository follows these security practices:\n- Only blueprint-data team members can merge to main\n- All PRs require at least one review\n- All tests must pass before merging\n- Branch protection rules prevent bypassing these requirements\n\n### Testing\n\n```bash\npoetry run pytest\n```\n\n### SDK Dev Guide\n\nSee the [dev guide](https://sdk.meltano.com/en/latest/dev_guide.html) for more instructions on how to use the SDK to develop your own taps and targets.",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Singer tap for extracting data from Dune Analytics API",
    "version": "1.0.0",
    "project_urls": {
        "Homepage": "https://github.com/blueprint-data/tap-dune",
        "Repository": "https://github.com/blueprint-data/tap-dune"
    },
    "split_keywords": [
        "singer",
        " tap",
        " dune",
        " analytics",
        " meltano"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f42a2b6fcf7503d4dc267d1cac3b1fa90ae306f27f5a65c34bdf6eeabbafc5ef",
                "md5": "91daaedc71af717e037b1cd07097e6b7",
                "sha256": "052d50f8cdada838b3c963ed0878c3cc9f3ab407d18a4fcd807cf6239e417788"
            },
            "downloads": -1,
            "filename": "tap_dune-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "91daaedc71af717e037b1cd07097e6b7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 12899,
            "upload_time": "2025-08-08T20:13:01",
            "upload_time_iso_8601": "2025-08-08T20:13:01.067362Z",
            "url": "https://files.pythonhosted.org/packages/f4/2a/2b6fcf7503d4dc267d1cac3b1fa90ae306f27f5a65c34bdf6eeabbafc5ef/tap_dune-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ee24db3a8cc9c56b5ce6e68c8bc5692639c580e2181c689847addb9eb313ec31",
                "md5": "1f5cc47fda9ac1e3af4243fdc62cba78",
                "sha256": "47060f165da30f8ad3be7efb622bff2b1b6f8e7b19cf8ebe2b8ff7a5471bde51"
            },
            "downloads": -1,
            "filename": "tap_dune-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1f5cc47fda9ac1e3af4243fdc62cba78",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 14335,
            "upload_time": "2025-08-08T20:13:02",
            "upload_time_iso_8601": "2025-08-08T20:13:02.001098Z",
            "url": "https://files.pythonhosted.org/packages/ee/24/db3a8cc9c56b5ce6e68c8bc5692639c580e2181c689847addb9eb313ec31/tap_dune-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-08 20:13:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "blueprint-data",
    "github_project": "tap-dune",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "tap-dune"
}
        
Elapsed time: 1.21853s