datateer-upload-agent


Namedatateer-upload-agent JSON
Version 0.5.1 PyPI version JSON
download
home_page
SummaryAn agent that can be installed inside a firewall or VPN and used to push data to Datateer
upload_time2023-08-23 03:45:48
maintainer
docs_urlNone
authorDatateer
requires_python>=3.8,<4.0
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Datateer upload-agent

This is a command-line tool for uploading data into your Datateer data lake.

The upload agent pushes files into an AWS S3 bucket, where the files are picked up for ingestion and further processing

## Quick start

Ensure you have python and pip installed, then follow these steps:

1. Install with `pip install datateer-upload-agent`
1. Do one-time agent configuration with `datateer config upload-agent`
1. Do one-time feed configuration with `datateer config feed`
1. Upload data with `datateer upload <feed_key> <path>`

## Concepts

All data in the data lake has the following metadata:

- A **provider** is an organization that is providing data. This could be your organization if you are pushing data from an internal database or application
- A **source** is the system or application that is providing data. A provider can provide data from one or more systems
- A **feed** is an independent data feed. A source can provide one or more feeds. For example, if the source is a database, each feed could represent a single table or view. If the source is an API, each feed could represent a single entity.
- A **file** is a data file like a CSV file. It is a point-in-time extraction of a feed, and it is what you upload using the agent.

## Commands

### Uploading

#### Upload a file

`datateer upload orders_feed ./my_exported_data/orders.csv` will upload the file at `./my_exported_data/orders.csv` using the feed key `orders_feed`

### Configuring

#### Configure the upload agent

`datateer config upload-agent` will ask you a series of questions to configure your agent

```yaml
Datateer client code:
Raw bucket name:
Access key:
Access secret:
```

If you need to reconfigure the agent, just rerun `datateer config upload-agent`

#### Configure a new feed

`datateer config feed` will ask a series of questions to configure a new feed

```yaml
Provider: xyz
Data Source: internal_app1
Feed: orders
Feed key [orders]: orders_feed
```

#### Reconfigure an existing feed

`datateer config feed --update orders_feed` will rerun the configuration questions for the feed with the key `orders_feed`

#### Show config

`datateer config upload-agent --show` will show you your existing configuration

```yaml
client-code: xyz
raw-bucket: xyz-pipeline-raw-202012331213123432341213
access-key: ABC***
access-secret: 123***
feeds: 3
```

```yaml
1) Feed "customers" will upload to xyz/internal_app1/customers/
2) Feed "orders_feed" will upload to xyz/internal_app1/orders/
3) Feed "leads" will upload to salesforce/salesforce/leads
```

```yaml
Feed "abc" will upload to provider/source/feed
```

## Data File Requirements

- The data lake supports CSV, TSV, and JSONL files
- The first row of the data file must contain header names
- Adding new data fields or removing data fields are both supported
- You should strive to be consistent with your header names over time. The data lake can handle changes, but it will likely confuse anyone using the feeds

## Configuration - detailed info

Configuration can be handled completely through the `datateer config` commands. If you need more details, this section provides more details on how configuration works and where it is stored.

### Location

Here is where the Datateer upload agent will look for configuration information, in order of preference:

1. In a relative directory named `.datateer`, in a file named `config.yml`.
1. In the future, we may add global configuration in the user's home directory or in environment variables

### Schema

An example configuration file will look like this:

```yaml
client-code: xyz
upload-agent:
  raw-bucket: xyz-pipeline-raw-202012331213123432341213
  access-key: ABC***
  access-secret: 123***
  feeds:
    customers:
      provider: xyz
      source: internal_app1
      feed: customers
    orders_feed:
      provider: xyz
      source: internal_app1
      feed: orders
    leads:
      provider: salesforce
      source: salesforce
      feed: leads
```


## Development
To develop in this repo:
1. Install poetry and activate shell with `poetry shell`
2. Run `poetry install`
3. To test run `pytest` or `ptw`
4. To run locally, install with `pip install -e .`

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "datateer-upload-agent",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Datateer",
    "author_email": "dev@datateer.com",
    "download_url": "https://files.pythonhosted.org/packages/27/7f/2b8c38eb2f6f912e285c615056cfe9975f7d9d160b76f07e890a4c70a3b5/datateer_upload_agent-0.5.1.tar.gz",
    "platform": null,
    "description": "# Datateer upload-agent\n\nThis is a command-line tool for uploading data into your Datateer data lake.\n\nThe upload agent pushes files into an AWS S3 bucket, where the files are picked up for ingestion and further processing\n\n## Quick start\n\nEnsure you have python and pip installed, then follow these steps:\n\n1. Install with `pip install datateer-upload-agent`\n1. Do one-time agent configuration with `datateer config upload-agent`\n1. Do one-time feed configuration with `datateer config feed`\n1. Upload data with `datateer upload <feed_key> <path>`\n\n## Concepts\n\nAll data in the data lake has the following metadata:\n\n- A **provider** is an organization that is providing data. This could be your organization if you are pushing data from an internal database or application\n- A **source** is the system or application that is providing data. A provider can provide data from one or more systems\n- A **feed** is an independent data feed. A source can provide one or more feeds. For example, if the source is a database, each feed could represent a single table or view. If the source is an API, each feed could represent a single entity.\n- A **file** is a data file like a CSV file. It is a point-in-time extraction of a feed, and it is what you upload using the agent.\n\n## Commands\n\n### Uploading\n\n#### Upload a file\n\n`datateer upload orders_feed ./my_exported_data/orders.csv` will upload the file at `./my_exported_data/orders.csv` using the feed key `orders_feed`\n\n### Configuring\n\n#### Configure the upload agent\n\n`datateer config upload-agent` will ask you a series of questions to configure your agent\n\n```yaml\nDatateer client code:\nRaw bucket name:\nAccess key:\nAccess secret:\n```\n\nIf you need to reconfigure the agent, just rerun `datateer config upload-agent`\n\n#### Configure a new feed\n\n`datateer config feed` will ask a series of questions to configure a new feed\n\n```yaml\nProvider: xyz\nData Source: internal_app1\nFeed: orders\nFeed key [orders]: orders_feed\n```\n\n#### Reconfigure an existing feed\n\n`datateer config feed --update orders_feed` will rerun the configuration questions for the feed with the key `orders_feed`\n\n#### Show config\n\n`datateer config upload-agent --show` will show you your existing configuration\n\n```yaml\nclient-code: xyz\nraw-bucket: xyz-pipeline-raw-202012331213123432341213\naccess-key: ABC***\naccess-secret: 123***\nfeeds: 3\n```\n\n```yaml\n1) Feed \"customers\" will upload to xyz/internal_app1/customers/\n2) Feed \"orders_feed\" will upload to xyz/internal_app1/orders/\n3) Feed \"leads\" will upload to salesforce/salesforce/leads\n```\n\n```yaml\nFeed \"abc\" will upload to provider/source/feed\n```\n\n## Data File Requirements\n\n- The data lake supports CSV, TSV, and JSONL files\n- The first row of the data file must contain header names\n- Adding new data fields or removing data fields are both supported\n- You should strive to be consistent with your header names over time. The data lake can handle changes, but it will likely confuse anyone using the feeds\n\n## Configuration - detailed info\n\nConfiguration can be handled completely through the `datateer config` commands. If you need more details, this section provides more details on how configuration works and where it is stored.\n\n### Location\n\nHere is where the Datateer upload agent will look for configuration information, in order of preference:\n\n1. In a relative directory named `.datateer`, in a file named `config.yml`.\n1. In the future, we may add global configuration in the user's home directory or in environment variables\n\n### Schema\n\nAn example configuration file will look like this:\n\n```yaml\nclient-code: xyz\nupload-agent:\n  raw-bucket: xyz-pipeline-raw-202012331213123432341213\n  access-key: ABC***\n  access-secret: 123***\n  feeds:\n    customers:\n      provider: xyz\n      source: internal_app1\n      feed: customers\n    orders_feed:\n      provider: xyz\n      source: internal_app1\n      feed: orders\n    leads:\n      provider: salesforce\n      source: salesforce\n      feed: leads\n```\n\n\n## Development\nTo develop in this repo:\n1. Install poetry and activate shell with `poetry shell`\n2. Run `poetry install`\n3. To test run `pytest` or `ptw`\n4. To run locally, install with `pip install -e .`\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "An agent that can be installed inside a firewall or VPN and used to push data to Datateer",
    "version": "0.5.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "322a89784b9f3f62e57cc6d356a2b79f54bbff9bd23c28999e73a92d4a7e445b",
                "md5": "aea081ae54f59444f29b5c1b859e0cb3",
                "sha256": "5804687b579165b2c7e9de743212b0dafa5b528d3a895969fc594170986f8e50"
            },
            "downloads": -1,
            "filename": "datateer_upload_agent-0.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aea081ae54f59444f29b5c1b859e0cb3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 8718,
            "upload_time": "2023-08-23T03:45:46",
            "upload_time_iso_8601": "2023-08-23T03:45:46.685382Z",
            "url": "https://files.pythonhosted.org/packages/32/2a/89784b9f3f62e57cc6d356a2b79f54bbff9bd23c28999e73a92d4a7e445b/datateer_upload_agent-0.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "277f2b8c38eb2f6f912e285c615056cfe9975f7d9d160b76f07e890a4c70a3b5",
                "md5": "9bb89557705d59f46b182af321175fba",
                "sha256": "2dd51078dba8bff05a065590ad9e2649ca3b9cb0134ce081415bccbc67322d17"
            },
            "downloads": -1,
            "filename": "datateer_upload_agent-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9bb89557705d59f46b182af321175fba",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 5939,
            "upload_time": "2023-08-23T03:45:48",
            "upload_time_iso_8601": "2023-08-23T03:45:48.106722Z",
            "url": "https://files.pythonhosted.org/packages/27/7f/2b8c38eb2f6f912e285c615056cfe9975f7d9d160b76f07e890a4c70a3b5/datateer_upload_agent-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-23 03:45:48",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "datateer-upload-agent"
}
        
Elapsed time: 0.11059s