aws-analytics-reference-architecture


Nameaws-analytics-reference-architecture JSON
Version 2.12.13 PyPI version JSON
download
home_pagehttps://aws-samples.github.io/aws-analytics-reference-architecture/
Summaryaws-analytics-reference-architecture
upload_time2024-02-29 12:25:08
maintainer
docs_urlNone
authorAmazon Web Services
requires_python~=3.8
licenseMIT-0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # AWS Analytics Reference Architecture

The AWS Analytics Reference Architecture is a set of analytics solutions put together as end-to-end examples.
It regroups AWS best practices for designing, implementing, and operating analytics platforms through different purpose-built patterns, handling common requirements, and solving customers' challenges.

This project is composed of:

* Reusable core components exposed in an AWS CDK (Cloud Development Kit) library currently available in [Typescript](https://www.npmjs.com/package/aws-analytics-reference-architecture) and [Python](https://pypi.org/project/aws-analytics-reference-architecture/). This library contains [AWS CDK constructs](https://constructs.dev/packages/aws-analytics-reference-architecture/?lang=python) that can be used to quickly provision analytics solutions in demos, prototypes, proof of concepts and end-to-end reference architectures.
* Reference architectures consumming the reusable components to demonstrate end-to-end examples in a business context. Currently, the [AWS native reference architecture](https://aws-samples.github.io/aws-analytics-reference-architecture/) is available.

This documentation explains how to get started with the core components of the AWS Analytics Reference Architecture.

## Getting started

* [AWS Analytics Reference Architecture](#aws-analytics-reference-architecture)

  * [Getting started](#getting-started)

    * [Prerequisites](#prerequisites)
    * [Initialization (in Python)](#initialization-in-python)
    * [Development](#development)
    * [Deployment](#deployment)
    * [Cleanup](#cleanup)
  * [API Reference](#api-reference)
  * [Contributing](#contributing)
* [License Summary](#license-summary)

### Prerequisites

1. [Create an AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/)
2. The core components can be deployed in any AWS region
3. Install the following components with the specified version on the machine from which the deployment will be executed:

   1. Python [3.8-3.9.2] or Typescript
   2. AWS CDK v2: Please refer to the [Getting started](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) guide.
4. Bootstrap AWS CDK in your region (here **eu-west-1**). It will provision resources required to deploy AWS CDK applications

```bash
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=eu-west-1
cdk bootstrap aws://$ACCOUNT_ID/$AWS_REGION
```

### Initialization (in Python)

1. Initialize a new AWS CDK application in Python and use a virtual environment to install dependencies

```bash
mkdir my_demo
cd my_demo
cdk init app --language python
python3 -m venv .env
source .env/bin/activate
```

1. Add the AWS Analytics Reference Architecture library in the dependencies of your project. Update **requirements.txt**

```bash
aws-cdk-lib==2.51.0
constructs>=10.0.0,<11.0.0
aws_analytics_reference_architecture>=2.0.0
```

1. Install The Packages via **pip**

```bash
python -m pip install -r requirements.txt
```

### Development

1. Import the AWS Analytics Reference Architecture in your code in **my_demo/my_demo_stack.py**

```bash
import aws_analytics_reference_architecture as ara
```

1. Now you can use all the constructs available from the core components library to quickly provision resources in your AWS CDK stack. For example:

* The DataLakeStorage to provision a full set of pre-configured Amazon S3 Bucket for a data lake

```bash
        # Create a new DataLakeStorage with Raw, Clean and Transform buckets configured with data lake best practices
        storage = ara.DataLakeStorage (self,"storage")
```

* The DataLakeCatalog to provision a full set of AWS Glue databases for registring tables in your data lake

```bash
        # Create a new DataLakeCatalog with Raw, Clean and Transform databases
        catalog = ara.DataLakeCatalog (self,"catalog")
```

* The DataGenerator to generate live data in the data lake from a pre-configured retail dataset

```bash
        # Generate the Sales Data
        sales_data = ara.BatchReplayer(
            scope=self,
            id="sale-data",
            dataset=ara.PreparedDataset.RETAIL_1_GB_STORE_SALE,
            sink_object_key="sale",
            sink_bucket=storage.raw_bucket,
         )

```

```bash
        # Generate the Customer Data
        customer_data = ara.BatchReplayer(
            scope=self,
            id="customer-data",
            dataset=ara.PreparedDataset.RETAIL_1_GB_CUSTOMER,
            sink_object_key="customer",
            sink_bucket=storage.raw_bucket,
         )

```

* Additionally, the library provides some helpers to quickly run demos:

```bash
        # Configure defaults for Athena console
        athena_defaults = ara.AthenaDemoSetup(scope=self, id="demo_setup")
```

```bash
        # Configure a default role for AWS Glue jobs
        ara.GlueDemoRole.get_or_create(self)
```

### Deployment

Deploy the AWS CDK application

```bash
cdk deploy
```

The time to deploy the application is depending on the constructs you are using

### Cleanup

Delete the AWS CDK application

```bash
cdk destroy
```

## API Reference

More contructs, helpers and datasets are available in the AWS Analytics Reference Architecture. See the full API specification [here](https://constructs.dev/packages/aws-analytics-reference-architecture)

## Contributing

Please refer to the [contributing guidelines](../CONTRIBUTING.md) and [contributing FAQ](../CONTRIB_FAQ.md) for details.

# License Summary

The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.

The sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.

            

Raw data

            {
    "_id": null,
    "home_page": "https://aws-samples.github.io/aws-analytics-reference-architecture/",
    "name": "aws-analytics-reference-architecture",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "~=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "Amazon Web Services",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/42/ef/683ea4decc4f4c7956de9b0e92879519e18346b8eb69ac1a6d9de4d438a4/aws_analytics_reference_architecture-2.12.13.tar.gz",
    "platform": null,
    "description": "# AWS Analytics Reference Architecture\n\nThe AWS Analytics Reference Architecture is a set of analytics solutions put together as end-to-end examples.\nIt regroups AWS best practices for designing, implementing, and operating analytics platforms through different purpose-built patterns, handling common requirements, and solving customers' challenges.\n\nThis project is composed of:\n\n* Reusable core components exposed in an AWS CDK (Cloud Development Kit) library currently available in [Typescript](https://www.npmjs.com/package/aws-analytics-reference-architecture) and [Python](https://pypi.org/project/aws-analytics-reference-architecture/). This library contains [AWS CDK constructs](https://constructs.dev/packages/aws-analytics-reference-architecture/?lang=python) that can be used to quickly provision analytics solutions in demos, prototypes, proof of concepts and end-to-end reference architectures.\n* Reference architectures consumming the reusable components to demonstrate end-to-end examples in a business context. Currently, the [AWS native reference architecture](https://aws-samples.github.io/aws-analytics-reference-architecture/) is available.\n\nThis documentation explains how to get started with the core components of the AWS Analytics Reference Architecture.\n\n## Getting started\n\n* [AWS Analytics Reference Architecture](#aws-analytics-reference-architecture)\n\n  * [Getting started](#getting-started)\n\n    * [Prerequisites](#prerequisites)\n    * [Initialization (in Python)](#initialization-in-python)\n    * [Development](#development)\n    * [Deployment](#deployment)\n    * [Cleanup](#cleanup)\n  * [API Reference](#api-reference)\n  * [Contributing](#contributing)\n* [License Summary](#license-summary)\n\n### Prerequisites\n\n1. [Create an AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/)\n2. The core components can be deployed in any AWS region\n3. Install the following components with the specified version on the machine from which the deployment will be executed:\n\n   1. Python [3.8-3.9.2] or Typescript\n   2. AWS CDK v2: Please refer to the [Getting started](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) guide.\n4. Bootstrap AWS CDK in your region (here **eu-west-1**). It will provision resources required to deploy AWS CDK applications\n\n```bash\nexport ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)\nexport AWS_REGION=eu-west-1\ncdk bootstrap aws://$ACCOUNT_ID/$AWS_REGION\n```\n\n### Initialization (in Python)\n\n1. Initialize a new AWS CDK application in Python and use a virtual environment to install dependencies\n\n```bash\nmkdir my_demo\ncd my_demo\ncdk init app --language python\npython3 -m venv .env\nsource .env/bin/activate\n```\n\n1. Add the AWS Analytics Reference Architecture library in the dependencies of your project. Update **requirements.txt**\n\n```bash\naws-cdk-lib==2.51.0\nconstructs>=10.0.0,<11.0.0\naws_analytics_reference_architecture>=2.0.0\n```\n\n1. Install The Packages via **pip**\n\n```bash\npython -m pip install -r requirements.txt\n```\n\n### Development\n\n1. Import the AWS Analytics Reference Architecture in your code in **my_demo/my_demo_stack.py**\n\n```bash\nimport aws_analytics_reference_architecture as ara\n```\n\n1. Now you can use all the constructs available from the core components library to quickly provision resources in your AWS CDK stack. For example:\n\n* The DataLakeStorage to provision a full set of pre-configured Amazon S3 Bucket for a data lake\n\n```bash\n        # Create a new DataLakeStorage with Raw, Clean and Transform buckets configured with data lake best practices\n        storage = ara.DataLakeStorage (self,\"storage\")\n```\n\n* The DataLakeCatalog to provision a full set of AWS Glue databases for registring tables in your data lake\n\n```bash\n        # Create a new DataLakeCatalog with Raw, Clean and Transform databases\n        catalog = ara.DataLakeCatalog (self,\"catalog\")\n```\n\n* The DataGenerator to generate live data in the data lake from a pre-configured retail dataset\n\n```bash\n        # Generate the Sales Data\n        sales_data = ara.BatchReplayer(\n            scope=self,\n            id=\"sale-data\",\n            dataset=ara.PreparedDataset.RETAIL_1_GB_STORE_SALE,\n            sink_object_key=\"sale\",\n            sink_bucket=storage.raw_bucket,\n         )\n\n```\n\n```bash\n        # Generate the Customer Data\n        customer_data = ara.BatchReplayer(\n            scope=self,\n            id=\"customer-data\",\n            dataset=ara.PreparedDataset.RETAIL_1_GB_CUSTOMER,\n            sink_object_key=\"customer\",\n            sink_bucket=storage.raw_bucket,\n         )\n\n```\n\n* Additionally, the library provides some helpers to quickly run demos:\n\n```bash\n        # Configure defaults for Athena console\n        athena_defaults = ara.AthenaDemoSetup(scope=self, id=\"demo_setup\")\n```\n\n```bash\n        # Configure a default role for AWS Glue jobs\n        ara.GlueDemoRole.get_or_create(self)\n```\n\n### Deployment\n\nDeploy the AWS CDK application\n\n```bash\ncdk deploy\n```\n\nThe time to deploy the application is depending on the constructs you are using\n\n### Cleanup\n\nDelete the AWS CDK application\n\n```bash\ncdk destroy\n```\n\n## API Reference\n\nMore contructs, helpers and datasets are available in the AWS Analytics Reference Architecture. See the full API specification [here](https://constructs.dev/packages/aws-analytics-reference-architecture)\n\n## Contributing\n\nPlease refer to the [contributing guidelines](../CONTRIBUTING.md) and [contributing FAQ](../CONTRIB_FAQ.md) for details.\n\n# License Summary\n\nThe documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.\n\nThe sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.\n",
    "bugtrack_url": null,
    "license": "MIT-0",
    "summary": "aws-analytics-reference-architecture",
    "version": "2.12.13",
    "project_urls": {
        "Homepage": "https://aws-samples.github.io/aws-analytics-reference-architecture/",
        "Source": "https://github.com/aws-samples/aws-analytics-reference-architecture.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "80280ddfae2dc0b9f95a8ea3a735c2e43babc1a29b750625e9889ec9dd845e39",
                "md5": "4a1994c9f9612becabe11d797306e661",
                "sha256": "613b091e6cae3c7a767fbb0f7db76e7dd92719d8b2ed1281d7dcb8e44d49d7df"
            },
            "downloads": -1,
            "filename": "aws_analytics_reference_architecture-2.12.13-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4a1994c9f9612becabe11d797306e661",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "~=3.8",
            "size": 122701236,
            "upload_time": "2024-02-29T12:25:00",
            "upload_time_iso_8601": "2024-02-29T12:25:00.302636Z",
            "url": "https://files.pythonhosted.org/packages/80/28/0ddfae2dc0b9f95a8ea3a735c2e43babc1a29b750625e9889ec9dd845e39/aws_analytics_reference_architecture-2.12.13-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "42ef683ea4decc4f4c7956de9b0e92879519e18346b8eb69ac1a6d9de4d438a4",
                "md5": "c28bc83149eab33ebc855f72c2ddecad",
                "sha256": "9d08c44130ea2733b132a0f767d67202dfff3f7cb4df08152979c2bd4808ad64"
            },
            "downloads": -1,
            "filename": "aws_analytics_reference_architecture-2.12.13.tar.gz",
            "has_sig": false,
            "md5_digest": "c28bc83149eab33ebc855f72c2ddecad",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "~=3.8",
            "size": 122703158,
            "upload_time": "2024-02-29T12:25:08",
            "upload_time_iso_8601": "2024-02-29T12:25:08.877668Z",
            "url": "https://files.pythonhosted.org/packages/42/ef/683ea4decc4f4c7956de9b0e92879519e18346b8eb69ac1a6d9de4d438a4/aws_analytics_reference_architecture-2.12.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-29 12:25:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aws-samples",
    "github_project": "aws-analytics-reference-architecture",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "aws-analytics-reference-architecture"
}
        
Elapsed time: 0.46173s