# AWS Analytics Reference Architecture
The AWS Analytics Reference Architecture is a set of analytics solutions put together as end-to-end examples.
It regroups AWS best practices for designing, implementing, and operating analytics platforms through different purpose-built patterns, handling common requirements, and solving customers' challenges.
This project is composed of:
* Reusable core components exposed in an AWS CDK (Cloud Development Kit) library currently available in [Typescript](https://www.npmjs.com/package/aws-analytics-reference-architecture) and [Python](https://pypi.org/project/aws-analytics-reference-architecture/). This library contains [AWS CDK constructs](https://constructs.dev/packages/aws-analytics-reference-architecture/?lang=python) that can be used to quickly provision analytics solutions in demos, prototypes, proof of concepts and end-to-end reference architectures.
* Reference architectures consumming the reusable components to demonstrate end-to-end examples in a business context. Currently, the [AWS native reference architecture](https://aws-samples.github.io/aws-analytics-reference-architecture/) is available.
This documentation explains how to get started with the core components of the AWS Analytics Reference Architecture.
## Getting started
* [AWS Analytics Reference Architecture](#aws-analytics-reference-architecture)
* [Getting started](#getting-started)
* [Prerequisites](#prerequisites)
* [Initialization (in Python)](#initialization-in-python)
* [Development](#development)
* [Deployment](#deployment)
* [Cleanup](#cleanup)
* [API Reference](#api-reference)
* [Contributing](#contributing)
* [License Summary](#license-summary)
### Prerequisites
1. [Create an AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/)
2. The core components can be deployed in any AWS region
3. Install the following components with the specified version on the machine from which the deployment will be executed:
1. Python [3.8-3.9.2] or Typescript
2. AWS CDK v2: Please refer to the [Getting started](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) guide.
4. Bootstrap AWS CDK in your region (here **eu-west-1**). It will provision resources required to deploy AWS CDK applications
```bash
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=eu-west-1
cdk bootstrap aws://$ACCOUNT_ID/$AWS_REGION
```
### Initialization (in Python)
1. Initialize a new AWS CDK application in Python and use a virtual environment to install dependencies
```bash
mkdir my_demo
cd my_demo
cdk init app --language python
python3 -m venv .env
source .env/bin/activate
```
1. Add the AWS Analytics Reference Architecture library in the dependencies of your project. Update **requirements.txt**
```bash
aws-cdk-lib==2.51.0
constructs>=10.0.0,<11.0.0
aws_analytics_reference_architecture>=2.0.0
```
1. Install The Packages via **pip**
```bash
python -m pip install -r requirements.txt
```
### Development
1. Import the AWS Analytics Reference Architecture in your code in **my_demo/my_demo_stack.py**
```bash
import aws_analytics_reference_architecture as ara
```
1. Now you can use all the constructs available from the core components library to quickly provision resources in your AWS CDK stack. For example:
* The DataLakeStorage to provision a full set of pre-configured Amazon S3 Bucket for a data lake
```bash
# Create a new DataLakeStorage with Raw, Clean and Transform buckets configured with data lake best practices
storage = ara.DataLakeStorage (self,"storage")
```
* The DataLakeCatalog to provision a full set of AWS Glue databases for registring tables in your data lake
```bash
# Create a new DataLakeCatalog with Raw, Clean and Transform databases
catalog = ara.DataLakeCatalog (self,"catalog")
```
* The DataGenerator to generate live data in the data lake from a pre-configured retail dataset
```bash
# Generate the Sales Data
sales_data = ara.BatchReplayer(
scope=self,
id="sale-data",
dataset=ara.PreparedDataset.RETAIL_1_GB_STORE_SALE,
sink_object_key="sale",
sink_bucket=storage.raw_bucket,
)
```
```bash
# Generate the Customer Data
customer_data = ara.BatchReplayer(
scope=self,
id="customer-data",
dataset=ara.PreparedDataset.RETAIL_1_GB_CUSTOMER,
sink_object_key="customer",
sink_bucket=storage.raw_bucket,
)
```
* Additionally, the library provides some helpers to quickly run demos:
```bash
# Configure defaults for Athena console
athena_defaults = ara.AthenaDemoSetup(scope=self, id="demo_setup")
```
```bash
# Configure a default role for AWS Glue jobs
ara.GlueDemoRole.get_or_create(self)
```
### Deployment
Deploy the AWS CDK application
```bash
cdk deploy
```
The time to deploy the application is depending on the constructs you are using
### Cleanup
Delete the AWS CDK application
```bash
cdk destroy
```
## API Reference
More contructs, helpers and datasets are available in the AWS Analytics Reference Architecture. See the full API specification [here](https://constructs.dev/packages/aws-analytics-reference-architecture)
## Contributing
Please refer to the [contributing guidelines](../CONTRIBUTING.md) and [contributing FAQ](../CONTRIB_FAQ.md) for details.
# License Summary
The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.
The sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.
Raw data
{
"_id": null,
"home_page": "https://aws-samples.github.io/aws-analytics-reference-architecture/",
"name": "aws-analytics-reference-architecture",
"maintainer": "",
"docs_url": null,
"requires_python": "~=3.8",
"maintainer_email": "",
"keywords": "",
"author": "Amazon Web Services",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/42/ef/683ea4decc4f4c7956de9b0e92879519e18346b8eb69ac1a6d9de4d438a4/aws_analytics_reference_architecture-2.12.13.tar.gz",
"platform": null,
"description": "# AWS Analytics Reference Architecture\n\nThe AWS Analytics Reference Architecture is a set of analytics solutions put together as end-to-end examples.\nIt regroups AWS best practices for designing, implementing, and operating analytics platforms through different purpose-built patterns, handling common requirements, and solving customers' challenges.\n\nThis project is composed of:\n\n* Reusable core components exposed in an AWS CDK (Cloud Development Kit) library currently available in [Typescript](https://www.npmjs.com/package/aws-analytics-reference-architecture) and [Python](https://pypi.org/project/aws-analytics-reference-architecture/). This library contains [AWS CDK constructs](https://constructs.dev/packages/aws-analytics-reference-architecture/?lang=python) that can be used to quickly provision analytics solutions in demos, prototypes, proof of concepts and end-to-end reference architectures.\n* Reference architectures consumming the reusable components to demonstrate end-to-end examples in a business context. Currently, the [AWS native reference architecture](https://aws-samples.github.io/aws-analytics-reference-architecture/) is available.\n\nThis documentation explains how to get started with the core components of the AWS Analytics Reference Architecture.\n\n## Getting started\n\n* [AWS Analytics Reference Architecture](#aws-analytics-reference-architecture)\n\n * [Getting started](#getting-started)\n\n * [Prerequisites](#prerequisites)\n * [Initialization (in Python)](#initialization-in-python)\n * [Development](#development)\n * [Deployment](#deployment)\n * [Cleanup](#cleanup)\n * [API Reference](#api-reference)\n * [Contributing](#contributing)\n* [License Summary](#license-summary)\n\n### Prerequisites\n\n1. [Create an AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/)\n2. The core components can be deployed in any AWS region\n3. Install the following components with the specified version on the machine from which the deployment will be executed:\n\n 1. Python [3.8-3.9.2] or Typescript\n 2. AWS CDK v2: Please refer to the [Getting started](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html) guide.\n4. Bootstrap AWS CDK in your region (here **eu-west-1**). It will provision resources required to deploy AWS CDK applications\n\n```bash\nexport ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)\nexport AWS_REGION=eu-west-1\ncdk bootstrap aws://$ACCOUNT_ID/$AWS_REGION\n```\n\n### Initialization (in Python)\n\n1. Initialize a new AWS CDK application in Python and use a virtual environment to install dependencies\n\n```bash\nmkdir my_demo\ncd my_demo\ncdk init app --language python\npython3 -m venv .env\nsource .env/bin/activate\n```\n\n1. Add the AWS Analytics Reference Architecture library in the dependencies of your project. Update **requirements.txt**\n\n```bash\naws-cdk-lib==2.51.0\nconstructs>=10.0.0,<11.0.0\naws_analytics_reference_architecture>=2.0.0\n```\n\n1. Install The Packages via **pip**\n\n```bash\npython -m pip install -r requirements.txt\n```\n\n### Development\n\n1. Import the AWS Analytics Reference Architecture in your code in **my_demo/my_demo_stack.py**\n\n```bash\nimport aws_analytics_reference_architecture as ara\n```\n\n1. Now you can use all the constructs available from the core components library to quickly provision resources in your AWS CDK stack. For example:\n\n* The DataLakeStorage to provision a full set of pre-configured Amazon S3 Bucket for a data lake\n\n```bash\n # Create a new DataLakeStorage with Raw, Clean and Transform buckets configured with data lake best practices\n storage = ara.DataLakeStorage (self,\"storage\")\n```\n\n* The DataLakeCatalog to provision a full set of AWS Glue databases for registring tables in your data lake\n\n```bash\n # Create a new DataLakeCatalog with Raw, Clean and Transform databases\n catalog = ara.DataLakeCatalog (self,\"catalog\")\n```\n\n* The DataGenerator to generate live data in the data lake from a pre-configured retail dataset\n\n```bash\n # Generate the Sales Data\n sales_data = ara.BatchReplayer(\n scope=self,\n id=\"sale-data\",\n dataset=ara.PreparedDataset.RETAIL_1_GB_STORE_SALE,\n sink_object_key=\"sale\",\n sink_bucket=storage.raw_bucket,\n )\n\n```\n\n```bash\n # Generate the Customer Data\n customer_data = ara.BatchReplayer(\n scope=self,\n id=\"customer-data\",\n dataset=ara.PreparedDataset.RETAIL_1_GB_CUSTOMER,\n sink_object_key=\"customer\",\n sink_bucket=storage.raw_bucket,\n )\n\n```\n\n* Additionally, the library provides some helpers to quickly run demos:\n\n```bash\n # Configure defaults for Athena console\n athena_defaults = ara.AthenaDemoSetup(scope=self, id=\"demo_setup\")\n```\n\n```bash\n # Configure a default role for AWS Glue jobs\n ara.GlueDemoRole.get_or_create(self)\n```\n\n### Deployment\n\nDeploy the AWS CDK application\n\n```bash\ncdk deploy\n```\n\nThe time to deploy the application is depending on the constructs you are using\n\n### Cleanup\n\nDelete the AWS CDK application\n\n```bash\ncdk destroy\n```\n\n## API Reference\n\nMore contructs, helpers and datasets are available in the AWS Analytics Reference Architecture. See the full API specification [here](https://constructs.dev/packages/aws-analytics-reference-architecture)\n\n## Contributing\n\nPlease refer to the [contributing guidelines](../CONTRIBUTING.md) and [contributing FAQ](../CONTRIB_FAQ.md) for details.\n\n# License Summary\n\nThe documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.\n\nThe sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.\n",
"bugtrack_url": null,
"license": "MIT-0",
"summary": "aws-analytics-reference-architecture",
"version": "2.12.13",
"project_urls": {
"Homepage": "https://aws-samples.github.io/aws-analytics-reference-architecture/",
"Source": "https://github.com/aws-samples/aws-analytics-reference-architecture.git"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "80280ddfae2dc0b9f95a8ea3a735c2e43babc1a29b750625e9889ec9dd845e39",
"md5": "4a1994c9f9612becabe11d797306e661",
"sha256": "613b091e6cae3c7a767fbb0f7db76e7dd92719d8b2ed1281d7dcb8e44d49d7df"
},
"downloads": -1,
"filename": "aws_analytics_reference_architecture-2.12.13-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4a1994c9f9612becabe11d797306e661",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.8",
"size": 122701236,
"upload_time": "2024-02-29T12:25:00",
"upload_time_iso_8601": "2024-02-29T12:25:00.302636Z",
"url": "https://files.pythonhosted.org/packages/80/28/0ddfae2dc0b9f95a8ea3a735c2e43babc1a29b750625e9889ec9dd845e39/aws_analytics_reference_architecture-2.12.13-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "42ef683ea4decc4f4c7956de9b0e92879519e18346b8eb69ac1a6d9de4d438a4",
"md5": "c28bc83149eab33ebc855f72c2ddecad",
"sha256": "9d08c44130ea2733b132a0f767d67202dfff3f7cb4df08152979c2bd4808ad64"
},
"downloads": -1,
"filename": "aws_analytics_reference_architecture-2.12.13.tar.gz",
"has_sig": false,
"md5_digest": "c28bc83149eab33ebc855f72c2ddecad",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.8",
"size": 122703158,
"upload_time": "2024-02-29T12:25:08",
"upload_time_iso_8601": "2024-02-29T12:25:08.877668Z",
"url": "https://files.pythonhosted.org/packages/42/ef/683ea4decc4f4c7956de9b0e92879519e18346b8eb69ac1a6d9de4d438a4/aws_analytics_reference_architecture-2.12.13.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-29 12:25:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aws-samples",
"github_project": "aws-analytics-reference-architecture",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "aws-analytics-reference-architecture"
}