cdklabs.cdk-appflow


Namecdklabs.cdk-appflow JSON
Version 0.0.34 PyPI version JSON
download
home_pagehttps://github.com/cdklabs/cdk-appflow.git
Summary@cdklabs/cdk-appflow
upload_time2024-04-05 18:15:19
maintainerNone
docs_urlNone
authorAmazon Web Services<aws-cdk-dev@amazon.com>
requires_python~=3.8
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Amazon AppFlow Construct Library

*Note:* this library is currently in technical preview.

## Introduction

Amazon AppFlow is a service that enables creating managed, bi-directional data transfer integrations between various SaaS applications and AWS services.

For more information, see the [Amazon AppFlow User Guide](https://docs.aws.amazon.com/appflow/latest/userguide/what-is-appflow.html).

## Example

```python
from aws_cdk import SecretValue
from aws_cdk.aws_s3 import Bucket
from aws_cdk.aws_secretsmanager import ISecret
from cdklabs.cdk_appflow import ISource, IDestination, Filter, FilterCondition, Mapping, OnDemandFlow, S3Destination, SalesforceConnectorProfile, SalesforceSource, Transform, Validation, ValidationAction, ValidationCondition

# client_secret: ISecret
# access_token: SecretValue
# refresh_token: SecretValue
# instance_url: str


profile = SalesforceConnectorProfile(self, "MyConnectorProfile",
    o_auth=SalesforceOAuthSettings(
        access_token=access_token,
        flow=SalesforceOAuthFlow(
            refresh_token_grant=SalesforceOAuthRefreshTokenGrantFlow(
                refresh_token=refresh_token,
                client=client_secret
            )
        )
    ),
    instance_url=instance_url,
    is_sandbox=False
)

source = SalesforceSource(
    profile=profile,
    object="Account"
)

bucket = Bucket(self, "DestinationBucket")

destination = S3Destination(
    location=S3Location(bucket=bucket)
)

OnDemandFlow(self, "SfAccountToS3",
    source=source,
    destination=destination,
    mappings=[Mapping.map_all()],
    transforms=[
        Transform.mask(Field(name="Name"), "*")
    ],
    validations=[
        Validation.when(ValidationCondition.is_null("Name"), ValidationAction.ignore_record())
    ],
    filters=[
        Filter.when(FilterCondition.timestamp_less_than_equals(Field(name="LastModifiedDate", data_type="datetime"), Date(Date.parse("2022-02-02"))))
    ]
)
```

# Concepts

Amazon AppFlow introduces several concepts that abstract away the technicalities of setting up and managing data integrations.

An `Application` is any SaaS data integration component that can be either a *source* or a *destination* for Amazon AppFlow. A source is an application from which Amazon AppFlow will retrieve data, whereas a destination is an application to which Amazon AppFlow will send data.

A `Flow` is Amazon AppFlow's integration between a source and a destination.

A `ConnectorProfile` is Amazon AppFlow's abstraction over authentication/authorization with a particular SaaS application. The per-SaaS application permissions given to a particular `ConnectorProfile` will determine whether the connector profile can support the application as a source or as a destination (see whether a particular application is supported as either a source or a destination in [the documentation](https://docs.aws.amazon.com/appflow/latest/userguide/app-specific.html)).

## Types of Flows

The library introduces three, separate types of flows:

* `OnDemandFlow` - a construct representing a flow that can be triggered programmatically with the use of a [StartFlow API call](https://docs.aws.amazon.com/appflow/1.0/APIReference/API_StartFlow.html).
* `OnEventFlow` - a construct representing a flow that is triggered by a SaaS application event published to AppFlow. At the time of writing only a Salesforce source is able to publish events that can be consumed by AppFlow flows.
* `OnScheduleFlow` - a construct representing a flow that is triggered on a [`Schedule`](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_events.Schedule.html)

## Tasks

Tasks are steps that can be taken upon fields. Tasks compose higher level objects that in this library are named `Operations`. There are four operations identified:

* Transforms - 1-1 transforms on source fields, like truncation or masking
* Mappings - 1-1 or many-to-1 operations from source fields to a destination field
* Filters - operations that limit the source data on a particular conditions
* Validations - operations that work on a per-record level and can have either a record-level consequence (i.e. dropping the record) or a global one (terminating the flow).

Each flow exposes dedicated properties to each of the operation types that one can use like in the example below:

```python
from cdklabs.cdk_appflow import Filter, FilterCondition, IDestination, ISource, Mapping, OnDemandFlow, S3Destination, SalesforceConnectorProfile, SalesforceSource, Transform, Validation, ValidationAction, ValidationCondition

# stack: Stack
# source: ISource
# destination: IDestination


flow = OnDemandFlow(stack, "OnDemandFlow",
    source=source,
    destination=destination,
    transforms=[
        Transform.mask(Field(name="Name"), "*")
    ],
    mappings=[
        Mapping.map(Field(name="Name", data_type="String"), name="Name", data_type="string")
    ],
    filters=[
        Filter.when(FilterCondition.timestamp_less_than_equals(Field(name="LastModifiedDate", data_type="datetime"), Date(Date.parse("2022-02-02"))))
    ],
    validations=[
        Validation.when(ValidationCondition.is_null("Name"), ValidationAction.ignore_record())
    ]
)
```

## Monitoring

### Metrcis

Each flow allows to access metrics through the methods:

* `metricFlowExecutionsStarted`
* `metricFlowExecutionsFailed`
* `metricFlowExecutionsSucceeded`
* `metricFlowExecutionTime`
* `metricFlowExecutionRecordsProcessed`

For detailed information about AppFlow metrics refer to [the documentation](https://docs.aws.amazon.com/appflow/latest/userguide/monitoring-cloudwatch.html).

It can be consume by CloudWatch alert using as in the example below:

```python
from cdklabs.cdk_appflow import IFlow

# flow: IFlow
# stack: Stack


metric = flow.metric_flow_executions_started()

metric.create_alarm(stack, "FlowExecutionsStartedAlarm",
    threshold=1000,
    evaluation_periods=2
)
```

### EventBridge notifications

Each flow publishes events to the default EventBridge bus:

* `onRunStarted`
* `onRunCompleted`
* `onDeactivated` (only for the `OnEventFlow` and the `OnScheduleFlow`)
* `onStatus` (only for the `OnEventFlow` )

This way one can consume the notifications as in the example below:

```python
from aws_cdk.aws_sns import ITopic
from aws_cdk.aws_events_targets import SnsTopic
from cdklabs.cdk_appflow import IFlow

# flow: IFlow
# my_topic: ITopic


flow.on_run_completed("OnRunCompleted",
    target=SnsTopic(my_topic)
)
```

# Notable distinctions from CloudFormation specification

## `OnScheduleFlow` and `incrementalPullConfig`

In CloudFormation the definition of the `incrementalPullConfig` (which effectively gives a name of the field used for tracking the last pulled timestamp) is on the [`SourceFlowConfig`](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-appflow-flow-sourceflowconfig.html#cfn-appflow-flow-sourceflowconfig-incrementalpullconfig) property. In the library this has been moved to the `OnScheduleFlow` constructor properties.

## `S3Destination` and Glue Catalog

Although in CloudFormation the Glue Catalog configuration is settable on the flow level - it works only when the destination is S3. That is why the library shifts the Glue Catalog properties definition to the `S3Destination`, which in turn requires using Lazy for populating `metadataCatalogConfig` in the flow.

# Security considerations

It is *recommended* to follow [data protection mechanisms for Amazon AppFlow](https://docs.aws.amazon.com/appflow/latest/userguide/data-protection.html).

## Confidential information

Amazon AppFlow application integration is done using `ConnectionProfiles`. A `ConnectionProfile` requires providing sensitive information in the form of e.g. access and refresh tokens. It is *recommended* that such information is stored securely and passed to AWS CDK securely. All sensitive fields are effectively `IResolvable` and this means they can be resolved at deploy time. With that one should follow the [best practices for credentials with CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/security-best-practices.html#creds). In this library, the sensitive fields are typed as `SecretValue` to emphasize these should not be plain strings.

An example of using a predefined AWS Secrets Manager secret for storing sensitive information can be found below:

```python
from aws_cdk.aws_secretsmanager import Secret
from cdklabs.cdk_appflow import GoogleAnalytics4ConnectorProfile

# stack: Stack


secret = Secret.from_secret_name_v2(stack, "GA4Secret", "appflow/ga4")

profile = GoogleAnalytics4ConnectorProfile(stack, "GA4Connector",
    o_auth=GoogleAnalytics4OAuthSettings(
        flow=GoogleAnalytics4OAuthFlow(
            refresh_token_grant=GoogleAnalytics4RefreshTokenGrantFlow(
                refresh_token=secret.secret_value_from_json("refreshToken"),
                client_id=secret.secret_value_from_json("clientId"),
                client_secret=secret.secret_value_from_json("clientSecret")
            )
        )
    )
)
```

## An approach to managing permissions

This library relies on an internal `AppFlowPermissionsManager` class to automatically infer and apply appropriate resource policy statements to the S3 Bucket, KMS Key, and Secrets Manager Secret resources. `AppFlowPermissionsManager` places the statements exactly once for the `appflow.amazonaws.com` principal no matter how many times a resource is reused in the code.

### Confused Deputy Problem

Amazon AppFlow is an account-bound and a regional service. With this it is invurlnerable to the confused deputy problem (see, e.g. [here](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html)). However, `AppFlowPermissionsManager` still introduces the `aws:SourceAccount` condtition to the resource policies as a *best practice*.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cdklabs/cdk-appflow.git",
    "name": "cdklabs.cdk-appflow",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "~=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Amazon Web Services<aws-cdk-dev@amazon.com>",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/7c/8a/a9622db11a808356cbfd803ce3f8a21c9cb6e5318649f63a99072b6c9842/cdklabs.cdk-appflow-0.0.34.tar.gz",
    "platform": null,
    "description": "# Amazon AppFlow Construct Library\n\n*Note:* this library is currently in technical preview.\n\n## Introduction\n\nAmazon AppFlow is a service that enables creating managed, bi-directional data transfer integrations between various SaaS applications and AWS services.\n\nFor more information, see the [Amazon AppFlow User Guide](https://docs.aws.amazon.com/appflow/latest/userguide/what-is-appflow.html).\n\n## Example\n\n```python\nfrom aws_cdk import SecretValue\nfrom aws_cdk.aws_s3 import Bucket\nfrom aws_cdk.aws_secretsmanager import ISecret\nfrom cdklabs.cdk_appflow import ISource, IDestination, Filter, FilterCondition, Mapping, OnDemandFlow, S3Destination, SalesforceConnectorProfile, SalesforceSource, Transform, Validation, ValidationAction, ValidationCondition\n\n# client_secret: ISecret\n# access_token: SecretValue\n# refresh_token: SecretValue\n# instance_url: str\n\n\nprofile = SalesforceConnectorProfile(self, \"MyConnectorProfile\",\n    o_auth=SalesforceOAuthSettings(\n        access_token=access_token,\n        flow=SalesforceOAuthFlow(\n            refresh_token_grant=SalesforceOAuthRefreshTokenGrantFlow(\n                refresh_token=refresh_token,\n                client=client_secret\n            )\n        )\n    ),\n    instance_url=instance_url,\n    is_sandbox=False\n)\n\nsource = SalesforceSource(\n    profile=profile,\n    object=\"Account\"\n)\n\nbucket = Bucket(self, \"DestinationBucket\")\n\ndestination = S3Destination(\n    location=S3Location(bucket=bucket)\n)\n\nOnDemandFlow(self, \"SfAccountToS3\",\n    source=source,\n    destination=destination,\n    mappings=[Mapping.map_all()],\n    transforms=[\n        Transform.mask(Field(name=\"Name\"), \"*\")\n    ],\n    validations=[\n        Validation.when(ValidationCondition.is_null(\"Name\"), ValidationAction.ignore_record())\n    ],\n    filters=[\n        Filter.when(FilterCondition.timestamp_less_than_equals(Field(name=\"LastModifiedDate\", data_type=\"datetime\"), Date(Date.parse(\"2022-02-02\"))))\n    ]\n)\n```\n\n# Concepts\n\nAmazon AppFlow introduces several concepts that abstract away the technicalities of setting up and managing data integrations.\n\nAn `Application` is any SaaS data integration component that can be either a *source* or a *destination* for Amazon AppFlow. A source is an application from which Amazon AppFlow will retrieve data, whereas a destination is an application to which Amazon AppFlow will send data.\n\nA `Flow` is Amazon AppFlow's integration between a source and a destination.\n\nA `ConnectorProfile` is Amazon AppFlow's abstraction over authentication/authorization with a particular SaaS application. The per-SaaS application permissions given to a particular `ConnectorProfile` will determine whether the connector profile can support the application as a source or as a destination (see whether a particular application is supported as either a source or a destination in [the documentation](https://docs.aws.amazon.com/appflow/latest/userguide/app-specific.html)).\n\n## Types of Flows\n\nThe library introduces three, separate types of flows:\n\n* `OnDemandFlow` - a construct representing a flow that can be triggered programmatically with the use of a [StartFlow API call](https://docs.aws.amazon.com/appflow/1.0/APIReference/API_StartFlow.html).\n* `OnEventFlow` - a construct representing a flow that is triggered by a SaaS application event published to AppFlow. At the time of writing only a Salesforce source is able to publish events that can be consumed by AppFlow flows.\n* `OnScheduleFlow` - a construct representing a flow that is triggered on a [`Schedule`](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_events.Schedule.html)\n\n## Tasks\n\nTasks are steps that can be taken upon fields. Tasks compose higher level objects that in this library are named `Operations`. There are four operations identified:\n\n* Transforms - 1-1 transforms on source fields, like truncation or masking\n* Mappings - 1-1 or many-to-1 operations from source fields to a destination field\n* Filters - operations that limit the source data on a particular conditions\n* Validations - operations that work on a per-record level and can have either a record-level consequence (i.e. dropping the record) or a global one (terminating the flow).\n\nEach flow exposes dedicated properties to each of the operation types that one can use like in the example below:\n\n```python\nfrom cdklabs.cdk_appflow import Filter, FilterCondition, IDestination, ISource, Mapping, OnDemandFlow, S3Destination, SalesforceConnectorProfile, SalesforceSource, Transform, Validation, ValidationAction, ValidationCondition\n\n# stack: Stack\n# source: ISource\n# destination: IDestination\n\n\nflow = OnDemandFlow(stack, \"OnDemandFlow\",\n    source=source,\n    destination=destination,\n    transforms=[\n        Transform.mask(Field(name=\"Name\"), \"*\")\n    ],\n    mappings=[\n        Mapping.map(Field(name=\"Name\", data_type=\"String\"), name=\"Name\", data_type=\"string\")\n    ],\n    filters=[\n        Filter.when(FilterCondition.timestamp_less_than_equals(Field(name=\"LastModifiedDate\", data_type=\"datetime\"), Date(Date.parse(\"2022-02-02\"))))\n    ],\n    validations=[\n        Validation.when(ValidationCondition.is_null(\"Name\"), ValidationAction.ignore_record())\n    ]\n)\n```\n\n## Monitoring\n\n### Metrcis\n\nEach flow allows to access metrics through the methods:\n\n* `metricFlowExecutionsStarted`\n* `metricFlowExecutionsFailed`\n* `metricFlowExecutionsSucceeded`\n* `metricFlowExecutionTime`\n* `metricFlowExecutionRecordsProcessed`\n\nFor detailed information about AppFlow metrics refer to [the documentation](https://docs.aws.amazon.com/appflow/latest/userguide/monitoring-cloudwatch.html).\n\nIt can be consume by CloudWatch alert using as in the example below:\n\n```python\nfrom cdklabs.cdk_appflow import IFlow\n\n# flow: IFlow\n# stack: Stack\n\n\nmetric = flow.metric_flow_executions_started()\n\nmetric.create_alarm(stack, \"FlowExecutionsStartedAlarm\",\n    threshold=1000,\n    evaluation_periods=2\n)\n```\n\n### EventBridge notifications\n\nEach flow publishes events to the default EventBridge bus:\n\n* `onRunStarted`\n* `onRunCompleted`\n* `onDeactivated` (only for the `OnEventFlow` and the `OnScheduleFlow`)\n* `onStatus` (only for the `OnEventFlow` )\n\nThis way one can consume the notifications as in the example below:\n\n```python\nfrom aws_cdk.aws_sns import ITopic\nfrom aws_cdk.aws_events_targets import SnsTopic\nfrom cdklabs.cdk_appflow import IFlow\n\n# flow: IFlow\n# my_topic: ITopic\n\n\nflow.on_run_completed(\"OnRunCompleted\",\n    target=SnsTopic(my_topic)\n)\n```\n\n# Notable distinctions from CloudFormation specification\n\n## `OnScheduleFlow` and `incrementalPullConfig`\n\nIn CloudFormation the definition of the `incrementalPullConfig` (which effectively gives a name of the field used for tracking the last pulled timestamp) is on the [`SourceFlowConfig`](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-appflow-flow-sourceflowconfig.html#cfn-appflow-flow-sourceflowconfig-incrementalpullconfig) property. In the library this has been moved to the `OnScheduleFlow` constructor properties.\n\n## `S3Destination` and Glue Catalog\n\nAlthough in CloudFormation the Glue Catalog configuration is settable on the flow level - it works only when the destination is S3. That is why the library shifts the Glue Catalog properties definition to the `S3Destination`, which in turn requires using Lazy for populating `metadataCatalogConfig` in the flow.\n\n# Security considerations\n\nIt is *recommended* to follow [data protection mechanisms for Amazon AppFlow](https://docs.aws.amazon.com/appflow/latest/userguide/data-protection.html).\n\n## Confidential information\n\nAmazon AppFlow application integration is done using `ConnectionProfiles`. A `ConnectionProfile` requires providing sensitive information in the form of e.g. access and refresh tokens. It is *recommended* that such information is stored securely and passed to AWS CDK securely. All sensitive fields are effectively `IResolvable` and this means they can be resolved at deploy time. With that one should follow the [best practices for credentials with CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/security-best-practices.html#creds). In this library, the sensitive fields are typed as `SecretValue` to emphasize these should not be plain strings.\n\nAn example of using a predefined AWS Secrets Manager secret for storing sensitive information can be found below:\n\n```python\nfrom aws_cdk.aws_secretsmanager import Secret\nfrom cdklabs.cdk_appflow import GoogleAnalytics4ConnectorProfile\n\n# stack: Stack\n\n\nsecret = Secret.from_secret_name_v2(stack, \"GA4Secret\", \"appflow/ga4\")\n\nprofile = GoogleAnalytics4ConnectorProfile(stack, \"GA4Connector\",\n    o_auth=GoogleAnalytics4OAuthSettings(\n        flow=GoogleAnalytics4OAuthFlow(\n            refresh_token_grant=GoogleAnalytics4RefreshTokenGrantFlow(\n                refresh_token=secret.secret_value_from_json(\"refreshToken\"),\n                client_id=secret.secret_value_from_json(\"clientId\"),\n                client_secret=secret.secret_value_from_json(\"clientSecret\")\n            )\n        )\n    )\n)\n```\n\n## An approach to managing permissions\n\nThis library relies on an internal `AppFlowPermissionsManager` class to automatically infer and apply appropriate resource policy statements to the S3 Bucket, KMS Key, and Secrets Manager Secret resources. `AppFlowPermissionsManager` places the statements exactly once for the `appflow.amazonaws.com` principal no matter how many times a resource is reused in the code.\n\n### Confused Deputy Problem\n\nAmazon AppFlow is an account-bound and a regional service. With this it is invurlnerable to the confused deputy problem (see, e.g. [here](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html)). However, `AppFlowPermissionsManager` still introduces the `aws:SourceAccount` condtition to the resource policies as a *best practice*.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "@cdklabs/cdk-appflow",
    "version": "0.0.34",
    "project_urls": {
        "Homepage": "https://github.com/cdklabs/cdk-appflow.git",
        "Source": "https://github.com/cdklabs/cdk-appflow.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e34fc6e8081f90ae16121ddff43db51fece28081ed7476885d5eb9f02f71ab17",
                "md5": "85f359b5a15c5e42ca38ef9de6f152af",
                "sha256": "98dc91d91122dd12e0b0ec1ba3cd5a326b83ea88dfddab701359e1cfa5852a1c"
            },
            "downloads": -1,
            "filename": "cdklabs.cdk_appflow-0.0.34-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "85f359b5a15c5e42ca38ef9de6f152af",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "~=3.8",
            "size": 360217,
            "upload_time": "2024-04-05T18:15:17",
            "upload_time_iso_8601": "2024-04-05T18:15:17.248005Z",
            "url": "https://files.pythonhosted.org/packages/e3/4f/c6e8081f90ae16121ddff43db51fece28081ed7476885d5eb9f02f71ab17/cdklabs.cdk_appflow-0.0.34-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7c8aa9622db11a808356cbfd803ce3f8a21c9cb6e5318649f63a99072b6c9842",
                "md5": "11136fb55fdabaa67a89a0b18128b028",
                "sha256": "c09d287ae66eb494dba5b2397ecdd03019089fe40a0ccbe5348f205220a1a3f6"
            },
            "downloads": -1,
            "filename": "cdklabs.cdk-appflow-0.0.34.tar.gz",
            "has_sig": false,
            "md5_digest": "11136fb55fdabaa67a89a0b18128b028",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "~=3.8",
            "size": 361000,
            "upload_time": "2024-04-05T18:15:19",
            "upload_time_iso_8601": "2024-04-05T18:15:19.045067Z",
            "url": "https://files.pythonhosted.org/packages/7c/8a/a9622db11a808356cbfd803ce3f8a21c9cb6e5318649f63a99072b6c9842/cdklabs.cdk-appflow-0.0.34.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-05 18:15:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cdklabs",
    "github_project": "cdk-appflow",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cdklabs.cdk-appflow"
}
        
Elapsed time: 0.25485s