# Amazon AppFlow Construct Library
*Note:* this library is currently in technical preview.
## Introduction
Amazon AppFlow is a service that enables creating managed, bi-directional data transfer integrations between various SaaS applications and AWS services.
For more information, see the [Amazon AppFlow User Guide](https://docs.aws.amazon.com/appflow/latest/userguide/what-is-appflow.html).
## Example
```python
from aws_cdk import SecretValue
from aws_cdk.aws_s3 import Bucket
from aws_cdk.aws_secretsmanager import ISecret
from cdklabs.cdk_appflow import ISource, IDestination, Filter, FilterCondition, Mapping, OnDemandFlow, S3Destination, SalesforceConnectorProfile, SalesforceSource, Transform, Validation, ValidationAction, ValidationCondition
# client_secret: ISecret
# access_token: SecretValue
# refresh_token: SecretValue
# instance_url: str
profile = SalesforceConnectorProfile(self, "MyConnectorProfile",
o_auth=SalesforceOAuthSettings(
access_token=access_token,
flow=SalesforceOAuthFlow(
refresh_token_grant=SalesforceOAuthRefreshTokenGrantFlow(
refresh_token=refresh_token,
client=client_secret
)
)
),
instance_url=instance_url,
is_sandbox=False
)
source = SalesforceSource(
profile=profile,
object="Account"
)
bucket = Bucket(self, "DestinationBucket")
destination = S3Destination(
location=S3Location(bucket=bucket)
)
OnDemandFlow(self, "SfAccountToS3",
source=source,
destination=destination,
mappings=[Mapping.map_all()],
transforms=[
Transform.mask(Field(name="Name"), "*")
],
validations=[
Validation.when(ValidationCondition.is_null("Name"), ValidationAction.ignore_record())
],
filters=[
Filter.when(FilterCondition.timestamp_less_than_equals(Field(name="LastModifiedDate", data_type="datetime"), Date(Date.parse("2022-02-02"))))
]
)
```
# Concepts
Amazon AppFlow introduces several concepts that abstract away the technicalities of setting up and managing data integrations.
An `Application` is any SaaS data integration component that can be either a *source* or a *destination* for Amazon AppFlow. A source is an application from which Amazon AppFlow will retrieve data, whereas a destination is an application to which Amazon AppFlow will send data.
A `Flow` is Amazon AppFlow's integration between a source and a destination.
A `ConnectorProfile` is Amazon AppFlow's abstraction over authentication/authorization with a particular SaaS application. The per-SaaS application permissions given to a particular `ConnectorProfile` will determine whether the connector profile can support the application as a source or as a destination (see whether a particular application is supported as either a source or a destination in [the documentation](https://docs.aws.amazon.com/appflow/latest/userguide/app-specific.html)).
## Types of Flows
The library introduces three, separate types of flows:
* `OnDemandFlow` - a construct representing a flow that can be triggered programmatically with the use of a [StartFlow API call](https://docs.aws.amazon.com/appflow/1.0/APIReference/API_StartFlow.html).
* `OnEventFlow` - a construct representing a flow that is triggered by a SaaS application event published to AppFlow. At the time of writing only a Salesforce source is able to publish events that can be consumed by AppFlow flows.
* `OnScheduleFlow` - a construct representing a flow that is triggered on a [`Schedule`](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_events.Schedule.html)
## Tasks
Tasks are steps that can be taken upon fields. Tasks compose higher level objects that in this library are named `Operations`. There are four operations identified:
* Transforms - 1-1 transforms on source fields, like truncation or masking
* Mappings - 1-1 or many-to-1 operations from source fields to a destination field
* Filters - operations that limit the source data on a particular conditions
* Validations - operations that work on a per-record level and can have either a record-level consequence (i.e. dropping the record) or a global one (terminating the flow).
Each flow exposes dedicated properties to each of the operation types that one can use like in the example below:
```python
from cdklabs.cdk_appflow import Filter, FilterCondition, IDestination, ISource, Mapping, OnDemandFlow, S3Destination, SalesforceConnectorProfile, SalesforceSource, Transform, Validation, ValidationAction, ValidationCondition
# stack: Stack
# source: ISource
# destination: IDestination
flow = OnDemandFlow(stack, "OnDemandFlow",
source=source,
destination=destination,
transforms=[
Transform.mask(Field(name="Name"), "*")
],
mappings=[
Mapping.map(Field(name="Name", data_type="String"), name="Name", data_type="string")
],
filters=[
Filter.when(FilterCondition.timestamp_less_than_equals(Field(name="LastModifiedDate", data_type="datetime"), Date(Date.parse("2022-02-02"))))
],
validations=[
Validation.when(ValidationCondition.is_null("Name"), ValidationAction.ignore_record())
]
)
```
## Monitoring
### Metrcis
Each flow allows to access metrics through the methods:
* `metricFlowExecutionsStarted`
* `metricFlowExecutionsFailed`
* `metricFlowExecutionsSucceeded`
* `metricFlowExecutionTime`
* `metricFlowExecutionRecordsProcessed`
For detailed information about AppFlow metrics refer to [the documentation](https://docs.aws.amazon.com/appflow/latest/userguide/monitoring-cloudwatch.html).
It can be consume by CloudWatch alert using as in the example below:
```python
from cdklabs.cdk_appflow import IFlow
# flow: IFlow
# stack: Stack
metric = flow.metric_flow_executions_started()
metric.create_alarm(stack, "FlowExecutionsStartedAlarm",
threshold=1000,
evaluation_periods=2
)
```
### EventBridge notifications
Each flow publishes events to the default EventBridge bus:
* `onRunStarted`
* `onRunCompleted`
* `onDeactivated` (only for the `OnEventFlow` and the `OnScheduleFlow`)
* `onStatus` (only for the `OnEventFlow` )
This way one can consume the notifications as in the example below:
```python
from aws_cdk.aws_sns import ITopic
from aws_cdk.aws_events_targets import SnsTopic
from cdklabs.cdk_appflow import IFlow
# flow: IFlow
# my_topic: ITopic
flow.on_run_completed("OnRunCompleted",
target=SnsTopic(my_topic)
)
```
# Notable distinctions from CloudFormation specification
## `OnScheduleFlow` and `incrementalPullConfig`
In CloudFormation the definition of the `incrementalPullConfig` (which effectively gives a name of the field used for tracking the last pulled timestamp) is on the [`SourceFlowConfig`](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-appflow-flow-sourceflowconfig.html#cfn-appflow-flow-sourceflowconfig-incrementalpullconfig) property. In the library this has been moved to the `OnScheduleFlow` constructor properties.
## `S3Destination` and Glue Catalog
Although in CloudFormation the Glue Catalog configuration is settable on the flow level - it works only when the destination is S3. That is why the library shifts the Glue Catalog properties definition to the `S3Destination`, which in turn requires using Lazy for populating `metadataCatalogConfig` in the flow.
# Security considerations
It is *recommended* to follow [data protection mechanisms for Amazon AppFlow](https://docs.aws.amazon.com/appflow/latest/userguide/data-protection.html).
## Confidential information
Amazon AppFlow application integration is done using `ConnectionProfiles`. A `ConnectionProfile` requires providing sensitive information in the form of e.g. access and refresh tokens. It is *recommended* that such information is stored securely and passed to AWS CDK securely. All sensitive fields are effectively `IResolvable` and this means they can be resolved at deploy time. With that one should follow the [best practices for credentials with CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/security-best-practices.html#creds). In this library, the sensitive fields are typed as `SecretValue` to emphasize these should not be plain strings.
An example of using a predefined AWS Secrets Manager secret for storing sensitive information can be found below:
```python
from aws_cdk.aws_secretsmanager import Secret
from cdklabs.cdk_appflow import GoogleAnalytics4ConnectorProfile
# stack: Stack
secret = Secret.from_secret_name_v2(stack, "GA4Secret", "appflow/ga4")
profile = GoogleAnalytics4ConnectorProfile(stack, "GA4Connector",
o_auth=GoogleAnalytics4OAuthSettings(
flow=GoogleAnalytics4OAuthFlow(
refresh_token_grant=GoogleAnalytics4RefreshTokenGrantFlow(
refresh_token=secret.secret_value_from_json("refreshToken"),
client_id=secret.secret_value_from_json("clientId"),
client_secret=secret.secret_value_from_json("clientSecret")
)
)
)
)
```
## An approach to managing permissions
This library relies on an internal `AppFlowPermissionsManager` class to automatically infer and apply appropriate resource policy statements to the S3 Bucket, KMS Key, and Secrets Manager Secret resources. `AppFlowPermissionsManager` places the statements exactly once for the `appflow.amazonaws.com` principal no matter how many times a resource is reused in the code.
### Confused Deputy Problem
Amazon AppFlow is an account-bound and a regional service. With this it is invurlnerable to the confused deputy problem (see, e.g. [here](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html)). However, `AppFlowPermissionsManager` still introduces the `aws:SourceAccount` condtition to the resource policies as a *best practice*.
Raw data
{
"_id": null,
"home_page": "https://github.com/cdklabs/cdk-appflow.git",
"name": "cdklabs.cdk-appflow",
"maintainer": null,
"docs_url": null,
"requires_python": "~=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Amazon Web Services<aws-cdk-dev@amazon.com>",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/7c/8a/a9622db11a808356cbfd803ce3f8a21c9cb6e5318649f63a99072b6c9842/cdklabs.cdk-appflow-0.0.34.tar.gz",
"platform": null,
"description": "# Amazon AppFlow Construct Library\n\n*Note:* this library is currently in technical preview.\n\n## Introduction\n\nAmazon AppFlow is a service that enables creating managed, bi-directional data transfer integrations between various SaaS applications and AWS services.\n\nFor more information, see the [Amazon AppFlow User Guide](https://docs.aws.amazon.com/appflow/latest/userguide/what-is-appflow.html).\n\n## Example\n\n```python\nfrom aws_cdk import SecretValue\nfrom aws_cdk.aws_s3 import Bucket\nfrom aws_cdk.aws_secretsmanager import ISecret\nfrom cdklabs.cdk_appflow import ISource, IDestination, Filter, FilterCondition, Mapping, OnDemandFlow, S3Destination, SalesforceConnectorProfile, SalesforceSource, Transform, Validation, ValidationAction, ValidationCondition\n\n# client_secret: ISecret\n# access_token: SecretValue\n# refresh_token: SecretValue\n# instance_url: str\n\n\nprofile = SalesforceConnectorProfile(self, \"MyConnectorProfile\",\n o_auth=SalesforceOAuthSettings(\n access_token=access_token,\n flow=SalesforceOAuthFlow(\n refresh_token_grant=SalesforceOAuthRefreshTokenGrantFlow(\n refresh_token=refresh_token,\n client=client_secret\n )\n )\n ),\n instance_url=instance_url,\n is_sandbox=False\n)\n\nsource = SalesforceSource(\n profile=profile,\n object=\"Account\"\n)\n\nbucket = Bucket(self, \"DestinationBucket\")\n\ndestination = S3Destination(\n location=S3Location(bucket=bucket)\n)\n\nOnDemandFlow(self, \"SfAccountToS3\",\n source=source,\n destination=destination,\n mappings=[Mapping.map_all()],\n transforms=[\n Transform.mask(Field(name=\"Name\"), \"*\")\n ],\n validations=[\n Validation.when(ValidationCondition.is_null(\"Name\"), ValidationAction.ignore_record())\n ],\n filters=[\n Filter.when(FilterCondition.timestamp_less_than_equals(Field(name=\"LastModifiedDate\", data_type=\"datetime\"), Date(Date.parse(\"2022-02-02\"))))\n ]\n)\n```\n\n# Concepts\n\nAmazon AppFlow introduces several concepts that abstract away the technicalities of setting up and managing data integrations.\n\nAn `Application` is any SaaS data integration component that can be either a *source* or a *destination* for Amazon AppFlow. A source is an application from which Amazon AppFlow will retrieve data, whereas a destination is an application to which Amazon AppFlow will send data.\n\nA `Flow` is Amazon AppFlow's integration between a source and a destination.\n\nA `ConnectorProfile` is Amazon AppFlow's abstraction over authentication/authorization with a particular SaaS application. The per-SaaS application permissions given to a particular `ConnectorProfile` will determine whether the connector profile can support the application as a source or as a destination (see whether a particular application is supported as either a source or a destination in [the documentation](https://docs.aws.amazon.com/appflow/latest/userguide/app-specific.html)).\n\n## Types of Flows\n\nThe library introduces three, separate types of flows:\n\n* `OnDemandFlow` - a construct representing a flow that can be triggered programmatically with the use of a [StartFlow API call](https://docs.aws.amazon.com/appflow/1.0/APIReference/API_StartFlow.html).\n* `OnEventFlow` - a construct representing a flow that is triggered by a SaaS application event published to AppFlow. At the time of writing only a Salesforce source is able to publish events that can be consumed by AppFlow flows.\n* `OnScheduleFlow` - a construct representing a flow that is triggered on a [`Schedule`](https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_events.Schedule.html)\n\n## Tasks\n\nTasks are steps that can be taken upon fields. Tasks compose higher level objects that in this library are named `Operations`. There are four operations identified:\n\n* Transforms - 1-1 transforms on source fields, like truncation or masking\n* Mappings - 1-1 or many-to-1 operations from source fields to a destination field\n* Filters - operations that limit the source data on a particular conditions\n* Validations - operations that work on a per-record level and can have either a record-level consequence (i.e. dropping the record) or a global one (terminating the flow).\n\nEach flow exposes dedicated properties to each of the operation types that one can use like in the example below:\n\n```python\nfrom cdklabs.cdk_appflow import Filter, FilterCondition, IDestination, ISource, Mapping, OnDemandFlow, S3Destination, SalesforceConnectorProfile, SalesforceSource, Transform, Validation, ValidationAction, ValidationCondition\n\n# stack: Stack\n# source: ISource\n# destination: IDestination\n\n\nflow = OnDemandFlow(stack, \"OnDemandFlow\",\n source=source,\n destination=destination,\n transforms=[\n Transform.mask(Field(name=\"Name\"), \"*\")\n ],\n mappings=[\n Mapping.map(Field(name=\"Name\", data_type=\"String\"), name=\"Name\", data_type=\"string\")\n ],\n filters=[\n Filter.when(FilterCondition.timestamp_less_than_equals(Field(name=\"LastModifiedDate\", data_type=\"datetime\"), Date(Date.parse(\"2022-02-02\"))))\n ],\n validations=[\n Validation.when(ValidationCondition.is_null(\"Name\"), ValidationAction.ignore_record())\n ]\n)\n```\n\n## Monitoring\n\n### Metrcis\n\nEach flow allows to access metrics through the methods:\n\n* `metricFlowExecutionsStarted`\n* `metricFlowExecutionsFailed`\n* `metricFlowExecutionsSucceeded`\n* `metricFlowExecutionTime`\n* `metricFlowExecutionRecordsProcessed`\n\nFor detailed information about AppFlow metrics refer to [the documentation](https://docs.aws.amazon.com/appflow/latest/userguide/monitoring-cloudwatch.html).\n\nIt can be consume by CloudWatch alert using as in the example below:\n\n```python\nfrom cdklabs.cdk_appflow import IFlow\n\n# flow: IFlow\n# stack: Stack\n\n\nmetric = flow.metric_flow_executions_started()\n\nmetric.create_alarm(stack, \"FlowExecutionsStartedAlarm\",\n threshold=1000,\n evaluation_periods=2\n)\n```\n\n### EventBridge notifications\n\nEach flow publishes events to the default EventBridge bus:\n\n* `onRunStarted`\n* `onRunCompleted`\n* `onDeactivated` (only for the `OnEventFlow` and the `OnScheduleFlow`)\n* `onStatus` (only for the `OnEventFlow` )\n\nThis way one can consume the notifications as in the example below:\n\n```python\nfrom aws_cdk.aws_sns import ITopic\nfrom aws_cdk.aws_events_targets import SnsTopic\nfrom cdklabs.cdk_appflow import IFlow\n\n# flow: IFlow\n# my_topic: ITopic\n\n\nflow.on_run_completed(\"OnRunCompleted\",\n target=SnsTopic(my_topic)\n)\n```\n\n# Notable distinctions from CloudFormation specification\n\n## `OnScheduleFlow` and `incrementalPullConfig`\n\nIn CloudFormation the definition of the `incrementalPullConfig` (which effectively gives a name of the field used for tracking the last pulled timestamp) is on the [`SourceFlowConfig`](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-appflow-flow-sourceflowconfig.html#cfn-appflow-flow-sourceflowconfig-incrementalpullconfig) property. In the library this has been moved to the `OnScheduleFlow` constructor properties.\n\n## `S3Destination` and Glue Catalog\n\nAlthough in CloudFormation the Glue Catalog configuration is settable on the flow level - it works only when the destination is S3. That is why the library shifts the Glue Catalog properties definition to the `S3Destination`, which in turn requires using Lazy for populating `metadataCatalogConfig` in the flow.\n\n# Security considerations\n\nIt is *recommended* to follow [data protection mechanisms for Amazon AppFlow](https://docs.aws.amazon.com/appflow/latest/userguide/data-protection.html).\n\n## Confidential information\n\nAmazon AppFlow application integration is done using `ConnectionProfiles`. A `ConnectionProfile` requires providing sensitive information in the form of e.g. access and refresh tokens. It is *recommended* that such information is stored securely and passed to AWS CDK securely. All sensitive fields are effectively `IResolvable` and this means they can be resolved at deploy time. With that one should follow the [best practices for credentials with CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/security-best-practices.html#creds). In this library, the sensitive fields are typed as `SecretValue` to emphasize these should not be plain strings.\n\nAn example of using a predefined AWS Secrets Manager secret for storing sensitive information can be found below:\n\n```python\nfrom aws_cdk.aws_secretsmanager import Secret\nfrom cdklabs.cdk_appflow import GoogleAnalytics4ConnectorProfile\n\n# stack: Stack\n\n\nsecret = Secret.from_secret_name_v2(stack, \"GA4Secret\", \"appflow/ga4\")\n\nprofile = GoogleAnalytics4ConnectorProfile(stack, \"GA4Connector\",\n o_auth=GoogleAnalytics4OAuthSettings(\n flow=GoogleAnalytics4OAuthFlow(\n refresh_token_grant=GoogleAnalytics4RefreshTokenGrantFlow(\n refresh_token=secret.secret_value_from_json(\"refreshToken\"),\n client_id=secret.secret_value_from_json(\"clientId\"),\n client_secret=secret.secret_value_from_json(\"clientSecret\")\n )\n )\n )\n)\n```\n\n## An approach to managing permissions\n\nThis library relies on an internal `AppFlowPermissionsManager` class to automatically infer and apply appropriate resource policy statements to the S3 Bucket, KMS Key, and Secrets Manager Secret resources. `AppFlowPermissionsManager` places the statements exactly once for the `appflow.amazonaws.com` principal no matter how many times a resource is reused in the code.\n\n### Confused Deputy Problem\n\nAmazon AppFlow is an account-bound and a regional service. With this it is invurlnerable to the confused deputy problem (see, e.g. [here](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html)). However, `AppFlowPermissionsManager` still introduces the `aws:SourceAccount` condtition to the resource policies as a *best practice*.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "@cdklabs/cdk-appflow",
"version": "0.0.34",
"project_urls": {
"Homepage": "https://github.com/cdklabs/cdk-appflow.git",
"Source": "https://github.com/cdklabs/cdk-appflow.git"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e34fc6e8081f90ae16121ddff43db51fece28081ed7476885d5eb9f02f71ab17",
"md5": "85f359b5a15c5e42ca38ef9de6f152af",
"sha256": "98dc91d91122dd12e0b0ec1ba3cd5a326b83ea88dfddab701359e1cfa5852a1c"
},
"downloads": -1,
"filename": "cdklabs.cdk_appflow-0.0.34-py3-none-any.whl",
"has_sig": false,
"md5_digest": "85f359b5a15c5e42ca38ef9de6f152af",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "~=3.8",
"size": 360217,
"upload_time": "2024-04-05T18:15:17",
"upload_time_iso_8601": "2024-04-05T18:15:17.248005Z",
"url": "https://files.pythonhosted.org/packages/e3/4f/c6e8081f90ae16121ddff43db51fece28081ed7476885d5eb9f02f71ab17/cdklabs.cdk_appflow-0.0.34-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7c8aa9622db11a808356cbfd803ce3f8a21c9cb6e5318649f63a99072b6c9842",
"md5": "11136fb55fdabaa67a89a0b18128b028",
"sha256": "c09d287ae66eb494dba5b2397ecdd03019089fe40a0ccbe5348f205220a1a3f6"
},
"downloads": -1,
"filename": "cdklabs.cdk-appflow-0.0.34.tar.gz",
"has_sig": false,
"md5_digest": "11136fb55fdabaa67a89a0b18128b028",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "~=3.8",
"size": 361000,
"upload_time": "2024-04-05T18:15:19",
"upload_time_iso_8601": "2024-04-05T18:15:19.045067Z",
"url": "https://files.pythonhosted.org/packages/7c/8a/a9622db11a808356cbfd803ce3f8a21c9cb6e5318649f63a99072b6c9842/cdklabs.cdk-appflow-0.0.34.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-05 18:15:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cdklabs",
"github_project": "cdk-appflow",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "cdklabs.cdk-appflow"
}