c8connector


Namec8connector JSON
Version 0.0.32 PyPI version JSON
download
home_page
SummaryC8 Connector Interface
upload_time2023-08-22 06:03:30
maintainer
docs_urlNone
authorMacrometa
requires_python>=3.8.1,<3.11
licenseApache-2.0
keywords elt connectors workflows macrometa
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Implementing C8 Connectors.

Users can extend `C8Connector` interface and develop 3 types of connectors.

1. Source Connectors (Connectors that ingest data)
2. Target Connectors (Connectors that export data)
3. Integration Connectors (Generic integrations for other services)

When developing these connectors, developers must adhere to a few guidelines mentioned below.

## Naming the Connector

- Package name of the connector must be in the `macrometa-{type}-{connector}` format (i.e `macrometa-source-postgres`).
- Module name of the connector must be in the `macrometa_{type}_{connector}` format (i.e `macrometa_source_postgres`).

## Project structure (package names and structure)

- Project source code must follow the below structure.
```text
.
├── LICENSE
├── README.md
├── GETTING_STARTED.md
├── macrometa_{type}_{connector}
│        ├── __init__.py
│        └── main.py
│        └── {other source files or modules}
├── pyproject.toml
└── setup.cfg
```
- Within the `/macrometa_{type}_{connector}/__init__.py` there must be a class which implements `C8Connector` interface.

## Dependencies/Libraries and their versions to use.

- Connectors must only use following dependencies/libraries and mentioned versions' when developing.
```text
python = ">=3.7"
c8connector = "latest"
pipelinewise-singer-python = "1.2.0"
```
- Developers must not use `singer-sdk` or any other singer sdk variants other than `pipelinewise-singer-python`.

## Connector specific documentation

- Every connector project should have a GETTING_STARTED.md file, documenting the connector configuration and all other requirements for the connector.
  It should be formatted like a User-Facing document and it should also provide the necessary instructions for the end user to be able to use the connector.
  
  Developers can follow the Generic Template available [here](https://github.com/Macrometacorp/c8connector/blob/main/GETTING_STARTED.md) and apply any necessary changes required on top of it for the specific connector.

## Resolving reserved key conflicts between macrometa collection and external DB

- For Source connectors:
  Macrometa collection(document) has the following reserved keys, `_key`, `_id` and `_rev`. `_key` is the primary key and hence, it will have the value of the primary key of source data and `_id` and `_rev` are always autogenerated. So if `_key`, `_id`, `_rev` also exists in source data (assuming _key is not the primary key of source data) then these values from source data would be lost.
  Hence we need to append an additional `_` to these reserved keys if they are present in source data. If `_key` is the primary key of the source data itself then no need to append `_` to `_key`. We should also check that the new key generated doesn't exist in the source data, If it exists then keep appending `_`.
  
  During the actual workflow run, this logic is implemented at the target level (macrometa-target-collection). But we also need the same to be implemented at source connector levels for `samples` and `schemas` API.
  Refer [PR] (https://github.com/Macrometacorp/macrometa-source-postgres/)


- For Target connectors:
  As seen for source connector with target as (macrometa-target-collection), the same reserved keys conflict can arise in case of target connectors too, where External database might have some fixed reserved keys which might be their primary key, autogenerated or internal key. So if such reserved keys also exist in source collection then these values from source collection will be lost in the target data.
  
  In such cases we should first specify all the reserved keys as a list of string in the `reserved_keys` field of target connector. If there is a fixed primary key it should always be specified as the first element of the list, else if there isn't a fixed primary key but there are other reserved keys then the first element should be an empty string followed by the list of reserved keys, Example: ["", "reservedkey1", "reservedkey2"]. If no reserved keys exist return an empty list []. Refer [PR] (https://github.com/Macrometacorp/macrometa-target-collection/pull/9)

  In addition to this, we also need to implement the logic of appending `_` to the reserved keys (only if they exist in source collection) before writing the data in the external DB at the target connector level. If `_key` is also the reserved primary key of the target external DB then no need to append `_` to this reserved primary key. We should also check that the new key generated doesn't exist in the source collection, If it exists then keep appending `_`. Refer [PR] (https://github.com/Macrometacorp/macrometa-target-collection/pull/10)

  > **_NOTE:_** This is applicable only when there are certain keys reserved in the external database.


## Adding metrics to the connectors
- For Source connectors:
  We support the following ingest metrics for source connectors:
    ingested_bytes, ingested_documents, ingest_errors, ingest_lag
  Out of the above 4 metrics, we only need to increment ingest_errors at the source connector level whenever there is an error and start the prometheus client http server at port 8000.
  The rest of them are calculated at macrometa target collection level.
  But to calculate ingest_lag metrics the source connector needs to send the current timestamp in `time_extracted` property of the Singer Record Message.
  You can refer to other source connectors, for example:
  https://github.com/Macrometacorp/macrometa-source-postgres/pull/14/files

- For Target connectors:
  We support the following ingest metrics for target connectors:
    exported_bytes, exported_documents, export_errors, export_lag
  Out of the above 4 metrics, exported_documents and exported_bytes are calculated at Macrometa source collection level.
  We need to increment export_errors at the target connector level whenever there is an error and start the prometheus client http server at port 8001 and we also need to calculate export_lag.
  `export_lag` is nothing but the time difference in seconds between the `time_extracted` property in Singer Record Message, which is sent by macrometa source collection connector, and the current timestamp (UTC timezone).
  You can refer to other target connectors, for example:
  https://github.com/Macrometacorp/macrometa-target-postgres/pull/31/files

## State management for connectors
  We do support state management for connectors. Please refer [here](/connector_creation_cookbook/state_management_guidelines.md) for a comprehensive guide on managing states using connectors.

## Sample Connectors
- Postgres Source Connector: [Git Repository](https://github.com/Macrometacorp/macrometa-source-postgres)
- Oracle Source Connector: [Git Repository](https://github.com/Macrometacorp/macrometa-source-oracle)
- C8 Collections target Connector: [Git Repository](https://github.com/Macrometacorp/macrometa-target-collection)


            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "c8connector",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8.1,<3.11",
    "maintainer_email": "",
    "keywords": "ELT,Connectors,Workflows,Macrometa",
    "author": "Macrometa",
    "author_email": "info@macrometa.com",
    "download_url": "https://files.pythonhosted.org/packages/02/ae/ece314f9645c08b8d54b5e6fca62144bbf35520c9fa5200caf1cc7b710f1/c8connector-0.0.32.tar.gz",
    "platform": null,
    "description": "# Implementing C8 Connectors.\n\nUsers can extend `C8Connector` interface and develop 3 types of connectors.\n\n1. Source Connectors (Connectors that ingest data)\n2. Target Connectors (Connectors that export data)\n3. Integration Connectors (Generic integrations for other services)\n\nWhen developing these connectors, developers must adhere to a few guidelines mentioned below.\n\n## Naming the Connector\n\n- Package name of the connector must be in the `macrometa-{type}-{connector}` format (i.e `macrometa-source-postgres`).\n- Module name of the connector must be in the `macrometa_{type}_{connector}` format (i.e `macrometa_source_postgres`).\n\n## Project structure (package names and structure)\n\n- Project source code must follow the below structure.\n```text\n.\n\u251c\u2500\u2500 LICENSE\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 GETTING_STARTED.md\n\u251c\u2500\u2500 macrometa_{type}_{connector}\n\u2502        \u251c\u2500\u2500 __init__.py\n\u2502        \u2514\u2500\u2500 main.py\n\u2502        \u2514\u2500\u2500 {other source files or modules}\n\u251c\u2500\u2500 pyproject.toml\n\u2514\u2500\u2500 setup.cfg\n```\n- Within the `/macrometa_{type}_{connector}/__init__.py` there must be a class which implements `C8Connector` interface.\n\n## Dependencies/Libraries and their versions to use.\n\n- Connectors must only use following dependencies/libraries and mentioned versions' when developing.\n```text\npython = \">=3.7\"\nc8connector = \"latest\"\npipelinewise-singer-python = \"1.2.0\"\n```\n- Developers must not use `singer-sdk` or any other singer sdk variants other than `pipelinewise-singer-python`.\n\n## Connector specific documentation\n\n- Every connector project should have a GETTING_STARTED.md file, documenting the connector configuration and all other requirements for the connector.\n  It should be formatted like a User-Facing document and it should also provide the necessary instructions for the end user to be able to use the connector.\n  \n  Developers can follow the Generic Template available [here](https://github.com/Macrometacorp/c8connector/blob/main/GETTING_STARTED.md) and apply any necessary changes required on top of it for the specific connector.\n\n## Resolving reserved key conflicts between macrometa collection and external DB\n\n- For Source connectors:\n  Macrometa collection(document) has the following reserved keys, `_key`, `_id` and `_rev`. `_key` is the primary key and hence, it will have the value of the primary key of source data and `_id` and `_rev` are always autogenerated. So if `_key`, `_id`, `_rev` also exists in source data (assuming _key is not the primary key of source data) then these values from source data would be lost.\n  Hence we need to append an additional `_` to these reserved keys if they are present in source data. If `_key` is the primary key of the source data itself then no need to append `_` to `_key`. We should also check that the new key generated doesn't exist in the source data, If it exists then keep appending `_`.\n  \n  During the actual workflow run, this logic is implemented at the target level (macrometa-target-collection). But we also need the same to be implemented at source connector levels for `samples` and `schemas` API.\n  Refer [PR] (https://github.com/Macrometacorp/macrometa-source-postgres/)\n\n\n- For Target connectors:\n  As seen for source connector with target as (macrometa-target-collection), the same reserved keys conflict can arise in case of target connectors too, where External database might have some fixed reserved keys which might be their primary key, autogenerated or internal key. So if such reserved keys also exist in source collection then these values from source collection will be lost in the target data.\n  \n  In such cases we should first specify all the reserved keys as a list of string in the `reserved_keys` field of target connector. If there is a fixed primary key it should always be specified as the first element of the list, else if there isn't a fixed primary key but there are other reserved keys then the first element should be an empty string followed by the list of reserved keys, Example: [\"\", \"reservedkey1\", \"reservedkey2\"]. If no reserved keys exist return an empty list []. Refer [PR] (https://github.com/Macrometacorp/macrometa-target-collection/pull/9)\n\n  In addition to this, we also need to implement the logic of appending `_` to the reserved keys (only if they exist in source collection) before writing the data in the external DB at the target connector level. If `_key` is also the reserved primary key of the target external DB then no need to append `_` to this reserved primary key. We should also check that the new key generated doesn't exist in the source collection, If it exists then keep appending `_`. Refer [PR] (https://github.com/Macrometacorp/macrometa-target-collection/pull/10)\n\n  > **_NOTE:_** This is applicable only when there are certain keys reserved in the external database.\n\n\n## Adding metrics to the connectors\n- For Source connectors:\n  We support the following ingest metrics for source connectors:\n    ingested_bytes, ingested_documents, ingest_errors, ingest_lag\n  Out of the above 4 metrics, we only need to increment ingest_errors at the source connector level whenever there is an error and start the prometheus client http server at port 8000.\n  The rest of them are calculated at macrometa target collection level.\n  But to calculate ingest_lag metrics the source connector needs to send the current timestamp in `time_extracted` property of the Singer Record Message.\n  You can refer to other source connectors, for example:\n  https://github.com/Macrometacorp/macrometa-source-postgres/pull/14/files\n\n- For Target connectors:\n  We support the following ingest metrics for target connectors:\n    exported_bytes, exported_documents, export_errors, export_lag\n  Out of the above 4 metrics, exported_documents and exported_bytes are calculated at Macrometa source collection level.\n  We need to increment export_errors at the target connector level whenever there is an error and start the prometheus client http server at port 8001 and we also need to calculate export_lag.\n  `export_lag` is nothing but the time difference in seconds between the `time_extracted` property in Singer Record Message, which is sent by macrometa source collection connector, and the current timestamp (UTC timezone).\n  You can refer to other target connectors, for example:\n  https://github.com/Macrometacorp/macrometa-target-postgres/pull/31/files\n\n## State management for connectors\n  We do support state management for connectors. Please refer [here](/connector_creation_cookbook/state_management_guidelines.md) for a comprehensive guide on managing states using connectors.\n\n## Sample Connectors\n- Postgres Source Connector: [Git Repository](https://github.com/Macrometacorp/macrometa-source-postgres)\n- Oracle Source Connector: [Git Repository](https://github.com/Macrometacorp/macrometa-source-oracle)\n- C8 Collections target Connector: [Git Repository](https://github.com/Macrometacorp/macrometa-target-collection)\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "C8 Connector Interface",
    "version": "0.0.32",
    "project_urls": {
        "Bug Tracker": "https://github.com/Macrometacorp/c8connector/issues",
        "Homepage": "https://www.macrometa.com/"
    },
    "split_keywords": [
        "elt",
        "connectors",
        "workflows",
        "macrometa"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c292c473b5d369e705224ad0818b0546bf305aa2b4ee25f04647f9257e6a19b",
                "md5": "fdb6a0131f71def42837d24a7293bdbe",
                "sha256": "f377b53bc3eb7ec145847a944bfcb4a230aef33d200ef5912e376e4ec3525746"
            },
            "downloads": -1,
            "filename": "c8connector-0.0.32-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fdb6a0131f71def42837d24a7293bdbe",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.1,<3.11",
            "size": 12776,
            "upload_time": "2023-08-22T06:03:29",
            "upload_time_iso_8601": "2023-08-22T06:03:29.227505Z",
            "url": "https://files.pythonhosted.org/packages/0c/29/2c473b5d369e705224ad0818b0546bf305aa2b4ee25f04647f9257e6a19b/c8connector-0.0.32-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "02aeece314f9645c08b8d54b5e6fca62144bbf35520c9fa5200caf1cc7b710f1",
                "md5": "2aaffc9d68ce628fbef04a6b71600730",
                "sha256": "cbcc55702f3e3596f3deb6f558fbcbbde210c1d54dc886bc48d56bf4101201ad"
            },
            "downloads": -1,
            "filename": "c8connector-0.0.32.tar.gz",
            "has_sig": false,
            "md5_digest": "2aaffc9d68ce628fbef04a6b71600730",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.1,<3.11",
            "size": 12622,
            "upload_time": "2023-08-22T06:03:30",
            "upload_time_iso_8601": "2023-08-22T06:03:30.348313Z",
            "url": "https://files.pythonhosted.org/packages/02/ae/ece314f9645c08b8d54b5e6fca62144bbf35520c9fa5200caf1cc7b710f1/c8connector-0.0.32.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-08-22 06:03:30",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Macrometacorp",
    "github_project": "c8connector",
    "github_not_found": true,
    "lcname": "c8connector"
}
        
Elapsed time: 0.11047s