odd-collector-sdk


Nameodd-collector-sdk JSON
Version 0.3.58 PyPI version JSON
download
home_pagehttps://github.com/opendatadiscovery/odd-collector-sdk
SummaryODD Collector
upload_time2024-04-23 18:19:26
maintainerNone
docs_urlNone
authorOpen Data Discovery
requires_python<4.0,>=3.9
licenseApache-2.0
keywords odd-collector-sdk odd_collector_sdk opendatadiscovery
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI version](https://badge.fury.io/py/odd-collector-sdk.svg)](https://badge.fury.io/py/odd-collector-sdk)

# ODD Collector SDK
Root project for ODD collectors

### Domain
* `CollectorConfig`

    _Main config file for collector_
    ``` python
    class CollectorConfig(pydantic.BaseSettings):
        default_pulling_interval: int # pulling interval in minutes
        token: str                    # token for requests to odd-platform
        plugins: Any
        platform_host_url: str
    ```

* `Collector`

    Args:

    `config_path`: str - path to collector_config.yaml (i.e. `'/collector_config.yaml'`)

    `root_package`: str - root package for adapters which will be loaded (i.e. `'my_collector.adapters'`)

    `plugins_union_type` - Type variable for pydantic model.

* `Plugin`

  Is a config for adapter
  ```python
  class Plugin(pydantic.BaseSettings):
    name: str
    description: Optional[str] = None
    namespace: Optional[str] = None
  ```

  Plugin class inherited from Pydantic's BaseSetting,it means it can take any field, which was skipped in `collector_config.yaml`, from env variables.

  Field `type: Literal["custom_adapter"]`  is obligatory for each plugin, by convention literal **MUST** have same name with adapter package

  Plugins example:
  ```python
    # plugins.py
    class AwsPlugin(Plugin):
        aws_secret_access_key: str
        aws_access_key_id: str
        aws_region: str
    
    class S3Plugin(AwsPlugin):
        type: Literal["s3"]
        buckets: Optional[List[str]] = []

    class GluePlugin(AwsPlugin):
        type: Literal["glue"]
    
    # For Collector's plugins_union_type argument
    AvailablePlugin = Annotated[
        Union[
            GluePlugin,
            S3Plugin,
        ],
        pydantic.Field(discriminator="type"),
    ]
  ```
* AbstractAdapter
    Abstract adapter which **MUST** be implemented by generic adapters

## Collector example

### Requirenments
Use the package manager [poetry](https://python-poetry.org/) to install add odd-collector-sdk and asyncio.
```bash
poetry add odd-collector-sdk
```

### A typical top-level collector's directory layout (as an example we took poetry project)

    .
    ├── my_collector            
    │   ├── adapters            # Adapters
    │   │   ├── custom_adapter  # Some adapter package
    │   │   │   ├── adapter.py  # Entry file for adapter
    │   │   │   └── __init__.py
    │   │   ├── other_custom_adapter
    │   │   ├── ...             # Other adapters
    │   │   └── __init__.py
    │   ├── domain              # Domain models
    │   │   ├── ...
    │   │   ├── plugins.py      # Models for available plugins
    │   │   └── __init__.py
    │   ├── __init__.py         
    │   └── __main__.py         # Entry file for collector
    ├── ...
    ├── collector_config.yaml
    ├── pyproject.toml
    ├── LICENSE
    └── README.md



### Adapters folder
Each adapter inside adapters folder must have an `adapter.py` file with an `Adapter` class implementing `AbstractAdapter`
```python
    # custom_adapter/adapter.py example
    from odd_collector_sdk.domain.adapter import AbstractAdapter
    from odd_models.models import DataEntityList

    # 
    class Adapter(AbstractAdapter):
        def __init__(self, config: any) -> None:
            super().__init__()

        def get_data_entity_list(self) -> DataEntityList:
            return DataEntityList(data_source_oddrn="test")

        def get_data_source_oddrn(self) -> str:
            return "oddrn"
```

### Plugins
Each plugin must implement `Plugin` class from sdk
```python
    # domain/plugins.py
    from typing import Literal, Union
    from typing_extensions import Annotated

    import pydantic
    from odd_collector_sdk.domain.plugin import Plugin

    class CustomPlugin(Plugin):
        type: Literal["custom_adapter"]


    class OtherCustomPlugin(Plugin):
        type: Literal["other_custom_adapter"]

    # Needs this type variable for Collector initialization
    AvailablePlugins = Annotated[
        Union[CustomPlugin, OtherCustomPlugin],
        pydantic.Field(discriminator="type"),
    ]
```

### collector_config.yaml

```yaml
default_pulling_interval: 10 
token: "" 
platform_host_url: "http://localhost:8080" 
plugins:
  - type: custom_adapter
    name: custom_adapter_name
  - type: other_custom_adapter
    name: other_custom_adapter_name

```

## Usage
```python
# __main__.py

import asyncio
import logging
from os import path


from odd_collector_sdk.collector import Collector

# Union type of avalable plugins
from my_collector.domain.plugins import AvailablePlugins

logging.basicConfig(
    level=logging.INFO, format="[%(asctime)s] %(levelname)s in %(module)s: %(message)s"
)

try:
    cur_dirname = path.dirname(path.realpath(__file__))
    config_path = path.join(cur_dirname, "../collector_config.yaml")
    root_package = "my_collector.adapters"

    loop = asyncio.get_event_loop()

    collector = Collector(config_path, root_package, AvailablePlugin)

    loop.run_until_complete(collector.register_data_sources())

    collector.start_polling()
    loop.run_forever()
except Exception as e:
    logging.error(e, exc_info=True)
    loop.stop()
```

And run
```bash
poetry run python -m my_collector
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/opendatadiscovery/odd-collector-sdk",
    "name": "odd-collector-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "odd-collector-sdk, odd_collector_sdk, opendatadiscovery",
    "author": "Open Data Discovery",
    "author_email": "pypi@opendatadiscovery.org",
    "download_url": "https://files.pythonhosted.org/packages/e6/89/86b8fa3977279c936e98e9b16390d5337f7e36b175659a0c134139596df8/odd_collector_sdk-0.3.58.tar.gz",
    "platform": null,
    "description": "[![PyPI version](https://badge.fury.io/py/odd-collector-sdk.svg)](https://badge.fury.io/py/odd-collector-sdk)\n\n# ODD Collector SDK\nRoot project for ODD collectors\n\n### Domain\n* `CollectorConfig`\n\n    _Main config file for collector_\n    ``` python\n    class CollectorConfig(pydantic.BaseSettings):\n        default_pulling_interval: int # pulling interval in minutes\n        token: str                    # token for requests to odd-platform\n        plugins: Any\n        platform_host_url: str\n    ```\n\n* `Collector`\n\n    Args:\n\n    `config_path`: str - path to collector_config.yaml (i.e. `'/collector_config.yaml'`)\n\n    `root_package`: str - root package for adapters which will be loaded (i.e. `'my_collector.adapters'`)\n\n    `plugins_union_type` - Type variable for pydantic model.\n\n* `Plugin`\n\n  Is a config for adapter\n  ```python\n  class Plugin(pydantic.BaseSettings):\n    name: str\n    description: Optional[str] = None\n    namespace: Optional[str] = None\n  ```\n\n  Plugin class inherited from Pydantic's BaseSetting,it means it can take any field, which was skipped in `collector_config.yaml`, from env variables.\n\n  Field `type: Literal[\"custom_adapter\"]`  is obligatory for each plugin, by convention literal **MUST** have same name with adapter package\n\n  Plugins example:\n  ```python\n    # plugins.py\n    class AwsPlugin(Plugin):\n        aws_secret_access_key: str\n        aws_access_key_id: str\n        aws_region: str\n    \n    class S3Plugin(AwsPlugin):\n        type: Literal[\"s3\"]\n        buckets: Optional[List[str]] = []\n\n    class GluePlugin(AwsPlugin):\n        type: Literal[\"glue\"]\n    \n    # For Collector's plugins_union_type argument\n    AvailablePlugin = Annotated[\n        Union[\n            GluePlugin,\n            S3Plugin,\n        ],\n        pydantic.Field(discriminator=\"type\"),\n    ]\n  ```\n* AbstractAdapter\n    Abstract adapter which **MUST** be implemented by generic adapters\n\n## Collector example\n\n### Requirenments\nUse the package manager [poetry](https://python-poetry.org/) to install add odd-collector-sdk and asyncio.\n```bash\npoetry add odd-collector-sdk\n```\n\n### A typical top-level collector's directory layout (as an example we took poetry project)\n\n    .\n    \u251c\u2500\u2500 my_collector            \n    \u2502   \u251c\u2500\u2500 adapters            # Adapters\n    \u2502   \u2502   \u251c\u2500\u2500 custom_adapter  # Some adapter package\n    \u2502   \u2502   \u2502   \u251c\u2500\u2500 adapter.py  # Entry file for adapter\n    \u2502   \u2502   \u2502   \u2514\u2500\u2500 __init__.py\n    \u2502   \u2502   \u251c\u2500\u2500 other_custom_adapter\n    \u2502   \u2502   \u251c\u2500\u2500 ...             # Other adapters\n    \u2502   \u2502   \u2514\u2500\u2500 __init__.py\n    \u2502   \u251c\u2500\u2500 domain              # Domain models\n    \u2502   \u2502   \u251c\u2500\u2500 ...\n    \u2502   \u2502   \u251c\u2500\u2500 plugins.py      # Models for available plugins\n    \u2502   \u2502   \u2514\u2500\u2500 __init__.py\n    \u2502   \u251c\u2500\u2500 __init__.py         \n    \u2502   \u2514\u2500\u2500 __main__.py         # Entry file for collector\n    \u251c\u2500\u2500 ...\n    \u251c\u2500\u2500 collector_config.yaml\n    \u251c\u2500\u2500 pyproject.toml\n    \u251c\u2500\u2500 LICENSE\n    \u2514\u2500\u2500 README.md\n\n\n\n### Adapters folder\nEach adapter inside adapters folder must have an `adapter.py` file with an `Adapter` class implementing `AbstractAdapter`\n```python\n    # custom_adapter/adapter.py example\n    from odd_collector_sdk.domain.adapter import AbstractAdapter\n    from odd_models.models import DataEntityList\n\n    # \n    class Adapter(AbstractAdapter):\n        def __init__(self, config: any) -> None:\n            super().__init__()\n\n        def get_data_entity_list(self) -> DataEntityList:\n            return DataEntityList(data_source_oddrn=\"test\")\n\n        def get_data_source_oddrn(self) -> str:\n            return \"oddrn\"\n```\n\n### Plugins\nEach plugin must implement `Plugin` class from sdk\n```python\n    # domain/plugins.py\n    from typing import Literal, Union\n    from typing_extensions import Annotated\n\n    import pydantic\n    from odd_collector_sdk.domain.plugin import Plugin\n\n    class CustomPlugin(Plugin):\n        type: Literal[\"custom_adapter\"]\n\n\n    class OtherCustomPlugin(Plugin):\n        type: Literal[\"other_custom_adapter\"]\n\n    # Needs this type variable for Collector initialization\n    AvailablePlugins = Annotated[\n        Union[CustomPlugin, OtherCustomPlugin],\n        pydantic.Field(discriminator=\"type\"),\n    ]\n```\n\n### collector_config.yaml\n\n```yaml\ndefault_pulling_interval: 10 \ntoken: \"\" \nplatform_host_url: \"http://localhost:8080\" \nplugins:\n  - type: custom_adapter\n    name: custom_adapter_name\n  - type: other_custom_adapter\n    name: other_custom_adapter_name\n\n```\n\n## Usage\n```python\n# __main__.py\n\nimport asyncio\nimport logging\nfrom os import path\n\n\nfrom odd_collector_sdk.collector import Collector\n\n# Union type of avalable plugins\nfrom my_collector.domain.plugins import AvailablePlugins\n\nlogging.basicConfig(\n    level=logging.INFO, format=\"[%(asctime)s] %(levelname)s in %(module)s: %(message)s\"\n)\n\ntry:\n    cur_dirname = path.dirname(path.realpath(__file__))\n    config_path = path.join(cur_dirname, \"../collector_config.yaml\")\n    root_package = \"my_collector.adapters\"\n\n    loop = asyncio.get_event_loop()\n\n    collector = Collector(config_path, root_package, AvailablePlugin)\n\n    loop.run_until_complete(collector.register_data_sources())\n\n    collector.start_polling()\n    loop.run_forever()\nexcept Exception as e:\n    logging.error(e, exc_info=True)\n    loop.stop()\n```\n\nAnd run\n```bash\npoetry run python -m my_collector\n```\n\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "ODD Collector",
    "version": "0.3.58",
    "project_urls": {
        "Homepage": "https://github.com/opendatadiscovery/odd-collector-sdk",
        "Repository": "https://github.com/opendatadiscovery/odd-collector-sdk"
    },
    "split_keywords": [
        "odd-collector-sdk",
        " odd_collector_sdk",
        " opendatadiscovery"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "63d4892ba3da44981fdd15654c1208396c3d808b59d8cddc3f3965bc6a63c6b1",
                "md5": "edaa93835f39c45bf32aae58bcd4ba7c",
                "sha256": "bcc6a7e7bf9908e52e50f9fbc15498ef97bb1946f38f46dead5d656c3ae39ecc"
            },
            "downloads": -1,
            "filename": "odd_collector_sdk-0.3.58-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "edaa93835f39c45bf32aae58bcd4ba7c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 32743,
            "upload_time": "2024-04-23T18:19:24",
            "upload_time_iso_8601": "2024-04-23T18:19:24.202368Z",
            "url": "https://files.pythonhosted.org/packages/63/d4/892ba3da44981fdd15654c1208396c3d808b59d8cddc3f3965bc6a63c6b1/odd_collector_sdk-0.3.58-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e68986b8fa3977279c936e98e9b16390d5337f7e36b175659a0c134139596df8",
                "md5": "ec55698540f5e4be6dc56bacf6a87079",
                "sha256": "b25272ca5d8291676d2bc7a423de5a4b7f416d744268d5ddbe8410c3cb1f945b"
            },
            "downloads": -1,
            "filename": "odd_collector_sdk-0.3.58.tar.gz",
            "has_sig": false,
            "md5_digest": "ec55698540f5e4be6dc56bacf6a87079",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 32556,
            "upload_time": "2024-04-23T18:19:26",
            "upload_time_iso_8601": "2024-04-23T18:19:26.369538Z",
            "url": "https://files.pythonhosted.org/packages/e6/89/86b8fa3977279c936e98e9b16390d5337f7e36b175659a0c134139596df8/odd_collector_sdk-0.3.58.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-23 18:19:26",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "opendatadiscovery",
    "github_project": "odd-collector-sdk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "odd-collector-sdk"
}
        
Elapsed time: 0.24394s