data-prep-connector


Namedata-prep-connector JSON
Version 0.2.3 PyPI version JSON
download
home_pageNone
SummaryScalable and Compliant Web Crawler
upload_time2024-11-22 16:39:59
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.10
licenseApache-2.0
keywords data data acquisition crawler web crawler llm generative ai fine-tuning llmapps 0b74b5a
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # DPK Connector

DPK Connector is a scalable and compliant web crawler developed for data acquisition towards LLM development. It is built on [Scrapy](https://scrapy.org/).
For more details read [the documentation](doc/overview.md).

## Virtual Environment

The project uses `pyproject.toml` and a Makefile for operations.
To do development you should establish the virtual environment
```shell
make venv
```
and then either activate
```shell
source venv/bin/activate
```
or set up your IDE to use the venv directory when developing in this project

## Library Artifact Build and Publish

To test, build and publish the library
```shell
make test build publish
```

To up the version number, edit the Makefile to change VERSION and rerun the above. This will require committing both the `Makefile` and the autotmatically updated `pyproject.toml` file.

## How to use

See [the overview](doc/overview.md).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "data-prep-connector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "data, data acquisition, crawler, web crawler, llm, generative, ai, fine-tuning, llmapps, 0b74b5a",
    "author": null,
    "author_email": "Hiroya Matsubara <hmtbr@jp.ibm.com>",
    "download_url": "https://files.pythonhosted.org/packages/a3/42/ffab9886bf250b7229041d43715aa38cbcbe06a1cb853b3c89c291812229/data_prep_connector-0.2.3.tar.gz",
    "platform": null,
    "description": "# DPK Connector\n\nDPK Connector is a scalable and compliant web crawler developed for data acquisition towards LLM development. It is built on [Scrapy](https://scrapy.org/).\nFor more details read [the documentation](doc/overview.md).\n\n## Virtual Environment\n\nThe project uses `pyproject.toml` and a Makefile for operations.\nTo do development you should establish the virtual environment\n```shell\nmake venv\n```\nand then either activate\n```shell\nsource venv/bin/activate\n```\nor set up your IDE to use the venv directory when developing in this project\n\n## Library Artifact Build and Publish\n\nTo test, build and publish the library\n```shell\nmake test build publish\n```\n\nTo up the version number, edit the Makefile to change VERSION and rerun the above. This will require committing both the `Makefile` and the autotmatically updated `pyproject.toml` file.\n\n## How to use\n\nSee [the overview](doc/overview.md).\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Scalable and Compliant Web Crawler",
    "version": "0.2.3",
    "project_urls": null,
    "split_keywords": [
        "data",
        " data acquisition",
        " crawler",
        " web crawler",
        " llm",
        " generative",
        " ai",
        " fine-tuning",
        " llmapps",
        " 0b74b5a"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2a1ae85db1df01f9514da649b5a4fbc4b70cf1e3dc73d21c26fae7fba8f3cd74",
                "md5": "335a041c64aa08a5ef076f944024a967",
                "sha256": "49ba5cc9057ac0b8e1ca9cb1d32c05595fee92d1aa8fe7aa9e5608783e915eee"
            },
            "downloads": -1,
            "filename": "data_prep_connector-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "335a041c64aa08a5ef076f944024a967",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 17167,
            "upload_time": "2024-11-22T16:39:57",
            "upload_time_iso_8601": "2024-11-22T16:39:57.164937Z",
            "url": "https://files.pythonhosted.org/packages/2a/1a/e85db1df01f9514da649b5a4fbc4b70cf1e3dc73d21c26fae7fba8f3cd74/data_prep_connector-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a342ffab9886bf250b7229041d43715aa38cbcbe06a1cb853b3c89c291812229",
                "md5": "3a24ee3fb1fe9b9d3b88eeacabf0a010",
                "sha256": "5ee6cf9e494b3e23c6197b5bd1654d198863e97b0243de8511733febf1685b07"
            },
            "downloads": -1,
            "filename": "data_prep_connector-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "3a24ee3fb1fe9b9d3b88eeacabf0a010",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.10",
            "size": 16071,
            "upload_time": "2024-11-22T16:39:59",
            "upload_time_iso_8601": "2024-11-22T16:39:59.043061Z",
            "url": "https://files.pythonhosted.org/packages/a3/42/ffab9886bf250b7229041d43715aa38cbcbe06a1cb853b3c89c291812229/data_prep_connector-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-22 16:39:59",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "data-prep-connector"
}
        
Elapsed time: 0.39135s