pydtc


Namepydtc JSON
Version 0.7.0 PyPI version JSON
download
home_pagehttps://github.com/cctester/pydtc
Summarytools collection for data engineer
upload_time2024-09-12 20:07:43
maintainerNone
docs_urlNone
authorcctester
requires_python>=3.5
licenseNone
keywords pandas multiprocessing database restapi requests
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            This pacakge provides various tools to perform task on data, in easy and efficient manner; more
modules could be added into the tools collection with development.

1. universal way to connect most database softwares via JDBC (include kerberos auth for Hive), using Fast/Batch load
technology to speed up the temporary table creation and query; as well as functions to convert clob 
into string or save the blob into specified file. 

2. add multiprocessing capablity to pandas dataframe when dealing with cpu intensive
operation on large volume data.

3. form based authentication module for requests package.

4. restapi client using aiohttp package with retry function.

sample usage:

    ## connect to mysql
        import pydtc

        conn = pydtc.connect('mysql', '127.0.0.1', 'user', 'pass')
        pydtc.read_sql('select * from demo.sample', conn)
        conn.close()
    
    ### or use with clause for auto close
        with pydtc.connect('mysql', '127.0.0.1', 'user', 'pass') as conn:
            conn.read_sql('select * from demo.sample')
            # pydtc.read_sql('select * from demo.sample', conn)

        ## DBAPI 2.0    
        with pydtc.connect_dbapi('mysql', '127.0.0.1', 'user', 'pass') as conn:
            pd.read_sql('select * from demo.sample', conn)

    ## pandas multiprocessing groupby then apply
        def func(df, key, value):
            dd = {key : value}
            dd['some_key'] = [len(df.other_key)]

            return pd.DataFrame(dd)

        new_df = pydtc.p_groupby_apply(func, df, 'group_key')

    ## access web page in website with form based authenticaion
        from pydtc import HttpFormAuth
        import requests

        r = requests.get('http://www.example.com/private_webpage.html', auth=HttpFormAuth('user', 'password'))

    ## restapi get and update
    # Fake Online REST API for Testing and Prototyping
    # https://jsonplaceholder.typicode.com/
        from pydtc import api_get, api_update

        api_get('https://jsonplaceholder.typicode.com/todos/1')
        # or
        api_update('https://jsonplaceholder.typicode.com/todos/1', data={'title': 'foo'}, method='patch')
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cctester/pydtc",
    "name": "pydtc",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": null,
    "keywords": "pandas, multiprocessing, database, restapi, requests",
    "author": "cctester",
    "author_email": "cctester2001@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/62/9c/ddd1baf1e9172447d1b3650e49e5d337816fe98e523ba9363a5ccde316f9/pydtc-0.7.0.tar.gz",
    "platform": null,
    "description": "This pacakge provides various tools to perform task on data, in easy and efficient manner; more\nmodules could be added into the tools collection with development.\n\n1. universal way to connect most database softwares via JDBC (include kerberos auth for Hive), using Fast/Batch load\ntechnology to speed up the temporary table creation and query; as well as functions to convert clob \ninto string or save the blob into specified file. \n\n2. add multiprocessing capablity to pandas dataframe when dealing with cpu intensive\noperation on large volume data.\n\n3. form based authentication module for requests package.\n\n4. restapi client using aiohttp package with retry function.\n\nsample usage:\n\n    ## connect to mysql\n        import pydtc\n\n        conn = pydtc.connect('mysql', '127.0.0.1', 'user', 'pass')\n        pydtc.read_sql('select * from demo.sample', conn)\n        conn.close()\n    \n    ### or use with clause for auto close\n        with pydtc.connect('mysql', '127.0.0.1', 'user', 'pass') as conn:\n            conn.read_sql('select * from demo.sample')\n            # pydtc.read_sql('select * from demo.sample', conn)\n\n        ## DBAPI 2.0    \n        with pydtc.connect_dbapi('mysql', '127.0.0.1', 'user', 'pass') as conn:\n            pd.read_sql('select * from demo.sample', conn)\n\n    ## pandas multiprocessing groupby then apply\n        def func(df, key, value):\n            dd = {key : value}\n            dd['some_key'] = [len(df.other_key)]\n\n            return pd.DataFrame(dd)\n\n        new_df = pydtc.p_groupby_apply(func, df, 'group_key')\n\n    ## access web page in website with form based authenticaion\n        from pydtc import HttpFormAuth\n        import requests\n\n        r = requests.get('http://www.example.com/private_webpage.html', auth=HttpFormAuth('user', 'password'))\n\n    ## restapi get and update\n    # Fake Online REST API for Testing and Prototyping\n    # https://jsonplaceholder.typicode.com/\n        from pydtc import api_get, api_update\n\n        api_get('https://jsonplaceholder.typicode.com/todos/1')\n        # or\n        api_update('https://jsonplaceholder.typicode.com/todos/1', data={'title': 'foo'}, method='patch')",
    "bugtrack_url": null,
    "license": null,
    "summary": "tools collection for data engineer",
    "version": "0.7.0",
    "project_urls": {
        "Homepage": "https://github.com/cctester/pydtc"
    },
    "split_keywords": [
        "pandas",
        " multiprocessing",
        " database",
        " restapi",
        " requests"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "629cddd1baf1e9172447d1b3650e49e5d337816fe98e523ba9363a5ccde316f9",
                "md5": "440727811329932e86791bc2817d4753",
                "sha256": "6f9c86e1713fc6ad7cf49a5c9aef6680bbd22752df575d92ea86c39bf5bea844"
            },
            "downloads": -1,
            "filename": "pydtc-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "440727811329932e86791bc2817d4753",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 10332,
            "upload_time": "2024-09-12T20:07:43",
            "upload_time_iso_8601": "2024-09-12T20:07:43.774224Z",
            "url": "https://files.pythonhosted.org/packages/62/9c/ddd1baf1e9172447d1b3650e49e5d337816fe98e523ba9363a5ccde316f9/pydtc-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-12 20:07:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cctester",
    "github_project": "pydtc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pydtc"
}
        
Elapsed time: 0.37542s