peakina


Namepeakina JSON
Version 0.15.0 PyPI version JSON
download
home_pagehttps://github.com/ToucanToco/peakina
Summarypandas readers on steroids (remote files, glob patterns, cache, etc.)
upload_time2024-11-22 15:01:43
maintainerNone
docs_urlNone
authorToucan Toco
requires_python<4.0,>=3.10
licenseBSD-3-Clause
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Pypi-v](https://img.shields.io/pypi/v/peakina.svg)](https://pypi.python.org/pypi/peakina)
[![Pypi-pyversions](https://img.shields.io/pypi/pyversions/peakina.svg)](https://pypi.python.org/pypi/peakina)
[![Pypi-l](https://img.shields.io/pypi/l/peakina.svg)](https://pypi.python.org/pypi/peakina)
[![Pypi-wheel](https://img.shields.io/pypi/wheel/peakina.svg)](https://pypi.python.org/pypi/peakina)
[![GitHub Actions](https://github.com/ToucanToco/peakina/workflows/CI/badge.svg)](https://github.com/ToucanToco/peakina/actions?query=workflow%3ACI)
[![codecov](https://codecov.io/gh/ToucanToco/peakina/branch/main/graph/badge.svg)](https://codecov.io/gh/ToucanToco/peakina)

# Pea Kina _aka 'Giant Panda'_

Wrapper around `pandas` library, which detects separator, encoding
and type of the file. It allows to get a group of files with a matching pattern (python or glob regex).
It can read both local and remote files (HTTP/HTTPS, FTP/FTPS/SFTP or S3/S3N/S3A).

The supported file types are `csv`, `excel`, `json`, `parquet` and `xml`.

:information_source: If the desired type is not yet supported, feel free to open an issue or to directly open a PR with the code !

Please, read the [documentation](https://doc-peakina.toucantoco.com) for more information

# Installation

`pip install peakina`

# Usage
Considering a file `file.csv`
```
a;b
0;0
0;1
```

Just type
```python
>>> import peakina as pk
>>> pk.read_pandas('file.csv')
   a  b
0  0  0
1  0  1
```

Or files on a FTPS server:
- my_data_2015.csv
- my_data_2016.csv
- my_data_2017.csv
- my_data_2018.csv

You can just type

```python
>>> pk.read_pandas('ftps://<path>/my_data_\\d{4}\\.csv$', match='regex', dtype={'a': 'str'})
    a   b     __filename__
0  '0'  0  'my_data_2015.csv'
1  '0'  1  'my_data_2015.csv'
2  '1'  0  'my_data_2016.csv'
3  '1'  1  'my_data_2016.csv'
4  '3'  0  'my_data_2017.csv'
5  '3'  1  'my_data_2017.csv'
6  '4'  0  'my_data_2018.csv'
7  '4'  1  'my_data_2018.csv'
```

## Using cache

You may want to keep the last result in cache, to avoid downloading and extracting the file if it didn't change:

```python
>>> from peakina.cache import Cache
>>> cache = Cache.get_cache('memory')  # in-memory cache
>>> df = pk.read_pandas('file.csv', expire=3600, cache=cache)
```

In this example, the resulting dataframe will be fetched from the cache, unless `file.csv` modification time has changed on disk, or unless the cache is older than 1 hour.

For persistent caching, use: `cache = Cache.get_cache('hdf', cache_dir='/tmp')`


## Use only downloading feature

If you just want to download a file, without converting it to a pandas dataframe:

```python
>>> uri = 'https://i.imgur.com/V9x88.jpg'
>>> f = pk.fetch(uri)
>>> f.get_str_mtime()
'2012-11-04T17:27:14Z'
>>> with f.open() as stream:
...     print('Image size:', len(stream.read()), 'bytes')
...
Image size: 60284 bytes
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ToucanToco/peakina",
    "name": "peakina",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Toucan Toco",
    "author_email": "dev@toucantoco.com",
    "download_url": "https://files.pythonhosted.org/packages/44/fd/ff9f151d751aaacba3a3d2704ae93fc44dec2bba16b7915db0e9212719c6/peakina-0.15.0.tar.gz",
    "platform": null,
    "description": "[![Pypi-v](https://img.shields.io/pypi/v/peakina.svg)](https://pypi.python.org/pypi/peakina)\n[![Pypi-pyversions](https://img.shields.io/pypi/pyversions/peakina.svg)](https://pypi.python.org/pypi/peakina)\n[![Pypi-l](https://img.shields.io/pypi/l/peakina.svg)](https://pypi.python.org/pypi/peakina)\n[![Pypi-wheel](https://img.shields.io/pypi/wheel/peakina.svg)](https://pypi.python.org/pypi/peakina)\n[![GitHub Actions](https://github.com/ToucanToco/peakina/workflows/CI/badge.svg)](https://github.com/ToucanToco/peakina/actions?query=workflow%3ACI)\n[![codecov](https://codecov.io/gh/ToucanToco/peakina/branch/main/graph/badge.svg)](https://codecov.io/gh/ToucanToco/peakina)\n\n# Pea Kina _aka 'Giant Panda'_\n\nWrapper around `pandas` library, which detects separator, encoding\nand type of the file. It allows to get a group of files with a matching pattern (python or glob regex).\nIt can read both local and remote files (HTTP/HTTPS, FTP/FTPS/SFTP or S3/S3N/S3A).\n\nThe supported file types are `csv`, `excel`, `json`, `parquet` and `xml`.\n\n:information_source: If the desired type is not yet supported, feel free to open an issue or to directly open a PR with the code !\n\nPlease, read the [documentation](https://doc-peakina.toucantoco.com) for more information\n\n# Installation\n\n`pip install peakina`\n\n# Usage\nConsidering a file `file.csv`\n```\na;b\n0;0\n0;1\n```\n\nJust type\n```python\n>>> import peakina as pk\n>>> pk.read_pandas('file.csv')\n   a  b\n0  0  0\n1  0  1\n```\n\nOr files on a FTPS server:\n- my_data_2015.csv\n- my_data_2016.csv\n- my_data_2017.csv\n- my_data_2018.csv\n\nYou can just type\n\n```python\n>>> pk.read_pandas('ftps://<path>/my_data_\\\\d{4}\\\\.csv$', match='regex', dtype={'a': 'str'})\n    a   b     __filename__\n0  '0'  0  'my_data_2015.csv'\n1  '0'  1  'my_data_2015.csv'\n2  '1'  0  'my_data_2016.csv'\n3  '1'  1  'my_data_2016.csv'\n4  '3'  0  'my_data_2017.csv'\n5  '3'  1  'my_data_2017.csv'\n6  '4'  0  'my_data_2018.csv'\n7  '4'  1  'my_data_2018.csv'\n```\n\n## Using cache\n\nYou may want to keep the last result in cache, to avoid downloading and extracting the file if it didn't change:\n\n```python\n>>> from peakina.cache import Cache\n>>> cache = Cache.get_cache('memory')  # in-memory cache\n>>> df = pk.read_pandas('file.csv', expire=3600, cache=cache)\n```\n\nIn this example, the resulting dataframe will be fetched from the cache, unless `file.csv` modification time has changed on disk, or unless the cache is older than 1 hour.\n\nFor persistent caching, use: `cache = Cache.get_cache('hdf', cache_dir='/tmp')`\n\n\n## Use only downloading feature\n\nIf you just want to download a file, without converting it to a pandas dataframe:\n\n```python\n>>> uri = 'https://i.imgur.com/V9x88.jpg'\n>>> f = pk.fetch(uri)\n>>> f.get_str_mtime()\n'2012-11-04T17:27:14Z'\n>>> with f.open() as stream:\n...     print('Image size:', len(stream.read()), 'bytes')\n...\nImage size: 60284 bytes\n```\n",
    "bugtrack_url": null,
    "license": "BSD-3-Clause",
    "summary": "pandas readers on steroids (remote files, glob patterns, cache, etc.)",
    "version": "0.15.0",
    "project_urls": {
        "Documentation": "https://toucantoco.github.io/peakina",
        "Homepage": "https://github.com/ToucanToco/peakina",
        "Repository": "https://github.com/ToucanToco/peakina"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7c68473d5fac4df0b71831f468f4777a0a09fca02ffbd48c4d38fcbe5ac9dace",
                "md5": "8a604d745e8bf84e69631d04b083290c",
                "sha256": "6f15dcf417b53a1e054f21b3a899cc7537a1e2347d343a0d0eca3ded1e3a61cd"
            },
            "downloads": -1,
            "filename": "peakina-0.15.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8a604d745e8bf84e69631d04b083290c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 27238,
            "upload_time": "2024-11-22T15:01:42",
            "upload_time_iso_8601": "2024-11-22T15:01:42.414440Z",
            "url": "https://files.pythonhosted.org/packages/7c/68/473d5fac4df0b71831f468f4777a0a09fca02ffbd48c4d38fcbe5ac9dace/peakina-0.15.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "44fdff9f151d751aaacba3a3d2704ae93fc44dec2bba16b7915db0e9212719c6",
                "md5": "368ee5fba60ba89daf3311c12594a5e0",
                "sha256": "631f68c73ac82f60b4658ac8435ce15864c53f6ac4d3c36a0d16517f51cac83d"
            },
            "downloads": -1,
            "filename": "peakina-0.15.0.tar.gz",
            "has_sig": false,
            "md5_digest": "368ee5fba60ba89daf3311c12594a5e0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 21482,
            "upload_time": "2024-11-22T15:01:43",
            "upload_time_iso_8601": "2024-11-22T15:01:43.411879Z",
            "url": "https://files.pythonhosted.org/packages/44/fd/ff9f151d751aaacba3a3d2704ae93fc44dec2bba16b7915db0e9212719c6/peakina-0.15.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-22 15:01:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ToucanToco",
    "github_project": "peakina",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "peakina"
}
        
Elapsed time: 1.05663s