scify-file-reader


Namescify-file-reader JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/Jeferson-Peter/scify-file-reader
SummaryA class to handle and process multiple files with identical structures within a directory.
upload_time2023-06-18 16:39:37
maintainer
docs_urlNone
authorJeferson-Peter (Jeferson Peter)
requires_python
license
keywords python file reading multiple file handler
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# scify-file-reader

The scify-file-reader package provides a convenient class for handling multiple files with the same structure in a directory. It offers functionality to read and process data from various file types, including CSV, XLSX, Parquet, and JSON.



## Installation



You can install scify-file-reader using pip:



```shell

pip install scify-file-reader

```



## Usage



To use scify-file-reader, follow these steps:



1. Import the `FileReader` class:



```python

from scify_file_reader import FileReader

```



2. Create an instance of the FileReader class, providing the content you want to read. The content can be a string representing a `file path`, a `Path` object, or a `zipfile.ZipFile` object:

```python 

content = 'path/to/directory'

reader = FileReader(content)

```



3. Read the files using the read_files method:

```python

data = reader.read_files()

```



The `read_files` method returns a dictionary where the keys are the filenames (without the extension) and the values are pandas DataFrames containing the file data.



**For more details on the available methods and parameters, refer to the package documentation.**





## Examples:

Here's an example that demonstrates how to use scify-file-reader:



### Normal Output

```python

from scify_file_reader import FileReader



PATH = '/path/to/directory'



"""

# Supomos que temos estes arquivos dentro do nosso diretório

print(os.listdir(PATH))

# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',

        'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', 

        'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', 

"""



# Example: Reading files from a directory

reader = FileReader('/path/to/directory')

data = reader.read_files() # read_files accept kwargs from pandas read_ methods



"""

OUTPUT: print(data)

{

    'file_1.csv': <pd.DataFrame>,

    'log_2.csv': <pd.DataFrame>,

    'test_3.csv': <pd.DataFrame>,

    'file_%Y%m%d%H%M%S.csv': <pd.DataFrame>,

    'log_%Y%m%d%H%M%S.csv': <pd.DataFrame>,

    'test_%Y%m%d%H%M%S.csv': <pd.DataFrame>,

    'file_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,

    'log_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,

    'test_%Y%m%d_%H%M%S.csv': <pd.DataFrame>

}

"""



```



### Concatenating patterns:

Use this method when you need to concatenate multiple files with similar patterns into a single consolidated file.



**E.g.** In the last example, we demonstrate the use of scify-file-reader with a directory containing 9 files that follow common naming patterns, such as 'file', 'log', and 'test'. By joining these files, we can consolidate and analyze their data more effectively. Let's take a look at the example to understand how they are joined.



```python

from scify_file_reader import FileReader



PATH = '/path/to/directory'



"""

# Let's suppose we have these files inside our directory.

print(os.listdir(PATH))

# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',

        'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', 

        'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', 

"""



# Example: Reading files from a directory

reader = FileReader('/path/to/directory')

data = reader.read_files(join_prefixes=True) #



"""

OUTPUT: print(data)

{

    'file': <pd.DataFrame>,

    'log': <pd.DataFrame>,

    'test': <pd.DataFrame>,

}

"""

```



### Using a specific regular expression



In the example above, all files with common prefixes, such as `file_1.csv`, `file_%Y%m%d%H%M%S.csv`, and `file_%Y%m%d_%H%M%S.csv`, were joined together under the file key in the output.  



If you want to use a specific regular expression for filtering your files, you can follow these steps:



```python

from scify_file_reader import FileReader



PATH = '/path/to/directory'



# Example: Reading files from a directory

reader = FileReader('/path/to/directory')



regex = '<some_regex>'

reader.set_prefix_file_pattern_regex(regex)



data = reader.read_files(join_prefixes=True) 

```



By default the regular expression is `^([A-Z]+)_\d+`.



### Speficic prefixes instead of regular expressions



If you prefer to use specific prefixes instead of regular expressions, you can utilize the `join_custom_prefixes` argument. This argument accepts a tuple of prefixes that you want to join together.



```python

from scify_file_reader import FileReader



PATH = '/path/to/directory'



"""

# Supomos que temos estes arquivos dentro do nosso diretório

print(os.listdir(PATH))

# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',

        'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', 

        'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', 

"""





# Example: Reading files from a directory

reader = FileReader('/path/to/directory')



specific_prefixes = ('file', 'log', 'test')



data = reader.read_files(join_prefixes=True) 



"""

OUTPUT: print(data)

{

    'file': <pd.DataFrame>,

    'log': <pd.DataFrame>,

    'test': <pd.DataFrame>,

}

"""

```



## Contributing

Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request on the [scify-file-reader](https://github.com/Jeferson-Peter/scify-file-reader) repository.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Jeferson-Peter/scify-file-reader",
    "name": "scify-file-reader",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "Python,File Reading,Multiple File Handler",
    "author": "Jeferson-Peter (Jeferson Peter)",
    "author_email": "jeferson.peter@pm.me",
    "download_url": "https://files.pythonhosted.org/packages/83/a5/5f204d5864fa2b36537ff8b0457dbe01d43d2aedaffcc7e526d5a5be7bb1/scify-file-reader-0.0.2.tar.gz",
    "platform": null,
    "description": "\r\n# scify-file-reader\r\n\r\nThe scify-file-reader package provides a convenient class for handling multiple files with the same structure in a directory. It offers functionality to read and process data from various file types, including CSV, XLSX, Parquet, and JSON.\r\n\r\n\r\n\r\n## Installation\r\n\r\n\r\n\r\nYou can install scify-file-reader using pip:\r\n\r\n\r\n\r\n```shell\r\n\r\npip install scify-file-reader\r\n\r\n```\r\n\r\n\r\n\r\n## Usage\r\n\r\n\r\n\r\nTo use scify-file-reader, follow these steps:\r\n\r\n\r\n\r\n1. Import the `FileReader` class:\r\n\r\n\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n```\r\n\r\n\r\n\r\n2. Create an instance of the FileReader class, providing the content you want to read. The content can be a string representing a `file path`, a `Path` object, or a `zipfile.ZipFile` object:\r\n\r\n```python \r\n\r\ncontent = 'path/to/directory'\r\n\r\nreader = FileReader(content)\r\n\r\n```\r\n\r\n\r\n\r\n3. Read the files using the read_files method:\r\n\r\n```python\r\n\r\ndata = reader.read_files()\r\n\r\n```\r\n\r\n\r\n\r\nThe `read_files` method returns a dictionary where the keys are the filenames (without the extension) and the values are pandas DataFrames containing the file data.\r\n\r\n\r\n\r\n**For more details on the available methods and parameters, refer to the package documentation.**\r\n\r\n\r\n\r\n\r\n\r\n## Examples:\r\n\r\nHere's an example that demonstrates how to use scify-file-reader:\r\n\r\n\r\n\r\n### Normal Output\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n\r\n\r\nPATH = '/path/to/directory'\r\n\r\n\r\n\r\n\"\"\"\r\n\r\n# Supomos que temos estes arquivos dentro do nosso diret\u00f3rio\r\n\r\nprint(os.listdir(PATH))\r\n\r\n# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',\r\n\r\n        'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', \r\n\r\n        'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', \r\n\r\n\"\"\"\r\n\r\n\r\n\r\n# Example: Reading files from a directory\r\n\r\nreader = FileReader('/path/to/directory')\r\n\r\ndata = reader.read_files() # read_files accept kwargs from pandas read_ methods\r\n\r\n\r\n\r\n\"\"\"\r\n\r\nOUTPUT: print(data)\r\n\r\n{\r\n\r\n    'file_1.csv': <pd.DataFrame>,\r\n\r\n    'log_2.csv': <pd.DataFrame>,\r\n\r\n    'test_3.csv': <pd.DataFrame>,\r\n\r\n    'file_%Y%m%d%H%M%S.csv': <pd.DataFrame>,\r\n\r\n    'log_%Y%m%d%H%M%S.csv': <pd.DataFrame>,\r\n\r\n    'test_%Y%m%d%H%M%S.csv': <pd.DataFrame>,\r\n\r\n    'file_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,\r\n\r\n    'log_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,\r\n\r\n    'test_%Y%m%d_%H%M%S.csv': <pd.DataFrame>\r\n\r\n}\r\n\r\n\"\"\"\r\n\r\n\r\n\r\n```\r\n\r\n\r\n\r\n### Concatenating patterns:\r\n\r\nUse this method when you need to concatenate multiple files with similar patterns into a single consolidated file.\r\n\r\n\r\n\r\n**E.g.** In the last example, we demonstrate the use of scify-file-reader with a directory containing 9 files that follow common naming patterns, such as 'file', 'log', and 'test'. By joining these files, we can consolidate and analyze their data more effectively. Let's take a look at the example to understand how they are joined.\r\n\r\n\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n\r\n\r\nPATH = '/path/to/directory'\r\n\r\n\r\n\r\n\"\"\"\r\n\r\n# Let's suppose we have these files inside our directory.\r\n\r\nprint(os.listdir(PATH))\r\n\r\n# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',\r\n\r\n        'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', \r\n\r\n        'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', \r\n\r\n\"\"\"\r\n\r\n\r\n\r\n# Example: Reading files from a directory\r\n\r\nreader = FileReader('/path/to/directory')\r\n\r\ndata = reader.read_files(join_prefixes=True) #\r\n\r\n\r\n\r\n\"\"\"\r\n\r\nOUTPUT: print(data)\r\n\r\n{\r\n\r\n    'file': <pd.DataFrame>,\r\n\r\n    'log': <pd.DataFrame>,\r\n\r\n    'test': <pd.DataFrame>,\r\n\r\n}\r\n\r\n\"\"\"\r\n\r\n```\r\n\r\n\r\n\r\n### Using a specific regular expression\r\n\r\n\r\n\r\nIn the example above, all files with common prefixes, such as `file_1.csv`, `file_%Y%m%d%H%M%S.csv`, and `file_%Y%m%d_%H%M%S.csv`, were joined together under the file key in the output.  \r\n\r\n\r\n\r\nIf you want to use a specific regular expression for filtering your files, you can follow these steps:\r\n\r\n\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n\r\n\r\nPATH = '/path/to/directory'\r\n\r\n\r\n\r\n# Example: Reading files from a directory\r\n\r\nreader = FileReader('/path/to/directory')\r\n\r\n\r\n\r\nregex = '<some_regex>'\r\n\r\nreader.set_prefix_file_pattern_regex(regex)\r\n\r\n\r\n\r\ndata = reader.read_files(join_prefixes=True) \r\n\r\n```\r\n\r\n\r\n\r\nBy default the regular expression is `^([A-Z]+)_\\d+`.\r\n\r\n\r\n\r\n### Speficic prefixes instead of regular expressions\r\n\r\n\r\n\r\nIf you prefer to use specific prefixes instead of regular expressions, you can utilize the `join_custom_prefixes` argument. This argument accepts a tuple of prefixes that you want to join together.\r\n\r\n\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n\r\n\r\nPATH = '/path/to/directory'\r\n\r\n\r\n\r\n\"\"\"\r\n\r\n# Supomos que temos estes arquivos dentro do nosso diret\u00f3rio\r\n\r\nprint(os.listdir(PATH))\r\n\r\n# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',\r\n\r\n        'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', \r\n\r\n        'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', \r\n\r\n\"\"\"\r\n\r\n\r\n\r\n\r\n\r\n# Example: Reading files from a directory\r\n\r\nreader = FileReader('/path/to/directory')\r\n\r\n\r\n\r\nspecific_prefixes = ('file', 'log', 'test')\r\n\r\n\r\n\r\ndata = reader.read_files(join_prefixes=True) \r\n\r\n\r\n\r\n\"\"\"\r\n\r\nOUTPUT: print(data)\r\n\r\n{\r\n\r\n    'file': <pd.DataFrame>,\r\n\r\n    'log': <pd.DataFrame>,\r\n\r\n    'test': <pd.DataFrame>,\r\n\r\n}\r\n\r\n\"\"\"\r\n\r\n```\r\n\r\n\r\n\r\n## Contributing\r\n\r\nContributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request on the [scify-file-reader](https://github.com/Jeferson-Peter/scify-file-reader) repository.\r\n\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A class to handle and process multiple files with identical structures within a directory.",
    "version": "0.0.2",
    "project_urls": {
        "Homepage": "https://github.com/Jeferson-Peter/scify-file-reader"
    },
    "split_keywords": [
        "python",
        "file reading",
        "multiple file handler"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7bf9df411496690062caea8366e61942161b375412c600acaafcd80f2974e15b",
                "md5": "18d198d5280a51ade1458fc4961c9bf4",
                "sha256": "5d6d322f8f37f671aebac0914025c4fc9ed26950a2b2efece032a12c6ed8c219"
            },
            "downloads": -1,
            "filename": "scify_file_reader-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "18d198d5280a51ade1458fc4961c9bf4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 5325,
            "upload_time": "2023-06-18T16:39:36",
            "upload_time_iso_8601": "2023-06-18T16:39:36.192252Z",
            "url": "https://files.pythonhosted.org/packages/7b/f9/df411496690062caea8366e61942161b375412c600acaafcd80f2974e15b/scify_file_reader-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "83a55f204d5864fa2b36537ff8b0457dbe01d43d2aedaffcc7e526d5a5be7bb1",
                "md5": "5bc6f8e41df7c087e935c8d77ab45c12",
                "sha256": "fbe78a53ad765f44f6ff8caa97874f8e24a42e8968a56af786d044a7d56299fa"
            },
            "downloads": -1,
            "filename": "scify-file-reader-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "5bc6f8e41df7c087e935c8d77ab45c12",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5280,
            "upload_time": "2023-06-18T16:39:37",
            "upload_time_iso_8601": "2023-06-18T16:39:37.724468Z",
            "url": "https://files.pythonhosted.org/packages/83/a5/5f204d5864fa2b36537ff8b0457dbe01d43d2aedaffcc7e526d5a5be7bb1/scify-file-reader-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-18 16:39:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Jeferson-Peter",
    "github_project": "scify-file-reader",
    "github_not_found": true,
    "lcname": "scify-file-reader"
}
        
Elapsed time: 0.94592s