# scify-file-reader
The scify-file-reader package provides a convenient class for handling multiple files with the same structure in a directory. It offers functionality to read and process data from various file types, including CSV, XLSX, Parquet, and JSON.
## Installation
You can install scify-file-reader using pip:
```shell
pip install scify-file-reader
```
## Usage
To use scify-file-reader, follow these steps:
1. Import the `FileReader` class:
```python
from scify_file_reader import FileReader
```
2. Create an instance of the FileReader class, providing the content you want to read. The content can be a string representing a `file path`, a `Path` object, or a `zipfile.ZipFile` object:
```python
content = 'path/to/directory'
reader = FileReader(content)
```
3. Read the files using the read_files method:
```python
data = reader.read_files()
```
The `read_files` method returns a dictionary where the keys are the filenames (without the extension) and the values are pandas DataFrames containing the file data.
**For more details on the available methods and parameters, refer to the package documentation.**
## Examples:
Here's an example that demonstrates how to use scify-file-reader:
### Normal Output
```python
from scify_file_reader import FileReader
PATH = '/path/to/directory'
"""
# Supomos que temos estes arquivos dentro do nosso diretório
print(os.listdir(PATH))
# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',
'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv',
'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv',
"""
# Example: Reading files from a directory
reader = FileReader('/path/to/directory')
data = reader.read_files() # read_files accept kwargs from pandas read_ methods
"""
OUTPUT: print(data)
{
'file_1.csv': <pd.DataFrame>,
'log_2.csv': <pd.DataFrame>,
'test_3.csv': <pd.DataFrame>,
'file_%Y%m%d%H%M%S.csv': <pd.DataFrame>,
'log_%Y%m%d%H%M%S.csv': <pd.DataFrame>,
'test_%Y%m%d%H%M%S.csv': <pd.DataFrame>,
'file_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,
'log_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,
'test_%Y%m%d_%H%M%S.csv': <pd.DataFrame>
}
"""
```
### Concatenating patterns:
Use this method when you need to concatenate multiple files with similar patterns into a single consolidated file.
**E.g.** In the last example, we demonstrate the use of scify-file-reader with a directory containing 9 files that follow common naming patterns, such as 'file', 'log', and 'test'. By joining these files, we can consolidate and analyze their data more effectively. Let's take a look at the example to understand how they are joined.
```python
from scify_file_reader import FileReader
PATH = '/path/to/directory'
"""
# Let's suppose we have these files inside our directory.
print(os.listdir(PATH))
# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',
'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv',
'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv',
"""
# Example: Reading files from a directory
reader = FileReader('/path/to/directory')
data = reader.read_files(join_prefixes=True) #
"""
OUTPUT: print(data)
{
'file': <pd.DataFrame>,
'log': <pd.DataFrame>,
'test': <pd.DataFrame>,
}
"""
```
### Using a specific regular expression
In the example above, all files with common prefixes, such as `file_1.csv`, `file_%Y%m%d%H%M%S.csv`, and `file_%Y%m%d_%H%M%S.csv`, were joined together under the file key in the output.
If you want to use a specific regular expression for filtering your files, you can follow these steps:
```python
from scify_file_reader import FileReader
PATH = '/path/to/directory'
# Example: Reading files from a directory
reader = FileReader('/path/to/directory')
regex = '<some_regex>'
reader.set_prefix_file_pattern_regex(regex)
data = reader.read_files(join_prefixes=True)
```
By default the regular expression is `^([A-Z]+)_\d+`.
### Speficic prefixes instead of regular expressions
If you prefer to use specific prefixes instead of regular expressions, you can utilize the `join_custom_prefixes` argument. This argument accepts a tuple of prefixes that you want to join together.
```python
from scify_file_reader import FileReader
PATH = '/path/to/directory'
"""
# Supomos que temos estes arquivos dentro do nosso diretório
print(os.listdir(PATH))
# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',
'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv',
'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv',
"""
# Example: Reading files from a directory
reader = FileReader('/path/to/directory')
specific_prefixes = ('file', 'log', 'test')
data = reader.read_files(join_prefixes=True)
"""
OUTPUT: print(data)
{
'file': <pd.DataFrame>,
'log': <pd.DataFrame>,
'test': <pd.DataFrame>,
}
"""
```
## Contributing
Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request on the [scify-file-reader](https://github.com/Jeferson-Peter/scify-file-reader) repository.
Raw data
{
"_id": null,
"home_page": "https://github.com/Jeferson-Peter/scify-file-reader",
"name": "scify-file-reader",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Python,File Reading,Multiple File Handler",
"author": "Jeferson-Peter (Jeferson Peter)",
"author_email": "jeferson.peter@pm.me",
"download_url": "https://files.pythonhosted.org/packages/83/a5/5f204d5864fa2b36537ff8b0457dbe01d43d2aedaffcc7e526d5a5be7bb1/scify-file-reader-0.0.2.tar.gz",
"platform": null,
"description": "\r\n# scify-file-reader\r\n\r\nThe scify-file-reader package provides a convenient class for handling multiple files with the same structure in a directory. It offers functionality to read and process data from various file types, including CSV, XLSX, Parquet, and JSON.\r\n\r\n\r\n\r\n## Installation\r\n\r\n\r\n\r\nYou can install scify-file-reader using pip:\r\n\r\n\r\n\r\n```shell\r\n\r\npip install scify-file-reader\r\n\r\n```\r\n\r\n\r\n\r\n## Usage\r\n\r\n\r\n\r\nTo use scify-file-reader, follow these steps:\r\n\r\n\r\n\r\n1. Import the `FileReader` class:\r\n\r\n\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n```\r\n\r\n\r\n\r\n2. Create an instance of the FileReader class, providing the content you want to read. The content can be a string representing a `file path`, a `Path` object, or a `zipfile.ZipFile` object:\r\n\r\n```python \r\n\r\ncontent = 'path/to/directory'\r\n\r\nreader = FileReader(content)\r\n\r\n```\r\n\r\n\r\n\r\n3. Read the files using the read_files method:\r\n\r\n```python\r\n\r\ndata = reader.read_files()\r\n\r\n```\r\n\r\n\r\n\r\nThe `read_files` method returns a dictionary where the keys are the filenames (without the extension) and the values are pandas DataFrames containing the file data.\r\n\r\n\r\n\r\n**For more details on the available methods and parameters, refer to the package documentation.**\r\n\r\n\r\n\r\n\r\n\r\n## Examples:\r\n\r\nHere's an example that demonstrates how to use scify-file-reader:\r\n\r\n\r\n\r\n### Normal Output\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n\r\n\r\nPATH = '/path/to/directory'\r\n\r\n\r\n\r\n\"\"\"\r\n\r\n# Supomos que temos estes arquivos dentro do nosso diret\u00f3rio\r\n\r\nprint(os.listdir(PATH))\r\n\r\n# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',\r\n\r\n 'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', \r\n\r\n 'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', \r\n\r\n\"\"\"\r\n\r\n\r\n\r\n# Example: Reading files from a directory\r\n\r\nreader = FileReader('/path/to/directory')\r\n\r\ndata = reader.read_files() # read_files accept kwargs from pandas read_ methods\r\n\r\n\r\n\r\n\"\"\"\r\n\r\nOUTPUT: print(data)\r\n\r\n{\r\n\r\n 'file_1.csv': <pd.DataFrame>,\r\n\r\n 'log_2.csv': <pd.DataFrame>,\r\n\r\n 'test_3.csv': <pd.DataFrame>,\r\n\r\n 'file_%Y%m%d%H%M%S.csv': <pd.DataFrame>,\r\n\r\n 'log_%Y%m%d%H%M%S.csv': <pd.DataFrame>,\r\n\r\n 'test_%Y%m%d%H%M%S.csv': <pd.DataFrame>,\r\n\r\n 'file_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,\r\n\r\n 'log_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,\r\n\r\n 'test_%Y%m%d_%H%M%S.csv': <pd.DataFrame>\r\n\r\n}\r\n\r\n\"\"\"\r\n\r\n\r\n\r\n```\r\n\r\n\r\n\r\n### Concatenating patterns:\r\n\r\nUse this method when you need to concatenate multiple files with similar patterns into a single consolidated file.\r\n\r\n\r\n\r\n**E.g.** In the last example, we demonstrate the use of scify-file-reader with a directory containing 9 files that follow common naming patterns, such as 'file', 'log', and 'test'. By joining these files, we can consolidate and analyze their data more effectively. Let's take a look at the example to understand how they are joined.\r\n\r\n\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n\r\n\r\nPATH = '/path/to/directory'\r\n\r\n\r\n\r\n\"\"\"\r\n\r\n# Let's suppose we have these files inside our directory.\r\n\r\nprint(os.listdir(PATH))\r\n\r\n# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',\r\n\r\n 'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', \r\n\r\n 'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', \r\n\r\n\"\"\"\r\n\r\n\r\n\r\n# Example: Reading files from a directory\r\n\r\nreader = FileReader('/path/to/directory')\r\n\r\ndata = reader.read_files(join_prefixes=True) #\r\n\r\n\r\n\r\n\"\"\"\r\n\r\nOUTPUT: print(data)\r\n\r\n{\r\n\r\n 'file': <pd.DataFrame>,\r\n\r\n 'log': <pd.DataFrame>,\r\n\r\n 'test': <pd.DataFrame>,\r\n\r\n}\r\n\r\n\"\"\"\r\n\r\n```\r\n\r\n\r\n\r\n### Using a specific regular expression\r\n\r\n\r\n\r\nIn the example above, all files with common prefixes, such as `file_1.csv`, `file_%Y%m%d%H%M%S.csv`, and `file_%Y%m%d_%H%M%S.csv`, were joined together under the file key in the output. \r\n\r\n\r\n\r\nIf you want to use a specific regular expression for filtering your files, you can follow these steps:\r\n\r\n\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n\r\n\r\nPATH = '/path/to/directory'\r\n\r\n\r\n\r\n# Example: Reading files from a directory\r\n\r\nreader = FileReader('/path/to/directory')\r\n\r\n\r\n\r\nregex = '<some_regex>'\r\n\r\nreader.set_prefix_file_pattern_regex(regex)\r\n\r\n\r\n\r\ndata = reader.read_files(join_prefixes=True) \r\n\r\n```\r\n\r\n\r\n\r\nBy default the regular expression is `^([A-Z]+)_\\d+`.\r\n\r\n\r\n\r\n### Speficic prefixes instead of regular expressions\r\n\r\n\r\n\r\nIf you prefer to use specific prefixes instead of regular expressions, you can utilize the `join_custom_prefixes` argument. This argument accepts a tuple of prefixes that you want to join together.\r\n\r\n\r\n\r\n```python\r\n\r\nfrom scify_file_reader import FileReader\r\n\r\n\r\n\r\nPATH = '/path/to/directory'\r\n\r\n\r\n\r\n\"\"\"\r\n\r\n# Supomos que temos estes arquivos dentro do nosso diret\u00f3rio\r\n\r\nprint(os.listdir(PATH))\r\n\r\n# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',\r\n\r\n 'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', \r\n\r\n 'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', \r\n\r\n\"\"\"\r\n\r\n\r\n\r\n\r\n\r\n# Example: Reading files from a directory\r\n\r\nreader = FileReader('/path/to/directory')\r\n\r\n\r\n\r\nspecific_prefixes = ('file', 'log', 'test')\r\n\r\n\r\n\r\ndata = reader.read_files(join_prefixes=True) \r\n\r\n\r\n\r\n\"\"\"\r\n\r\nOUTPUT: print(data)\r\n\r\n{\r\n\r\n 'file': <pd.DataFrame>,\r\n\r\n 'log': <pd.DataFrame>,\r\n\r\n 'test': <pd.DataFrame>,\r\n\r\n}\r\n\r\n\"\"\"\r\n\r\n```\r\n\r\n\r\n\r\n## Contributing\r\n\r\nContributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request on the [scify-file-reader](https://github.com/Jeferson-Peter/scify-file-reader) repository.\r\n\r\n",
"bugtrack_url": null,
"license": "",
"summary": "A class to handle and process multiple files with identical structures within a directory.",
"version": "0.0.2",
"project_urls": {
"Homepage": "https://github.com/Jeferson-Peter/scify-file-reader"
},
"split_keywords": [
"python",
"file reading",
"multiple file handler"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7bf9df411496690062caea8366e61942161b375412c600acaafcd80f2974e15b",
"md5": "18d198d5280a51ade1458fc4961c9bf4",
"sha256": "5d6d322f8f37f671aebac0914025c4fc9ed26950a2b2efece032a12c6ed8c219"
},
"downloads": -1,
"filename": "scify_file_reader-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "18d198d5280a51ade1458fc4961c9bf4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5325,
"upload_time": "2023-06-18T16:39:36",
"upload_time_iso_8601": "2023-06-18T16:39:36.192252Z",
"url": "https://files.pythonhosted.org/packages/7b/f9/df411496690062caea8366e61942161b375412c600acaafcd80f2974e15b/scify_file_reader-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "83a55f204d5864fa2b36537ff8b0457dbe01d43d2aedaffcc7e526d5a5be7bb1",
"md5": "5bc6f8e41df7c087e935c8d77ab45c12",
"sha256": "fbe78a53ad765f44f6ff8caa97874f8e24a42e8968a56af786d044a7d56299fa"
},
"downloads": -1,
"filename": "scify-file-reader-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "5bc6f8e41df7c087e935c8d77ab45c12",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5280,
"upload_time": "2023-06-18T16:39:37",
"upload_time_iso_8601": "2023-06-18T16:39:37.724468Z",
"url": "https://files.pythonhosted.org/packages/83/a5/5f204d5864fa2b36537ff8b0457dbe01d43d2aedaffcc7e526d5a5be7bb1/scify-file-reader-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-18 16:39:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Jeferson-Peter",
"github_project": "scify-file-reader",
"github_not_found": true,
"lcname": "scify-file-reader"
}