# mft to parquet (pyarrow dtypes)
## Tested against Windows 10 / Python 3.11 / Anaconda
### pip install mft2parquet
```python
Reads HDD (Hard Disk Drive) information from a specified drive and returns it as a pandas DataFrame.
Args:
drive (str, optional): The drive path to read from. Default is "c:\\".
outputfile (str, optional): If provided, the DataFrame will be saved as a Parquet file at this path.
Default is None.
Returns:
pd.DataFrame: A DataFrame with pyarrow dtypes containing HDD information with the specified columns.
Raises:
subprocess.CalledProcessError: If the external command fails to execute.
Note:
- This function uses an external command-line utility https://github.com/githubrobbi/Ultra-Fast-File-Search to retrieve HDD information.
- The DataFrame will have the following columns:
- aa_path
- aa_name
- aa_path_only
- aa_size
- aa_size_on_disk
- aa_created
- aa_last_written
- aa_last_accessed
- aa_descendents
- aa_read-only
- aa_archive
- aa_system
- aa_hidden
- aa_offline
- aa_not_content_indexed_file
- aa_no_scrub_file
- aa_integrity
- aa_pinned
- aa_unpinned
- aa_directory_flag
- aa_compressed
- aa_encrypted
- aa_sparse
- aa_reparse
- aa_attributes
Example:
df = read_hdd(drive="d:\\", outputfile="hdd_info.parquet")
# Reads HDD information from the 'D:' drive and saves it as 'hdd_info.parquet'.
```
Raw data
{
"_id": null,
"home_page": "https://github.com/hansalemaos/mft2parquet",
"name": "mft2parquet",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "mft,file,search",
"author": "Johannes Fischer",
"author_email": "aulasparticularesdealemaosp@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/89/80/3ad5d0ae0df238fcc0108cb5531012047741a1f0a6aec23862f6989e16f7/mft2parquet-0.11.tar.gz",
"platform": null,
"description": "\r\n# mft to parquet (pyarrow dtypes)\r\n\r\n## Tested against Windows 10 / Python 3.11 / Anaconda\r\n\r\n### pip install mft2parquet\r\n\r\n\r\n```python\r\nReads HDD (Hard Disk Drive) information from a specified drive and returns it as a pandas DataFrame.\r\n\r\nArgs:\r\ndrive (str, optional): The drive path to read from. Default is \"c:\\\\\".\r\noutputfile (str, optional): If provided, the DataFrame will be saved as a Parquet file at this path.\r\n\t\t\t\t\t Default is None.\r\n\r\nReturns:\r\npd.DataFrame: A DataFrame with pyarrow dtypes containing HDD information with the specified columns.\r\n\r\nRaises:\r\nsubprocess.CalledProcessError: If the external command fails to execute.\r\n\r\nNote:\r\n- This function uses an external command-line utility https://github.com/githubrobbi/Ultra-Fast-File-Search to retrieve HDD information.\r\n- The DataFrame will have the following columns:\r\n- aa_path\r\n- aa_name\r\n- aa_path_only\r\n- aa_size\r\n- aa_size_on_disk\r\n- aa_created\r\n- aa_last_written\r\n- aa_last_accessed\r\n- aa_descendents\r\n- aa_read-only\r\n- aa_archive\r\n- aa_system\r\n- aa_hidden\r\n- aa_offline\r\n- aa_not_content_indexed_file\r\n- aa_no_scrub_file\r\n- aa_integrity\r\n- aa_pinned\r\n- aa_unpinned\r\n- aa_directory_flag\r\n- aa_compressed\r\n- aa_encrypted\r\n- aa_sparse\r\n- aa_reparse\r\n- aa_attributes\r\n\r\nExample:\r\ndf = read_hdd(drive=\"d:\\\\\", outputfile=\"hdd_info.parquet\")\r\n# Reads HDD information from the 'D:' drive and saves it as 'hdd_info.parquet'.\r\n```\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "mft to parquet (pyarrow dtypes)",
"version": "0.11",
"project_urls": {
"Homepage": "https://github.com/hansalemaos/mft2parquet"
},
"split_keywords": [
"mft",
"file",
"search"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5b611ce7c4389039c4ff723171dc32e0c0c996f34fa3a4a0a30dfd8717308516",
"md5": "c4ba995b68b1a2dc9bb87c9691f33a4e",
"sha256": "b7bcad862194241311b99cc0e571a03ebebdf0493d164cbef0358096ce718ecb"
},
"downloads": -1,
"filename": "mft2parquet-0.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c4ba995b68b1a2dc9bb87c9691f33a4e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 681541,
"upload_time": "2023-09-14T03:08:11",
"upload_time_iso_8601": "2023-09-14T03:08:11.131957Z",
"url": "https://files.pythonhosted.org/packages/5b/61/1ce7c4389039c4ff723171dc32e0c0c996f34fa3a4a0a30dfd8717308516/mft2parquet-0.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "89803ad5d0ae0df238fcc0108cb5531012047741a1f0a6aec23862f6989e16f7",
"md5": "c235d79266314ab7661d0449f653e9c5",
"sha256": "38eb1b4fea7f7397809d3cf1dba04afee50a6151bff7b5e9c3d3e02222147761"
},
"downloads": -1,
"filename": "mft2parquet-0.11.tar.gz",
"has_sig": false,
"md5_digest": "c235d79266314ab7661d0449f653e9c5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 679240,
"upload_time": "2023-09-14T03:08:15",
"upload_time_iso_8601": "2023-09-14T03:08:15.015042Z",
"url": "https://files.pythonhosted.org/packages/89/80/3ad5d0ae0df238fcc0108cb5531012047741a1f0a6aec23862f6989e16f7/mft2parquet-0.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-14 03:08:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "hansalemaos",
"github_project": "mft2parquet",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "mft2parquet"
}