fsspec-encrypted
================
``fsspec-encrypted`` is a package that provides an encrypted filesystem
for use with Python. It’s built on
`fsspec <https://filesystem-spec.readthedocs.io/en/latest/>`__ making it
compatible with Cloud Services like S3, GCS, Azure Blob Service / Data
Lake etc. As well as bringing encryption to Pandas Data Frames.
It allows users to transparently encrypt and decrypt files while
maintaining compatibility with any underlying ``fsspec``-compatible
filesystem (e.g., local, S3, GCS, etc.).
- `fsspec-encrypted <#fsspec-encrypted>`__
- `Note <#note>`__
- `Keys <#keys>`__
- `Features <#features>`__
- `Application <#application>`__
- `Installation <#installation>`__
- `Usage <#usage>`__
- `Local Filesystem Example <#local-filesystem-example>`__
- `Pandas compatibility <#pandas-compatibility>`__
- `S3 Filesystem Example <#s3-filesystem-example>`__
- `Other Filesystems <#other-filesystems>`__
- `CLI <#cli>`__
- `Generate an Encryption Key <#generate-an-encryption-key>`__
- `What is a Salt? <#what-is-a-salt>`__
- `Encrypt data from stdin and write it to a
file <#encrypt-data-from-stdin-and-write-it-to-a-file>`__
- `Development <#development>`__
- `Setting Up for Development <#setting-up-for-development>`__
- `Running Tests <#running-tests>`__
Note
----
This supersedes
`fs-encrypted <https://github.com/thevgergroup/fs-encrypted>`__ as it
appears pyfilesystem2 is no longer maintained. So we are switching to
`fsspec <https://github.com/fsspec/filesystem_spec/>`__ which has a
broad level of adoption.
``fsspec-encrypted`` is an AES-256 CBC encrypted driver for ``fsspec``
The entire file is buffered to memory before written to disk with the
pandas to\_\* methods, this is to reduce time spent on decrypting and
re-encrypting by chunk.
Our roadmap will be to switch to AES-CTR to allow for streaming
encryption, which will reduce the need for a larger memory footprint.
Keys
----
We use a keys, ensure you store the keys securely!!!! A lost key means
lost data!
Keys are natively bytes, and should be base64 encoded / decoded, use the
methods EncryptedFS.key_to_str and EncryptedFS.str_to_key, for storing,
transmitting, and especially copying + pasting. These helper methods are
named as I couldn’t remember if I should encode or decode - so write
once and forget.
e.g.
.. code:: python
from fsspec_encrypted.fs_enc_cli import generate_key
from fsspec_encrypted.fs_enc import EncryptedFS
# Your encryption key
encryption_key = generate_key(passphrase="my_secret_passphrase", salt=b"12345432")
print("Encryption key:", EncryptedFS.key_to_str(encryption_key))
Features
--------
- **Encryption on top of any filesystem**: Works with any
``fsspec``-supported filesystem (e.g., local, S3, GCS, FTP, Azure).
- **Automatic encryption and decryption**: Data is automatically
encrypted during writes and decrypted during reads.
- **CLI**: Provides for easy scripting and key generation
- **Simple and flexible**: Minimal setup required with flexible file
system options.
Application
-----------
Applications that may require sensitive data storage should use an
encrypted file system. By providing a layer of abstraction on top of the
encryption our hope is to make it safer to store this data.
PII / PHI \* Print Billing systems \* Insurance services / Identity
cards \* Data Transfer \* Secure distributed configuration
Installation
------------
You can install ``fsspec-encrypted`` via pip from PyPI:
.. code:: bash
pip install fsspec-encrypted
Usage
-----
Here’s a simple example of using ``fsspec-encrypted`` to create an
encrypted filesystem layer on top of a local filesystem (default) and
perform basic read and write operations.
Local Filesystem Example
~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
import fsspec
from fsspec_encrypted.fs_enc_cli import generate_key
# Generate an encryption key
encryption_key = generate_key(passphrase="my_secret_passphrase", salt=b"12345432")
# Create an EncryptedFS instance (local filesystem is the default)
enc_fs = fsspec.filesystem('enc', encryption_key=encryption_key)
# Write some encrypted data to a file
enc_fs.writetext('./encfs/example.txt', 'This is some encrypted text.')
# Read the encrypted data back from the file
print(enc_fs.readtext('./encfs/example.txt'))
Pandas compatibility
~~~~~~~~~~~~~~~~~~~~
Pandas uses ``fsspec`` under the hood, which lets you using the read /
to methods to encrypt data Additional note, we are using the
generate_key here with a passphrase and salt to allow for reusable key
.. code:: python
import pandas as pd
from fsspec_encrypted.fs_enc_cli import generate_key
# Your encryption key
encryption_key = generate_key(passphrase="my_secret_passphrase", salt=b"12345432")
# Create a sample DataFrame
data = {
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35]
}
df = pd.DataFrame(data)
# This encrypts the file to disk
df.to_csv('enc://./encfs/encrypted-file.csv', index=False, storage_options={"encryption_key": encryption_key})
print("Data written to encrypted file with key:", encryption_key.decode())
# Read and decrypt the file
df2 = pd.read_csv('enc://./encfs/encrypted-file.csv', storage_options={"encryption_key": encryption_key})
print(df2)
S3 Filesystem Example
~~~~~~~~~~~~~~~~~~~~~
This is an example of using encryption on top of other file systems,
where we wrap S3 and encrypt or decrypt as required.
.. code:: python
import fsspec
from cryptography.fernet import Fernet
# Generate an encryption key
encryption_key = Fernet.generate_key()
# Use the encrypted filesystem on top of an S3 filesystem
enc_fs = fsspec.filesystem('enc', encryption_key=encryption_key)
# Write some encrypted data to S3
enc_fs.writetext('s3://your-bucket/example.txt', 'This is some encrypted text.')
# Read the encrypted data back from S3
print(enc_fs.readtext('s3://your-bucket/example.txt'))
# This can also be done by wrapping the filesystem
bucket="some-bucket"
df = pd.read_csv(f'enc://s3://{bucket}/encrypted-file.csv', storage_options={"encryption_key": encryption_key})
Other Filesystems
~~~~~~~~~~~~~~~~~
``fsspec-encrypted`` automatically determines the filesystem type based
on the file path.
For example, if the path starts with s3://, it will use S3; otherwise,
it defaults to the local filesystem. It supports any fsspec-compatible
filesystem (e.g., GCS, FTP).
For wrapping the filesystem we can use ``enc://<other-file-system>://``
CLI
---
``fsspec-encrypted`` also includes a command-line interface (CLI) for
encrypting and decrypting files.
This allows a simple ability to encrypt and decrypt files without code
|asciicast|
Generate an Encryption Key
~~~~~~~~~~~~~~~~~~~~~~~~~~
Store your keys appropriately - a secrets manager is an ideal solution!
.. code:: bash
# Generate a random key
# CRITICAL STORE THE KEY SOMEWHERE SECURE
key=$(fs-enc gen-key)
If you want to generate a key based on a passphrase and salt
.. code:: bash
fs-enc gen-key --passphrase 'hello world' --salt 12345432
What is a Salt?
~~~~~~~~~~~~~~~
A salt is a random 16 byte value used during the key derivation process
to ensure that even if two people use the same passphrase, the derived
encryption keys will be different. The salt is not a secret, but it
should be unique and random for each encryption.
When encrypting data, the salt is usually stored alongside the encrypted
data so that it can be used again during decryption to derive the same
encryption key from the passphrase.
Encrypt data from stdin and write it to a file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: bash
# Encrypt and store locally
echo "This is sensitive data" | fs-enc encrypt --key $key --file ./encfs/encrypted-file.txt
# Decrypt
fs-enc decrypt --key $key --file ./encfs/encrypted-file.txt
Writing encrypted data to a cloud store, The following example requires
the appropriate driver s3fs in this case installed and AWS env variables
configured
.. code:: bash
export AWS_PROFILE=xxxxxx
pip install -U s3fs
echo "This is sensitive data" | fs-enc encrypt --key $key --file s3://<some-bucket>/encrypted-file.txt
fs-enc decrypt --key $key --file s3://<some-bucket>/encrypted-file.txt
Development
-----------
If you’d like to contribute or modify the code, you can set up the
project for development using Poetry.
Setting Up for Development
~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Clone the repository:
.. code:: bash
git clone https://github.com/thevgergroup/fsspec-encrypted.git
cd fsspec-encrypted
2. Install the dependencies using Poetry:
.. code:: bash
poetry install
3. After installation, any changes you make to the code will be
automatically reflected when running the project.
Running Tests
~~~~~~~~~~~~~
The project uses ``pytest`` for testing. To run the test suite, simply
use:
.. code:: bash
poetry run pytest
.. |asciicast| image:: https://asciinema.org/a/hwpcCH1r1CM7ezNU4fM6wgKiY.svg
:target: https://asciinema.org/a/hwpcCH1r1CM7ezNU4fM6wgKiY
Raw data
{
"_id": null,
"home_page": "https://github.com/thevgergroup/fsspec-encrypted",
"name": "fsspec-encrypted",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "filesystem, encryption, fsspec",
"author": "patrick o'leary",
"author_email": "pjaol@pjaol.com",
"download_url": "https://files.pythonhosted.org/packages/d3/e8/46f068e3421e8d5849e08f0dfbf6cc464af839a93ab7035592e1e9bcb9f7/fsspec_encrypted-0.8.tar.gz",
"platform": null,
"description": "fsspec-encrypted\n================\n\n``fsspec-encrypted`` is a package that provides an encrypted filesystem\nfor use with Python. It\u2019s built on\n`fsspec <https://filesystem-spec.readthedocs.io/en/latest/>`__ making it\ncompatible with Cloud Services like S3, GCS, Azure Blob Service / Data\nLake etc. As well as bringing encryption to Pandas Data Frames.\n\nIt allows users to transparently encrypt and decrypt files while\nmaintaining compatibility with any underlying ``fsspec``-compatible\nfilesystem (e.g., local, S3, GCS, etc.).\n\n- `fsspec-encrypted <#fsspec-encrypted>`__\n\n - `Note <#note>`__\n - `Keys <#keys>`__\n - `Features <#features>`__\n - `Application <#application>`__\n - `Installation <#installation>`__\n - `Usage <#usage>`__\n\n - `Local Filesystem Example <#local-filesystem-example>`__\n - `Pandas compatibility <#pandas-compatibility>`__\n - `S3 Filesystem Example <#s3-filesystem-example>`__\n - `Other Filesystems <#other-filesystems>`__\n\n - `CLI <#cli>`__\n\n - `Generate an Encryption Key <#generate-an-encryption-key>`__\n - `What is a Salt? <#what-is-a-salt>`__\n - `Encrypt data from stdin and write it to a\n file <#encrypt-data-from-stdin-and-write-it-to-a-file>`__\n\n - `Development <#development>`__\n\n - `Setting Up for Development <#setting-up-for-development>`__\n - `Running Tests <#running-tests>`__\n\nNote\n----\n\nThis supersedes\n`fs-encrypted <https://github.com/thevgergroup/fs-encrypted>`__ as it\nappears pyfilesystem2 is no longer maintained. So we are switching to\n`fsspec <https://github.com/fsspec/filesystem_spec/>`__ which has a\nbroad level of adoption.\n\n``fsspec-encrypted`` is an AES-256 CBC encrypted driver for ``fsspec``\nThe entire file is buffered to memory before written to disk with the\npandas to\\_\\* methods, this is to reduce time spent on decrypting and\nre-encrypting by chunk.\n\nOur roadmap will be to switch to AES-CTR to allow for streaming\nencryption, which will reduce the need for a larger memory footprint.\n\nKeys\n----\n\nWe use a keys, ensure you store the keys securely!!!! A lost key means\nlost data!\n\nKeys are natively bytes, and should be base64 encoded / decoded, use the\nmethods EncryptedFS.key_to_str and EncryptedFS.str_to_key, for storing,\ntransmitting, and especially copying + pasting. These helper methods are\nnamed as I couldn\u2019t remember if I should encode or decode - so write\nonce and forget.\n\ne.g.\n\n.. code:: python\n\n from fsspec_encrypted.fs_enc_cli import generate_key\n from fsspec_encrypted.fs_enc import EncryptedFS\n\n # Your encryption key\n encryption_key = generate_key(passphrase=\"my_secret_passphrase\", salt=b\"12345432\")\n print(\"Encryption key:\", EncryptedFS.key_to_str(encryption_key))\n\nFeatures\n--------\n\n- **Encryption on top of any filesystem**: Works with any\n ``fsspec``-supported filesystem (e.g., local, S3, GCS, FTP, Azure).\n- **Automatic encryption and decryption**: Data is automatically\n encrypted during writes and decrypted during reads.\n- **CLI**: Provides for easy scripting and key generation\n- **Simple and flexible**: Minimal setup required with flexible file\n system options.\n\nApplication\n-----------\n\nApplications that may require sensitive data storage should use an\nencrypted file system. By providing a layer of abstraction on top of the\nencryption our hope is to make it safer to store this data.\n\nPII / PHI \\* Print Billing systems \\* Insurance services / Identity\ncards \\* Data Transfer \\* Secure distributed configuration\n\nInstallation\n------------\n\nYou can install ``fsspec-encrypted`` via pip from PyPI:\n\n.. code:: bash\n\n pip install fsspec-encrypted\n\nUsage\n-----\n\nHere\u2019s a simple example of using ``fsspec-encrypted`` to create an\nencrypted filesystem layer on top of a local filesystem (default) and\nperform basic read and write operations.\n\nLocal Filesystem Example\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n import fsspec\n from fsspec_encrypted.fs_enc_cli import generate_key\n\n # Generate an encryption key\n encryption_key = generate_key(passphrase=\"my_secret_passphrase\", salt=b\"12345432\")\n\n # Create an EncryptedFS instance (local filesystem is the default)\n enc_fs = fsspec.filesystem('enc', encryption_key=encryption_key)\n\n # Write some encrypted data to a file\n enc_fs.writetext('./encfs/example.txt', 'This is some encrypted text.')\n\n # Read the encrypted data back from the file\n print(enc_fs.readtext('./encfs/example.txt'))\n\nPandas compatibility\n~~~~~~~~~~~~~~~~~~~~\n\nPandas uses ``fsspec`` under the hood, which lets you using the read /\nto methods to encrypt data Additional note, we are using the\ngenerate_key here with a passphrase and salt to allow for reusable key\n\n.. code:: python\n\n import pandas as pd\n from fsspec_encrypted.fs_enc_cli import generate_key\n\n # Your encryption key\n encryption_key = generate_key(passphrase=\"my_secret_passphrase\", salt=b\"12345432\")\n\n # Create a sample DataFrame\n data = {\n 'name': ['Alice', 'Bob', 'Charlie'],\n 'age': [25, 30, 35]\n }\n df = pd.DataFrame(data)\n\n # This encrypts the file to disk\n df.to_csv('enc://./encfs/encrypted-file.csv', index=False, storage_options={\"encryption_key\": encryption_key})\n\n print(\"Data written to encrypted file with key:\", encryption_key.decode())\n\n # Read and decrypt the file\n df2 = pd.read_csv('enc://./encfs/encrypted-file.csv', storage_options={\"encryption_key\": encryption_key})\n\n print(df2)\n\nS3 Filesystem Example\n~~~~~~~~~~~~~~~~~~~~~\n\nThis is an example of using encryption on top of other file systems,\nwhere we wrap S3 and encrypt or decrypt as required.\n\n.. code:: python\n\n import fsspec\n from cryptography.fernet import Fernet\n\n # Generate an encryption key\n encryption_key = Fernet.generate_key()\n\n # Use the encrypted filesystem on top of an S3 filesystem\n enc_fs = fsspec.filesystem('enc', encryption_key=encryption_key)\n\n # Write some encrypted data to S3\n enc_fs.writetext('s3://your-bucket/example.txt', 'This is some encrypted text.')\n\n # Read the encrypted data back from S3\n print(enc_fs.readtext('s3://your-bucket/example.txt'))\n\n # This can also be done by wrapping the filesystem\n bucket=\"some-bucket\"\n df = pd.read_csv(f'enc://s3://{bucket}/encrypted-file.csv', storage_options={\"encryption_key\": encryption_key})\n\nOther Filesystems\n~~~~~~~~~~~~~~~~~\n\n``fsspec-encrypted`` automatically determines the filesystem type based\non the file path.\n\nFor example, if the path starts with s3://, it will use S3; otherwise,\nit defaults to the local filesystem. It supports any fsspec-compatible\nfilesystem (e.g., GCS, FTP).\n\nFor wrapping the filesystem we can use ``enc://<other-file-system>://``\n\nCLI\n---\n\n``fsspec-encrypted`` also includes a command-line interface (CLI) for\nencrypting and decrypting files.\n\nThis allows a simple ability to encrypt and decrypt files without code\n|asciicast|\n\nGenerate an Encryption Key\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nStore your keys appropriately - a secrets manager is an ideal solution!\n\n.. code:: bash\n\n # Generate a random key\n # CRITICAL STORE THE KEY SOMEWHERE SECURE\n key=$(fs-enc gen-key)\n\nIf you want to generate a key based on a passphrase and salt\n\n.. code:: bash\n\n fs-enc gen-key --passphrase 'hello world' --salt 12345432\n\nWhat is a Salt?\n~~~~~~~~~~~~~~~\n\nA salt is a random 16 byte value used during the key derivation process\nto ensure that even if two people use the same passphrase, the derived\nencryption keys will be different. The salt is not a secret, but it\nshould be unique and random for each encryption.\n\nWhen encrypting data, the salt is usually stored alongside the encrypted\ndata so that it can be used again during decryption to derive the same\nencryption key from the passphrase.\n\nEncrypt data from stdin and write it to a file\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: bash\n\n # Encrypt and store locally\n echo \"This is sensitive data\" | fs-enc encrypt --key $key --file ./encfs/encrypted-file.txt\n # Decrypt\n fs-enc decrypt --key $key --file ./encfs/encrypted-file.txt\n\nWriting encrypted data to a cloud store, The following example requires\nthe appropriate driver s3fs in this case installed and AWS env variables\nconfigured\n\n.. code:: bash\n\n export AWS_PROFILE=xxxxxx\n pip install -U s3fs\n echo \"This is sensitive data\" | fs-enc encrypt --key $key --file s3://<some-bucket>/encrypted-file.txt \n fs-enc decrypt --key $key --file s3://<some-bucket>/encrypted-file.txt \n\nDevelopment\n-----------\n\nIf you\u2019d like to contribute or modify the code, you can set up the\nproject for development using Poetry.\n\nSetting Up for Development\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n1. Clone the repository:\n\n .. code:: bash\n\n git clone https://github.com/thevgergroup/fsspec-encrypted.git\n cd fsspec-encrypted\n\n2. Install the dependencies using Poetry:\n\n .. code:: bash\n\n poetry install\n\n3. After installation, any changes you make to the code will be\n automatically reflected when running the project.\n\nRunning Tests\n~~~~~~~~~~~~~\n\nThe project uses ``pytest`` for testing. To run the test suite, simply\nuse:\n\n.. code:: bash\n\n poetry run pytest\n\n.. |asciicast| image:: https://asciinema.org/a/hwpcCH1r1CM7ezNU4fM6wgKiY.svg\n :target: https://asciinema.org/a/hwpcCH1r1CM7ezNU4fM6wgKiY\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A fsspec filesystem that encrypts files, compatible with pandas",
"version": "0.8",
"project_urls": {
"Homepage": "https://github.com/thevgergroup/fsspec-encrypted",
"Repository": "https://github.com/thevgergroup/fsspec-encrypted.git"
},
"split_keywords": [
"filesystem",
" encryption",
" fsspec"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "71df7303a15383a1a400b16d1d65c2e37700d50b5cf2970914a7dbd00ee6c7a0",
"md5": "94a972f72d405a4043f95f125f2cf53e",
"sha256": "a804ad77c0eb25df47847a6cd859108a351c5e6b8ea28d1c8cd49ad0c03c8cb3"
},
"downloads": -1,
"filename": "fsspec_encrypted-0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "94a972f72d405a4043f95f125f2cf53e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 10481,
"upload_time": "2024-09-17T16:36:17",
"upload_time_iso_8601": "2024-09-17T16:36:17.188941Z",
"url": "https://files.pythonhosted.org/packages/71/df/7303a15383a1a400b16d1d65c2e37700d50b5cf2970914a7dbd00ee6c7a0/fsspec_encrypted-0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d3e846f068e3421e8d5849e08f0dfbf6cc464af839a93ab7035592e1e9bcb9f7",
"md5": "91f03745301cea5ea58fb73f2c2d1140",
"sha256": "a2ef21977c26ac16759a71d75832c4fd27a834417de6edd73df6458109174b90"
},
"downloads": -1,
"filename": "fsspec_encrypted-0.8.tar.gz",
"has_sig": false,
"md5_digest": "91f03745301cea5ea58fb73f2c2d1140",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 12042,
"upload_time": "2024-09-17T16:36:18",
"upload_time_iso_8601": "2024-09-17T16:36:18.850312Z",
"url": "https://files.pythonhosted.org/packages/d3/e8/46f068e3421e8d5849e08f0dfbf6cc464af839a93ab7035592e1e9bcb9f7/fsspec_encrypted-0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-17 16:36:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "thevgergroup",
"github_project": "fsspec-encrypted",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "fsspec-encrypted"
}