Name | git-fastcdc JSON |
Version |
0.4.0
JSON |
| download |
home_page | None |
Summary | FastCDC for large git files |
upload_time | 2024-04-10 10:43:55 |
maintainer | None |
docs_url | None |
author | Jean-Louis Fuchs |
requires_python | <4.0,>=3.9 |
license | AGPL-3.0-or-later |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# git-fastcdc
Split certain files using content-defined-chunking for faster deduplication. It
has a similar use-case to git-lfs, but blobs are in-repository. git-fastcdc
mitigates some of the speed penalties. For most use-cases you are probably
better off with git-lfs. If you have a focus on archival and deduplication, git-
fastcdc might right for you.
## Enable
```bash
git fastcdc install
```
## Config
Edit .gitattributes:
```
*.wav binary filter=git_fastcdc
/.gitattributes text -binary -filter
/.gitignore text -binary -filter
```
By default git-fastcdc runs in-memory. Switch to on-disk:
```bash
git config --local fastcdc.ondisk true
```
If you have a pure git-fastcdc repository, you probably want to disable delta-compression
to benefit from the speedups through fastcdc.
```bash
git fastcdc delta disable
```
Which will set `core.bigFileThreshold` to `200k` which isn't exect science. It
means most of the history- and meta-data is delta-compressed while most of the
cdc-blobs aren't.
## Results
For my repository - 800GB of music collection:
- Without git-fastcdc delta-compression took over 5 hours (actually it took all
night)
- With git-fastcdc delta-compression takes about 2 minutes
- With git-fastcdc the repostiory got slightly smaller: about 1%
So much faster repack, with the same delta-compression.
Methodology: I took one state of my repostory from 2 years ago and one state
from today. A lot of meta-data has changed in those two states, because I am
constantly fixing these using beaTunes. In both tests I created two commits
and did `reapck -a -d -f` at the end.
## How
It will split files on filtering when you add them. The split files go into
the `git-fastcdc` branch. You need to push this branch to remotes too!
You will see the actual data in the files in the working copy, in `*.wav` in the
example above. But actually the blobs of these files are just a list of chunks.
Raw data
{
"_id": null,
"home_page": null,
"name": "git-fastcdc",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Jean-Louis Fuchs",
"author_email": "jean-louis.fuchs@adfinis.com",
"download_url": "https://files.pythonhosted.org/packages/84/76/52b94e137a6e63304fe59497aff1ddbc96dd434efadb3e4ff8ddd6e12425/git_fastcdc-0.4.0.tar.gz",
"platform": null,
"description": "# git-fastcdc\n\nSplit certain files using content-defined-chunking for faster deduplication. It\nhas a similar use-case to git-lfs, but blobs are in-repository. git-fastcdc\nmitigates some of the speed penalties. For most use-cases you are probably\nbetter off with git-lfs. If you have a focus on archival and deduplication, git-\nfastcdc might right for you.\n\n## Enable\n\n```bash\ngit fastcdc install\n```\n\n## Config\n\nEdit .gitattributes:\n\n```\n*.wav binary filter=git_fastcdc\n/.gitattributes text -binary -filter\n/.gitignore text -binary -filter\n```\n\nBy default git-fastcdc runs in-memory. Switch to on-disk:\n\n```bash\ngit config --local fastcdc.ondisk true\n```\n\nIf you have a pure git-fastcdc repository, you probably want to disable delta-compression \nto benefit from the speedups through fastcdc.\n\n```bash\ngit fastcdc delta disable\n```\n\nWhich will set `core.bigFileThreshold` to `200k` which isn't exect science. It\nmeans most of the history- and meta-data is delta-compressed while most of the\ncdc-blobs aren't.\n\n## Results\n\nFor my repository - 800GB of music collection:\n\n- Without git-fastcdc delta-compression took over 5 hours (actually it took all\n night)\n- With git-fastcdc delta-compression takes about 2 minutes\n- With git-fastcdc the repostiory got slightly smaller: about 1%\n\nSo much faster repack, with the same delta-compression.\n\nMethodology: I took one state of my repostory from 2 years ago and one state\nfrom today. A lot of meta-data has changed in those two states, because I am\nconstantly fixing these using beaTunes. In both tests I created two commits\nand did `reapck -a -d -f` at the end.\n\n## How\n\nIt will split files on filtering when you add them. The split files go into\nthe `git-fastcdc` branch. You need to push this branch to remotes too!\n\nYou will see the actual data in the files in the working copy, in `*.wav` in the\nexample above. But actually the blobs of these files are just a list of chunks.\n",
"bugtrack_url": null,
"license": "AGPL-3.0-or-later",
"summary": "FastCDC for large git files",
"version": "0.4.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "aa084cbe4d0ca61b96734ab440abe54d5c8fb0e15ae2de3de82c000f96556c41",
"md5": "83f46cf414e689d6298bbd3dd5cf0b5a",
"sha256": "f50fb5a5c27f54632830260641a506e04f87b6e0a8864780318b544e868c46d3"
},
"downloads": -1,
"filename": "git_fastcdc-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "83f46cf414e689d6298bbd3dd5cf0b5a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 18368,
"upload_time": "2024-04-10T10:43:53",
"upload_time_iso_8601": "2024-04-10T10:43:53.479346Z",
"url": "https://files.pythonhosted.org/packages/aa/08/4cbe4d0ca61b96734ab440abe54d5c8fb0e15ae2de3de82c000f96556c41/git_fastcdc-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "847652b94e137a6e63304fe59497aff1ddbc96dd434efadb3e4ff8ddd6e12425",
"md5": "5c20018d35ebc4822773fb324671baa5",
"sha256": "ea7da4c9369fbb95bee65f7423064744ba34ae9fb951f7d1fa32ea6a830476e7"
},
"downloads": -1,
"filename": "git_fastcdc-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "5c20018d35ebc4822773fb324671baa5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 17757,
"upload_time": "2024-04-10T10:43:55",
"upload_time_iso_8601": "2024-04-10T10:43:55.610038Z",
"url": "https://files.pythonhosted.org/packages/84/76/52b94e137a6e63304fe59497aff1ddbc96dd434efadb3e4ff8ddd6e12425/git_fastcdc-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-10 10:43:55",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "git-fastcdc"
}