# Orgecc File Matcher
A Python library and CLI tool for Git-compatible file matching and directory traversal.
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python Versions](https://img.shields.io/pypi/pyversions/orgecc-file-matcher.svg)](https://pypi.org/project/orgecc-file-matcher/)
[![CI](https://github.com/yourusername/file-matcher-python/actions/workflows/test.yml/badge.svg)](https://github.com/yourusername/file-matcher-python/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/orgecc-file-matcher)](https://pypi.org/project/orgecc-file-matcher/)
[![PyPI version](https://badge.fury.io/py/orgecc-file-matcher.svg)](https://pypi.org/project/orgecc-file-matcher/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
A versatile command-line, Python library and toolkit for `.gitignore`-style file matching, designed to meet four key goals:
1. **Pure Python Matcher**: Provide a pure Python implementation that precisely matches Git's behavior.
2. **File Walker**: Traverse directories while respecting `.gitignore` rules at all levels.
3. **Unit Testing**: Verify the correctness of any `.gitignore` matching library or command.
4. **Benchmarking**: Compare the performance of different `.gitignore` matching implementations.
## Features
- **Git-Compatible Matching**: Pure Python implementation passes all test cases, ensuring 100% compatibility with Git's behavior.
- **Multiple Implementations**: Choose from pure Python, external libraries ([gitignorefile](https://github.com/excitoon/gitignorefile), [pathspec](https://github.com/excitoon/gitignorefile)), or native Git integration.
- **Multiple Implementations** (see available options in [MatcherImplementation](src/orgecc/filematcher/__init__.py)):
- **Pure Python**: No external dependencies. Aims at 100% Git compatibility.
- **Native Git Integration**: Internally calls `git check-ignore -v`. The unit tests are adjusted according to this implementation.
- **External Libraries**: Supports [gitignorefile](https://github.com/excitoon/gitignorefile) and [pathspec](https://github.com/cpburnz/python-path-specification).
- **[Comprehensive Test Suite](#unit-testing)**: Includes a [test corpus](tests/corpus) for validating `.gitignore` matching behavior.
- **Tree-Sitter-Inspired Testing**: The corpus files follow the same rigorous testing principles used by [Tree-Sitter](https://tree-sitter.github.io/tree-sitter/), ensuring high-quality and reliable test coverage.
- **Efficient Directory Traversal**: A file walker that skips ignored files and directories.
- **Cross-Platform**: Works seamlessly on Windows, macOS, and Linux.
## Installation
Install via **pip**:
```bash
pip install orgecc-filematcher
```
## Usage
### Pure Python Matcher
Use the Git-compatible pure Python matcher (the default):
```python
from orgecc.filematcher import get_factory, MatcherImplementation
from orgecc.filematcher.patterns import new_deny_pattern_source
factory = get_factory(MatcherImplementation.PURE_PYTHON)
patterns = new_deny_pattern_source(["*.pyc", "build/"])
matcher = factory.pattern2matcher(patterns)
result = matcher.match("path/to/file.pyc")
print(result.matches) # True or False, matching Git's behavior
```
### File Walker
Traverse directories while respecting `.gitignore` rules:
#### CLI Tool for _macOS_, _Linux_ and _Windows_
Use the provided CLI tool to traverse directories while respecting .gitignore rules:
```shell
file-walker --help
```
```
Usage: file-walker [OPTIONS] PATH
List files and directories while respecting gitignore patterns.
Options:
-t, --type [all|f|d] Type of entries to show
-f, --format [absolute|relative|name]
Output format for paths
-X, --exclude-from FILE Base gitignore file to apply before others
-x, --exclude TEXT Base patterns to ignore (applied before
others)
-0, --null Use null character as separator (useful for
xargs)
--suppress-errors Suppress error messages
-q, --quiet Don't show summary, be quiet
--help Show this message and exit.
```
#### Python Class: _DirectoryWalker_
```python
from orgecc.filematcher.walker import DirectoryWalker
walker = DirectoryWalker()
for file in walker.walk("path/to/directory"):
print(file)
print(walker.yielded_count)
print(walker.ignored_count)
```
### Unit Testing
Use the included [test corpus](tests/corpus) to validate your `.gitignore` matching implementation.
You can see an example of failure below for the [negation.txt](tests/corpus/negation.txt) test file:
<details>
<summary>Test file: negation.txt [block #7]</summary>
```
<.gitignore>
# ======================
# Advanced Negation & Anchored Patterns
# Demonstrates anchored patterns, directories, and multiple negation layers.
# We test directory handling, anchored patterns, and negation layering:
# ======================
# ignore top-level "build" directory
/build
# unignore a specific file inside that directory
!/build/allow.log
!/dist/allow.log
/dist
# ignore all .tmp files
*.tmp
# unignore a specific top-level file
!/global.tmp
# ignore all .log
*.log
# unignore only *.critical.log
!*.critical.log
</.gitignore>
T: 'build' # is a directory matching /build => ignored
T: 'build/allow.log' unignored, but was first ignored by dir, so still matches
T: 'build/subdir/file.txt' # inside build => ignored
T: 'dist'
T: 'dist/allow.log'
F: 'global.tmp' # unignored by !/global.tmp
T: 'random.tmp' # ignored by '*.tmp'
T: 'some/dir/random.tmp' # also ignored by '*.tmp'
T: 'system.log' # ignored by '*.log'
F: 'kernel.critical.log' # unignored by !*.critical.log
F: 'really.critical.log' # unignored by !*.critical.log
F: 'nested/dir/another.critical.log' # unignored by !*.critical.log
T: 'nested/dir/another.debug.log' # still ignored by '*.log'
```
</details>
<details>
<summary>Test Failure: gitignorefile[negation-#7-block34]</summary>
```
XFAIL tests/filematcher_corpus_test.py::test_corpus_extlib_gitignorefile[negation-#7-block34] - reason:
<.gitignore>
/build
!/build/allow.log
!/dist/allow.log
/dist
*.tmp
!/global.tmp
*.log
!*.critical.log
</.gitignore>
== Failures: 9 (negation-#7) ==
1. T->F 'build' is a directory matching /build => ignored
Rule: ext-lib: gitignorefile
2. T->F 'build/allow.log' unignored, but was first ignored by dir, so still matches
Rule: ext-lib: gitignorefile
3. T->F 'build/subdir/file.txt' inside build => ignored
Rule: ext-lib: gitignorefile
4. T->F 'dist'
Rule: ext-lib: gitignorefile
5. T->F 'dist/allow.log'
Rule: ext-lib: gitignorefile
6. T->F 'random.tmp' ignored by '*.tmp'
Rule: ext-lib: gitignorefile
7. T->F 'some/dir/random.tmp' also ignored by '*.tmp'
Rule: ext-lib: gitignorefile
8. T->F 'system.log' ignored by '*.log'
Rule: ext-lib: gitignorefile
9. T->F 'nested/dir/another.debug.log' still ignored by '*.log'
Rule: ext-lib: gitignorefile
```
</details>
### Benchmarking
Compare the performance of different matcher implementations:
```python
from orgecc.filematcher import get_factory, MatcherImplementation
# Test pure Python implementation
factory = get_factory(MatcherImplementation.PURE_PYTHON)
# Test external library implementation
factory = get_factory(MatcherImplementation.EXTLIB_GITIGNOREFILE)
```
## License
This project is licensed under the Apache 2 License - see the [LICENSE](LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "orgecc-file-matcher",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "gitignore, git, file-filter, path-matcher, wildmatch",
"author": null,
"author_email": "Elifarley <elifarley@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/4b/84/d98910750b067c921ea3af6b6a3f3e1a574c88e874400ddc0c768f00ac2c/orgecc_file_matcher-0.0.1.tar.gz",
"platform": null,
"description": "# Orgecc File Matcher\n\nA Python library and CLI tool for Git-compatible file matching and directory traversal.\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python Versions](https://img.shields.io/pypi/pyversions/orgecc-file-matcher.svg)](https://pypi.org/project/orgecc-file-matcher/)\n[![CI](https://github.com/yourusername/file-matcher-python/actions/workflows/test.yml/badge.svg)](https://github.com/yourusername/file-matcher-python/actions/workflows/test.yml)\n[![PyPI](https://img.shields.io/pypi/v/orgecc-file-matcher)](https://pypi.org/project/orgecc-file-matcher/)\n[![PyPI version](https://badge.fury.io/py/orgecc-file-matcher.svg)](https://pypi.org/project/orgecc-file-matcher/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nA versatile command-line, Python library and toolkit for `.gitignore`-style file matching, designed to meet four key goals:\n\n1. **Pure Python Matcher**: Provide a pure Python implementation that precisely matches Git's behavior.\n2. **File Walker**: Traverse directories while respecting `.gitignore` rules at all levels.\n3. **Unit Testing**: Verify the correctness of any `.gitignore` matching library or command.\n4. **Benchmarking**: Compare the performance of different `.gitignore` matching implementations.\n\n## Features\n\n- **Git-Compatible Matching**: Pure Python implementation passes all test cases, ensuring 100% compatibility with Git's behavior.\n- **Multiple Implementations**: Choose from pure Python, external libraries ([gitignorefile](https://github.com/excitoon/gitignorefile), [pathspec](https://github.com/excitoon/gitignorefile)), or native Git integration.\n- **Multiple Implementations** (see available options in [MatcherImplementation](src/orgecc/filematcher/__init__.py)):\n - **Pure Python**: No external dependencies. Aims at 100% Git compatibility.\n - **Native Git Integration**: Internally calls `git check-ignore -v`. The unit tests are adjusted according to this implementation.\n - **External Libraries**: Supports [gitignorefile](https://github.com/excitoon/gitignorefile) and [pathspec](https://github.com/cpburnz/python-path-specification).\n- **[Comprehensive Test Suite](#unit-testing)**: Includes a [test corpus](tests/corpus) for validating `.gitignore` matching behavior.\n- **Tree-Sitter-Inspired Testing**: The corpus files follow the same rigorous testing principles used by [Tree-Sitter](https://tree-sitter.github.io/tree-sitter/), ensuring high-quality and reliable test coverage.\n- **Efficient Directory Traversal**: A file walker that skips ignored files and directories.\n- **Cross-Platform**: Works seamlessly on Windows, macOS, and Linux.\n\n## Installation\n\nInstall via **pip**:\n\n```bash\npip install orgecc-filematcher\n```\n\n## Usage\n\n### Pure Python Matcher\n\nUse the Git-compatible pure Python matcher (the default):\n\n```python\nfrom orgecc.filematcher import get_factory, MatcherImplementation\nfrom orgecc.filematcher.patterns import new_deny_pattern_source\n\nfactory = get_factory(MatcherImplementation.PURE_PYTHON)\npatterns = new_deny_pattern_source([\"*.pyc\", \"build/\"])\nmatcher = factory.pattern2matcher(patterns)\nresult = matcher.match(\"path/to/file.pyc\")\nprint(result.matches) # True or False, matching Git's behavior\n```\n\n### File Walker\n\nTraverse directories while respecting `.gitignore` rules:\n\n#### CLI Tool for _macOS_, _Linux_ and _Windows_\n\nUse the provided CLI tool to traverse directories while respecting .gitignore rules:\n\n```shell\nfile-walker --help\n```\n```\nUsage: file-walker [OPTIONS] PATH\n\n List files and directories while respecting gitignore patterns.\n\nOptions:\n -t, --type [all|f|d] Type of entries to show\n -f, --format [absolute|relative|name]\n Output format for paths\n -X, --exclude-from FILE Base gitignore file to apply before others\n -x, --exclude TEXT Base patterns to ignore (applied before\n others)\n -0, --null Use null character as separator (useful for\n xargs)\n --suppress-errors Suppress error messages\n -q, --quiet Don't show summary, be quiet\n --help Show this message and exit.\n\n```\n\n#### Python Class: _DirectoryWalker_\n\n```python\nfrom orgecc.filematcher.walker import DirectoryWalker\n\nwalker = DirectoryWalker()\nfor file in walker.walk(\"path/to/directory\"):\n print(file)\nprint(walker.yielded_count)\nprint(walker.ignored_count)\n```\n\n### Unit Testing\n\nUse the included [test corpus](tests/corpus) to validate your `.gitignore` matching implementation.\n\nYou can see an example of failure below for the [negation.txt](tests/corpus/negation.txt) test file:\n\n<details>\n<summary>Test file: negation.txt [block #7]</summary>\n\n\n```\n<.gitignore>\n# ======================\n# Advanced Negation & Anchored Patterns\n# Demonstrates anchored patterns, directories, and multiple negation layers.\n# We test directory handling, anchored patterns, and negation layering:\n# ======================\n\n# ignore top-level \"build\" directory\n/build\n# unignore a specific file inside that directory\n!/build/allow.log\n\n!/dist/allow.log\n/dist\n\n# ignore all .tmp files\n*.tmp\n# unignore a specific top-level file\n!/global.tmp\n\n# ignore all .log\n*.log\n# unignore only *.critical.log\n!*.critical.log\n</.gitignore>\nT: 'build' # is a directory matching /build => ignored\nT: 'build/allow.log' unignored, but was first ignored by dir, so still matches\nT: 'build/subdir/file.txt' # inside build => ignored\nT: 'dist'\nT: 'dist/allow.log'\nF: 'global.tmp' # unignored by !/global.tmp\nT: 'random.tmp' # ignored by '*.tmp'\nT: 'some/dir/random.tmp' # also ignored by '*.tmp'\nT: 'system.log' # ignored by '*.log'\nF: 'kernel.critical.log' # unignored by !*.critical.log\nF: 'really.critical.log' # unignored by !*.critical.log\nF: 'nested/dir/another.critical.log' # unignored by !*.critical.log\nT: 'nested/dir/another.debug.log' # still ignored by '*.log'\n```\n</details>\n\n<details>\n<summary>Test Failure: gitignorefile[negation-#7-block34]</summary>\n\n```\nXFAIL tests/filematcher_corpus_test.py::test_corpus_extlib_gitignorefile[negation-#7-block34] - reason:\n<.gitignore>\n/build\n!/build/allow.log\n!/dist/allow.log\n/dist\n*.tmp\n!/global.tmp\n*.log\n!*.critical.log\n</.gitignore>\n\n\n== Failures: 9 (negation-#7) ==\n\n1. T->F 'build' is a directory matching /build => ignored\n Rule: ext-lib: gitignorefile\n2. T->F 'build/allow.log' unignored, but was first ignored by dir, so still matches\n Rule: ext-lib: gitignorefile\n3. T->F 'build/subdir/file.txt' inside build => ignored\n Rule: ext-lib: gitignorefile\n4. T->F 'dist'\n Rule: ext-lib: gitignorefile\n5. T->F 'dist/allow.log'\n Rule: ext-lib: gitignorefile\n6. T->F 'random.tmp' ignored by '*.tmp'\n Rule: ext-lib: gitignorefile\n7. T->F 'some/dir/random.tmp' also ignored by '*.tmp'\n Rule: ext-lib: gitignorefile\n8. T->F 'system.log' ignored by '*.log'\n Rule: ext-lib: gitignorefile\n9. T->F 'nested/dir/another.debug.log' still ignored by '*.log'\n Rule: ext-lib: gitignorefile\n\n```\n</details>\n\n### Benchmarking\n\nCompare the performance of different matcher implementations:\n\n```python\nfrom orgecc.filematcher import get_factory, MatcherImplementation\n\n# Test pure Python implementation\nfactory = get_factory(MatcherImplementation.PURE_PYTHON)\n\n# Test external library implementation\nfactory = get_factory(MatcherImplementation.EXTLIB_GITIGNOREFILE)\n```\n\n## License\n\nThis project is licensed under the Apache 2 License - see the [LICENSE](LICENSE) file for details.\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "CLI and library for efficient file path filtering using gitignore rules",
"version": "0.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/elifarley/file-matcher-python/issues",
"Documentation": "https://github.com/elifarley/file-matcher-python#readme",
"Homepage": "https://github.com/elifarley/file-matcher-python",
"Repository": "https://github.com/elifarley/file-matcher-python.git"
},
"split_keywords": [
"gitignore",
" git",
" file-filter",
" path-matcher",
" wildmatch"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "488778a46c9e90ed5bd49007634765b910de7af5f51fac671955ec4c18f03a8e",
"md5": "609062f57344e59d991ec564080868bf",
"sha256": "40db3e4e966a80b2347ea4f7a6457e7f266186a867dd3cef860e34c0777abde4"
},
"downloads": -1,
"filename": "orgecc_file_matcher-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "609062f57344e59d991ec564080868bf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 28159,
"upload_time": "2025-01-14T22:35:52",
"upload_time_iso_8601": "2025-01-14T22:35:52.458459Z",
"url": "https://files.pythonhosted.org/packages/48/87/78a46c9e90ed5bd49007634765b910de7af5f51fac671955ec4c18f03a8e/orgecc_file_matcher-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4b84d98910750b067c921ea3af6b6a3f3e1a574c88e874400ddc0c768f00ac2c",
"md5": "3ae48add7c9120559c419f99233a82d3",
"sha256": "a468c750f4cb0dd51d5bc946adba01df1b9dc921d48f0422d07820550d18b751"
},
"downloads": -1,
"filename": "orgecc_file_matcher-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "3ae48add7c9120559c419f99233a82d3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 37024,
"upload_time": "2025-01-14T22:35:53",
"upload_time_iso_8601": "2025-01-14T22:35:53.818278Z",
"url": "https://files.pythonhosted.org/packages/4b/84/d98910750b067c921ea3af6b6a3f3e1a574c88e874400ddc0c768f00ac2c/orgecc_file_matcher-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-14 22:35:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "elifarley",
"github_project": "file-matcher-python",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "orgecc-file-matcher"
}