orgecc-file-matcher


Nameorgecc-file-matcher JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryCLI and library for efficient file path filtering using gitignore rules
upload_time2025-01-14 22:35:53
maintainerNone
docs_urlNone
authorNone
requires_python>=3.12
licenseApache-2.0
keywords gitignore git file-filter path-matcher wildmatch
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Orgecc File Matcher

A Python library and CLI tool for Git-compatible file matching and directory traversal.

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python Versions](https://img.shields.io/pypi/pyversions/orgecc-file-matcher.svg)](https://pypi.org/project/orgecc-file-matcher/)
[![CI](https://github.com/yourusername/file-matcher-python/actions/workflows/test.yml/badge.svg)](https://github.com/yourusername/file-matcher-python/actions/workflows/test.yml)
[![PyPI](https://img.shields.io/pypi/v/orgecc-file-matcher)](https://pypi.org/project/orgecc-file-matcher/)
[![PyPI version](https://badge.fury.io/py/orgecc-file-matcher.svg)](https://pypi.org/project/orgecc-file-matcher/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A versatile command-line, Python library and toolkit for `.gitignore`-style file matching, designed to meet four key goals:

1. **Pure Python Matcher**: Provide a pure Python implementation that precisely matches Git's behavior.
2. **File Walker**: Traverse directories while respecting `.gitignore` rules at all levels.
3. **Unit Testing**: Verify the correctness of any `.gitignore` matching library or command.
4. **Benchmarking**: Compare the performance of different `.gitignore` matching implementations.

## Features

- **Git-Compatible Matching**: Pure Python implementation passes all test cases, ensuring 100% compatibility with Git's behavior.
- **Multiple Implementations**: Choose from pure Python, external libraries ([gitignorefile](https://github.com/excitoon/gitignorefile), [pathspec](https://github.com/excitoon/gitignorefile)), or native Git integration.
- **Multiple Implementations** (see available options in [MatcherImplementation](src/orgecc/filematcher/__init__.py)):
  - **Pure Python**: No external dependencies. Aims at 100% Git compatibility.
  - **Native Git Integration**: Internally calls `git check-ignore -v`. The unit tests are adjusted according to this implementation.
  - **External Libraries**: Supports [gitignorefile](https://github.com/excitoon/gitignorefile) and [pathspec](https://github.com/cpburnz/python-path-specification).
- **[Comprehensive Test Suite](#unit-testing)**: Includes a [test corpus](tests/corpus) for validating `.gitignore` matching behavior.
- **Tree-Sitter-Inspired Testing**: The corpus files follow the same rigorous testing principles used by [Tree-Sitter](https://tree-sitter.github.io/tree-sitter/), ensuring high-quality and reliable test coverage.
- **Efficient Directory Traversal**: A file walker that skips ignored files and directories.
- **Cross-Platform**: Works seamlessly on Windows, macOS, and Linux.

## Installation

Install via **pip**:

```bash
pip install orgecc-filematcher
```

## Usage

### Pure Python Matcher

Use the Git-compatible pure Python matcher (the default):

```python
from orgecc.filematcher import get_factory, MatcherImplementation
from orgecc.filematcher.patterns import new_deny_pattern_source

factory = get_factory(MatcherImplementation.PURE_PYTHON)
patterns = new_deny_pattern_source(["*.pyc", "build/"])
matcher = factory.pattern2matcher(patterns)
result = matcher.match("path/to/file.pyc")
print(result.matches)  # True or False, matching Git's behavior
```

### File Walker

Traverse directories while respecting `.gitignore` rules:

#### CLI Tool for _macOS_, _Linux_ and _Windows_

Use the provided CLI tool to traverse directories while respecting .gitignore rules:

```shell
file-walker --help
```
```
Usage: file-walker [OPTIONS] PATH

  List files and directories while respecting gitignore patterns.

Options:
  -t, --type [all|f|d]            Type of entries to show
  -f, --format [absolute|relative|name]
                                  Output format for paths
  -X, --exclude-from FILE         Base gitignore file to apply before others
  -x, --exclude TEXT              Base patterns to ignore (applied before
                                  others)
  -0, --null                      Use null character as separator (useful for
                                  xargs)
  --suppress-errors               Suppress error messages
  -q, --quiet                     Don't show summary, be quiet
  --help                          Show this message and exit.

```

#### Python Class: _DirectoryWalker_

```python
from orgecc.filematcher.walker import DirectoryWalker

walker = DirectoryWalker()
for file in walker.walk("path/to/directory"):
    print(file)
print(walker.yielded_count)
print(walker.ignored_count)
```

### Unit Testing

Use the included [test corpus](tests/corpus) to validate your `.gitignore` matching implementation.

You can see an example of failure below for the [negation.txt](tests/corpus/negation.txt) test file:

<details>
<summary>Test file: negation.txt [block #7]</summary>


```
<.gitignore>
# ======================
# Advanced Negation & Anchored Patterns
# Demonstrates anchored patterns, directories, and multiple negation layers.
# We test directory handling, anchored patterns, and negation layering:
# ======================

# ignore top-level "build" directory
/build
# unignore a specific file inside that directory
!/build/allow.log

!/dist/allow.log
/dist

# ignore all .tmp files
*.tmp
# unignore a specific top-level file
!/global.tmp

# ignore all .log
*.log
# unignore only *.critical.log
!*.critical.log
</.gitignore>
T: 'build' # is a directory matching /build => ignored
T: 'build/allow.log' unignored, but was first ignored by dir, so still matches
T: 'build/subdir/file.txt' # inside build => ignored
T: 'dist'
T: 'dist/allow.log'
F: 'global.tmp' # unignored by !/global.tmp
T: 'random.tmp' # ignored by '*.tmp'
T: 'some/dir/random.tmp' # also ignored by '*.tmp'
T: 'system.log' # ignored by '*.log'
F: 'kernel.critical.log' # unignored by !*.critical.log
F: 'really.critical.log' # unignored by !*.critical.log
F: 'nested/dir/another.critical.log' # unignored by !*.critical.log
T: 'nested/dir/another.debug.log' # still ignored by '*.log'
```
</details>

<details>
<summary>Test Failure: gitignorefile[negation-#7-block34]</summary>

```
XFAIL tests/filematcher_corpus_test.py::test_corpus_extlib_gitignorefile[negation-#7-block34] - reason:
<.gitignore>
/build
!/build/allow.log
!/dist/allow.log
/dist
*.tmp
!/global.tmp
*.log
!*.critical.log
</.gitignore>


== Failures: 9 (negation-#7) ==

1. T->F 'build' is a directory matching /build => ignored
  Rule: ext-lib: gitignorefile
2. T->F 'build/allow.log' unignored, but was first ignored by dir, so still matches
  Rule: ext-lib: gitignorefile
3. T->F 'build/subdir/file.txt' inside build => ignored
  Rule: ext-lib: gitignorefile
4. T->F 'dist'
  Rule: ext-lib: gitignorefile
5. T->F 'dist/allow.log'
  Rule: ext-lib: gitignorefile
6. T->F 'random.tmp' ignored by '*.tmp'
  Rule: ext-lib: gitignorefile
7. T->F 'some/dir/random.tmp' also ignored by '*.tmp'
  Rule: ext-lib: gitignorefile
8. T->F 'system.log' ignored by '*.log'
  Rule: ext-lib: gitignorefile
9. T->F 'nested/dir/another.debug.log' still ignored by '*.log'
  Rule: ext-lib: gitignorefile

```
</details>

### Benchmarking

Compare the performance of different matcher implementations:

```python
from orgecc.filematcher import get_factory, MatcherImplementation

# Test pure Python implementation
factory = get_factory(MatcherImplementation.PURE_PYTHON)

# Test external library implementation
factory = get_factory(MatcherImplementation.EXTLIB_GITIGNOREFILE)
```

## License

This project is licensed under the Apache 2 License - see the [LICENSE](LICENSE) file for details.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "orgecc-file-matcher",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "gitignore, git, file-filter, path-matcher, wildmatch",
    "author": null,
    "author_email": "Elifarley <elifarley@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/4b/84/d98910750b067c921ea3af6b6a3f3e1a574c88e874400ddc0c768f00ac2c/orgecc_file_matcher-0.0.1.tar.gz",
    "platform": null,
    "description": "# Orgecc File Matcher\n\nA Python library and CLI tool for Git-compatible file matching and directory traversal.\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![Python Versions](https://img.shields.io/pypi/pyversions/orgecc-file-matcher.svg)](https://pypi.org/project/orgecc-file-matcher/)\n[![CI](https://github.com/yourusername/file-matcher-python/actions/workflows/test.yml/badge.svg)](https://github.com/yourusername/file-matcher-python/actions/workflows/test.yml)\n[![PyPI](https://img.shields.io/pypi/v/orgecc-file-matcher)](https://pypi.org/project/orgecc-file-matcher/)\n[![PyPI version](https://badge.fury.io/py/orgecc-file-matcher.svg)](https://pypi.org/project/orgecc-file-matcher/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nA versatile command-line, Python library and toolkit for `.gitignore`-style file matching, designed to meet four key goals:\n\n1. **Pure Python Matcher**: Provide a pure Python implementation that precisely matches Git's behavior.\n2. **File Walker**: Traverse directories while respecting `.gitignore` rules at all levels.\n3. **Unit Testing**: Verify the correctness of any `.gitignore` matching library or command.\n4. **Benchmarking**: Compare the performance of different `.gitignore` matching implementations.\n\n## Features\n\n- **Git-Compatible Matching**: Pure Python implementation passes all test cases, ensuring 100% compatibility with Git's behavior.\n- **Multiple Implementations**: Choose from pure Python, external libraries ([gitignorefile](https://github.com/excitoon/gitignorefile), [pathspec](https://github.com/excitoon/gitignorefile)), or native Git integration.\n- **Multiple Implementations** (see available options in [MatcherImplementation](src/orgecc/filematcher/__init__.py)):\n  - **Pure Python**: No external dependencies. Aims at 100% Git compatibility.\n  - **Native Git Integration**: Internally calls `git check-ignore -v`. The unit tests are adjusted according to this implementation.\n  - **External Libraries**: Supports [gitignorefile](https://github.com/excitoon/gitignorefile) and [pathspec](https://github.com/cpburnz/python-path-specification).\n- **[Comprehensive Test Suite](#unit-testing)**: Includes a [test corpus](tests/corpus) for validating `.gitignore` matching behavior.\n- **Tree-Sitter-Inspired Testing**: The corpus files follow the same rigorous testing principles used by [Tree-Sitter](https://tree-sitter.github.io/tree-sitter/), ensuring high-quality and reliable test coverage.\n- **Efficient Directory Traversal**: A file walker that skips ignored files and directories.\n- **Cross-Platform**: Works seamlessly on Windows, macOS, and Linux.\n\n## Installation\n\nInstall via **pip**:\n\n```bash\npip install orgecc-filematcher\n```\n\n## Usage\n\n### Pure Python Matcher\n\nUse the Git-compatible pure Python matcher (the default):\n\n```python\nfrom orgecc.filematcher import get_factory, MatcherImplementation\nfrom orgecc.filematcher.patterns import new_deny_pattern_source\n\nfactory = get_factory(MatcherImplementation.PURE_PYTHON)\npatterns = new_deny_pattern_source([\"*.pyc\", \"build/\"])\nmatcher = factory.pattern2matcher(patterns)\nresult = matcher.match(\"path/to/file.pyc\")\nprint(result.matches)  # True or False, matching Git's behavior\n```\n\n### File Walker\n\nTraverse directories while respecting `.gitignore` rules:\n\n#### CLI Tool for _macOS_, _Linux_ and _Windows_\n\nUse the provided CLI tool to traverse directories while respecting .gitignore rules:\n\n```shell\nfile-walker --help\n```\n```\nUsage: file-walker [OPTIONS] PATH\n\n  List files and directories while respecting gitignore patterns.\n\nOptions:\n  -t, --type [all|f|d]            Type of entries to show\n  -f, --format [absolute|relative|name]\n                                  Output format for paths\n  -X, --exclude-from FILE         Base gitignore file to apply before others\n  -x, --exclude TEXT              Base patterns to ignore (applied before\n                                  others)\n  -0, --null                      Use null character as separator (useful for\n                                  xargs)\n  --suppress-errors               Suppress error messages\n  -q, --quiet                     Don't show summary, be quiet\n  --help                          Show this message and exit.\n\n```\n\n#### Python Class: _DirectoryWalker_\n\n```python\nfrom orgecc.filematcher.walker import DirectoryWalker\n\nwalker = DirectoryWalker()\nfor file in walker.walk(\"path/to/directory\"):\n    print(file)\nprint(walker.yielded_count)\nprint(walker.ignored_count)\n```\n\n### Unit Testing\n\nUse the included [test corpus](tests/corpus) to validate your `.gitignore` matching implementation.\n\nYou can see an example of failure below for the [negation.txt](tests/corpus/negation.txt) test file:\n\n<details>\n<summary>Test file: negation.txt [block #7]</summary>\n\n\n```\n<.gitignore>\n# ======================\n# Advanced Negation & Anchored Patterns\n# Demonstrates anchored patterns, directories, and multiple negation layers.\n# We test directory handling, anchored patterns, and negation layering:\n# ======================\n\n# ignore top-level \"build\" directory\n/build\n# unignore a specific file inside that directory\n!/build/allow.log\n\n!/dist/allow.log\n/dist\n\n# ignore all .tmp files\n*.tmp\n# unignore a specific top-level file\n!/global.tmp\n\n# ignore all .log\n*.log\n# unignore only *.critical.log\n!*.critical.log\n</.gitignore>\nT: 'build' # is a directory matching /build => ignored\nT: 'build/allow.log' unignored, but was first ignored by dir, so still matches\nT: 'build/subdir/file.txt' # inside build => ignored\nT: 'dist'\nT: 'dist/allow.log'\nF: 'global.tmp' # unignored by !/global.tmp\nT: 'random.tmp' # ignored by '*.tmp'\nT: 'some/dir/random.tmp' # also ignored by '*.tmp'\nT: 'system.log' # ignored by '*.log'\nF: 'kernel.critical.log' # unignored by !*.critical.log\nF: 'really.critical.log' # unignored by !*.critical.log\nF: 'nested/dir/another.critical.log' # unignored by !*.critical.log\nT: 'nested/dir/another.debug.log' # still ignored by '*.log'\n```\n</details>\n\n<details>\n<summary>Test Failure: gitignorefile[negation-#7-block34]</summary>\n\n```\nXFAIL tests/filematcher_corpus_test.py::test_corpus_extlib_gitignorefile[negation-#7-block34] - reason:\n<.gitignore>\n/build\n!/build/allow.log\n!/dist/allow.log\n/dist\n*.tmp\n!/global.tmp\n*.log\n!*.critical.log\n</.gitignore>\n\n\n== Failures: 9 (negation-#7) ==\n\n1. T->F 'build' is a directory matching /build => ignored\n  Rule: ext-lib: gitignorefile\n2. T->F 'build/allow.log' unignored, but was first ignored by dir, so still matches\n  Rule: ext-lib: gitignorefile\n3. T->F 'build/subdir/file.txt' inside build => ignored\n  Rule: ext-lib: gitignorefile\n4. T->F 'dist'\n  Rule: ext-lib: gitignorefile\n5. T->F 'dist/allow.log'\n  Rule: ext-lib: gitignorefile\n6. T->F 'random.tmp' ignored by '*.tmp'\n  Rule: ext-lib: gitignorefile\n7. T->F 'some/dir/random.tmp' also ignored by '*.tmp'\n  Rule: ext-lib: gitignorefile\n8. T->F 'system.log' ignored by '*.log'\n  Rule: ext-lib: gitignorefile\n9. T->F 'nested/dir/another.debug.log' still ignored by '*.log'\n  Rule: ext-lib: gitignorefile\n\n```\n</details>\n\n### Benchmarking\n\nCompare the performance of different matcher implementations:\n\n```python\nfrom orgecc.filematcher import get_factory, MatcherImplementation\n\n# Test pure Python implementation\nfactory = get_factory(MatcherImplementation.PURE_PYTHON)\n\n# Test external library implementation\nfactory = get_factory(MatcherImplementation.EXTLIB_GITIGNOREFILE)\n```\n\n## License\n\nThis project is licensed under the Apache 2 License - see the [LICENSE](LICENSE) file for details.\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "CLI and library for efficient file path filtering using gitignore rules",
    "version": "0.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/elifarley/file-matcher-python/issues",
        "Documentation": "https://github.com/elifarley/file-matcher-python#readme",
        "Homepage": "https://github.com/elifarley/file-matcher-python",
        "Repository": "https://github.com/elifarley/file-matcher-python.git"
    },
    "split_keywords": [
        "gitignore",
        " git",
        " file-filter",
        " path-matcher",
        " wildmatch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "488778a46c9e90ed5bd49007634765b910de7af5f51fac671955ec4c18f03a8e",
                "md5": "609062f57344e59d991ec564080868bf",
                "sha256": "40db3e4e966a80b2347ea4f7a6457e7f266186a867dd3cef860e34c0777abde4"
            },
            "downloads": -1,
            "filename": "orgecc_file_matcher-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "609062f57344e59d991ec564080868bf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 28159,
            "upload_time": "2025-01-14T22:35:52",
            "upload_time_iso_8601": "2025-01-14T22:35:52.458459Z",
            "url": "https://files.pythonhosted.org/packages/48/87/78a46c9e90ed5bd49007634765b910de7af5f51fac671955ec4c18f03a8e/orgecc_file_matcher-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4b84d98910750b067c921ea3af6b6a3f3e1a574c88e874400ddc0c768f00ac2c",
                "md5": "3ae48add7c9120559c419f99233a82d3",
                "sha256": "a468c750f4cb0dd51d5bc946adba01df1b9dc921d48f0422d07820550d18b751"
            },
            "downloads": -1,
            "filename": "orgecc_file_matcher-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "3ae48add7c9120559c419f99233a82d3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 37024,
            "upload_time": "2025-01-14T22:35:53",
            "upload_time_iso_8601": "2025-01-14T22:35:53.818278Z",
            "url": "https://files.pythonhosted.org/packages/4b/84/d98910750b067c921ea3af6b6a3f3e1a574c88e874400ddc0c768f00ac2c/orgecc_file_matcher-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-14 22:35:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "elifarley",
    "github_project": "file-matcher-python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "orgecc-file-matcher"
}
        
Elapsed time: 0.45826s