py-walk


Namepy-walk JSON
Version 0.3.3 PyPI version JSON
download
home_pageNone
SummaryFilter filesystem paths based on gitignore-like patterns
upload_time2024-10-26 14:30:39
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
    <img src="https://raw.githubusercontent.com/pacha/py-walk/main/docs/logo-header.png" alt="logo">
</p>

py-walk
=======

![Tests](https://github.com/pacha/py-walk/actions/workflows/tests.yaml/badge.svg)
![Type checks](https://github.com/pacha/py-walk/actions/workflows/type-checks.yaml/badge.svg)
![Code formatting](https://github.com/pacha/py-walk/actions/workflows/code-formatting.yaml/badge.svg)
![Supported Python versions](https://img.shields.io/pypi/pyversions/py-walk.svg)

_Python library to filter filesystem paths based on gitignore-like patterns._

Example:
```python
from py_walk import walk
from py_walk import get_parser_from_text

patterns = """
    **/data/*.bin
    !**/data/foo.bin

    # python files
    __pycache__/
    *.py[cod]
"""

# you can get the filtered paths from a directory
for path in walk("some/directory", ignore=patterns):
    do_something(path)

# ...or check paths against the patterns manually
parser = get_parser_from_text(patterns, base_dir="some/directory")
if parser.match("file.txt"):
    do_something_else()
```

**py-walk** can be useful for applications or tools that work with paths and aim to
offer a `.gitignore` type file to their users. It's also handy for users working
in interactive sessions who need to quickly retrieve sets of paths that must
meet relatively complex constraints.

> py-walk tries to achieve 100% compatibility with Git's gitignore (wildmatch)
> pattern syntax. Currently, it includes more than 500 tests, which incorporate
> all the original tests from the Git codebase. These tests are executed against
> `git check-ignore` to ensure as much compatibility as possible. If you find
> any divergence, please don't hesitate to open an issue or PR.

## Installation

To install py-walk, simply use `pip`:
```shell
$ pip install py-walk
```

## Usage

With py-walk, you have the ability to input paths into the library to determine
whether they match with a set of gitignore-based patterns. Alternatively, you
can directly traverse the contents of a directory, based on a set of conditions
that the paths must meet.

### walk

To walk through all the contents of a directory, don't provide any constraints:
```python
from py-walk import walk

for path in walk("/some/directory/"):
    print(path)
```
`walk` accepts the directory to traverse as a strings or as a `Path` object from
`pathlib`. It returns `Path` objects.

> `walk` returns a generator, if you prefer to get the results as a list or
> tuple, wrap the call with the desired data type constructor
> (eg. `list(walk("some-dir"))`).

To ignore certain paths, you can pass patterns as a text or a list of patterns:
```python
ignore = """
    # these patterns use gitignore syntax
    foo.txt
    /bar/**/*.dat
"""

for path in walk("/some/directory", ignore=ignore):
    ...
```
or
```python
ignore = ["foo.txt", "/bar/**/*.dat"]
for path in walk("/some/directory", ignore=ignore):
    ...
```

To only retrieve paths that match a set of patterns, use the `match` parameter
(again, passing a text blob or a list of patterns):
```python
for path in walk("/some/directory", ignore=["data/"], match=["*.css", "*.js"]):
    ...
```
> Note that the `ignore` parameter has precedence: once a path is ignored it
> can't be reincluded using the `match` parameter due to performance reasons.
> That includes children of ignored directories. For example, if you ignore
> a directory `/foo/`, `/foo/bar/file.txt` will be ignored even if `match`
> includes the `*.txt` pattern.

In addition, you can retrieve either only files or only directories using the
`mode` parameter:
```python
for path in walk("/some/directory", ignore=["static/"], mode="only-files"):
    ...
```
```python
for path in walk("/some/directory", ignore=["static/"], mode="only-dirs"):
    ...
```

You can combine `ignore`, `match` and `mode` to get the exact list of files
that you need. However, always remember that `ignore` takes precedence over the
other two.

> Note: you can convert any text containing gitignore-based patterns into a list using
> the `py_walk.pattern_text_to_pattern_list` function:
> ```python
> from py_walk import pattern_text_to_pattern_list
>
> pattern_list = pattern_text_to_pattern_list("""
>     # some patterns
>     **/foo.txt
>     dir[A-Z]/
> """)

### get_parser_from_*

You can also create a parser from a gitignore-type text, a list of patterns or
a file handle to a `.gitignore` type of file. Using the `match` method of the
parser, you can directly evaluate paths.

```python
from py_walk import get_parser_from_file

parser = get_parser_from_file("path/to/gitignore-type-file")
if parser.match("file.txt"):
    print("file.txt matches!")
```

```python
from py_walk import get_parser_from_text

patterns = """
# some comment
*.txt
**/bar/*.dat
"""

parser = get_parser_from_text(patterns, base_dir="/some/folder")
if parser.match("file.txt"):
    print("file.txt matches!")
```

```python
from py_walk import get_parser_from_list

patterns = [
    "*.txt",
    "**/bar/*.dat",
]

parser = get_parser_from_list(patterns, base_dir="/some/folder")
if parser.match("file.txt"):
    ...
```

#### base_dir

The `base_dir` denotes the directory where files are stored. When you use
`get_parser_from_file`, the `base_dir` is determined by the location of the
gitignore-type file passed as a parameter. Specifically, it's set to the parent
directory, which mirrors the functionality of Git and a `.gitignore` file.

When using `get_parser_from_text` or `get_parser_from_list`, you have the
option to either explicitly set the `base_dir` or leave it out. If omitted,
most matches will work just fine, as the provided path will simply be compared
to the patterns in a textual manner. However, there are certain instances where
the package will need to access the actual file system to resolve a match. For
instance, if you have a pattern like `foo/bar/` and the provided path is
`foo/bar`, a match will only occur if `bar` is a directory. If `base_dir` is
defined, the package will verify the existence of `bar` and confirm if it is
indeed a directory, returning `True` in that case. If `bar` is not a directory
or `base_dir` is not defined, the result will be `False`. Therefore, while it's
entirely possible to match patterns without a `base_dir`, be mindful of the
potential differences in results. This behavior is directly copied from Git to
maintain as much compatibility with it as possible.

## License

py-walk is available under the MIT license.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "py-walk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": "Andr\u00e9s Sope\u00f1a P\u00e9rez <code@ehmm.org>",
    "download_url": "https://files.pythonhosted.org/packages/b3/b5/e2f3fab1e11d4089b1c3dfd72175fdb2408ff8028e01bdb0d308923609bb/py_walk-0.3.3.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n    <img src=\"https://raw.githubusercontent.com/pacha/py-walk/main/docs/logo-header.png\" alt=\"logo\">\n</p>\n\npy-walk\n=======\n\n![Tests](https://github.com/pacha/py-walk/actions/workflows/tests.yaml/badge.svg)\n![Type checks](https://github.com/pacha/py-walk/actions/workflows/type-checks.yaml/badge.svg)\n![Code formatting](https://github.com/pacha/py-walk/actions/workflows/code-formatting.yaml/badge.svg)\n![Supported Python versions](https://img.shields.io/pypi/pyversions/py-walk.svg)\n\n_Python library to filter filesystem paths based on gitignore-like patterns._\n\nExample:\n```python\nfrom py_walk import walk\nfrom py_walk import get_parser_from_text\n\npatterns = \"\"\"\n    **/data/*.bin\n    !**/data/foo.bin\n\n    # python files\n    __pycache__/\n    *.py[cod]\n\"\"\"\n\n# you can get the filtered paths from a directory\nfor path in walk(\"some/directory\", ignore=patterns):\n    do_something(path)\n\n# ...or check paths against the patterns manually\nparser = get_parser_from_text(patterns, base_dir=\"some/directory\")\nif parser.match(\"file.txt\"):\n    do_something_else()\n```\n\n**py-walk** can be useful for applications or tools that work with paths and aim to\noffer a `.gitignore` type file to their users. It's also handy for users working\nin interactive sessions who need to quickly retrieve sets of paths that must\nmeet relatively complex constraints.\n\n> py-walk tries to achieve 100% compatibility with Git's gitignore (wildmatch)\n> pattern syntax. Currently, it includes more than 500 tests, which incorporate\n> all the original tests from the Git codebase. These tests are executed against\n> `git check-ignore` to ensure as much compatibility as possible. If you find\n> any divergence, please don't hesitate to open an issue or PR.\n\n## Installation\n\nTo install py-walk, simply use `pip`:\n```shell\n$ pip install py-walk\n```\n\n## Usage\n\nWith py-walk, you have the ability to input paths into the library to determine\nwhether they match with a set of gitignore-based patterns. Alternatively, you\ncan directly traverse the contents of a directory, based on a set of conditions\nthat the paths must meet.\n\n### walk\n\nTo walk through all the contents of a directory, don't provide any constraints:\n```python\nfrom py-walk import walk\n\nfor path in walk(\"/some/directory/\"):\n    print(path)\n```\n`walk` accepts the directory to traverse as a strings or as a `Path` object from\n`pathlib`. It returns `Path` objects.\n\n> `walk` returns a generator, if you prefer to get the results as a list or\n> tuple, wrap the call with the desired data type constructor\n> (eg. `list(walk(\"some-dir\"))`).\n\nTo ignore certain paths, you can pass patterns as a text or a list of patterns:\n```python\nignore = \"\"\"\n    # these patterns use gitignore syntax\n    foo.txt\n    /bar/**/*.dat\n\"\"\"\n\nfor path in walk(\"/some/directory\", ignore=ignore):\n    ...\n```\nor\n```python\nignore = [\"foo.txt\", \"/bar/**/*.dat\"]\nfor path in walk(\"/some/directory\", ignore=ignore):\n    ...\n```\n\nTo only retrieve paths that match a set of patterns, use the `match` parameter\n(again, passing a text blob or a list of patterns):\n```python\nfor path in walk(\"/some/directory\", ignore=[\"data/\"], match=[\"*.css\", \"*.js\"]):\n    ...\n```\n> Note that the `ignore` parameter has precedence: once a path is ignored it\n> can't be reincluded using the `match` parameter due to performance reasons.\n> That includes children of ignored directories. For example, if you ignore\n> a directory `/foo/`, `/foo/bar/file.txt` will be ignored even if `match`\n> includes the `*.txt` pattern.\n\nIn addition, you can retrieve either only files or only directories using the\n`mode` parameter:\n```python\nfor path in walk(\"/some/directory\", ignore=[\"static/\"], mode=\"only-files\"):\n    ...\n```\n```python\nfor path in walk(\"/some/directory\", ignore=[\"static/\"], mode=\"only-dirs\"):\n    ...\n```\n\nYou can combine `ignore`, `match` and `mode` to get the exact list of files\nthat you need. However, always remember that `ignore` takes precedence over the\nother two.\n\n> Note: you can convert any text containing gitignore-based patterns into a list using\n> the `py_walk.pattern_text_to_pattern_list` function:\n> ```python\n> from py_walk import pattern_text_to_pattern_list\n>\n> pattern_list = pattern_text_to_pattern_list(\"\"\"\n>     # some patterns\n>     **/foo.txt\n>     dir[A-Z]/\n> \"\"\")\n\n### get_parser_from_*\n\nYou can also create a parser from a gitignore-type text, a list of patterns or\na file handle to a `.gitignore` type of file. Using the `match` method of the\nparser, you can directly evaluate paths.\n\n```python\nfrom py_walk import get_parser_from_file\n\nparser = get_parser_from_file(\"path/to/gitignore-type-file\")\nif parser.match(\"file.txt\"):\n    print(\"file.txt matches!\")\n```\n\n```python\nfrom py_walk import get_parser_from_text\n\npatterns = \"\"\"\n# some comment\n*.txt\n**/bar/*.dat\n\"\"\"\n\nparser = get_parser_from_text(patterns, base_dir=\"/some/folder\")\nif parser.match(\"file.txt\"):\n    print(\"file.txt matches!\")\n```\n\n```python\nfrom py_walk import get_parser_from_list\n\npatterns = [\n    \"*.txt\",\n    \"**/bar/*.dat\",\n]\n\nparser = get_parser_from_list(patterns, base_dir=\"/some/folder\")\nif parser.match(\"file.txt\"):\n    ...\n```\n\n#### base_dir\n\nThe `base_dir` denotes the directory where files are stored. When you use\n`get_parser_from_file`, the `base_dir` is determined by the location of the\ngitignore-type file passed as a parameter. Specifically, it's set to the parent\ndirectory, which mirrors the functionality of Git and a `.gitignore` file.\n\nWhen using `get_parser_from_text` or `get_parser_from_list`, you have the\noption to either explicitly set the `base_dir` or leave it out. If omitted,\nmost matches will work just fine, as the provided path will simply be compared\nto the patterns in a textual manner. However, there are certain instances where\nthe package will need to access the actual file system to resolve a match. For\ninstance, if you have a pattern like `foo/bar/` and the provided path is\n`foo/bar`, a match will only occur if `bar` is a directory. If `base_dir` is\ndefined, the package will verify the existence of `bar` and confirm if it is\nindeed a directory, returning `True` in that case. If `bar` is not a directory\nor `base_dir` is not defined, the result will be `False`. Therefore, while it's\nentirely possible to match patterns without a `base_dir`, be mindful of the\npotential differences in results. This behavior is directly copied from Git to\nmaintain as much compatibility with it as possible.\n\n## License\n\npy-walk is available under the MIT license.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Filter filesystem paths based on gitignore-like patterns",
    "version": "0.3.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/pacha/py-walk/issues",
        "Homepage": "https://github.com/pacha/py-walk"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "273856b67abdbf6797475dfe2f62d391b4a6ead851c76acbaf07e118e53651b6",
                "md5": "afdda6c1fd6831b865b4e449a1d4d60c",
                "sha256": "238fc018165138021ce0bfd9c351cdc473d3120ccc5534df35611b92608c94d5"
            },
            "downloads": -1,
            "filename": "py_walk-0.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "afdda6c1fd6831b865b4e449a1d4d60c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 14537,
            "upload_time": "2024-10-26T14:30:38",
            "upload_time_iso_8601": "2024-10-26T14:30:38.060211Z",
            "url": "https://files.pythonhosted.org/packages/27/38/56b67abdbf6797475dfe2f62d391b4a6ead851c76acbaf07e118e53651b6/py_walk-0.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b3b5e2f3fab1e11d4089b1c3dfd72175fdb2408ff8028e01bdb0d308923609bb",
                "md5": "a743ac333464f86eda47be9633b4e915",
                "sha256": "a1b28d6079f27203fa3098b69a98572675b3ff5bd02286c43e6dacd66615f879"
            },
            "downloads": -1,
            "filename": "py_walk-0.3.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a743ac333464f86eda47be9633b4e915",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 1815727,
            "upload_time": "2024-10-26T14:30:39",
            "upload_time_iso_8601": "2024-10-26T14:30:39.421013Z",
            "url": "https://files.pythonhosted.org/packages/b3/b5/e2f3fab1e11d4089b1c3dfd72175fdb2408ff8028e01bdb0d308923609bb/py_walk-0.3.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-26 14:30:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pacha",
    "github_project": "py-walk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "py-walk"
}
        
Elapsed time: 0.66961s