| Name | py-walk JSON |
| Version |
0.3.3
JSON |
| download |
| home_page | None |
| Summary | Filter filesystem paths based on gitignore-like patterns |
| upload_time | 2024-10-26 14:30:39 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.7 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
<p align="center">
<img src="https://raw.githubusercontent.com/pacha/py-walk/main/docs/logo-header.png" alt="logo">
</p>
py-walk
=======




_Python library to filter filesystem paths based on gitignore-like patterns._
Example:
```python
from py_walk import walk
from py_walk import get_parser_from_text
patterns = """
**/data/*.bin
!**/data/foo.bin
# python files
__pycache__/
*.py[cod]
"""
# you can get the filtered paths from a directory
for path in walk("some/directory", ignore=patterns):
do_something(path)
# ...or check paths against the patterns manually
parser = get_parser_from_text(patterns, base_dir="some/directory")
if parser.match("file.txt"):
do_something_else()
```
**py-walk** can be useful for applications or tools that work with paths and aim to
offer a `.gitignore` type file to their users. It's also handy for users working
in interactive sessions who need to quickly retrieve sets of paths that must
meet relatively complex constraints.
> py-walk tries to achieve 100% compatibility with Git's gitignore (wildmatch)
> pattern syntax. Currently, it includes more than 500 tests, which incorporate
> all the original tests from the Git codebase. These tests are executed against
> `git check-ignore` to ensure as much compatibility as possible. If you find
> any divergence, please don't hesitate to open an issue or PR.
## Installation
To install py-walk, simply use `pip`:
```shell
$ pip install py-walk
```
## Usage
With py-walk, you have the ability to input paths into the library to determine
whether they match with a set of gitignore-based patterns. Alternatively, you
can directly traverse the contents of a directory, based on a set of conditions
that the paths must meet.
### walk
To walk through all the contents of a directory, don't provide any constraints:
```python
from py-walk import walk
for path in walk("/some/directory/"):
print(path)
```
`walk` accepts the directory to traverse as a strings or as a `Path` object from
`pathlib`. It returns `Path` objects.
> `walk` returns a generator, if you prefer to get the results as a list or
> tuple, wrap the call with the desired data type constructor
> (eg. `list(walk("some-dir"))`).
To ignore certain paths, you can pass patterns as a text or a list of patterns:
```python
ignore = """
# these patterns use gitignore syntax
foo.txt
/bar/**/*.dat
"""
for path in walk("/some/directory", ignore=ignore):
...
```
or
```python
ignore = ["foo.txt", "/bar/**/*.dat"]
for path in walk("/some/directory", ignore=ignore):
...
```
To only retrieve paths that match a set of patterns, use the `match` parameter
(again, passing a text blob or a list of patterns):
```python
for path in walk("/some/directory", ignore=["data/"], match=["*.css", "*.js"]):
...
```
> Note that the `ignore` parameter has precedence: once a path is ignored it
> can't be reincluded using the `match` parameter due to performance reasons.
> That includes children of ignored directories. For example, if you ignore
> a directory `/foo/`, `/foo/bar/file.txt` will be ignored even if `match`
> includes the `*.txt` pattern.
In addition, you can retrieve either only files or only directories using the
`mode` parameter:
```python
for path in walk("/some/directory", ignore=["static/"], mode="only-files"):
...
```
```python
for path in walk("/some/directory", ignore=["static/"], mode="only-dirs"):
...
```
You can combine `ignore`, `match` and `mode` to get the exact list of files
that you need. However, always remember that `ignore` takes precedence over the
other two.
> Note: you can convert any text containing gitignore-based patterns into a list using
> the `py_walk.pattern_text_to_pattern_list` function:
> ```python
> from py_walk import pattern_text_to_pattern_list
>
> pattern_list = pattern_text_to_pattern_list("""
> # some patterns
> **/foo.txt
> dir[A-Z]/
> """)
### get_parser_from_*
You can also create a parser from a gitignore-type text, a list of patterns or
a file handle to a `.gitignore` type of file. Using the `match` method of the
parser, you can directly evaluate paths.
```python
from py_walk import get_parser_from_file
parser = get_parser_from_file("path/to/gitignore-type-file")
if parser.match("file.txt"):
print("file.txt matches!")
```
```python
from py_walk import get_parser_from_text
patterns = """
# some comment
*.txt
**/bar/*.dat
"""
parser = get_parser_from_text(patterns, base_dir="/some/folder")
if parser.match("file.txt"):
print("file.txt matches!")
```
```python
from py_walk import get_parser_from_list
patterns = [
"*.txt",
"**/bar/*.dat",
]
parser = get_parser_from_list(patterns, base_dir="/some/folder")
if parser.match("file.txt"):
...
```
#### base_dir
The `base_dir` denotes the directory where files are stored. When you use
`get_parser_from_file`, the `base_dir` is determined by the location of the
gitignore-type file passed as a parameter. Specifically, it's set to the parent
directory, which mirrors the functionality of Git and a `.gitignore` file.
When using `get_parser_from_text` or `get_parser_from_list`, you have the
option to either explicitly set the `base_dir` or leave it out. If omitted,
most matches will work just fine, as the provided path will simply be compared
to the patterns in a textual manner. However, there are certain instances where
the package will need to access the actual file system to resolve a match. For
instance, if you have a pattern like `foo/bar/` and the provided path is
`foo/bar`, a match will only occur if `bar` is a directory. If `base_dir` is
defined, the package will verify the existence of `bar` and confirm if it is
indeed a directory, returning `True` in that case. If `bar` is not a directory
or `base_dir` is not defined, the result will be `False`. Therefore, while it's
entirely possible to match patterns without a `base_dir`, be mindful of the
potential differences in results. This behavior is directly copied from Git to
maintain as much compatibility with it as possible.
## License
py-walk is available under the MIT license.
Raw data
{
"_id": null,
"home_page": null,
"name": "py-walk",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Andr\u00e9s Sope\u00f1a P\u00e9rez <code@ehmm.org>",
"download_url": "https://files.pythonhosted.org/packages/b3/b5/e2f3fab1e11d4089b1c3dfd72175fdb2408ff8028e01bdb0d308923609bb/py_walk-0.3.3.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <img src=\"https://raw.githubusercontent.com/pacha/py-walk/main/docs/logo-header.png\" alt=\"logo\">\n</p>\n\npy-walk\n=======\n\n\n\n\n\n\n_Python library to filter filesystem paths based on gitignore-like patterns._\n\nExample:\n```python\nfrom py_walk import walk\nfrom py_walk import get_parser_from_text\n\npatterns = \"\"\"\n **/data/*.bin\n !**/data/foo.bin\n\n # python files\n __pycache__/\n *.py[cod]\n\"\"\"\n\n# you can get the filtered paths from a directory\nfor path in walk(\"some/directory\", ignore=patterns):\n do_something(path)\n\n# ...or check paths against the patterns manually\nparser = get_parser_from_text(patterns, base_dir=\"some/directory\")\nif parser.match(\"file.txt\"):\n do_something_else()\n```\n\n**py-walk** can be useful for applications or tools that work with paths and aim to\noffer a `.gitignore` type file to their users. It's also handy for users working\nin interactive sessions who need to quickly retrieve sets of paths that must\nmeet relatively complex constraints.\n\n> py-walk tries to achieve 100% compatibility with Git's gitignore (wildmatch)\n> pattern syntax. Currently, it includes more than 500 tests, which incorporate\n> all the original tests from the Git codebase. These tests are executed against\n> `git check-ignore` to ensure as much compatibility as possible. If you find\n> any divergence, please don't hesitate to open an issue or PR.\n\n## Installation\n\nTo install py-walk, simply use `pip`:\n```shell\n$ pip install py-walk\n```\n\n## Usage\n\nWith py-walk, you have the ability to input paths into the library to determine\nwhether they match with a set of gitignore-based patterns. Alternatively, you\ncan directly traverse the contents of a directory, based on a set of conditions\nthat the paths must meet.\n\n### walk\n\nTo walk through all the contents of a directory, don't provide any constraints:\n```python\nfrom py-walk import walk\n\nfor path in walk(\"/some/directory/\"):\n print(path)\n```\n`walk` accepts the directory to traverse as a strings or as a `Path` object from\n`pathlib`. It returns `Path` objects.\n\n> `walk` returns a generator, if you prefer to get the results as a list or\n> tuple, wrap the call with the desired data type constructor\n> (eg. `list(walk(\"some-dir\"))`).\n\nTo ignore certain paths, you can pass patterns as a text or a list of patterns:\n```python\nignore = \"\"\"\n # these patterns use gitignore syntax\n foo.txt\n /bar/**/*.dat\n\"\"\"\n\nfor path in walk(\"/some/directory\", ignore=ignore):\n ...\n```\nor\n```python\nignore = [\"foo.txt\", \"/bar/**/*.dat\"]\nfor path in walk(\"/some/directory\", ignore=ignore):\n ...\n```\n\nTo only retrieve paths that match a set of patterns, use the `match` parameter\n(again, passing a text blob or a list of patterns):\n```python\nfor path in walk(\"/some/directory\", ignore=[\"data/\"], match=[\"*.css\", \"*.js\"]):\n ...\n```\n> Note that the `ignore` parameter has precedence: once a path is ignored it\n> can't be reincluded using the `match` parameter due to performance reasons.\n> That includes children of ignored directories. For example, if you ignore\n> a directory `/foo/`, `/foo/bar/file.txt` will be ignored even if `match`\n> includes the `*.txt` pattern.\n\nIn addition, you can retrieve either only files or only directories using the\n`mode` parameter:\n```python\nfor path in walk(\"/some/directory\", ignore=[\"static/\"], mode=\"only-files\"):\n ...\n```\n```python\nfor path in walk(\"/some/directory\", ignore=[\"static/\"], mode=\"only-dirs\"):\n ...\n```\n\nYou can combine `ignore`, `match` and `mode` to get the exact list of files\nthat you need. However, always remember that `ignore` takes precedence over the\nother two.\n\n> Note: you can convert any text containing gitignore-based patterns into a list using\n> the `py_walk.pattern_text_to_pattern_list` function:\n> ```python\n> from py_walk import pattern_text_to_pattern_list\n>\n> pattern_list = pattern_text_to_pattern_list(\"\"\"\n> # some patterns\n> **/foo.txt\n> dir[A-Z]/\n> \"\"\")\n\n### get_parser_from_*\n\nYou can also create a parser from a gitignore-type text, a list of patterns or\na file handle to a `.gitignore` type of file. Using the `match` method of the\nparser, you can directly evaluate paths.\n\n```python\nfrom py_walk import get_parser_from_file\n\nparser = get_parser_from_file(\"path/to/gitignore-type-file\")\nif parser.match(\"file.txt\"):\n print(\"file.txt matches!\")\n```\n\n```python\nfrom py_walk import get_parser_from_text\n\npatterns = \"\"\"\n# some comment\n*.txt\n**/bar/*.dat\n\"\"\"\n\nparser = get_parser_from_text(patterns, base_dir=\"/some/folder\")\nif parser.match(\"file.txt\"):\n print(\"file.txt matches!\")\n```\n\n```python\nfrom py_walk import get_parser_from_list\n\npatterns = [\n \"*.txt\",\n \"**/bar/*.dat\",\n]\n\nparser = get_parser_from_list(patterns, base_dir=\"/some/folder\")\nif parser.match(\"file.txt\"):\n ...\n```\n\n#### base_dir\n\nThe `base_dir` denotes the directory where files are stored. When you use\n`get_parser_from_file`, the `base_dir` is determined by the location of the\ngitignore-type file passed as a parameter. Specifically, it's set to the parent\ndirectory, which mirrors the functionality of Git and a `.gitignore` file.\n\nWhen using `get_parser_from_text` or `get_parser_from_list`, you have the\noption to either explicitly set the `base_dir` or leave it out. If omitted,\nmost matches will work just fine, as the provided path will simply be compared\nto the patterns in a textual manner. However, there are certain instances where\nthe package will need to access the actual file system to resolve a match. For\ninstance, if you have a pattern like `foo/bar/` and the provided path is\n`foo/bar`, a match will only occur if `bar` is a directory. If `base_dir` is\ndefined, the package will verify the existence of `bar` and confirm if it is\nindeed a directory, returning `True` in that case. If `bar` is not a directory\nor `base_dir` is not defined, the result will be `False`. Therefore, while it's\nentirely possible to match patterns without a `base_dir`, be mindful of the\npotential differences in results. This behavior is directly copied from Git to\nmaintain as much compatibility with it as possible.\n\n## License\n\npy-walk is available under the MIT license.\n",
"bugtrack_url": null,
"license": null,
"summary": "Filter filesystem paths based on gitignore-like patterns",
"version": "0.3.3",
"project_urls": {
"Bug Tracker": "https://github.com/pacha/py-walk/issues",
"Homepage": "https://github.com/pacha/py-walk"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "273856b67abdbf6797475dfe2f62d391b4a6ead851c76acbaf07e118e53651b6",
"md5": "afdda6c1fd6831b865b4e449a1d4d60c",
"sha256": "238fc018165138021ce0bfd9c351cdc473d3120ccc5534df35611b92608c94d5"
},
"downloads": -1,
"filename": "py_walk-0.3.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "afdda6c1fd6831b865b4e449a1d4d60c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 14537,
"upload_time": "2024-10-26T14:30:38",
"upload_time_iso_8601": "2024-10-26T14:30:38.060211Z",
"url": "https://files.pythonhosted.org/packages/27/38/56b67abdbf6797475dfe2f62d391b4a6ead851c76acbaf07e118e53651b6/py_walk-0.3.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b3b5e2f3fab1e11d4089b1c3dfd72175fdb2408ff8028e01bdb0d308923609bb",
"md5": "a743ac333464f86eda47be9633b4e915",
"sha256": "a1b28d6079f27203fa3098b69a98572675b3ff5bd02286c43e6dacd66615f879"
},
"downloads": -1,
"filename": "py_walk-0.3.3.tar.gz",
"has_sig": false,
"md5_digest": "a743ac333464f86eda47be9633b4e915",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 1815727,
"upload_time": "2024-10-26T14:30:39",
"upload_time_iso_8601": "2024-10-26T14:30:39.421013Z",
"url": "https://files.pythonhosted.org/packages/b3/b5/e2f3fab1e11d4089b1c3dfd72175fdb2408ff8028e01bdb0d308923609bb/py_walk-0.3.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-26 14:30:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pacha",
"github_project": "py-walk",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "py-walk"
}