| Name | pathql JSON |
| Version |
0.0.8
JSON |
| download |
| home_page | None |
| Summary | PathQL: A Python project for declarative file system querying. |
| upload_time | 2025-10-29 20:19:42 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.10 |
| license | MIT |
| keywords |
pathlib
query
filesystem
|
| VCS |
|
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
  
## PathQL: Declarative Filesystem Query Language for Python
PathQL is a declarative, composable, and efficient query language for filesystem operations in Python. It enables expressive, readable, and powerful queries over files and directories, inspired by `pathlib` and modern query languages. PathQL is designed for performance (stat caching), extensibility, and testability, with robust operator overloading and a familiar, Pythonic API requiring no (or VERY few) interactions and conversions with `datetimes`, `timestaps` and vaguaries of the `os.stat_result` object.
## Features
- **Declarative Query Syntax**: Compose filters using Python operators (`&`, `|`, `~`, etc.)
- **Pathlib-like Naming**: Filters and queries mimic `pathlib` conventions (e.g., `Stem`, `Suffix`, `FileType`)
- **Stat Caching**: Each file is stat-ed at-most, once, even if there are many checks against "stat-able" parts of a file.
- **Threaded Filesystem Search**: Query engine uses a producer-consumer model with a dedicated thread for filesystem crawling and a main thread for filtering, improving responsiveness and throughput for large directory trees.
- **Extensible Filters**: Easily add new filters for custom logic
- **Comprehensive Testing**: Robust, parameterized pytest suite with coverage
## Basic Concepts
PathQL lets you write queries against the file system in a composable form:
```python
from pathql import AgeYears, Suffix, Query
for f in Query(where_expr=(AgeYears() > 1) & Suffix(".bak"), from_paths="c:/logs").files():
print(f"Files to delete - {f.resolve()}")
```
```python
from pathql import AgeDays, Size, FileType, Suffix, Query
for f in Query(where_expr=(AgeDays() == 0) & (Size() > "10 mb") & Suffix("log"), from_paths="C:/logs", threaded=True).files():
print(f"Files to zip - {f.resolve()}")
```
```python
from pathql import AgeDays, Size, Suffix, Query,ResultField
# Count, largest file size, and oldest file in the result set
query = Query(
where_expr=(AgeDays() == 0) & (Size() > "10 mb") & Suffix("log"),
from_paths="C:/logs",
threaded=True
)
result_set = query.select()
print(f"Number of files to zip: {result_set.count_()}")
print(f"Largest file size: {result_set.max(ResultField.SIZE)} bytes")
print(f"Oldest file: {result_set.min(ResultField.MTIME)}")
```
These examples show how `PathQL` hides the guts of the pathlib module and the os module from you with readable filter expressions. The real power comes with actions that allow you to apply a function to all of the files that match your query, in parallel.
### Composition and Filters
Filters are composable objects that match files based on many of the attributes exposed by the `pathlib` Package. Attributes such as size, age, suffix, stem, or type. Each filter can be combined using Python's over-loadable boolean operators. Those filters can be composed in arbitrarily complex ways using boolean operators and methods that operate on queries.
# Boolean Filter Functions(all/any) and Operators: `&`, `|`, `~`
PathQL allows you to compose complex queries using the `&` (AND), `|` (OR), and `~` (NOT) operators. These operators work analogously to Python's built-in `and`, `or`, and `not`, but are implemented using operator overloading to allow chaining and composition of filter objects.
- `&` (AND): Combines two filters so that both must match for a file to be included.
- `|` (OR): Combines two filters so that either filter matching will include the file.
- `~` (NOT): Inverts the result of a filter, matching files that do **not** satisfy the filter.
- **Any(q1,q2,..)** and **All(q1,q2,...** composable queries allow for short circuited sub query composition.
**Precedence**
The query language implemented by `PathQL` should be thought of as a mathematical system that uses pythons mathematical operators, as such the rules of precedence follow those for '&', `|` and `^`, which have different precedence than the `and`/`or` operators (which cannot be overloaded).
You must say
`(AgeDays() < 2) & (Suffix("txt))`
for precedence to work as you expect, or use the `All`/`Any` methods.
**Short-Circuiting Behavior**
PathQL's filter composition is designed to short-circuit, just like Python's boolean operators:
- For `A & B`, if filter `A` does not match, filter `B` is **not** evaluated.
- For `A | B`, if filter `A` matches, filter `B` is **not** evaluated.
- For chained filters (e.g., `A & B & C`), evaluation stops at the first filter that fails for AND, or the first that succeeds for OR.
- For `All(A,B,C)` or `Any(A,B,C)`, the operation will short circuit once a `All` sees a `False` or `Any` sees a `True`.
Place expensive filters at right most position in expressions so they can be short circuited.
**Example 1**
```python
from pathql.filters.base import Filter, AndFilter, OrFilter
class ExpensiveFilter(Filter):
def match(self, path, now=None, stat_result=None):
# Simulate a slow operation
import time
time.sleep(1)
return True
class TrueFilter(Filter):
def match(self, path, now=None, stat_result=None):
return True
class FalseFilter(Filter):
def match(self, path, now=None, stat_result=None):
return False
# ExpensiveFilter will NOT be called because CheapFilter fails
combined = FalseFilter() & ExpensiveFilter()
result = combined.match(pathlib.Path("somefile.txt")) # Fast, no delay
# ExpensiveFilter will NOT be called because CheapFilter succeeds
combined = TrueFilter() | ExpensiveFilter()
result = combined.match(pathlib.Path("somefile.txt")) # Fast, no delay
```
**Example 2**
```python
# ExpensiveFilter will NOT be called because CheapFilter fails. All Function mimics 'and'
combined = All(FalseFilter(), ExpensiveFilter())
result = combined.match(pathlib.Path("somefile.txt")) # Fast, no delay
# ExpensiveFilter will NOT be called because CheapFilter succeeds. Any function mimics 'or'
combined = Any(TrueFilter(), ExpensiveFilter())
result = combined.match(pathlib.Path("somefile.txt")) # Fast, no delay
```
**Note:**
Filters should be pure functions without side effects, as short-circuiting means some filters may not be executed depending on the logic. While most filters that come with `pathql` run fairly quickly, real world use cases involve opening files and scanning contents for state information. Eliminating these checks can be VERY valuable.
**Summary:**
- Use `&`, `|`, and `~` to build complex queries.
- Follow precedence rules (Size()>"1mb")
- Short-circuiting ensures efficient evaluation and skips unnecessary work.
- Place expensive filters down stream in your logic chains Fast=(Ext,Name,FileDate) slow = (Size,Type,Date,Age)
#### Example Filters
- `Size() <= 1_000_000` — files up to 1MB
- `Suffix({".png", ".jpg"})` — files with .png or .jpg extension
- `Stem("report_*")` — files whose stem matches a glob pattern (e.g., starts with "report_")
- `FileType().file` — 'file, 'dir', 'link'
- `AgeMinutes() < 10` — modified in the last 10 minutes
- `Between(Size(), 1000, 2000)` — files with size >= 1000 and < 2000 bytes (inclusive lower, exclusive upper)
- `YearFIlter(2024)`
- `MonthFIlter("May")`
- `DayFilter(23)`
- `HourFilter(12)`
- `MinuteFilter(59)`
- `SecondFilter(40)`
- `FileDate().modified > dt.datetime(2025,1,1)`
### Using the `Between` Filter
The `Between` filter matches files whose attribute (such as size or age) falls within a specified range: inclusive on the lower bound, exclusive on the upper bound. This filter works with values that can be reduced to comparable values. At this time `Size` and `FileDate` work with between.
**Example:** Find files with size >= 1KB and < 2KB:
```python
from pathql.filters import Between, Size
from pathql.query import Query
for path in Query("/some/dir", Between(Size(), 1000, 2000) & Between(FileDate(),dt.datetime(2024,1,1),dt.datetime(2024,2,1))):
print(path)
else:
print("No files were found")
```
## Query (engine)
The `Query` class is the driver for PathQL. It walks the filesystem, gathers stat metadata, and applies filter expressions to decide which files to yield. It exposes a small API and supports both single-threaded and producer/consumer (threaded) modes for flexible performance characteristics.
### Overview
- Purpose: efficiently find filesystem entries that match a filter expression. - Model: the filesystem walk (producer) is separated from filtering/consumption (consumer) so IO-bound directory traversal can proceed concurrently with CPU bound filter evaluation.
- Key features: stat caching, optional threading, recursion control, and
eager or lazy collection via `select` and `files` respectively.
### Public API
- `Query(filter_expr)` — construct a Query with a filter expression (any `Filter`).
- `Query.match(path, now=None, stat_result=None)` — run the filter expression against a single `path`. Useful for programmatic checks or tests. `now` defaults to the current datetime `stat_result` can be provided to avoid an additional `stat()` call.
- `Query.files(path, recursive=True, files_only=True, now=None, threaded=False)` — lazily yield matching `pathlib.Path` objects. Set `threaded=True` to enable the producer/consumer mode.
- `Query.select(path, ...)` — eagerly collect matches into a `ResultSet`.
### Threaded vs single-threaded
With `pathql` you get threading basically for free. Just ask for it when you run a query and your query will use 5 threads, though you have control over this. For custom filters that are IO bound threading will help, if you are running a free-threaded version of python it will also help on compute bound problems.
- Single-threaded (`threaded=False`) walks the tree and filters each entry in a single loop. Simpler and predictable; good for small trees or when threading
is not desired.
- Threaded (`threaded=True`) starts a producer thread that walks the filesystem and pushes `(path, stat_result)` tuples into a bounded `queue.Queue` (default maxsize: 10). The consumer (caller) reads from the queue and runs the filter expression. This can improve throughput when the filesystem is slow or when filter evaluation is relatively expensive.
- The producer places a `None` sentinel on the queue when traversal completes; the consumer stops after seeing the sentinel and joins the producer thread.
### Stat caching and `stat_result`
- The producer (threaded mode) or the single-threaded walker performs `path.stat()` once per file (at most) and passes the `stat_result` along with the path to the filters. Filters should accept and refer the provided `stat_result` to avoid extra system calls.
- `stat-result` is only generated when it is actually used and if other filter expressions require stat, they use a cached version.
- All filters in PathQL accept optional `now` and `stat_result` parameters so that a single `now` timestamp can be used for consistent comparisons and
so callers can reuse stat results.
### Timing ###
IN large directory trees with 100's or 1000's of files the time changes between the beginning and end of a query, sometimes by many seconds or more. This can lead to race conditions when ages are calculated. The design decision that was made is that at the start of a query the current time is read (or a user provided one is used) for all datetime calculations. This is imperfect because there are many possible issues. For example, a file existed when you started the run but was deleted when you finished. There are many ways that this problem could be solved with varying degrees of "quality".
### Recursion, files vs directories
- `recursive=True` (default) uses `rglob("*")` to walk the tree; set `recursive=False` to restrict to the top-level directory via `glob("*")`.
- `files_only=True` (default) filters out non-files; set `files=False` to include directories in the results.
## ResultSet and ResultField
PathQL’s `ResultSet` is a materialized list of `pathlib.Path` objects returned by `Query.select()`. It provides aggregation, sorting, and statistical methods for analyzing query results. You can use `ResultField` to specify which file attribute to operate on.
### ResultSet Operations
| Operation | Description |
| ------------------------- | -------------------------------------- |
| `count_()` | Number of items in the result set |
| `sort_(field, ascending)` | Sort by a field (ascending/descending) |
| `top_n(field, n)` | Top N items by field |
| `bot_n(field, n)` | Bottom N items by field |
| `max(field)` | Maximum value for a field |
| `min(field)` | Minimum value for a field |
| `average(field)` | Average value for a field |
| `median(field)` | Median value for a field |
### ResultField Options
| Field | Description |
| --------------------- | ---------------------------------- |
| `SIZE` | File size in bytes |
| `MTIME` | Modification time (timestamp) |
| `CTIME` | Creation time (timestamp) |
| `ATIME` | Access time (timestamp) |
| `MTIME_DT` | Modification time (datetime) |
| `CTIME_DT` | Creation time (datetime) |
| `ATIME_DT` | Access time (datetime) |
| `NAME` | File name |
| `SUFFIX` | File extension |
| `STEM` | File stem (name without extension) |
| `PATH` | Full path as string |
| `PARENT` | Parent directory |
| `PARENTS` | All parent directories (tuple) |
| `PARENTS_STEM_SUFFIX` | (Parents, stem, suffix) tuple |
### Examples
```python
from pathql.query import Query
from pathql.result_fields import ResultField
# Get a result set of all .log files
result_set = Query(where_expr=Suffix(".log"), from_paths="./logs").select()
# Count files
print(result_set.count_())
# Get top 3 largest files
top_3 = result_set.top_n(ResultField.SIZE, 3)
# Sort by creation time (ascending)
sorted_by_ctime = result_set.sort_(ResultField.CTIME, ascending=True)
# Get file with latest modification time
latest = result_set.max(ResultField.MTIME)
```
## File and Suffix Filter Semantics
### File Filter
The `File` filter matches the full filename using shell-style glob patterns (via `fnmatch`). This means you can use wildcards like `*`, `?`, and character sets, but **curly-brace expansion** (e.g., `foo.{jpg,png}`) is **not supported**. Matching is case-insensitive.
**Examples:**
```python
File("*.jpg") # matches "photo.jpg", "archive.backup.jpg"
File("foo.*") # matches "foo.jpg", "foo.png", "foo.bmp"
File("*.tar.gz") # matches "archive.tar.gz", "foo.bar.tar.gz"
File("foo.*.back") # matches "foo.tif.back", "foo.txt.back"
File("*report*") # matches any file with "report" in the name
```
**Note:** Patterns like `foo.{jpg,png}` will NOT expand to match multiple extensions. Use wildcards or run multiple queries if you need to match several extensions.
### Suffix Filter
The `Suffix` filter match file extensions by checking if the filename ends with the full extension, including the dot. Patterns like 'bmp' or '.bmp' are both accepted, but are normalized internally to '.bmp'.
- Any leading dot in the extension pattern is added if missing, so 'bmp' and '.bmp' are treated identically.
- Matching is case-insensitive.
- The filter matches if the filename ends with the normalized pattern (e.g., '.bmp'), regardless of how many dots precede it.
- Wildcards (e.g., `*`, `?`) are supported in Suffix patterns via `fnmatch`.
**Examples:**
```python
Suffix(".jpg") # matches "photo.jpg", "archive.backup.jpg"
Suffix("jpg") # also matches "photo.jpg", "archive.backup.jpg"
Suffix(".tar.gz") # matches "archive.tar.gz", "foo.bar.tar.gz"
Suffix(".tif.back") # matches "image.tif.back", "foo.tif.back"
Suffix(".back") # matches "foo.back", "image.tif.back", "foo.txt.back"
Suffix(".foo") # matches "bar.foo", "baz.bar.foo"
Suffix("*.gz") # matches any file ending in ".gz"
```
Note on curly-brace expansion: `Suffix` supports simple brace expansion inside a single string. For example, `Suffix("{tif,jpg}")` expands to the two patterns
".tif" and ".jpg" (you can also include a base, e.g. ".{tif,jpg}"). This is a convenience for common multi-extension cases. By contrast, the
`File` filter does NOT perform curly-brace expansion (it uses `fnmatch` on the entire filename) — use multiple `Suffix` filters or explicit patterns for complex filename matching.
Multi-part extensions like '.tar.gz' or '.tif.back' are supported and will match files ending with those exact strings. Wildcards are supported for flexible matching.
For more advanced extension filtering, you can combine `Suffix` with other filters:
```python
from pathql.query import Query
for path in Query("/photos", Suffix('.jpg') & CameraMakeFilter("Canon")):
print(path)
```
Example: brace expansion with `Suffix`:
```python
# matches files ending with .tif or .jpg
for p in Query("/images", Suffix("{tif,jpg}")):
print(p)
```
### Custom Filter Classes
For more advanced logic, subclass `Filter` and implement the `match()` method. This gives you full control over how files are evaluated.
**Example: Custom Filter for minimum line count**
```python
from pathql.filters.base import Filter
from pathql.query import Query
from pathql.filters import Suffix
class MinLinesFilter(Filter):
def __init__(self, min_lines: int):
self.min_lines = min_lines
def match(self, path: pathlib.Path, now=None, stat_result=None) -> bool:
try:
return sum(1 for _ in path.open()) >= self.min_lines
except (IOError,PermissionError,FileNotFoundError):
return False
# Place Suffix first for short-circuiting, then MinLinesFilter
for p in Query(where_expr=Suffix(".txt") & MinLinesFilter(100), from_paths="./docs"):
print(p)
```
See `test/test_override.py` for more examples and patterns.
## Datetime Part Filters
PathQL provides powerful filters for matching files based on specific parts of their modification, creation, or access datetime. These filters allow precise queries for year, month, day, hour, minute, and second.
### Key Idea
You almost never need to interact with the raw `datetime` objects if you are dealing with time scales relative to today. The filter constructors and defaults are designed so that typical queries (by year, month, day, etc.) will just work. You specify the part you care about (e.g., `YearFilter(2025)`, `MonthFilter("may")`), and PathQL will filter data in that time period. If that time period is in the future (you ask for May and it is April)
### How `base`, `offset`, and `attr` Work
- **base**: The reference datetime for the filter. By default, this is the current date/time, but you can set it to any `datetime` object. This is useful for queries like "files from the same month last year" or "files from a specific day."
- **offset**: An integer that shifts the `base` by the specified number of years, months, days, etc., depending on the filter. For example, `offset=-1` with a `MonthFilter` and the default base will match last month.
- **attr**: Which file timestamp to use. Accepts user-friendly names (`"modified"`, `"created"`, `"accessed"`) or raw stat attribute names (`"st_mtime"`, etc.). This lets you filter by modification, creation, or access time as needed.
### Available Filters
- `YearFilter(year, base=None, offset=0, attr="modified")`
- `MonthFilter(month, base=None, offset=0, attr="created")`
- `DayFilter(day, base=None, offset=0, attr="accessed")`
- `HourFilter(hour, base=None, offset=0, attr="modified")`
- `MinuteFilter(minute, base=None, offset=0, attr="modified")`
- `SecondFilter(second, base=None, offset=0, attr="modified")`
### Arguments
- **value** (`int` or `str`): The part to match (e.g., year=2025, month="may" or 5, day=16, etc.)
- **base** (`datetime`, optional): Reference datetime for offset calculation. Defaults to now.
- **offset** (`int`, optional): Offset to apply to the base datetime (e.g., offset=1 for next year/month/day).
- **attr** (`str`, optional): Which stat attribute to use. Accepts "modified", "created", "accessed" or "st_mtime", "st_ctime", "st_atime".
### Matching Logic
- The filter extracts the relevant part(s) from the file's stat timestamp (using the specified attr).
- It compares the extracted part(s) to the filter's value(s).
- For `MonthFilter`, both string names ("may", "january") and integers (1-12) are supported.
- For `DayFilter`, `HourFilter`, `MinuteFilter`, `SecondFilter`, the filter matches only if all parts (year, month, day, etc.) match exactly.
### Examples
| FilterName | base_time | offset | attr | Description |
| ----------- | ------------------------- | ------ | ----- | ------------------------------------------------------------ |
| HourFilter | datetime(2022,10,28,11,0) | 2 | mtime | Files will match if modified in the hour 13:00 on 2022-10-28 |
| HourFilter | None (current_time) | 0 | ctime | Files will match if created in the current hour |
| DayFilter | datetime(2022,1,1) | 0 | atime | Files will match if accessed on 2022-01-01 |
| DayFilter | None (current_time) | -1 | mtime | Files will match if modified yesterday |
| MonthFilter | datetime(2023,6,1) | -1 | ctime | Files will match if created in May 2023 |
| MonthFilter | None (current_time) | 0 | mtime | Files will match if modified this month |
| YearFilter | datetime(2020,1,1) | 1 | mtime | Files will match if modified in 2021 |
| YearFilter | None (current_time) | 0 | ctime | Files will match if created this year |
**Example usage:**
```python
HourFilter(base_time=datetime(2022,10,28,11,0), offset=2, attr="mtime")
HourFilter(base_time=None, offset=0, attr="ctime")
DayFilter(base_time=datetime(2022,1,1), offset=0, attr="atime")
DayFilter(base_time=None, offset=-1, attr="mtime")
MonthFilter(base_time=datetime(2023,6,1), offset=-1, attr="ctime")
MonthFilter(base_time=None, offset=0, attr="mtime")
YearFilter(base_time=datetime(2020,1,1), offset=1, attr="mtime")
YearFilter(base_time=None, offset=0, attr="ctime")
```
## Age Filters
PathQL provides convenient age-based filters that match files based on their modification age. These are higher-level than the datetime-part filters and are useful for queries like "files older than 30 days" or "files modified in the last 2 hours".
Supported filters:
- `AgeMinutes` — age in minutes
- `AgeHours` — age in hours
- `AgeDays` — age in days
- `AgeYears` — age in years
Comparison semantics:
- Only `<`, `<=`, `>`, and `>=` style comparisons are supported for age filters. The library treats `<` and `>` as inclusive (so they behave like `<=` and `>=` respectively) for convenience. Equality (`==`) and inequality (`!=`) comparisons are intentionally unsupported and will raise `TypeError`.
Examples:
```py
from pathql import AgeDays, AgeHours, AgeYears,Query
# Files older than 30 days
for p in Query("/var/log", AgeDays() > 30):
print(p)
# Files modified in the last 2 hours
for p in Query("/tmp", AgeHours() < 2):
print(p)
# Files older than one year
for p in Query("/archive", AgeYears() >= 1):
print(p)
```
Note: age filters use the file modification time (`st_mtime`) and accept an optional `now` parameter when used programmatically (useful for reproducible tests). They also accept an optional `stat_result` to avoid extra `stat()` calls when you already have file metadata.
- "created" → `st_ctime`
- "accessed" → `st_atime`
- You can also use the raw stat attribute names directly.
### Examples
```python
from pathql import YearFilter, MonthFilter, DayFilter, Query
import datetime as dt
# Files modified in 2025
query = YearFilter(2025)
for path in Query("/data", query):
print(path)
# Files created in May
query = MonthFilter("may", attr="created")
for path in Query("/data", query):
print(path)
# Files modified on October 16, 2025
query = DayFilter(base=dt.datetime(2025, 10, 16))
for path in Query("/data", query):
print(path)
```
### More Examples: Yesterday, Last Month, and Relative Dates
```python
import datetime as dt
from pathql import DayFilter, MonthFilter,Query
# Files modified yesterday (-1day from now)
query = DayFilter(offset=-1)
for path in Query("/data", query).files():
print(path)
# Files modified last month (-1 month from this month)
query = MonthFilter(offset=-1)
for path in Query("/data" query).files():
print(path)
# Files modified the day before Christmas 2023
christmas = dt.datetime(2022, 12, 25)
query = DayFilter(base=christmas, offset=-1)
for path in Query("/data", query).files():
print(path)
```
Ideally, if you have a datetime value in your code you will never need to deal with time spans or deltas or complex data calculations.
## File Age Filters
PathQL supports filtering files by age using either filesystem timestamps or the date encoded in the filename. Using filename dating allows for a (possibly) more reliable way to target creation time in log files. Instead of using the operating system timing data we use the file naming convention where the file name starts with YYYY-MM-DD-HH-, YYYY-MM-DD_, YY-MM_, YY_ Using this time can get around issues with creation tme not being reliable across operating systems, modification times being changed by mistake. System clocks not be correct.
For the purpose of matching if a valid file date is not found the match does not occur, it is not an error.
### Filename-based Age Filters
These filters use the date encoded in the filename (e.g. `YYYY-MM-DD_HH_{ArchiveName}.{EXT}`):
- `FilenameAgeMinutes`
- `FilenameAgeHours`
- `FilenameAgeDays`
- `FilenameAgeYears`
**Example usage:**
```python
from pathql.filters.fileage import FilenameAgeDays
# Find files whose filename date is less than 10 days old
filter = FilenameAgeDays(operator.lt, 10)
filter.match("2024-05-01_report.txt")
```
Filename-based filters allow you to query files based on their logical age, independent of filesystem metadata.
### File Actions
PathQL provides utilities for batch file operations on lists of `pathlib.Path` objects (or `ResultSet`). You can use a query to select files, then copy, move, delete or zip them:
```python
from pathql.actions import copy_files, move_files, delete_files
from pathql.query import Query
from pathql.filters import Suffix
# Select all .txt files recursively from 'src'
files = Query(Suffix() == "txt").select(src, recursive=True)
destination = Path("dst")
copy_files(files, destination)
move_files(files, destination)
delete_files(files)
```
### Exception Handling in File Actions
All file actions (`copy_files`, `move_files`, `delete_files`) in PathQL are designed for robust batch processing. When you perform an action on a list of files:
- **Exceptions are caught per file**: If an error occurs (such as `IOError`, `PermissionError`, `OSError`, `FileNotFoundError`, or `NotADirectoryError`), it is caught for that specific file.
- **Result reporting**: The return value is a `FileActionResult` object, which contains:
- `success`: a list of files that were processed successfully.
- `failed`: a list of files that failed to process.
- `errors`: a dictionary mapping each failed file to the exception that was raised.
- `status`: True if `failed` list is empty.
- **No silent failures**: By default, exceptions are caught and reported in the result object. If you set `ignore_access_exception=False`, the first exception encountered will be raised immediately.
- **Status property**: You can check `result.status` to see if all files were processed successfully (`True` if no failures).
**Example:**
```python
result = copy_files(files, destination)
if not result.status:
print("Some files failed:")
for path, exc in result.errors.items():
print(f"{path}: {exc}")
```
This design ensures you always know which files succeeded, which failed, and why—making batch file operations safe and transparent.
> **Note:** If you perform a combined action (such as `zip_move_files`), the returned `FileActionResult` will reflect the status for both the zip and the move actions. The `success`, `failed`, and `errors` fields will include results from all underlying operations.
## Zip Actions
You can create zip archives from lists of files, with options to preserve directory structure and chain actions (zip+delete, zip+move, zip+copy):
```python
from pathql.actions import zip_files, zip_delete_files
from pathql.query import Query
from pathql.filters import Suffix
# Select all .txt files recursively from 'src'
files = Query(Suffix() == "txt").select(src, recursive=True)
root = Path("src")
target_zip = Path("archive.zip")
# Zip files to a target folder
zip_files(files, root, target_zip, preserve_dir_structure=True)
# Zip files to target folder and delete files left behind.
zip_delete_files(files, root, target_zip)
```
## UV Example
### 1. Find all PNG or BMP images under 1MB, modified in the last 10 minutes
`PathQL` has a small CLI app called `pql.py1` that can be run from the command line using `uv`. It is a basic app that pulls in `pathql` and `PIL` and runs from the command line. It has a custom filter object that looks into image file metadata and extracts the color depth allowing you filter files by color depth. The demo enables color depth age and size filtering.
```bash
cli % uv run --script pql.py "*.jpg" RGB
black_1kb.jpg
```
This command will:
- Build a virtual environment with python 3.14 and install pathql and PIL
- Search for files matching the pattern `*.jpg` in the current directory
- Filter for images with color mode `RGB`
You can add more options, such as file size and age:
```bash
cli % uv run --script pql.py "*.jpg" RGB --size-min 10kb --size-max 2mb --min-age 0 --max-age 365
```
This will:
- Build a virtual environment with python 3.14 and install pathql and PIL
- Find JPEG images with RGB color mode
- Only include files between 10KB and 2MB
- Only include files created between 0 and 365 days ago
- Return nothing because the file size constraint missed.
### Arguments
- **pattern** (positional): Glob pattern for files (e.g., `"*.jpg"`)
- **col_mode** (positional): Comma-separated color modes (e.g., `"RGB,RGBA,L"`)
- **--root**: Folder to search (default: current directory)
- **--size-min**: Minimum file size in bytes (default: 0)
- **--size-max**: Maximum file size in bytes (default: 1TB)
- **--min-age**: Minimum file creation age in days (default: 0)
- **--max-age**: Maximum file creation age in days (default: 100000)
## Note on Types: Operator Overloads in DSLs
PathQL filter classes (like `Size`, `AgeDays`, etc.) intentionally override Python's comparison and boolean operator methods (`__eq__`, `__lt__`, etc.) to return filter objects, not `bool`. This enables expressive, composable query logic:
```python
Size() > "10MB" # returns a filter, not a bool
```
This pattern is common in query builders and DSLs, but it conflicts with Python's type system and expectations for these dunder methods (which are supposed to return `bool`). As a result, type checkers (like mypy, Pyright) and some linters will report errors or warnings about return types and operator overloads.
**Why?**
- Python expects `__eq__`, `__lt__`, etc. to return `bool` for built-in types.
- DSLs return filter objects so you can build up complex queries using operators.
- This is deliberate and necessary for PathQL's design, but it is not type-safe
according to Python's current type system.
**What to do?**
- Ignore these type errors for filter classes, or use `# type: ignore` comments.
- Document this behavior for users and contributors.
- If you need strict `bool` results, use explicit methods or evaluate filters directly.
**Community Note:**
This is a known limitation in Python's type system. A future PEP may address DSL-friendly operator overloads, but for now, this pattern will always cause typechecker friction.
---
## Developer & Release Conventions
Project-level conventions, contributor guidance, and release steps are maintained in `AI_CONTEXT.md` in the repository root. Please consult that file for the latest instructions on coding style, testing, and release procedures.
---
## Release Summary
### Version: **v0.0.8** | Date: 2025-10-29
#### Highlights
- **README.md refactored:** Fixed issue in `README.md` file. Did not double check AI.
Raw data
{
"_id": null,
"home_page": null,
"name": "pathql",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "pathlib, query, filesystem",
"author": null,
"author_email": "Chuck Bass <chuck@acrocad.net>",
"download_url": "https://files.pythonhosted.org/packages/07/15/f7805fc254566f7c1c8dab2d7735c8686aef73e944b524f0ca1b375ea97f/pathql-0.0.8.tar.gz",
"platform": null,
"description": "\n  \n\n\n## PathQL: Declarative Filesystem Query Language for Python\n\nPathQL is a declarative, composable, and efficient query language for filesystem operations in Python. It enables expressive, readable, and powerful queries over files and directories, inspired by `pathlib` and modern query languages. PathQL is designed for performance (stat caching), extensibility, and testability, with robust operator overloading and a familiar, Pythonic API requiring no (or VERY few) interactions and conversions with `datetimes`, `timestaps` and vaguaries of the `os.stat_result` object.\n\n## Features\n\n- **Declarative Query Syntax**: Compose filters using Python operators (`&`, `|`, `~`, etc.)\n- **Pathlib-like Naming**: Filters and queries mimic `pathlib` conventions (e.g., `Stem`, `Suffix`, `FileType`)\n- **Stat Caching**: Each file is stat-ed at-most, once, even if there are many checks against \"stat-able\" parts of a file.\n- **Threaded Filesystem Search**: Query engine uses a producer-consumer model with a dedicated thread for filesystem crawling and a main thread for filtering, improving responsiveness and throughput for large directory trees.\n- **Extensible Filters**: Easily add new filters for custom logic\n- **Comprehensive Testing**: Robust, parameterized pytest suite with coverage\n\n## Basic Concepts\n\nPathQL lets you write queries against the file system in a composable form:\n\n```python\nfrom pathql import AgeYears, Suffix, Query\nfor f in Query(where_expr=(AgeYears() > 1) & Suffix(\".bak\"), from_paths=\"c:/logs\").files():\n print(f\"Files to delete - {f.resolve()}\")\n```\n\n```python\nfrom pathql import AgeDays, Size, FileType, Suffix, Query\nfor f in Query(where_expr=(AgeDays() == 0) & (Size() > \"10 mb\") & Suffix(\"log\"), from_paths=\"C:/logs\", threaded=True).files():\n print(f\"Files to zip - {f.resolve()}\")\n```\n\n```python\nfrom pathql import AgeDays, Size, Suffix, Query,ResultField\n\n# Count, largest file size, and oldest file in the result set\nquery = Query(\n where_expr=(AgeDays() == 0) & (Size() > \"10 mb\") & Suffix(\"log\"),\n from_paths=\"C:/logs\",\n threaded=True\n)\nresult_set = query.select()\n\nprint(f\"Number of files to zip: {result_set.count_()}\")\nprint(f\"Largest file size: {result_set.max(ResultField.SIZE)} bytes\")\nprint(f\"Oldest file: {result_set.min(ResultField.MTIME)}\")\n```\n\nThese examples show how `PathQL` hides the guts of the pathlib module and the os module from you with readable filter expressions. The real power comes with actions that allow you to apply a function to all of the files that match your query, in parallel.\n\n\n\n### Composition and Filters\n\nFilters are composable objects that match files based on many of the attributes exposed by the `pathlib` Package. Attributes such as size, age, suffix, stem, or type. Each filter can be combined using Python's over-loadable boolean operators. Those filters can be composed in arbitrarily complex ways using boolean operators and methods that operate on queries.\n\n# Boolean Filter Functions(all/any) and Operators: `&`, `|`, `~`\n\nPathQL allows you to compose complex queries using the `&` (AND), `|` (OR), and `~` (NOT) operators. These operators work analogously to Python's built-in `and`, `or`, and `not`, but are implemented using operator overloading to allow chaining and composition of filter objects.\n\n- `&` (AND): Combines two filters so that both must match for a file to be included.\n- `|` (OR): Combines two filters so that either filter matching will include the file.\n- `~` (NOT): Inverts the result of a filter, matching files that do **not** satisfy the filter.\n- **Any(q1,q2,..)** and **All(q1,q2,...** composable queries allow for short circuited sub query composition.\n\n**Precedence**\n\nThe query language implemented by `PathQL` should be thought of as a mathematical system that uses pythons mathematical operators, as such the rules of precedence follow those for '&', `|` and `^`, which have different precedence than the `and`/`or` operators (which cannot be overloaded).\n\nYou must say\n\n`(AgeDays() < 2) & (Suffix(\"txt))`\n\nfor precedence to work as you expect, or use the `All`/`Any` methods.\n\n**Short-Circuiting Behavior**\n\nPathQL's filter composition is designed to short-circuit, just like Python's boolean operators:\n\n- For `A & B`, if filter `A` does not match, filter `B` is **not** evaluated.\n- For `A | B`, if filter `A` matches, filter `B` is **not** evaluated.\n- For chained filters (e.g., `A & B & C`), evaluation stops at the first filter that fails for AND, or the first that succeeds for OR.\n- For `All(A,B,C)` or `Any(A,B,C)`, the operation will short circuit once a `All` sees a `False` or `Any` sees a `True`.\n\nPlace expensive filters at right most position in expressions so they can be short circuited.\n\n**Example 1**\n\n```python\nfrom pathql.filters.base import Filter, AndFilter, OrFilter\n\nclass ExpensiveFilter(Filter):\n def match(self, path, now=None, stat_result=None):\n # Simulate a slow operation\n import time\n time.sleep(1)\n return True\n\nclass TrueFilter(Filter):\n def match(self, path, now=None, stat_result=None):\n return True\n\nclass FalseFilter(Filter):\n def match(self, path, now=None, stat_result=None):\n return False\n\n# ExpensiveFilter will NOT be called because CheapFilter fails\ncombined = FalseFilter() & ExpensiveFilter()\nresult = combined.match(pathlib.Path(\"somefile.txt\")) # Fast, no delay\n\n# ExpensiveFilter will NOT be called because CheapFilter succeeds\ncombined = TrueFilter() | ExpensiveFilter()\nresult = combined.match(pathlib.Path(\"somefile.txt\")) # Fast, no delay\n```\n\n**Example 2**\n```python\n# ExpensiveFilter will NOT be called because CheapFilter fails. All Function mimics 'and'\ncombined = All(FalseFilter(), ExpensiveFilter())\nresult = combined.match(pathlib.Path(\"somefile.txt\")) # Fast, no delay\n\n# ExpensiveFilter will NOT be called because CheapFilter succeeds. Any function mimics 'or'\ncombined = Any(TrueFilter(), ExpensiveFilter())\nresult = combined.match(pathlib.Path(\"somefile.txt\")) # Fast, no delay\n```\n\n**Note:**\nFilters should be pure functions without side effects, as short-circuiting means some filters may not be executed depending on the logic. While most filters that come with `pathql` run fairly quickly, real world use cases involve opening files and scanning contents for state information. Eliminating these checks can be VERY valuable.\n\n**Summary:**\n\n- Use `&`, `|`, and `~` to build complex queries.\n- Follow precedence rules (Size()>\"1mb\")\n- Short-circuiting ensures efficient evaluation and skips unnecessary work.\n- Place expensive filters down stream in your logic chains Fast=(Ext,Name,FileDate) slow = (Size,Type,Date,Age)\n\n#### Example Filters\n\n- `Size() <= 1_000_000` \u2014 files up to 1MB\n- `Suffix({\".png\", \".jpg\"})` \u2014 files with .png or .jpg extension\n- `Stem(\"report_*\")` \u2014 files whose stem matches a glob pattern (e.g., starts with \"report_\")\n- `FileType().file` \u2014 'file, 'dir', 'link'\n- `AgeMinutes() < 10` \u2014 modified in the last 10 minutes\n- `Between(Size(), 1000, 2000)` \u2014 files with size >= 1000 and < 2000 bytes (inclusive lower, exclusive upper)\n- `YearFIlter(2024)`\n- `MonthFIlter(\"May\")`\n- `DayFilter(23)`\n- `HourFilter(12)`\n- `MinuteFilter(59)`\n- `SecondFilter(40)`\n- `FileDate().modified > dt.datetime(2025,1,1)`\n\n### Using the `Between` Filter\n\nThe `Between` filter matches files whose attribute (such as size or age) falls within a specified range: inclusive on the lower bound, exclusive on the upper bound. This filter works with values that can be reduced to comparable values. At this time `Size` and `FileDate` work with between.\n\n**Example:** Find files with size >= 1KB and < 2KB:\n\n```python\nfrom pathql.filters import Between, Size\nfrom pathql.query import Query\n\nfor path in Query(\"/some/dir\", Between(Size(), 1000, 2000) & Between(FileDate(),dt.datetime(2024,1,1),dt.datetime(2024,2,1))):\n print(path)\nelse:\n print(\"No files were found\")\n```\n\n## Query (engine)\n\nThe `Query` class is the driver for PathQL. It walks the filesystem, gathers stat metadata, and applies filter expressions to decide which files to yield. It exposes a small API and supports both single-threaded and producer/consumer (threaded) modes for flexible performance characteristics.\n\n### Overview\n\n- Purpose: efficiently find filesystem entries that match a filter expression. - Model: the filesystem walk (producer) is separated from filtering/consumption (consumer) so IO-bound directory traversal can proceed concurrently with CPU bound filter evaluation.\n- Key features: stat caching, optional threading, recursion control, and\n eager or lazy collection via `select` and `files` respectively.\n\n### Public API\n\n- `Query(filter_expr)` \u2014 construct a Query with a filter expression (any `Filter`).\n- `Query.match(path, now=None, stat_result=None)` \u2014 run the filter expression against a single `path`. Useful for programmatic checks or tests. `now` defaults to the current datetime `stat_result` can be provided to avoid an additional `stat()` call.\n- `Query.files(path, recursive=True, files_only=True, now=None, threaded=False)` \u2014 lazily yield matching `pathlib.Path` objects. Set `threaded=True` to enable the producer/consumer mode.\n- `Query.select(path, ...)` \u2014 eagerly collect matches into a `ResultSet`.\n\n### Threaded vs single-threaded\n\nWith `pathql` you get threading basically for free. Just ask for it when you run a query and your query will use 5 threads, though you have control over this. For custom filters that are IO bound threading will help, if you are running a free-threaded version of python it will also help on compute bound problems.\n\n- Single-threaded (`threaded=False`) walks the tree and filters each entry in a single loop. Simpler and predictable; good for small trees or when threading\n is not desired.\n- Threaded (`threaded=True`) starts a producer thread that walks the filesystem and pushes `(path, stat_result)` tuples into a bounded `queue.Queue` (default maxsize: 10). The consumer (caller) reads from the queue and runs the filter expression. This can improve throughput when the filesystem is slow or when filter evaluation is relatively expensive.\n- The producer places a `None` sentinel on the queue when traversal completes; the consumer stops after seeing the sentinel and joins the producer thread.\n\n### Stat caching and `stat_result`\n\n- The producer (threaded mode) or the single-threaded walker performs `path.stat()` once per file (at most) and passes the `stat_result` along with the path to the filters. Filters should accept and refer the provided `stat_result` to avoid extra system calls.\n- `stat-result` is only generated when it is actually used and if other filter expressions require stat, they use a cached version.\n- All filters in PathQL accept optional `now` and `stat_result` parameters so that a single `now` timestamp can be used for consistent comparisons and\n so callers can reuse stat results.\n\n### Timing ###\n\nIN large directory trees with 100's or 1000's of files the time changes between the beginning and end of a query, sometimes by many seconds or more. This can lead to race conditions when ages are calculated. The design decision that was made is that at the start of a query the current time is read (or a user provided one is used) for all datetime calculations. This is imperfect because there are many possible issues. For example, a file existed when you started the run but was deleted when you finished. There are many ways that this problem could be solved with varying degrees of \"quality\".\n\n### Recursion, files vs directories\n\n- `recursive=True` (default) uses `rglob(\"*\")` to walk the tree; set `recursive=False` to restrict to the top-level directory via `glob(\"*\")`.\n- `files_only=True` (default) filters out non-files; set `files=False` to include directories in the results.\n\n## ResultSet and ResultField\n\nPathQL\u2019s `ResultSet` is a materialized list of `pathlib.Path` objects returned by `Query.select()`. It provides aggregation, sorting, and statistical methods for analyzing query results. You can use `ResultField` to specify which file attribute to operate on.\n\n### ResultSet Operations\n\n| Operation | Description |\n| ------------------------- | -------------------------------------- |\n| `count_()` | Number of items in the result set |\n| `sort_(field, ascending)` | Sort by a field (ascending/descending) |\n| `top_n(field, n)` | Top N items by field |\n| `bot_n(field, n)` | Bottom N items by field |\n| `max(field)` | Maximum value for a field |\n| `min(field)` | Minimum value for a field |\n| `average(field)` | Average value for a field |\n| `median(field)` | Median value for a field |\n\n### ResultField Options\n\n| Field | Description |\n| --------------------- | ---------------------------------- |\n| `SIZE` | File size in bytes |\n| `MTIME` | Modification time (timestamp) |\n| `CTIME` | Creation time (timestamp) |\n| `ATIME` | Access time (timestamp) |\n| `MTIME_DT` | Modification time (datetime) |\n| `CTIME_DT` | Creation time (datetime) |\n| `ATIME_DT` | Access time (datetime) |\n| `NAME` | File name |\n| `SUFFIX` | File extension |\n| `STEM` | File stem (name without extension) |\n| `PATH` | Full path as string |\n| `PARENT` | Parent directory |\n| `PARENTS` | All parent directories (tuple) |\n| `PARENTS_STEM_SUFFIX` | (Parents, stem, suffix) tuple |\n\n### Examples\n\n```python\nfrom pathql.query import Query\nfrom pathql.result_fields import ResultField\n\n# Get a result set of all .log files\nresult_set = Query(where_expr=Suffix(\".log\"), from_paths=\"./logs\").select()\n\n# Count files\nprint(result_set.count_())\n\n# Get top 3 largest files\ntop_3 = result_set.top_n(ResultField.SIZE, 3)\n\n# Sort by creation time (ascending)\nsorted_by_ctime = result_set.sort_(ResultField.CTIME, ascending=True)\n\n# Get file with latest modification time\nlatest = result_set.max(ResultField.MTIME)\n```\n\n\n## File and Suffix Filter Semantics\n\n### File Filter\n\nThe `File` filter matches the full filename using shell-style glob patterns (via `fnmatch`). This means you can use wildcards like `*`, `?`, and character sets, but **curly-brace expansion** (e.g., `foo.{jpg,png}`) is **not supported**. Matching is case-insensitive.\n\n**Examples:**\n\n```python\nFile(\"*.jpg\") # matches \"photo.jpg\", \"archive.backup.jpg\"\nFile(\"foo.*\") # matches \"foo.jpg\", \"foo.png\", \"foo.bmp\"\nFile(\"*.tar.gz\") # matches \"archive.tar.gz\", \"foo.bar.tar.gz\"\nFile(\"foo.*.back\") # matches \"foo.tif.back\", \"foo.txt.back\"\nFile(\"*report*\") # matches any file with \"report\" in the name\n```\n\n**Note:** Patterns like `foo.{jpg,png}` will NOT expand to match multiple extensions. Use wildcards or run multiple queries if you need to match several extensions.\n\n### Suffix Filter\n\nThe `Suffix` filter match file extensions by checking if the filename ends with the full extension, including the dot. Patterns like 'bmp' or '.bmp' are both accepted, but are normalized internally to '.bmp'.\n\n- Any leading dot in the extension pattern is added if missing, so 'bmp' and '.bmp' are treated identically.\n- Matching is case-insensitive.\n- The filter matches if the filename ends with the normalized pattern (e.g., '.bmp'), regardless of how many dots precede it.\n- Wildcards (e.g., `*`, `?`) are supported in Suffix patterns via `fnmatch`.\n\n**Examples:**\n\n```python\nSuffix(\".jpg\") # matches \"photo.jpg\", \"archive.backup.jpg\"\nSuffix(\"jpg\") # also matches \"photo.jpg\", \"archive.backup.jpg\"\nSuffix(\".tar.gz\") # matches \"archive.tar.gz\", \"foo.bar.tar.gz\"\nSuffix(\".tif.back\") # matches \"image.tif.back\", \"foo.tif.back\"\nSuffix(\".back\") # matches \"foo.back\", \"image.tif.back\", \"foo.txt.back\"\nSuffix(\".foo\") # matches \"bar.foo\", \"baz.bar.foo\"\nSuffix(\"*.gz\") # matches any file ending in \".gz\"\n```\n\nNote on curly-brace expansion: `Suffix` supports simple brace expansion inside a single string. For example, `Suffix(\"{tif,jpg}\")` expands to the two patterns\n\".tif\" and \".jpg\" (you can also include a base, e.g. \".{tif,jpg}\"). This is a convenience for common multi-extension cases. By contrast, the\n`File` filter does NOT perform curly-brace expansion (it uses `fnmatch` on the entire filename) \u2014 use multiple `Suffix` filters or explicit patterns for complex filename matching.\n\nMulti-part extensions like '.tar.gz' or '.tif.back' are supported and will match files ending with those exact strings. Wildcards are supported for flexible matching.\n\nFor more advanced extension filtering, you can combine `Suffix` with other filters:\n\n```python\nfrom pathql.query import Query\nfor path in Query(\"/photos\", Suffix('.jpg') & CameraMakeFilter(\"Canon\")):\n print(path)\n```\n\nExample: brace expansion with `Suffix`:\n\n```python\n# matches files ending with .tif or .jpg\nfor p in Query(\"/images\", Suffix(\"{tif,jpg}\")):\n print(p)\n```\n\n\n\n### Custom Filter Classes\n\nFor more advanced logic, subclass `Filter` and implement the `match()` method. This gives you full control over how files are evaluated.\n\n**Example: Custom Filter for minimum line count**\n```python\nfrom pathql.filters.base import Filter\nfrom pathql.query import Query\nfrom pathql.filters import Suffix\n\nclass MinLinesFilter(Filter):\n def __init__(self, min_lines: int):\n self.min_lines = min_lines\n\n def match(self, path: pathlib.Path, now=None, stat_result=None) -> bool:\n try:\n return sum(1 for _ in path.open()) >= self.min_lines\n except (IOError,PermissionError,FileNotFoundError):\n return False\n\n# Place Suffix first for short-circuiting, then MinLinesFilter\nfor p in Query(where_expr=Suffix(\".txt\") & MinLinesFilter(100), from_paths=\"./docs\"):\n print(p)\n```\n\nSee `test/test_override.py` for more examples and patterns.\n\n## Datetime Part Filters\n\nPathQL provides powerful filters for matching files based on specific parts of their modification, creation, or access datetime. These filters allow precise queries for year, month, day, hour, minute, and second.\n\n### Key Idea\n\nYou almost never need to interact with the raw `datetime` objects if you are dealing with time scales relative to today. The filter constructors and defaults are designed so that typical queries (by year, month, day, etc.) will just work. You specify the part you care about (e.g., `YearFilter(2025)`, `MonthFilter(\"may\")`), and PathQL will filter data in that time period. If that time period is in the future (you ask for May and it is April)\n\n### How `base`, `offset`, and `attr` Work\n\n- **base**: The reference datetime for the filter. By default, this is the current date/time, but you can set it to any `datetime` object. This is useful for queries like \"files from the same month last year\" or \"files from a specific day.\"\n- **offset**: An integer that shifts the `base` by the specified number of years, months, days, etc., depending on the filter. For example, `offset=-1` with a `MonthFilter` and the default base will match last month.\n- **attr**: Which file timestamp to use. Accepts user-friendly names (`\"modified\"`, `\"created\"`, `\"accessed\"`) or raw stat attribute names (`\"st_mtime\"`, etc.). This lets you filter by modification, creation, or access time as needed.\n\n### Available Filters\n\n- `YearFilter(year, base=None, offset=0, attr=\"modified\")`\n- `MonthFilter(month, base=None, offset=0, attr=\"created\")`\n- `DayFilter(day, base=None, offset=0, attr=\"accessed\")`\n- `HourFilter(hour, base=None, offset=0, attr=\"modified\")`\n- `MinuteFilter(minute, base=None, offset=0, attr=\"modified\")`\n- `SecondFilter(second, base=None, offset=0, attr=\"modified\")`\n\n### Arguments\n\n- **value** (`int` or `str`): The part to match (e.g., year=2025, month=\"may\" or 5, day=16, etc.)\n- **base** (`datetime`, optional): Reference datetime for offset calculation. Defaults to now.\n- **offset** (`int`, optional): Offset to apply to the base datetime (e.g., offset=1 for next year/month/day).\n- **attr** (`str`, optional): Which stat attribute to use. Accepts \"modified\", \"created\", \"accessed\" or \"st_mtime\", \"st_ctime\", \"st_atime\".\n\n### Matching Logic\n\n- The filter extracts the relevant part(s) from the file's stat timestamp (using the specified attr).\n- It compares the extracted part(s) to the filter's value(s).\n- For `MonthFilter`, both string names (\"may\", \"january\") and integers (1-12) are supported.\n- For `DayFilter`, `HourFilter`, `MinuteFilter`, `SecondFilter`, the filter matches only if all parts (year, month, day, etc.) match exactly.\n\n\n### Examples\n\n| FilterName | base_time | offset | attr | Description |\n| ----------- | ------------------------- | ------ | ----- | ------------------------------------------------------------ |\n| HourFilter | datetime(2022,10,28,11,0) | 2 | mtime | Files will match if modified in the hour 13:00 on 2022-10-28 |\n| HourFilter | None (current_time) | 0 | ctime | Files will match if created in the current hour |\n| DayFilter | datetime(2022,1,1) | 0 | atime | Files will match if accessed on 2022-01-01 |\n| DayFilter | None (current_time) | -1 | mtime | Files will match if modified yesterday |\n| MonthFilter | datetime(2023,6,1) | -1 | ctime | Files will match if created in May 2023 |\n| MonthFilter | None (current_time) | 0 | mtime | Files will match if modified this month |\n| YearFilter | datetime(2020,1,1) | 1 | mtime | Files will match if modified in 2021 |\n| YearFilter | None (current_time) | 0 | ctime | Files will match if created this year |\n\n**Example usage:**\n```python\nHourFilter(base_time=datetime(2022,10,28,11,0), offset=2, attr=\"mtime\")\nHourFilter(base_time=None, offset=0, attr=\"ctime\")\nDayFilter(base_time=datetime(2022,1,1), offset=0, attr=\"atime\")\nDayFilter(base_time=None, offset=-1, attr=\"mtime\")\nMonthFilter(base_time=datetime(2023,6,1), offset=-1, attr=\"ctime\")\nMonthFilter(base_time=None, offset=0, attr=\"mtime\")\nYearFilter(base_time=datetime(2020,1,1), offset=1, attr=\"mtime\")\nYearFilter(base_time=None, offset=0, attr=\"ctime\")\n```\n\n## Age Filters\n\nPathQL provides convenient age-based filters that match files based on their modification age. These are higher-level than the datetime-part filters and are useful for queries like \"files older than 30 days\" or \"files modified in the last 2 hours\".\n\nSupported filters:\n\n- `AgeMinutes` \u2014 age in minutes\n- `AgeHours` \u2014 age in hours\n- `AgeDays` \u2014 age in days\n- `AgeYears` \u2014 age in years\n\nComparison semantics:\n\n- Only `<`, `<=`, `>`, and `>=` style comparisons are supported for age filters. The library treats `<` and `>` as inclusive (so they behave like `<=` and `>=` respectively) for convenience. Equality (`==`) and inequality (`!=`) comparisons are intentionally unsupported and will raise `TypeError`.\n\nExamples:\n\n```py\nfrom pathql import AgeDays, AgeHours, AgeYears,Query\n\n# Files older than 30 days\nfor p in Query(\"/var/log\", AgeDays() > 30):\n print(p)\n\n# Files modified in the last 2 hours\nfor p in Query(\"/tmp\", AgeHours() < 2):\n print(p)\n\n# Files older than one year\nfor p in Query(\"/archive\", AgeYears() >= 1):\n print(p)\n\n\n```\n\nNote: age filters use the file modification time (`st_mtime`) and accept an optional `now` parameter when used programmatically (useful for reproducible tests). They also accept an optional `stat_result` to avoid extra `stat()` calls when you already have file metadata.\n\n- \"created\" \u2192 `st_ctime`\n- \"accessed\" \u2192 `st_atime`\n- You can also use the raw stat attribute names directly.\n\n### Examples\n\n```python\nfrom pathql import YearFilter, MonthFilter, DayFilter, Query\nimport datetime as dt\n\n# Files modified in 2025\nquery = YearFilter(2025)\nfor path in Query(\"/data\", query):\n print(path)\n\n# Files created in May\nquery = MonthFilter(\"may\", attr=\"created\")\nfor path in Query(\"/data\", query):\n print(path)\n\n# Files modified on October 16, 2025\nquery = DayFilter(base=dt.datetime(2025, 10, 16))\nfor path in Query(\"/data\", query):\n print(path)\n```\n\n### More Examples: Yesterday, Last Month, and Relative Dates\n\n```python\nimport datetime as dt\nfrom pathql import DayFilter, MonthFilter,Query\n\n# Files modified yesterday (-1day from now)\nquery = DayFilter(offset=-1)\nfor path in Query(\"/data\", query).files():\n print(path)\n\n# Files modified last month (-1 month from this month)\nquery = MonthFilter(offset=-1)\nfor path in Query(\"/data\" query).files():\n print(path)\n\n# Files modified the day before Christmas 2023\nchristmas = dt.datetime(2022, 12, 25)\nquery = DayFilter(base=christmas, offset=-1)\nfor path in Query(\"/data\", query).files():\n print(path)\n```\n\nIdeally, if you have a datetime value in your code you will never need to deal with time spans or deltas or complex data calculations.\n\n## File Age Filters\n\nPathQL supports filtering files by age using either filesystem timestamps or the date encoded in the filename. Using filename dating allows for a (possibly) more reliable way to target creation time in log files. Instead of using the operating system timing data we use the file naming convention where the file name starts with YYYY-MM-DD-HH-, YYYY-MM-DD_, YY-MM_, YY_ Using this time can get around issues with creation tme not being reliable across operating systems, modification times being changed by mistake. System clocks not be correct.\n\nFor the purpose of matching if a valid file date is not found the match does not occur, it is not an error.\n\n### Filename-based Age Filters\n\nThese filters use the date encoded in the filename (e.g. `YYYY-MM-DD_HH_{ArchiveName}.{EXT}`):\n\n- `FilenameAgeMinutes`\n- `FilenameAgeHours`\n- `FilenameAgeDays`\n- `FilenameAgeYears`\n\n**Example usage:**\n\n```python\nfrom pathql.filters.fileage import FilenameAgeDays\n\n# Find files whose filename date is less than 10 days old\nfilter = FilenameAgeDays(operator.lt, 10)\nfilter.match(\"2024-05-01_report.txt\")\n```\n\nFilename-based filters allow you to query files based on their logical age, independent of filesystem metadata.\n\n### File Actions\n\nPathQL provides utilities for batch file operations on lists of `pathlib.Path` objects (or `ResultSet`). You can use a query to select files, then copy, move, delete or zip them:\n\n```python\nfrom pathql.actions import copy_files, move_files, delete_files\nfrom pathql.query import Query\nfrom pathql.filters import Suffix\n\n# Select all .txt files recursively from 'src'\nfiles = Query(Suffix() == \"txt\").select(src, recursive=True)\ndestination = Path(\"dst\")\n\ncopy_files(files, destination)\nmove_files(files, destination)\ndelete_files(files)\n```\n\n### Exception Handling in File Actions\n\nAll file actions (`copy_files`, `move_files`, `delete_files`) in PathQL are designed for robust batch processing. When you perform an action on a list of files:\n\n- **Exceptions are caught per file**: If an error occurs (such as `IOError`, `PermissionError`, `OSError`, `FileNotFoundError`, or `NotADirectoryError`), it is caught for that specific file.\n- **Result reporting**: The return value is a `FileActionResult` object, which contains:\n - `success`: a list of files that were processed successfully.\n - `failed`: a list of files that failed to process.\n - `errors`: a dictionary mapping each failed file to the exception that was raised.\n - `status`: True if `failed` list is empty.\n- **No silent failures**: By default, exceptions are caught and reported in the result object. If you set `ignore_access_exception=False`, the first exception encountered will be raised immediately.\n- **Status property**: You can check `result.status` to see if all files were processed successfully (`True` if no failures).\n\n**Example:**\n\n```python\nresult = copy_files(files, destination)\nif not result.status:\n print(\"Some files failed:\")\n for path, exc in result.errors.items():\n print(f\"{path}: {exc}\")\n```\n\nThis design ensures you always know which files succeeded, which failed, and why\u2014making batch file operations safe and transparent.\n\n> **Note:** If you perform a combined action (such as `zip_move_files`), the returned `FileActionResult` will reflect the status for both the zip and the move actions. The `success`, `failed`, and `errors` fields will include results from all underlying operations.\n\n## Zip Actions\n\nYou can create zip archives from lists of files, with options to preserve directory structure and chain actions (zip+delete, zip+move, zip+copy):\n\n```python\nfrom pathql.actions import zip_files, zip_delete_files\nfrom pathql.query import Query\nfrom pathql.filters import Suffix\n\n# Select all .txt files recursively from 'src'\nfiles = Query(Suffix() == \"txt\").select(src, recursive=True)\nroot = Path(\"src\")\ntarget_zip = Path(\"archive.zip\")\n\n# Zip files to a target folder\nzip_files(files, root, target_zip, preserve_dir_structure=True)\n\n# Zip files to target folder and delete files left behind.\nzip_delete_files(files, root, target_zip)\n```\n\n## UV Example\n\n### 1. Find all PNG or BMP images under 1MB, modified in the last 10 minutes\n\n`PathQL` has a small CLI app called `pql.py1` that can be run from the command line using `uv`. It is a basic app that pulls in `pathql` and `PIL` and runs from the command line. It has a custom filter object that looks into image file metadata and extracts the color depth allowing you filter files by color depth. The demo enables color depth age and size filtering.\n\n\n\n```bash\ncli % uv run --script pql.py \"*.jpg\" RGB\nblack_1kb.jpg\n\n```\n\nThis command will:\n- Build a virtual environment with python 3.14 and install pathql and PIL\n- Search for files matching the pattern `*.jpg` in the current directory\n- Filter for images with color mode `RGB`\n\nYou can add more options, such as file size and age:\n\n```bash\ncli % uv run --script pql.py \"*.jpg\" RGB --size-min 10kb --size-max 2mb --min-age 0 --max-age 365\n```\n\nThis will:\n- Build a virtual environment with python 3.14 and install pathql and PIL\n- Find JPEG images with RGB color mode\n- Only include files between 10KB and 2MB\n- Only include files created between 0 and 365 days ago\n- Return nothing because the file size constraint missed.\n\n### Arguments\n\n- **pattern** (positional): Glob pattern for files (e.g., `\"*.jpg\"`)\n- **col_mode** (positional): Comma-separated color modes (e.g., `\"RGB,RGBA,L\"`)\n- **--root**: Folder to search (default: current directory)\n- **--size-min**: Minimum file size in bytes (default: 0)\n- **--size-max**: Maximum file size in bytes (default: 1TB)\n- **--min-age**: Minimum file creation age in days (default: 0)\n- **--max-age**: Maximum file creation age in days (default: 100000)\n\n\n## Note on Types: Operator Overloads in DSLs\n\nPathQL filter classes (like `Size`, `AgeDays`, etc.) intentionally override Python's comparison and boolean operator methods (`__eq__`, `__lt__`, etc.) to return filter objects, not `bool`. This enables expressive, composable query logic:\n\n```python\nSize() > \"10MB\" # returns a filter, not a bool\n```\n\nThis pattern is common in query builders and DSLs, but it conflicts with Python's type system and expectations for these dunder methods (which are supposed to return `bool`). As a result, type checkers (like mypy, Pyright) and some linters will report errors or warnings about return types and operator overloads.\n\n**Why?**\n- Python expects `__eq__`, `__lt__`, etc. to return `bool` for built-in types.\n- DSLs return filter objects so you can build up complex queries using operators.\n- This is deliberate and necessary for PathQL's design, but it is not type-safe\n according to Python's current type system.\n\n**What to do?**\n- Ignore these type errors for filter classes, or use `# type: ignore` comments.\n- Document this behavior for users and contributors.\n- If you need strict `bool` results, use explicit methods or evaluate filters directly.\n\n**Community Note:**\nThis is a known limitation in Python's type system. A future PEP may address DSL-friendly operator overloads, but for now, this pattern will always cause typechecker friction.\n\n---\n## Developer & Release Conventions\n\nProject-level conventions, contributor guidance, and release steps are maintained in `AI_CONTEXT.md` in the repository root. Please consult that file for the latest instructions on coding style, testing, and release procedures.\n\n---\n\n\n## Release Summary\n\n### Version: **v0.0.8** | Date: 2025-10-29\n\n#### Highlights\n\n- **README.md refactored:** Fixed issue in `README.md` file. Did not double check AI.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "PathQL: A Python project for declarative file system querying.",
"version": "0.0.8",
"project_urls": null,
"split_keywords": [
"pathlib",
" query",
" filesystem"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "eb193a905be6217b086e9456360ca27067a320806245a92340d980e571f589ec",
"md5": "4dc2d9230d09ff3f1d72dad1f1b12065",
"sha256": "657368614de88bf1db9312199efa9b02c382b98bfbeeead4062ebd7c295c9cba"
},
"downloads": -1,
"filename": "pathql-0.0.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4dc2d9230d09ff3f1d72dad1f1b12065",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 48536,
"upload_time": "2025-10-29T20:19:41",
"upload_time_iso_8601": "2025-10-29T20:19:41.853706Z",
"url": "https://files.pythonhosted.org/packages/eb/19/3a905be6217b086e9456360ca27067a320806245a92340d980e571f589ec/pathql-0.0.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0715f7805fc254566f7c1c8dab2d7735c8686aef73e944b524f0ca1b375ea97f",
"md5": "6d8e13e0551f036229b8cc4297c96881",
"sha256": "a9aaded1f30b8f0e548786865bdfd1a7c640da97a7ba7d65890acc164695bb22"
},
"downloads": -1,
"filename": "pathql-0.0.8.tar.gz",
"has_sig": false,
"md5_digest": "6d8e13e0551f036229b8cc4297c96881",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 92816,
"upload_time": "2025-10-29T20:19:42",
"upload_time_iso_8601": "2025-10-29T20:19:42.889497Z",
"url": "https://files.pythonhosted.org/packages/07/15/f7805fc254566f7c1c8dab2d7735c8686aef73e944b524f0ca1b375ea97f/pathql-0.0.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-29 20:19:42",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pathql"
}