My grab bag of convenience functions for files and filenames/pathnames.
*Latest release 20241007.1*:
Bugfix for @atomic_filename.
## <a name="abspath_from_file"></a>`abspath_from_file(path, from_file)`
Return the absolute path of `path` with respect to `from_file`,
as one might do for an include file.
## <a name="atomic_filename"></a>`atomic_filename(filename, exists_ok=False, placeholder=False, dir=None, prefix=None, suffix=None, rename_func=<built-in function rename>, **tempfile_kw)`
A context manager to create `filename` atomicly on completion.
This yields a `NamedTemporaryFile` to use to create the file contents.
On completion the temporary file is renamed to the target name `filename`.
If the caller decides to _not_ create the target they may remove the
temporary file. This is not considered an error.
Parameters:
* `filename`: the file name to create
* `exists_ok`: default `False`;
if true it not an error if `filename` already exists
* `placeholder`: create a placeholder file at `filename`
while the real contents are written to the temporary file
* `dir`: passed to `NamedTemporaryFile`, specifies the directory
to hold the temporary file; the default is `dirname(filename)`
to ensure the rename is atomic
* `prefix`: passed to `NamedTemporaryFile`, specifies a prefix
for the temporary file; the default is a dot (`'.'`) plus the prefix
from `splitext(basename(filename))`
* `suffix`: passed to `NamedTemporaryFile`, specifies a suffix
for the temporary file; the default is the extension obtained
from `splitext(basename(filename))`
* `rename_func`: a callable accepting `(tempname,filename)`
used to rename the temporary file to the final name; the
default is `os.rename` and this parametr exists to accept
something such as `FSTags.move`
Other keyword arguments are passed to the `NamedTemporaryFile` constructor.
Example:
>>> import os
>>> from os.path import exists as existspath
>>> fn = 'test_atomic_filename'
>>> with atomic_filename(fn, mode='w') as f:
... assert not existspath(fn)
... print('foo', file=f)
... assert not existspath(fn)
...
>>> assert existspath(fn)
>>> assert open(fn).read() == 'foo\n'
>>> os.remove(fn)
## <a name="BackedFile"></a>Class `BackedFile(ReadMixin)`
A RawIOBase duck type
which uses a backing file for initial data
and writes new data to a front scratch file.
*`BackedFile.__init__(self, back_file, dirpath=None)`*:
Initialise the BackedFile using `back_file` for the backing data.
*`BackedFile.__enter__(self)`*:
BackedFile instances offer a context manager that take the lock,
allowing synchronous use of the file
without implementing a suite of special methods like pread/pwrite.
*`BackedFile.close(self)`*:
Close the BackedFile.
Flush contents. Close the front_file if necessary.
*`BackedFile.datafrom(self, offset)`*:
Generator yielding natural chunks from the file commencing at offset.
*`BackedFile.seek(self, pos, whence=0)`*:
Adjust the current file pointer offset.
*`BackedFile.switch_back_file(self, new_back_file)`*:
Switch out one back file for another. Return the old back file.
*`BackedFile.tell(self)`*:
Report the current file pointer offset.
*`BackedFile.write(self, b)`*:
Write data to the front_file.
## <a name="BackedFile_TestMethods"></a>Class `BackedFile_TestMethods`
Mixin for testing subclasses of BackedFile.
Tests self.backed_fp.
*`BackedFile_TestMethods.test_BackedFile(self)`*:
Test function for a BackedFile to use in unit test suites.
## <a name="byteses_as_fd"></a>`byteses_as_fd(bss, **kw)`
Deliver the iterable of bytes `bss` as a readable file descriptor.
Return the file descriptor.
Any keyword arguments are passed to `CornuCopyBuffer.as_fd`.
Example:
# present a passphrase for use as in input file descrptor
# for a subprocess
rfd = byteses_as_fd([(passphrase + '
').encode()])
## <a name="common_path_prefix"></a>`common_path_prefix(*paths)`
Return the common path prefix of the `paths`.
Note that the common prefix of `'/a/b/c1'` and `'/a/b/c2'`
is `'/a/b/'`, _not_ `'/a/b/c'`.
Callers may find it useful to preadjust the supplied paths
with `normpath`, `abspath` or `realpath` from `os.path`;
see the `os.path` documentation for the various caveats
which go with those functions.
Examples:
>>> # the obvious
>>> common_path_prefix('', '')
''
>>> common_path_prefix('/', '/')
'/'
>>> common_path_prefix('a', 'a')
'a'
>>> common_path_prefix('a', 'b')
''
>>> # nonempty directory path prefixes end in os.sep
>>> common_path_prefix('/', '/a')
'/'
>>> # identical paths include the final basename
>>> common_path_prefix('p/a', 'p/a')
'p/a'
>>> # the comparison does not normalise paths
>>> common_path_prefix('p//a', 'p//a')
'p//a'
>>> common_path_prefix('p//a', 'p//b')
'p//'
>>> common_path_prefix('p//a', 'p/a')
'p/'
>>> common_path_prefix('p/a', 'p/b')
'p/'
>>> # the comparison strips complete unequal path components
>>> common_path_prefix('p/a1', 'p/a2')
'p/'
>>> common_path_prefix('p/a/b1', 'p/a/b2')
'p/a/'
>>> # contrast with cs.lex.common_prefix
>>> common_prefix('abc/def', 'abc/def1')
'abc/def'
>>> common_path_prefix('abc/def', 'abc/def1')
'abc/'
>>> common_prefix('abc/def', 'abc/def1', 'abc/def2')
'abc/def'
>>> common_path_prefix('abc/def', 'abc/def1', 'abc/def2')
'abc/'
## <a name="compare"></a>`compare(f1, f2, mode='rb')`
Compare the contents of two file-like objects `f1` and `f2` for equality.
If `f1` or `f2` is a string, open the named file using `mode`
(default: `"rb"`).
## <a name="copy_data"></a>`copy_data(fpin, fpout, nbytes, rsize=None)`
Copy `nbytes` of data from `fpin` to `fpout`,
return the number of bytes copied.
Parameters:
* `nbytes`: number of bytes to copy.
If `None`, copy until EOF.
* `rsize`: read size, default `DEFAULT_READSIZE`.
## <a name="crop_name"></a>`crop_name(name, ext=None, name_max=255)`
Crop a file basename so as not to exceed `name_max` in length.
Return the original `name` if it already short enough.
Otherwise crop `name` before the file extension
to make it short enough.
Parameters:
* `name`: the file basename to crop
* `ext`: optional file extension;
the default is to infer the extension with `os.path.splitext`.
* `name_max`: optional maximum length, default: `255`
## <a name="datafrom"></a>`datafrom(f, offset=None, readsize=None, maxlength=None)`
General purpose reader for files yielding data from `offset`.
*WARNING*: this function might move the file pointer.
Parameters:
* `f`: the file from which to read data;
if a string, the file is opened with mode="rb";
if an int, treated as an OS file descriptor;
otherwise presumed to be a file-like object.
If that object has a `.fileno()` method, treat that as an
OS file descriptor and use it.
* `offset`: starting offset for the data
* `maxlength`: optional maximum amount of data to yield
* `readsize`: read size, default DEFAULT_READSIZE.
For file-like objects, the read1 method is used in preference
to read if available. The file pointer is briefly moved during
fetches.
## <a name="datafrom_fd"></a>`datafrom_fd(fd, offset=None, readsize=None, aligned=True, maxlength=None)`
General purpose reader for file descriptors yielding data from `offset`.
**Note**: This does not move the file descriptor position
**if** the file is seekable.
Parameters:
* `fd`: the file descriptor from which to read.
* `offset`: the offset from which to read.
If omitted, use the current file descriptor position.
* `readsize`: the read size, default: `DEFAULT_READSIZE`
* `aligned`: if true (the default), the first read is sized
to align the new offset with a multiple of `readsize`.
* `maxlength`: if specified yield no more than this many bytes of data.
## <a name="file_based"></a>`file_based(*da, **dkw)`
A decorator which caches a value obtained from a file.
In addition to all the keyword arguments for `@cs.deco.cachedmethod`,
this decorator also accepts the following arguments:
* `attr_name`: the name for the associated attribute, used as
the basis for the internal cache value attribute
* `filename`: the filename to monitor.
Default from the `._{attr_name}__filename` attribute.
This value will be passed to the method as the `filename` keyword
parameter.
* `poll_delay`: delay between file polls, default `DEFAULT_POLL_INTERVAL`.
* `sig_func`: signature function used to encapsulate the relevant
information about the file; default
cs.filestate.FileState({filename}).
If the decorated function raises OSError with errno == ENOENT,
this returns None. Other exceptions are reraised.
## <a name="file_data"></a>`file_data(fp, nbytes=None, rsize=None)`
Read `nbytes` of data from `fp` and yield the chunks as read.
Parameters:
* `nbytes`: number of bytes to read; if None read until EOF.
* `rsize`: read size, default DEFAULT_READSIZE.
## <a name="file_property"></a>`file_property(*da, **dkw)`
A property whose value reloads if a file changes.
## <a name="files_property"></a>`files_property(func)`
A property whose value reloads if any of a list of files changes.
Note: this is just the default mode for `make_files_property`.
`func` accepts the file path and returns the new value.
The underlying attribute name is `'_'+func.__name__`,
the default from `make_files_property()`.
The attribute *{attr_name}*`_lock` is a mutex controlling access to the property.
The attributes *{attr_name}*`_filestates` and *{attr_name}*`_paths` track the
associated file states.
The attribute *{attr_name}*`_lastpoll` tracks the last poll time.
The decorated function is passed the current list of files
and returns the new list of files and the associated value.
One example use would be a configuration file with recurive
include operations; the inner function would parse the first
file in the list, and the parse would accumulate this filename
and those of any included files so that they can be monitored,
triggering a fresh parse if one changes.
Example:
class C(object):
def __init__(self):
self._foo_path = '.foorc'
@files_property
def foo(self,paths):
new_paths, result = parse(paths[0])
return new_paths, result
The load function is called on the first access and on every
access thereafter where an associated file's `FileState` has
changed and the time since the last successful load exceeds
the poll_rate (1s). An attempt at avoiding races is made by
ignoring reloads that raise exceptions and ignoring reloads
where files that were stat()ed during the change check have
changed state after the load.
## <a name="find"></a>`find(path, select=None, sort_names=True)`
Walk a directory tree `path`
yielding selected paths.
Note: not selecting a directory prunes all its descendants.
## <a name="findup"></a>`findup(path, test, first=False)`
Test the pathname `abspath(path)` and each of its ancestors
against the callable `test`,
yielding paths satisfying the test.
If `first` is true (default `False`)
this function always yields exactly one value,
either the first path satisfying the test or `None`.
This mode supports a use such as:
matched_path = next(findup(path, test, first=True))
# post condition: matched_path will be `None` on no match
# otherwise the first matching path
## <a name="gzifopen"></a>`gzifopen(path, mode='r', *a, **kw)`
Context manager to open a file which may be a plain file or a gzipped file.
If `path` ends with `'.gz'` then the filesystem paths attempted
are `path` and `path` without the extension, otherwise the
filesystem paths attempted are `path+'.gz'` and `path`. In
this way a path ending in `'.gz'` indicates a preference for
a gzipped file otherwise an uncompressed file.
However, if exactly one of the paths exists already then only
that path will be used.
Note that the single character modes `'r'`, `'a'`, `'w'` and `'x'`
are text mode for both uncompressed and gzipped opens,
like the builtin `open` and *unlike* `gzip.open`.
This is to ensure equivalent behaviour.
## <a name="iter_fd"></a>`iter_fd(fd, **kw)`
Iterate over data from the file descriptor `fd`.
## <a name="iter_file"></a>`iter_file(f, **kw)`
Iterate over data from the file `f`.
## <a name="lines_of"></a>`lines_of(fp, partials=None)`
Generator yielding lines from a file until EOF.
Intended for file-like objects that lack a line iteration API.
## <a name="lockfile"></a>`lockfile(path, **lock_kw)`
A context manager which takes and holds a lock file.
An open file descriptor is kept for the lock file as well
to aid locating the process holding the lock file using eg `lsof`.
This is just a context manager shim for `makelockfile`
and all arguments are plumbed through.
## <a name="make_files_property"></a>`make_files_property(attr_name=None, unset_object=None, poll_rate=1.0)`
Construct a decorator that watches multiple associated files.
Parameters:
* `attr_name`: the underlying attribute, default: `'_'+func.__name__`
* `unset_object`: the sentinel value for "uninitialised", default: `None`
* `poll_rate`: how often in seconds to poll the file for changes,
default from `DEFAULT_POLL_INTERVAL`: `1.0`
The attribute *attr_name*`_lock` controls access to the property.
The attributes *attr_name*`_filestates` and *attr_name*`_paths` track the
associated files' state.
The attribute *attr_name*`_lastpoll` tracks the last poll time.
The decorated function is passed the current list of files
and returns the new list of files and the associated value.
One example use would be a configuration file with recursive
include operations; the inner function would parse the first
file in the list, and the parse would accumulate this filename
and those of any included files so that they can be monitored,
triggering a fresh parse if one changes.
Example:
class C(object):
def __init__(self):
self._foo_path = '.foorc'
@files_property
def foo(self,paths):
new_paths, result = parse(paths[0])
return new_paths, result
The load function is called on the first access and on every
access thereafter where an associated file's `FileState` has
changed and the time since the last successful load exceeds
the `poll_rate`.
An attempt at avoiding races is made by
ignoring reloads that raise exceptions and ignoring reloads
where files that were `os.stat()`ed during the change check have
changed state after the load.
## <a name="makelockfile"></a>`makelockfile(path, *, ext=None, poll_interval=None, timeout=None, runstate: Optional[cs.resources.RunState] = <function uses_runstate.<locals>.<lambda> at 0x109192ca0>, keepopen=False, max_interval=37)`
Create a lockfile and return its path.
The lockfile can be removed with `os.remove`.
This is the core functionality supporting the `lockfile()`
context manager.
Parameters:
* `path`: the base associated with the lock file,
often the filesystem object whose access is being managed.
* `ext`: the extension to the base used to construct the lockfile name.
Default: ".lock"
* `timeout`: maximum time to wait before failing.
Default: `None` (wait forever).
Note that zero is an accepted value
and requires the lock to succeed on the first attempt.
* `poll_interval`: polling frequency when timeout is not 0.
* `runstate`: optional `RunState` duck instance supporting cancellation.
Note that if a cancelled `RunState` is provided
no attempt will be made to make the lockfile.
* `keepopen`: optional flag, default `False`:
if true, do not close the lockfile and return `(lockpath,lockfd)`
being the lock file path and the open file descriptor
## <a name="max_suffix"></a>`max_suffix(dirpath, prefix)`
Compute the highest existing numeric suffix
for names starting with `prefix`.
This is generally used as a starting point for picking
a new numeric suffix.
## <a name="mkdirn"></a>`mkdirn(path, sep='')`
Create a new directory named `path+sep+n`,
where `n` exceeds any name already present.
Parameters:
* `path`: the basic directory path.
* `sep`: a separator between `path` and `n`.
Default: `''`
## <a name="NamedTemporaryCopy"></a>`NamedTemporaryCopy(f, progress=False, progress_label=None, **kw)`
A context manager yielding a temporary copy of `filename`
as returned by `NamedTemporaryFile(**kw)`.
Parameters:
* `f`: the name of the file to copy, or an open binary file,
or a `CornuCopyBuffer`
* `progress`: an optional progress indicator, default `False`;
if a `bool`, show a progress bar for the copy phase if true;
if an `int`, show a progress bar for the copy phase
if the file size equals or exceeds the value;
otherwise it should be a `cs.progress.Progress` instance
* `progress_label`: option progress bar label,
only used if a progress bar is made
Other keyword parameters are passed to `tempfile.NamedTemporaryFile`.
## <a name="NullFile"></a>Class `NullFile`
Writable file that discards its input.
Note that this is _not_ an open of `os.devnull`;
it just discards writes and is not the underlying file descriptor.
*`NullFile.__init__(self)`*:
Initialise the file offset to 0.
*`NullFile.flush(self)`*:
Flush buffered data to the subsystem.
*`NullFile.write(self, data)`*:
Discard data, advance file offset by length of data.
## <a name="Pathname"></a>Class `Pathname(builtins.str)`
Subclass of str presenting convenience properties useful for
format strings related to file paths.
*`Pathname.__format__(self, fmt_spec)`*:
Calling format(<Pathname>, fmt_spec) treat `fmt_spec` as a new style
formatting string with a single positional parameter of `self`.
*`Pathname.abs`*:
The absolute form of this Pathname.
*`Pathname.basename`*:
The basename of this Pathname.
*`Pathname.dirname`*:
The dirname of the Pathname.
*`Pathname.isabs`*:
Whether this Pathname is an absolute Pathname.
*`Pathname.short`*:
The shortened form of this Pathname.
*`Pathname.shorten(self, prefixes=None)`*:
Shorten a Pathname using ~ and ~user.
## <a name="poll_file"></a>`poll_file(path, old_state, reload_file, missing_ok=False)`
Watch a file for modification by polling its state as obtained
by `FileState()`.
Call `reload_file(path)` if the state changes.
Return `(new_state,reload_file(path))` if the file was modified
and was unchanged (stable state) before and after the reload_file().
Otherwise return `(None,None)`.
This may raise an `OSError` if the `path` cannot be `os.stat()`ed
and of course for any exceptions that occur calling `reload_file`.
If `missing_ok` is true then a failure to `os.stat()` which
raises `OSError` with `ENOENT` will just return `(None,None)`.
## <a name="read_data"></a>`read_data(fp, nbytes, rsize=None)`
Read `nbytes` of data from `fp`, return the data.
Parameters:
* `nbytes`: number of bytes to copy.
If `None`, copy until EOF.
* `rsize`: read size, default `DEFAULT_READSIZE`.
## <a name="read_from"></a>`read_from(fp, rsize=None, tail_mode=False, tail_delay=None)`
Generator to present text or data from an open file until EOF.
Parameters:
* `rsize`: read size, default: DEFAULT_READSIZE
* `tail_mode`: if true, yield an empty chunk at EOF, allowing resumption
if the file grows.
## <a name="ReadMixin"></a>Class `ReadMixin`
Useful read methods to accomodate modes not necessarily available in a class.
Note that this mixin presumes that the attribute `self._lock`
is a threading.RLock like context manager.
Classes using this mixin should consider overriding the default
.datafrom method with something more efficient or direct.
*`ReadMixin.bufferfrom(self, offset)`*:
Return a CornuCopyBuffer from the specified `offset`.
*`ReadMixin.datafrom(self, offset, readsize=None)`*:
Yield data from the specified `offset` onward in some
approximation of the "natural" chunk size.
*NOTE*: UNLIKE the global datafrom() function, this method
MUST NOT move the logical file position. Implementors may need
to save and restore the file pointer within a lock around
the I/O if they do not use a direct access method like
os.pread.
The aspiration here is to read data with only a single call
to the underlying storage, and to return the chunks in
natural sizes instead of some default read size.
Classes using this mixin must implement this method.
*`ReadMixin.read(self, size=-1, offset=None, longread=False)`*:
Read up to `size` bytes, honouring the "single system call"
spirit unless `longread` is true.
Parameters:
* `size`: the number of bytes requested. A size of -1 requests
all bytes to the end of the file.
* `offset`: the starting point of the read; if None, use the
current file position; if not None, seek to this position
before reading, even if `size` == 0.
* `longread`: switch from "single system call" to "as many
as required to obtain `size` bytes"; short data will still
be returned if the file is too short.
*`ReadMixin.read_n(self, n)`*:
Read `n` bytes of data and return them.
Unlike traditional file.read(), RawIOBase.read() may return short
data, thus this workalike, which may only return short data if it
hits EOF.
*`ReadMixin.readinto(self, barray)`*:
Read data into a bytearray.
## <a name="rewrite"></a>`rewrite(filepath, srcf, mode='w', backup_ext=None, do_rename=False, do_diff=None, empty_ok=False, overwrite_anyway=False)`
Rewrite the file `filepath` with data from the file object `srcf`.
Return `True` if the content was changed, `False` if unchanged.
Parameters:
* `filepath`: the name of the file to rewrite.
* `srcf`: the source file containing the new content.
* `mode`: the write-mode for the file, default `'w'` (for text);
use `'wb'` for binary data.
* `empty_ok`: if true (default `False`),
do not raise `ValueError` if the new data are empty.
* `overwrite_anyway`: if true (default `False`),
skip the content check and overwrite unconditionally.
* `backup_ext`: if a nonempty string,
take a backup of the original at `filepath + backup_ext`.
* `do_diff`: if not `None`, call `do_diff(filepath,tempfile)`.
* `do_rename`: if true (default `False`),
rename the temp file to `filepath`
after copying the permission bits.
Otherwise (default), copy the tempfile to `filepath`;
this preserves the file's inode and permissions etc.
## <a name="rewrite_cmgr"></a>`rewrite_cmgr(filepath, mode='w', **kw)`
Rewrite a file, presented as a context manager.
Parameters:
* `mode`: file write mode, defaulting to "w" for text.
Other keyword parameters are passed to `rewrite()`.
Example:
with rewrite_cmgr(pathname, do_rename=True) as f:
... write new content to f ...
## <a name="RWFileBlockCache"></a>Class `RWFileBlockCache`
A scratch file for storing data.
*`RWFileBlockCache.__init__(self, pathname=None, dirpath=None, suffix=None, lock=None)`*:
Initialise the file.
Parameters:
* `pathname`: path of file. If None, create a new file with
tempfile.mkstemp using dir=`dirpath` and unlink that file once
opened.
* `dirpath`: location for the file if made by mkstemp as above.
* `lock`: an object to use as a mutex, allowing sharing with
some outer system. A Lock will be allocated if omitted.
*`RWFileBlockCache.close(self)`*:
Close the file descriptors.
*`RWFileBlockCache.closed`*:
Test whether the file descriptor has been closed.
*`RWFileBlockCache.get(self, offset, length)`*:
Get data from `offset` of length `length`.
*`RWFileBlockCache.put(self, data)`*:
Store `data`, return offset.
## <a name="saferename"></a>`saferename(oldpath, newpath)`
Rename a path using `os.rename()`,
but raise an exception if the target path already exists.
Note: slightly racey.
## <a name="seekable"></a>`seekable(fp)`
Try to test whether a filelike object is seekable.
First try the `IOBase.seekable` method, otherwise try getting a file
descriptor from `fp.fileno` and `os.stat()`ing that,
otherwise return `False`.
## <a name="Tee"></a>Class `Tee`
An object with .write, .flush and .close methods
which copies data to multiple output files.
*`Tee.__init__(self, *fps)`*:
Initialise the Tee; any arguments are taken to be output file objects.
*`Tee.add(self, output)`*:
Add a new output.
*`Tee.close(self)`*:
Close all the outputs and close the Tee.
*`Tee.flush(self)`*:
Flush all the outputs.
*`Tee.write(self, data)`*:
Write the data to all the outputs.
Note: does not detect or accodmodate short writes.
## <a name="tee"></a>`tee(fp, fp2)`
Context manager duplicating `.write` and `.flush` from `fp` to `fp2`.
## <a name="tmpdir"></a>`tmpdir()`
Return the pathname of the default temporary directory for scratch data,
the environment variable `$TMPDIR` or `'/tmp'`.
## <a name="tmpdirn"></a>`tmpdirn(tmp=None)`
Make a new temporary directory with a numeric suffix.
## <a name="trysaferename"></a>`trysaferename(oldpath, newpath)`
A `saferename()` that returns `True` on success,
`False` on failure.
# Release Log
*Release 20241007.1*:
Bugfix for @atomic_filename.
*Release 20241007*:
atomic_filename: new feature - the caller may remove the temporary file to indicate that the target file should not be made.
*Release 20240723*:
lockfile: now a purer shim for makelockfile.
*Release 20240709*:
rewrite: return True if the content is modified, False otherwise.
*Release 20240630*:
makelockfile: cap the retry poll interval at 37s, just issue a warning if the lock file is already gone on exit (eg manual removal).
*Release 20240316*:
Fixed release upload artifacts.
*Release 20240201*:
* makelockfile: new optional keepopen parameter - if true return the lock path and an open file descriptor.
* lockfile(): keep the lock file open to aid debugging with eg lsof.
*Release 20231129*:
* atomic_filename: accept optional rename_func to use instead of os.rename, supports using FSTags.move.
* atomic_filename: clean up the temp file.
*Release 20230421*:
atomic_filename: raise FileExistsError instead of ValueError if not exists_ok and existspath(filename).
*Release 20230401*:
Replaced a lot of runstate plumbing with @uses_runstate.
*Release 20221118*:
atomic_filename: use shutil.copystat instead of shutil.copymode, bugfix the associated logic.
*Release 20220429*:
Move longpath and shortpath to cs.fs, leave legacy names behind.
*Release 20211208*:
* Move NDJSON stuff to separate cs.ndjson module.
* New gzifopen() function to open either a gzipped file or an uncompressed file.
*Release 20210906*:
Additional release because I'm unsure @atomic_filename made it into the previous release.
*Release 20210731*:
New atomic_filename context manager wrapping NamedTemporaryFile for presenting a file after its contents are prepared.
*Release 20210717*:
Updates for recent cs.mappings-20210717 release.
*Release 20210420*:
* Forensic prefix for NamedTemporaryCopy.
* UUIDNDJSONMapping: provide an empty .scan_errors on instantiation, avoids AttributeError if a scan never occurs.
*Release 20210306*:
* datafrom_fd: fix use-before-set of is_seekable.
* RWFileBlockCache.put: remove assert(len(data)>0), adjust logic.
*Release 20210131*:
crop_name: put ext before name_max, more likely to be specified, I think.
*Release 20201227.1*:
Docstring tweak.
*Release 20201227*:
scan_ndjson: optional errors_list to accrue errors during the scan.
*Release 20201108*:
Bugfix rewrite_cmgr, failed to flush a file before copying its contents.
*Release 20201102*:
* Newline delimited JSON (ndjson) support.
* New UUIDNDJSONMapping implementing a singleton cs.mappings.LoadableMappingMixin of cs.mappings.UUIDedDict subclass instances backed by an NDJSON file.
* New scan_ndjson() function to yield newline delimited JSON records.
* New write_ndjson() function to write newline delimited JSON records.
* New append_ndjson() function to append a single newline delimited JSON record to a file.
* New NamedTemporaryCopy for creating a temporary copy of a file with an optional progress bar.
* rewrite_cmgr: turn into a simple wrapper for rewrite.
* datafrom: make the offset parameter optional, tweak the @strable open function.
* datafrom_fd: support nonseekable file descriptors, document that for these the file position is moved (no pread support).
* New iter_fd and iter_file to return iterators of a file's data by utilising a CornuCopyBuffer.
* New byteses_as_fd to return a readable file descriptor receiving an iterable of bytes via a CornuCopyBuffer.
*Release 20200914*:
New common_path_prefix to compare pathnames.
*Release 20200517*:
* New crop_name() function to crop a file basename to fit within a specific length.
* New find() function complimenting findup (UNTESTED).
*Release 20200318*:
New findup(path,test) generator to walk up a file tree.
*Release 20191006*:
Adjust import of cs.deco.cachedmethod.
*Release 20190729*:
datafrom_fd: make `offset` optional, defaulting to fd position at call.
*Release 20190617*:
@file_based: adjust use of @cached from cached(wrap0, **dkw) to cached(**dkw)(wrap0).
*Release 20190101*:
datafrom: add maxlength keyword arg, bugfix fd and f.fileno cases.
*Release 20181109*:
* Various bugfixes for BackedFile.
* Use a file's .read1 method if available in some scenarios.
* makelockfile: accept am optional RunState control parameter, improve some behaviour.
* datafrom_fd: new optional maxlength parameter limiting the amount of data returned.
* datafrom_fd: by default, perform an initial read to align all subsequent reads with the readsize.
* drop fdreader, add datafrom(f, offset, readsize) accepting a file or a file descriptor, expose datafrom_fd.
* ReadMixin.datafrom now mandatory. Add ReadMixin.bufferfrom.
* Assorted other improvements, minor bugfixes, documentation improvements.
*Release 20171231.1*:
Trite DISTINFO fix, no semantic changes.
*Release 20171231*:
Update imports, bump DEFAULT_READSIZE from 8KiB to 128KiB.
*Release 20170608*:
* Move lockfile and the SharedAppend* classes to cs.sharedfile.
* BackedFile internal changes.
*Release 20160918*:
* BackedFile: redo implementation of .front_file to fix resource leak; add .__len__; add methods .spans, .front_spans and .back_spans to return information about front vs back data.
* seek: bugfix: seek should return the new file offset.
* BackedFile does not subclass RawIOBase, it just works like one.
*Release 20160828*:
* Use "install_requires" instead of "requires" in DISTINFO.
* Rename maxFilenameSuffix to max_suffix.
* Pull in OpenSocket file-like socket wrapper from cs.venti.tcp.
* Update for cs.asynchron changes.
* ... then move cs.fileutils.OpenSocket into new module cs.socketutils.
* New Tee class, for copying output to multiple files.
* NullFile class which discards writes (==> no-op for Tee).
* New class SavingFile to accrue output and move to specified pathname when complete.
* Memory usage improvements.
* Polyfill non-threadsafe implementation of pread if os.pread does not exist.
* New function seekable() to probe a file for seekability.
* SharedAppendFile: provide new .open(filemode) context manager for allowing direct file output for external users.
* New function makelockfile() presenting the logic to create a lock file separately from the lockfile context manager.
* Assorted bugfixes and improvements.
*Release 20150116*:
Initial PyPI release.
Raw data
{
"_id": null,
"home_page": null,
"name": "cs.fileutils",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "python2, python3",
"author": null,
"author_email": "Cameron Simpson <cs@cskk.id.au>",
"download_url": "https://files.pythonhosted.org/packages/1c/67/cfd9e5f3f3b0c0d519ad6c26feb4d114601e6409838afeb27b359f3ec251/cs_fileutils-20241007.1.tar.gz",
"platform": null,
"description": "My grab bag of convenience functions for files and filenames/pathnames.\n\n*Latest release 20241007.1*:\nBugfix for @atomic_filename.\n\n## <a name=\"abspath_from_file\"></a>`abspath_from_file(path, from_file)`\n\nReturn the absolute path of `path` with respect to `from_file`,\nas one might do for an include file.\n\n## <a name=\"atomic_filename\"></a>`atomic_filename(filename, exists_ok=False, placeholder=False, dir=None, prefix=None, suffix=None, rename_func=<built-in function rename>, **tempfile_kw)`\n\nA context manager to create `filename` atomicly on completion.\nThis yields a `NamedTemporaryFile` to use to create the file contents.\nOn completion the temporary file is renamed to the target name `filename`.\n\nIf the caller decides to _not_ create the target they may remove the\ntemporary file. This is not considered an error.\n\nParameters:\n* `filename`: the file name to create\n* `exists_ok`: default `False`;\n if true it not an error if `filename` already exists\n* `placeholder`: create a placeholder file at `filename`\n while the real contents are written to the temporary file\n* `dir`: passed to `NamedTemporaryFile`, specifies the directory\n to hold the temporary file; the default is `dirname(filename)`\n to ensure the rename is atomic\n* `prefix`: passed to `NamedTemporaryFile`, specifies a prefix\n for the temporary file; the default is a dot (`'.'`) plus the prefix\n from `splitext(basename(filename))`\n* `suffix`: passed to `NamedTemporaryFile`, specifies a suffix\n for the temporary file; the default is the extension obtained\n from `splitext(basename(filename))`\n* `rename_func`: a callable accepting `(tempname,filename)`\n used to rename the temporary file to the final name; the\n default is `os.rename` and this parametr exists to accept\n something such as `FSTags.move`\nOther keyword arguments are passed to the `NamedTemporaryFile` constructor.\n\nExample:\n\n >>> import os\n >>> from os.path import exists as existspath\n >>> fn = 'test_atomic_filename'\n >>> with atomic_filename(fn, mode='w') as f:\n ... assert not existspath(fn)\n ... print('foo', file=f)\n ... assert not existspath(fn)\n ...\n >>> assert existspath(fn)\n >>> assert open(fn).read() == 'foo\\n'\n >>> os.remove(fn)\n\n## <a name=\"BackedFile\"></a>Class `BackedFile(ReadMixin)`\n\nA RawIOBase duck type\nwhich uses a backing file for initial data\nand writes new data to a front scratch file.\n\n*`BackedFile.__init__(self, back_file, dirpath=None)`*:\nInitialise the BackedFile using `back_file` for the backing data.\n\n*`BackedFile.__enter__(self)`*:\nBackedFile instances offer a context manager that take the lock,\nallowing synchronous use of the file\nwithout implementing a suite of special methods like pread/pwrite.\n\n*`BackedFile.close(self)`*:\nClose the BackedFile.\nFlush contents. Close the front_file if necessary.\n\n*`BackedFile.datafrom(self, offset)`*:\nGenerator yielding natural chunks from the file commencing at offset.\n\n*`BackedFile.seek(self, pos, whence=0)`*:\nAdjust the current file pointer offset.\n\n*`BackedFile.switch_back_file(self, new_back_file)`*:\nSwitch out one back file for another. Return the old back file.\n\n*`BackedFile.tell(self)`*:\nReport the current file pointer offset.\n\n*`BackedFile.write(self, b)`*:\nWrite data to the front_file.\n\n## <a name=\"BackedFile_TestMethods\"></a>Class `BackedFile_TestMethods`\n\nMixin for testing subclasses of BackedFile.\nTests self.backed_fp.\n\n*`BackedFile_TestMethods.test_BackedFile(self)`*:\nTest function for a BackedFile to use in unit test suites.\n\n## <a name=\"byteses_as_fd\"></a>`byteses_as_fd(bss, **kw)`\n\nDeliver the iterable of bytes `bss` as a readable file descriptor.\n Return the file descriptor.\n Any keyword arguments are passed to `CornuCopyBuffer.as_fd`.\n\n Example:\n\n # present a passphrase for use as in input file descrptor\n # for a subprocess\n rfd = byteses_as_fd([(passphrase + '\n').encode()])\n\n## <a name=\"common_path_prefix\"></a>`common_path_prefix(*paths)`\n\nReturn the common path prefix of the `paths`.\n\nNote that the common prefix of `'/a/b/c1'` and `'/a/b/c2'`\nis `'/a/b/'`, _not_ `'/a/b/c'`.\n\nCallers may find it useful to preadjust the supplied paths\nwith `normpath`, `abspath` or `realpath` from `os.path`;\nsee the `os.path` documentation for the various caveats\nwhich go with those functions.\n\nExamples:\n\n >>> # the obvious\n >>> common_path_prefix('', '')\n ''\n >>> common_path_prefix('/', '/')\n '/'\n >>> common_path_prefix('a', 'a')\n 'a'\n >>> common_path_prefix('a', 'b')\n ''\n >>> # nonempty directory path prefixes end in os.sep\n >>> common_path_prefix('/', '/a')\n '/'\n >>> # identical paths include the final basename\n >>> common_path_prefix('p/a', 'p/a')\n 'p/a'\n >>> # the comparison does not normalise paths\n >>> common_path_prefix('p//a', 'p//a')\n 'p//a'\n >>> common_path_prefix('p//a', 'p//b')\n 'p//'\n >>> common_path_prefix('p//a', 'p/a')\n 'p/'\n >>> common_path_prefix('p/a', 'p/b')\n 'p/'\n >>> # the comparison strips complete unequal path components\n >>> common_path_prefix('p/a1', 'p/a2')\n 'p/'\n >>> common_path_prefix('p/a/b1', 'p/a/b2')\n 'p/a/'\n >>> # contrast with cs.lex.common_prefix\n >>> common_prefix('abc/def', 'abc/def1')\n 'abc/def'\n >>> common_path_prefix('abc/def', 'abc/def1')\n 'abc/'\n >>> common_prefix('abc/def', 'abc/def1', 'abc/def2')\n 'abc/def'\n >>> common_path_prefix('abc/def', 'abc/def1', 'abc/def2')\n 'abc/'\n\n## <a name=\"compare\"></a>`compare(f1, f2, mode='rb')`\n\nCompare the contents of two file-like objects `f1` and `f2` for equality.\n\nIf `f1` or `f2` is a string, open the named file using `mode`\n(default: `\"rb\"`).\n\n## <a name=\"copy_data\"></a>`copy_data(fpin, fpout, nbytes, rsize=None)`\n\nCopy `nbytes` of data from `fpin` to `fpout`,\nreturn the number of bytes copied.\n\nParameters:\n* `nbytes`: number of bytes to copy.\n If `None`, copy until EOF.\n* `rsize`: read size, default `DEFAULT_READSIZE`.\n\n## <a name=\"crop_name\"></a>`crop_name(name, ext=None, name_max=255)`\n\nCrop a file basename so as not to exceed `name_max` in length.\nReturn the original `name` if it already short enough.\nOtherwise crop `name` before the file extension\nto make it short enough.\n\nParameters:\n* `name`: the file basename to crop\n* `ext`: optional file extension;\n the default is to infer the extension with `os.path.splitext`.\n* `name_max`: optional maximum length, default: `255`\n\n## <a name=\"datafrom\"></a>`datafrom(f, offset=None, readsize=None, maxlength=None)`\n\nGeneral purpose reader for files yielding data from `offset`.\n\n*WARNING*: this function might move the file pointer.\n\nParameters:\n* `f`: the file from which to read data;\n if a string, the file is opened with mode=\"rb\";\n if an int, treated as an OS file descriptor;\n otherwise presumed to be a file-like object.\n If that object has a `.fileno()` method, treat that as an\n OS file descriptor and use it.\n* `offset`: starting offset for the data\n* `maxlength`: optional maximum amount of data to yield\n* `readsize`: read size, default DEFAULT_READSIZE.\n\nFor file-like objects, the read1 method is used in preference\nto read if available. The file pointer is briefly moved during\nfetches.\n\n## <a name=\"datafrom_fd\"></a>`datafrom_fd(fd, offset=None, readsize=None, aligned=True, maxlength=None)`\n\nGeneral purpose reader for file descriptors yielding data from `offset`.\n**Note**: This does not move the file descriptor position\n**if** the file is seekable.\n\nParameters:\n* `fd`: the file descriptor from which to read.\n* `offset`: the offset from which to read.\n If omitted, use the current file descriptor position.\n* `readsize`: the read size, default: `DEFAULT_READSIZE`\n* `aligned`: if true (the default), the first read is sized\n to align the new offset with a multiple of `readsize`.\n* `maxlength`: if specified yield no more than this many bytes of data.\n\n## <a name=\"file_based\"></a>`file_based(*da, **dkw)`\n\nA decorator which caches a value obtained from a file.\n\nIn addition to all the keyword arguments for `@cs.deco.cachedmethod`,\nthis decorator also accepts the following arguments:\n* `attr_name`: the name for the associated attribute, used as\n the basis for the internal cache value attribute\n* `filename`: the filename to monitor.\n Default from the `._{attr_name}__filename` attribute.\n This value will be passed to the method as the `filename` keyword\n parameter.\n* `poll_delay`: delay between file polls, default `DEFAULT_POLL_INTERVAL`.\n* `sig_func`: signature function used to encapsulate the relevant\n information about the file; default\n cs.filestate.FileState({filename}).\n\nIf the decorated function raises OSError with errno == ENOENT,\nthis returns None. Other exceptions are reraised.\n\n## <a name=\"file_data\"></a>`file_data(fp, nbytes=None, rsize=None)`\n\nRead `nbytes` of data from `fp` and yield the chunks as read.\n\nParameters:\n* `nbytes`: number of bytes to read; if None read until EOF.\n* `rsize`: read size, default DEFAULT_READSIZE.\n\n## <a name=\"file_property\"></a>`file_property(*da, **dkw)`\n\nA property whose value reloads if a file changes.\n\n## <a name=\"files_property\"></a>`files_property(func)`\n\nA property whose value reloads if any of a list of files changes.\n\nNote: this is just the default mode for `make_files_property`.\n\n`func` accepts the file path and returns the new value.\nThe underlying attribute name is `'_'+func.__name__`,\nthe default from `make_files_property()`.\nThe attribute *{attr_name}*`_lock` is a mutex controlling access to the property.\nThe attributes *{attr_name}*`_filestates` and *{attr_name}*`_paths` track the\nassociated file states.\nThe attribute *{attr_name}*`_lastpoll` tracks the last poll time.\n\nThe decorated function is passed the current list of files\nand returns the new list of files and the associated value.\n\nOne example use would be a configuration file with recurive\ninclude operations; the inner function would parse the first\nfile in the list, and the parse would accumulate this filename\nand those of any included files so that they can be monitored,\ntriggering a fresh parse if one changes.\n\nExample:\n\n class C(object):\n def __init__(self):\n self._foo_path = '.foorc'\n @files_property\n def foo(self,paths):\n new_paths, result = parse(paths[0])\n return new_paths, result\n\nThe load function is called on the first access and on every\naccess thereafter where an associated file's `FileState` has\nchanged and the time since the last successful load exceeds\nthe poll_rate (1s). An attempt at avoiding races is made by\nignoring reloads that raise exceptions and ignoring reloads\nwhere files that were stat()ed during the change check have\nchanged state after the load.\n\n## <a name=\"find\"></a>`find(path, select=None, sort_names=True)`\n\nWalk a directory tree `path`\nyielding selected paths.\n\nNote: not selecting a directory prunes all its descendants.\n\n## <a name=\"findup\"></a>`findup(path, test, first=False)`\n\nTest the pathname `abspath(path)` and each of its ancestors\nagainst the callable `test`,\nyielding paths satisfying the test.\n\nIf `first` is true (default `False`)\nthis function always yields exactly one value,\neither the first path satisfying the test or `None`.\nThis mode supports a use such as:\n\n matched_path = next(findup(path, test, first=True))\n # post condition: matched_path will be `None` on no match\n # otherwise the first matching path\n\n## <a name=\"gzifopen\"></a>`gzifopen(path, mode='r', *a, **kw)`\n\nContext manager to open a file which may be a plain file or a gzipped file.\n\nIf `path` ends with `'.gz'` then the filesystem paths attempted\nare `path` and `path` without the extension, otherwise the\nfilesystem paths attempted are `path+'.gz'` and `path`. In\nthis way a path ending in `'.gz'` indicates a preference for\na gzipped file otherwise an uncompressed file.\n\nHowever, if exactly one of the paths exists already then only\nthat path will be used.\n\nNote that the single character modes `'r'`, `'a'`, `'w'` and `'x'`\nare text mode for both uncompressed and gzipped opens,\nlike the builtin `open` and *unlike* `gzip.open`.\nThis is to ensure equivalent behaviour.\n\n## <a name=\"iter_fd\"></a>`iter_fd(fd, **kw)`\n\nIterate over data from the file descriptor `fd`.\n\n## <a name=\"iter_file\"></a>`iter_file(f, **kw)`\n\nIterate over data from the file `f`.\n\n## <a name=\"lines_of\"></a>`lines_of(fp, partials=None)`\n\nGenerator yielding lines from a file until EOF.\nIntended for file-like objects that lack a line iteration API.\n\n## <a name=\"lockfile\"></a>`lockfile(path, **lock_kw)`\n\nA context manager which takes and holds a lock file.\nAn open file descriptor is kept for the lock file as well\nto aid locating the process holding the lock file using eg `lsof`.\nThis is just a context manager shim for `makelockfile`\nand all arguments are plumbed through.\n\n## <a name=\"make_files_property\"></a>`make_files_property(attr_name=None, unset_object=None, poll_rate=1.0)`\n\nConstruct a decorator that watches multiple associated files.\n\nParameters:\n* `attr_name`: the underlying attribute, default: `'_'+func.__name__`\n* `unset_object`: the sentinel value for \"uninitialised\", default: `None`\n* `poll_rate`: how often in seconds to poll the file for changes,\n default from `DEFAULT_POLL_INTERVAL`: `1.0`\n\nThe attribute *attr_name*`_lock` controls access to the property.\nThe attributes *attr_name*`_filestates` and *attr_name*`_paths` track the\nassociated files' state.\nThe attribute *attr_name*`_lastpoll` tracks the last poll time.\n\nThe decorated function is passed the current list of files\nand returns the new list of files and the associated value.\n\nOne example use would be a configuration file with recursive\ninclude operations; the inner function would parse the first\nfile in the list, and the parse would accumulate this filename\nand those of any included files so that they can be monitored,\ntriggering a fresh parse if one changes.\n\nExample:\n\n class C(object):\n def __init__(self):\n self._foo_path = '.foorc'\n @files_property\n def foo(self,paths):\n new_paths, result = parse(paths[0])\n return new_paths, result\n\nThe load function is called on the first access and on every\naccess thereafter where an associated file's `FileState` has\nchanged and the time since the last successful load exceeds\nthe `poll_rate`.\n\nAn attempt at avoiding races is made by\nignoring reloads that raise exceptions and ignoring reloads\nwhere files that were `os.stat()`ed during the change check have\nchanged state after the load.\n\n## <a name=\"makelockfile\"></a>`makelockfile(path, *, ext=None, poll_interval=None, timeout=None, runstate: Optional[cs.resources.RunState] = <function uses_runstate.<locals>.<lambda> at 0x109192ca0>, keepopen=False, max_interval=37)`\n\nCreate a lockfile and return its path.\n\nThe lockfile can be removed with `os.remove`.\nThis is the core functionality supporting the `lockfile()`\ncontext manager.\n\nParameters:\n* `path`: the base associated with the lock file,\n often the filesystem object whose access is being managed.\n* `ext`: the extension to the base used to construct the lockfile name.\n Default: \".lock\"\n* `timeout`: maximum time to wait before failing.\n Default: `None` (wait forever).\n Note that zero is an accepted value\n and requires the lock to succeed on the first attempt.\n* `poll_interval`: polling frequency when timeout is not 0.\n* `runstate`: optional `RunState` duck instance supporting cancellation.\n Note that if a cancelled `RunState` is provided\n no attempt will be made to make the lockfile.\n* `keepopen`: optional flag, default `False`:\n if true, do not close the lockfile and return `(lockpath,lockfd)`\n being the lock file path and the open file descriptor\n\n## <a name=\"max_suffix\"></a>`max_suffix(dirpath, prefix)`\n\nCompute the highest existing numeric suffix\nfor names starting with `prefix`.\n\nThis is generally used as a starting point for picking\na new numeric suffix.\n\n## <a name=\"mkdirn\"></a>`mkdirn(path, sep='')`\n\nCreate a new directory named `path+sep+n`,\nwhere `n` exceeds any name already present.\n\nParameters:\n* `path`: the basic directory path.\n* `sep`: a separator between `path` and `n`.\n Default: `''`\n\n## <a name=\"NamedTemporaryCopy\"></a>`NamedTemporaryCopy(f, progress=False, progress_label=None, **kw)`\n\nA context manager yielding a temporary copy of `filename`\nas returned by `NamedTemporaryFile(**kw)`.\n\nParameters:\n* `f`: the name of the file to copy, or an open binary file,\n or a `CornuCopyBuffer`\n* `progress`: an optional progress indicator, default `False`;\n if a `bool`, show a progress bar for the copy phase if true;\n if an `int`, show a progress bar for the copy phase\n if the file size equals or exceeds the value;\n otherwise it should be a `cs.progress.Progress` instance\n* `progress_label`: option progress bar label,\n only used if a progress bar is made\nOther keyword parameters are passed to `tempfile.NamedTemporaryFile`.\n\n## <a name=\"NullFile\"></a>Class `NullFile`\n\nWritable file that discards its input.\n\nNote that this is _not_ an open of `os.devnull`;\nit just discards writes and is not the underlying file descriptor.\n\n*`NullFile.__init__(self)`*:\nInitialise the file offset to 0.\n\n*`NullFile.flush(self)`*:\nFlush buffered data to the subsystem.\n\n*`NullFile.write(self, data)`*:\nDiscard data, advance file offset by length of data.\n\n## <a name=\"Pathname\"></a>Class `Pathname(builtins.str)`\n\nSubclass of str presenting convenience properties useful for\nformat strings related to file paths.\n\n*`Pathname.__format__(self, fmt_spec)`*:\nCalling format(<Pathname>, fmt_spec) treat `fmt_spec` as a new style\nformatting string with a single positional parameter of `self`.\n\n*`Pathname.abs`*:\nThe absolute form of this Pathname.\n\n*`Pathname.basename`*:\nThe basename of this Pathname.\n\n*`Pathname.dirname`*:\nThe dirname of the Pathname.\n\n*`Pathname.isabs`*:\nWhether this Pathname is an absolute Pathname.\n\n*`Pathname.short`*:\nThe shortened form of this Pathname.\n\n*`Pathname.shorten(self, prefixes=None)`*:\nShorten a Pathname using ~ and ~user.\n\n## <a name=\"poll_file\"></a>`poll_file(path, old_state, reload_file, missing_ok=False)`\n\nWatch a file for modification by polling its state as obtained\nby `FileState()`.\nCall `reload_file(path)` if the state changes.\nReturn `(new_state,reload_file(path))` if the file was modified\nand was unchanged (stable state) before and after the reload_file().\nOtherwise return `(None,None)`.\n\nThis may raise an `OSError` if the `path` cannot be `os.stat()`ed\nand of course for any exceptions that occur calling `reload_file`.\n\nIf `missing_ok` is true then a failure to `os.stat()` which\nraises `OSError` with `ENOENT` will just return `(None,None)`.\n\n## <a name=\"read_data\"></a>`read_data(fp, nbytes, rsize=None)`\n\nRead `nbytes` of data from `fp`, return the data.\n\nParameters:\n* `nbytes`: number of bytes to copy.\n If `None`, copy until EOF.\n* `rsize`: read size, default `DEFAULT_READSIZE`.\n\n## <a name=\"read_from\"></a>`read_from(fp, rsize=None, tail_mode=False, tail_delay=None)`\n\nGenerator to present text or data from an open file until EOF.\n\nParameters:\n* `rsize`: read size, default: DEFAULT_READSIZE\n* `tail_mode`: if true, yield an empty chunk at EOF, allowing resumption\n if the file grows.\n\n## <a name=\"ReadMixin\"></a>Class `ReadMixin`\n\nUseful read methods to accomodate modes not necessarily available in a class.\n\nNote that this mixin presumes that the attribute `self._lock`\nis a threading.RLock like context manager.\n\nClasses using this mixin should consider overriding the default\n.datafrom method with something more efficient or direct.\n\n*`ReadMixin.bufferfrom(self, offset)`*:\nReturn a CornuCopyBuffer from the specified `offset`.\n\n*`ReadMixin.datafrom(self, offset, readsize=None)`*:\nYield data from the specified `offset` onward in some\napproximation of the \"natural\" chunk size.\n\n*NOTE*: UNLIKE the global datafrom() function, this method\nMUST NOT move the logical file position. Implementors may need\nto save and restore the file pointer within a lock around\nthe I/O if they do not use a direct access method like\nos.pread.\n\nThe aspiration here is to read data with only a single call\nto the underlying storage, and to return the chunks in\nnatural sizes instead of some default read size.\n\nClasses using this mixin must implement this method.\n\n*`ReadMixin.read(self, size=-1, offset=None, longread=False)`*:\nRead up to `size` bytes, honouring the \"single system call\"\nspirit unless `longread` is true.\n\nParameters:\n* `size`: the number of bytes requested. A size of -1 requests\n all bytes to the end of the file.\n* `offset`: the starting point of the read; if None, use the\n current file position; if not None, seek to this position\n before reading, even if `size` == 0.\n* `longread`: switch from \"single system call\" to \"as many\n as required to obtain `size` bytes\"; short data will still\n be returned if the file is too short.\n\n*`ReadMixin.read_n(self, n)`*:\nRead `n` bytes of data and return them.\n\nUnlike traditional file.read(), RawIOBase.read() may return short\ndata, thus this workalike, which may only return short data if it\nhits EOF.\n\n*`ReadMixin.readinto(self, barray)`*:\nRead data into a bytearray.\n\n## <a name=\"rewrite\"></a>`rewrite(filepath, srcf, mode='w', backup_ext=None, do_rename=False, do_diff=None, empty_ok=False, overwrite_anyway=False)`\n\nRewrite the file `filepath` with data from the file object `srcf`.\nReturn `True` if the content was changed, `False` if unchanged.\n\nParameters:\n* `filepath`: the name of the file to rewrite.\n* `srcf`: the source file containing the new content.\n* `mode`: the write-mode for the file, default `'w'` (for text);\n use `'wb'` for binary data.\n* `empty_ok`: if true (default `False`),\n do not raise `ValueError` if the new data are empty.\n* `overwrite_anyway`: if true (default `False`),\n skip the content check and overwrite unconditionally.\n* `backup_ext`: if a nonempty string,\n take a backup of the original at `filepath + backup_ext`.\n* `do_diff`: if not `None`, call `do_diff(filepath,tempfile)`.\n* `do_rename`: if true (default `False`),\n rename the temp file to `filepath`\n after copying the permission bits.\n Otherwise (default), copy the tempfile to `filepath`;\n this preserves the file's inode and permissions etc.\n\n## <a name=\"rewrite_cmgr\"></a>`rewrite_cmgr(filepath, mode='w', **kw)`\n\nRewrite a file, presented as a context manager.\n\nParameters:\n* `mode`: file write mode, defaulting to \"w\" for text.\n\nOther keyword parameters are passed to `rewrite()`.\n\nExample:\n\n with rewrite_cmgr(pathname, do_rename=True) as f:\n ... write new content to f ...\n\n## <a name=\"RWFileBlockCache\"></a>Class `RWFileBlockCache`\n\nA scratch file for storing data.\n\n*`RWFileBlockCache.__init__(self, pathname=None, dirpath=None, suffix=None, lock=None)`*:\nInitialise the file.\n\nParameters:\n* `pathname`: path of file. If None, create a new file with\n tempfile.mkstemp using dir=`dirpath` and unlink that file once\n opened.\n* `dirpath`: location for the file if made by mkstemp as above.\n* `lock`: an object to use as a mutex, allowing sharing with\n some outer system. A Lock will be allocated if omitted.\n\n*`RWFileBlockCache.close(self)`*:\nClose the file descriptors.\n\n*`RWFileBlockCache.closed`*:\nTest whether the file descriptor has been closed.\n\n*`RWFileBlockCache.get(self, offset, length)`*:\nGet data from `offset` of length `length`.\n\n*`RWFileBlockCache.put(self, data)`*:\nStore `data`, return offset.\n\n## <a name=\"saferename\"></a>`saferename(oldpath, newpath)`\n\nRename a path using `os.rename()`,\nbut raise an exception if the target path already exists.\nNote: slightly racey.\n\n## <a name=\"seekable\"></a>`seekable(fp)`\n\nTry to test whether a filelike object is seekable.\n\nFirst try the `IOBase.seekable` method, otherwise try getting a file\ndescriptor from `fp.fileno` and `os.stat()`ing that,\notherwise return `False`.\n\n## <a name=\"Tee\"></a>Class `Tee`\n\nAn object with .write, .flush and .close methods\nwhich copies data to multiple output files.\n\n*`Tee.__init__(self, *fps)`*:\nInitialise the Tee; any arguments are taken to be output file objects.\n\n*`Tee.add(self, output)`*:\nAdd a new output.\n\n*`Tee.close(self)`*:\nClose all the outputs and close the Tee.\n\n*`Tee.flush(self)`*:\nFlush all the outputs.\n\n*`Tee.write(self, data)`*:\nWrite the data to all the outputs.\nNote: does not detect or accodmodate short writes.\n\n## <a name=\"tee\"></a>`tee(fp, fp2)`\n\nContext manager duplicating `.write` and `.flush` from `fp` to `fp2`.\n\n## <a name=\"tmpdir\"></a>`tmpdir()`\n\nReturn the pathname of the default temporary directory for scratch data,\nthe environment variable `$TMPDIR` or `'/tmp'`.\n\n## <a name=\"tmpdirn\"></a>`tmpdirn(tmp=None)`\n\nMake a new temporary directory with a numeric suffix.\n\n## <a name=\"trysaferename\"></a>`trysaferename(oldpath, newpath)`\n\nA `saferename()` that returns `True` on success,\n`False` on failure.\n\n# Release Log\n\n\n\n*Release 20241007.1*:\nBugfix for @atomic_filename.\n\n*Release 20241007*:\natomic_filename: new feature - the caller may remove the temporary file to indicate that the target file should not be made.\n\n*Release 20240723*:\nlockfile: now a purer shim for makelockfile.\n\n*Release 20240709*:\nrewrite: return True if the content is modified, False otherwise.\n\n*Release 20240630*:\nmakelockfile: cap the retry poll interval at 37s, just issue a warning if the lock file is already gone on exit (eg manual removal).\n\n*Release 20240316*:\nFixed release upload artifacts.\n\n*Release 20240201*:\n* makelockfile: new optional keepopen parameter - if true return the lock path and an open file descriptor.\n* lockfile(): keep the lock file open to aid debugging with eg lsof.\n\n*Release 20231129*:\n* atomic_filename: accept optional rename_func to use instead of os.rename, supports using FSTags.move.\n* atomic_filename: clean up the temp file.\n\n*Release 20230421*:\natomic_filename: raise FileExistsError instead of ValueError if not exists_ok and existspath(filename).\n\n*Release 20230401*:\nReplaced a lot of runstate plumbing with @uses_runstate.\n\n*Release 20221118*:\natomic_filename: use shutil.copystat instead of shutil.copymode, bugfix the associated logic.\n\n*Release 20220429*:\nMove longpath and shortpath to cs.fs, leave legacy names behind.\n\n*Release 20211208*:\n* Move NDJSON stuff to separate cs.ndjson module.\n* New gzifopen() function to open either a gzipped file or an uncompressed file.\n\n*Release 20210906*:\nAdditional release because I'm unsure @atomic_filename made it into the previous release.\n\n*Release 20210731*:\nNew atomic_filename context manager wrapping NamedTemporaryFile for presenting a file after its contents are prepared.\n\n*Release 20210717*:\nUpdates for recent cs.mappings-20210717 release.\n\n*Release 20210420*:\n* Forensic prefix for NamedTemporaryCopy.\n* UUIDNDJSONMapping: provide an empty .scan_errors on instantiation, avoids AttributeError if a scan never occurs.\n\n*Release 20210306*:\n* datafrom_fd: fix use-before-set of is_seekable.\n* RWFileBlockCache.put: remove assert(len(data)>0), adjust logic.\n\n*Release 20210131*:\ncrop_name: put ext before name_max, more likely to be specified, I think.\n\n*Release 20201227.1*:\nDocstring tweak.\n\n*Release 20201227*:\nscan_ndjson: optional errors_list to accrue errors during the scan.\n\n*Release 20201108*:\nBugfix rewrite_cmgr, failed to flush a file before copying its contents.\n\n*Release 20201102*:\n* Newline delimited JSON (ndjson) support.\n* New UUIDNDJSONMapping implementing a singleton cs.mappings.LoadableMappingMixin of cs.mappings.UUIDedDict subclass instances backed by an NDJSON file.\n* New scan_ndjson() function to yield newline delimited JSON records.\n* New write_ndjson() function to write newline delimited JSON records.\n* New append_ndjson() function to append a single newline delimited JSON record to a file.\n* New NamedTemporaryCopy for creating a temporary copy of a file with an optional progress bar.\n* rewrite_cmgr: turn into a simple wrapper for rewrite.\n* datafrom: make the offset parameter optional, tweak the @strable open function.\n* datafrom_fd: support nonseekable file descriptors, document that for these the file position is moved (no pread support).\n* New iter_fd and iter_file to return iterators of a file's data by utilising a CornuCopyBuffer.\n* New byteses_as_fd to return a readable file descriptor receiving an iterable of bytes via a CornuCopyBuffer.\n\n*Release 20200914*:\nNew common_path_prefix to compare pathnames.\n\n*Release 20200517*:\n* New crop_name() function to crop a file basename to fit within a specific length.\n* New find() function complimenting findup (UNTESTED).\n\n*Release 20200318*:\nNew findup(path,test) generator to walk up a file tree.\n\n*Release 20191006*:\nAdjust import of cs.deco.cachedmethod.\n\n*Release 20190729*:\ndatafrom_fd: make `offset` optional, defaulting to fd position at call.\n\n*Release 20190617*:\n@file_based: adjust use of @cached from cached(wrap0, **dkw) to cached(**dkw)(wrap0).\n\n*Release 20190101*:\ndatafrom: add maxlength keyword arg, bugfix fd and f.fileno cases.\n\n*Release 20181109*:\n* Various bugfixes for BackedFile.\n* Use a file's .read1 method if available in some scenarios.\n* makelockfile: accept am optional RunState control parameter, improve some behaviour.\n* datafrom_fd: new optional maxlength parameter limiting the amount of data returned.\n* datafrom_fd: by default, perform an initial read to align all subsequent reads with the readsize.\n* drop fdreader, add datafrom(f, offset, readsize) accepting a file or a file descriptor, expose datafrom_fd.\n* ReadMixin.datafrom now mandatory. Add ReadMixin.bufferfrom.\n* Assorted other improvements, minor bugfixes, documentation improvements.\n\n*Release 20171231.1*:\nTrite DISTINFO fix, no semantic changes.\n\n*Release 20171231*:\nUpdate imports, bump DEFAULT_READSIZE from 8KiB to 128KiB.\n\n*Release 20170608*:\n* Move lockfile and the SharedAppend* classes to cs.sharedfile.\n* BackedFile internal changes.\n\n*Release 20160918*:\n* BackedFile: redo implementation of .front_file to fix resource leak; add .__len__; add methods .spans, .front_spans and .back_spans to return information about front vs back data.\n* seek: bugfix: seek should return the new file offset.\n* BackedFile does not subclass RawIOBase, it just works like one.\n\n*Release 20160828*:\n* Use \"install_requires\" instead of \"requires\" in DISTINFO.\n* Rename maxFilenameSuffix to max_suffix.\n* Pull in OpenSocket file-like socket wrapper from cs.venti.tcp.\n* Update for cs.asynchron changes.\n* ... then move cs.fileutils.OpenSocket into new module cs.socketutils.\n* New Tee class, for copying output to multiple files.\n* NullFile class which discards writes (==> no-op for Tee).\n* New class SavingFile to accrue output and move to specified pathname when complete.\n* Memory usage improvements.\n* Polyfill non-threadsafe implementation of pread if os.pread does not exist.\n* New function seekable() to probe a file for seekability.\n* SharedAppendFile: provide new .open(filemode) context manager for allowing direct file output for external users.\n* New function makelockfile() presenting the logic to create a lock file separately from the lockfile context manager.\n* Assorted bugfixes and improvements.\n\n*Release 20150116*:\nInitial PyPI release.\n",
"bugtrack_url": null,
"license": "GNU General Public License v3 or later (GPLv3+)",
"summary": "My grab bag of convenience functions for files and filenames/pathnames.",
"version": "20241007.1",
"project_urls": {
"MonoRepo Commits": "https://bitbucket.org/cameron_simpson/css/commits/branch/main",
"Monorepo Git Mirror": "https://github.com/cameron-simpson/css",
"Monorepo Hg/Mercurial Mirror": "https://hg.sr.ht/~cameron-simpson/css",
"Source": "https://github.com/cameron-simpson/css/blob/main/lib/python/cs/fileutils.py"
},
"split_keywords": [
"python2",
" python3"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8616eb93219519760c461146a20f0c6bb1a55b50448d967b8a1eb483d712eb0d",
"md5": "f8b157f093e1c17f2d9810783814ac0d",
"sha256": "7f36f3a5a3107cdffdeb491767202ba2617542e245e88c96020c4ec572ee3da2"
},
"downloads": -1,
"filename": "cs.fileutils-20241007.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f8b157f093e1c17f2d9810783814ac0d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 29304,
"upload_time": "2024-10-07T06:44:43",
"upload_time_iso_8601": "2024-10-07T06:44:43.999269Z",
"url": "https://files.pythonhosted.org/packages/86/16/eb93219519760c461146a20f0c6bb1a55b50448d967b8a1eb483d712eb0d/cs.fileutils-20241007.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1c67cfd9e5f3f3b0c0d519ad6c26feb4d114601e6409838afeb27b359f3ec251",
"md5": "161fd1a4292f91c01372357bf94e0423",
"sha256": "8b10567a84fb5d0878edb70aca87a30654d0d2851b9175d8c01925ae16421dbc"
},
"downloads": -1,
"filename": "cs_fileutils-20241007.1.tar.gz",
"has_sig": false,
"md5_digest": "161fd1a4292f91c01372357bf94e0423",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 56630,
"upload_time": "2024-10-07T06:44:46",
"upload_time_iso_8601": "2024-10-07T06:44:46.150808Z",
"url": "https://files.pythonhosted.org/packages/1c/67/cfd9e5f3f3b0c0d519ad6c26feb4d114601e6409838afeb27b359f3ec251/cs_fileutils-20241007.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-07 06:44:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "cameron-simpson",
"github_project": "css",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "cs.fileutils"
}