cs-lex

Name	cs-lex JSON
Version	20250724 JSON
	download
home_page	None
Summary	Lexical analysis functions, tokenisers, transcribers: an arbitrary assortment of lexical and tokenisation functions useful for writing recursive descent parsers, of which I have several. There are also some transcription functions for producing text from various objects, such as `hexify` and `unctrl`.
upload_time	2025-07-24 02:22:46
maintainer	None
docs_url	None
author	None
requires_python	None
license	None
keywords	python2 python3
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            Lexical analysis functions, tokenisers, transcribers:
an arbitrary assortment of lexical and tokenisation functions useful
for writing recursive descent parsers, of which I have several.
There are also some transcription functions for producing text
from various objects, such as `hexify` and `unctrl`.

*Latest release 20250724*:
tabulate: new optional ppcls argument to supply a custom PrettyPrinter class for formatting, use a date and datetime aware class by default.

Generally the get_* functions accept a source string and an offset
(usually optional, default `0`) and return a token and the new offset,
raising `ValueError` on failed tokenisation.

Short summary:
* `as_lines`: Generator yielding complete lines from arbitrary pieces of text from the iterable of `str` `chunks`.
* `BaseToken`: A mixin for token dataclasses.
* `camelcase`: Convert a snake cased string `snakecased` into camel case.
* `common_prefix`: Return the common prefix of the strings `strs`.
* `common_suffix`: Return the common suffix of the strings `strs`.
* `CoreTokens`: A mixin for token dataclasses whose subclasses include `Identifier`, 'NumericValue` and `QuotedString`.
* `cropped`: If the length of `s` exceeds `max_length` (default `32`), replace enough of the tail with `ellipsis` and the last `roffset` (default `1`) characters of `s` to fit in `max_length` characters.
* `cropped_repr`: Compute a cropped `repr()` of `o`.
* `cutprefix`: Remove `prefix` from the front of `s` if present. Return the suffix if `s.startswith(prefix)`, else `s`. As with `str.startswith`, `prefix` may be a `str` or a `tuple` of `str`. If a tuple, the first matching prefix from the tuple will be removed.
* `cutsuffix`: Remove `suffix` from the end of `s` if present. Return the prefix if `s.endswith(suffix)`, else `s`. As with `str.endswith`, `suffix` may be a `str` or a `tuple` of `str`. If a tuple, the first matching suffix from the tuple will be removed.
* `FFloat`: Formattable `float`.
* `FInt`: Formattable `int`.
* `FNumericMixin`: A `FormatableMixin` subclass.
* `format_as`: Format the string `format_s` using `Formatter.vformat`, return the formatted result. This is a wrapper for `str.format_map` which raises a more informative `FormatAsError` exception on failure.
* `format_attribute`: A decorator to mark a method as available as a format method. Requires the enclosing class to be decorated with `@has_format_attributes`.
* `format_escape`: Escape `{}` characters in a string to protect them from `str.format`.
* `format_recover`: Decorator for `__format__` methods which replaces failed formats with `{self:format_spec}`.
* `FormatableFormatter`: A `string.Formatter` subclass interacting with objects which inherit from `FormatableMixin`.
* `FormatableMixin`: A subclass of `FormatableFormatter` which  provides 2 features: - a `__format__` method which parses the `format_spec` string into multiple colon separated terms whose results chain - a `format_as` method which formats a format string using `str.format_map` with a suitable mapping derived from the instance via its `format_kwargs` method (whose default is to return the instance itself).
* `FormatAsError`: Subclass of `LookupError` for use by `format_as`.
* `FStr`: A `str` subclass with the `FormatableMixin` methods, particularly its `__format__` method which uses `str` method names as valid formats.
* `get_chars`: Scan the string `s` for characters in `gochars` starting at `offset`. Return `(match,new_offset)`.
* `get_decimal`: Scan the string `s` for decimal characters starting at `offset` (default `0`). Return `(dec_string,new_offset)`.
* `get_decimal_or_float_value`: Fetch a decimal or basic float (nnn.nnn) value from the str `s` at `offset` (default `0`). Return `(value,new_offset)`.
* `get_decimal_value`: Scan the string `s` for a decimal value starting at `offset` (default `0`). Return `(value,new_offset)`.
* `get_delimited`: Collect text from the string `s` from position `offset` up to the first occurence of delimiter `delim`; return the text excluding the delimiter and the offset after the delimiter.
* `get_dotted_identifier`: Scan the string `s` for a dotted identifier (by default an ASCII letter or underscore followed by letters, digits or underscores) with optional trailing dot and another dotted identifier, starting at `offset` (default `0`). Return `(match,new_offset)`.
* `get_envvar`: Parse a simple environment variable reference to $varname or $x where "x" is a special character.
* `get_hexadecimal`: Scan the string `s` for hexadecimal characters starting at `offset` (default `0`). Return `(hex_string,new_offset)`.
* `get_hexadecimal_value`: Scan the string `s` for a hexadecimal value starting at `offset` (default `0`). Return `(value,new_offset)`.
* `get_identifier`: Scan the string `s` for an identifier (by default an ASCII letter or underscore followed by letters, digits or underscores) starting at `offset` (default 0). Return `(match,new_offset)`.
* `get_ini_clause_entryname`: Parse a `[`*clausename*`]`*entryname* string from `s` at `offset` (default `0`). Return `(clausename,entryname,new_offset)`.
* `get_ini_clausename`: Parse a `[`*clausename*`]` string from `s` at `offset` (default `0`). Return `(clausename,new_offset)`.
* `get_nonwhite`: Scan the string `s` for characters not in `string.whitespace` starting at `offset` (default `0`). Return `(match,new_offset)`.
* `get_other_chars`: Scan the string `s` for characters not in `stopchars` starting at `offset` (default `0`). Return `(match,new_offset)`.
* `get_prefix_n`: Strip a leading `prefix` and numeric value `n` from the string `s` starting at `offset` (default `0`). Return the matched prefix, the numeric value and the new offset. Returns `(None,None,offset)` on no match.
* `get_qstr`: Get quoted text with slosh escapes and optional environment substitution.
* `get_qstr_or_identifier`: Parse a double quoted string or an identifier.
* `get_sloshed_text`: Collect slosh escaped text from the string `s` from position `offset` (default `0`) and return the decoded unicode string and the offset of the completed parse.
* `get_suffix_part`: Strip a trailing "part N" suffix from the string `s`. Return the matched suffix and the number part number. Retrn `(None,None)` on no match.
* `get_tokens`: Parse the string `s` from position `offset` using the supplied tokeniser functions `getters`. Return the list of tokens matched and the final offset.
* `get_uc_identifier`: Scan the string `s` for an identifier as for `get_identifier`, but require the letters to be uppercase.
* `get_white`: Scan the string `s` for characters in `string.whitespace` starting at `offset` (default `0`). Return `(match,new_offset)`.
* `has_format_attributes`: Class decorator to walk this class for direct methods marked as for use in format strings and to include them in `cls.format_attributes()`.
* `hexify`: A flavour of `binascii.hexlify` returning a `str`.
* `htmlify`: Convert a string for safe transcription in HTML.
* `htmlquote`: Quote a string for use in HTML.
* `Identifier`: A dotted identifier.
* `indent`: Return the `paragraph` indented by `line_indent` (default `"  "`).
* `is_dotted_identifier`: Test if the string `s` is an identifier from position `offset` onward.
* `is_identifier`: Test if the string `s` is an identifier from position `offset` (default `0`) onward.
* `is_uc_identifier`: Test if the string `s` is an uppercase identifier from position `offset` (default `0`) onward.
* `isUC_`: Check that a string matches the regular expression `^[A-Z][A-Z_0-9]*$`.
* `jsquote`: Quote a string for use in JavaScript.
* `lc_`: Return `value.lower()` with `'-'` translated into `'_'` and `' '` translated into `'-'`.
* `match_tokens`: Wrapper for `get_tokens` which catches `ValueError` exceptions and returns `(None,offset)`.
* `NumericValue`: An `int` or `float` literal.
* `parseUC_sAttr`: Take an attribute name `attr` and return `(key,is_plural)`.
* `phpquote`: Quote a string for use in PHP code.
* `printt`: A wrapper for `tabulate()` to print the results. Each positional argument is a table row.
* `QuotedString`: A double quoted string.
* `r`: Like `typed_str` but using `repr` instead of `str`. This is available as both `typed_repr` and `r`.
* `s`: Return "type(o).__name__:str(o)" for some object `o`. This is available as both `typed_str` and `s`.
* `skipwhite`: Convenience routine for skipping past whitespace; returns the offset of the next nonwhitespace character.
* `slosh_mapper`: Return a string to replace backslash-`c`, or `None`.
* `slosh_quote`: Quote a string `raw_s` with quote character `q`.
* `snakecase`: Convert a camel cased string `camelcased` into snake case.
* `split_remote_path`: OBSOLETE version of split_remote_path, suggestion: cs.fs.RemotePath.from_str.
* `stripped_dedent`: Slightly smarter dedent which ignores a string's opening indent.
* `strlist`: Convert an iterable to strings and join with `sep` (default `', '`).
* `tabpadding`: Compute some spaces to use a tab padding at an offfset.
* `tabulate`: A generator yielding lines of values from `rows` aligned in columns.
* `texthexify`: Transcribe the bytes `bs` to text using compact text runs for some common text values.
* `titleify_lc`: Translate `'-'` into `' '` and `'_'` translated into `'-'`, then titlecased.
* `typed_repr`: Like `typed_str` but using `repr` instead of `str`. This is available as both `typed_repr` and `r`.
* `typed_str`: Return "type(o).__name__:str(o)" for some object `o`. This is available as both `typed_str` and `s`.
* `unctrl`: Return the string `s` with `TAB`s expanded and control characters replaced with printable representations.
* `untexthexify`: Decode a textual representation of binary data into binary data.

Module contents:
- <a name="as_lines"></a>`as_lines(chunks, partials=None)`: Generator yielding complete lines from arbitrary pieces of text from
  the iterable of `str` `chunks`.

  After completion, any remaining newline-free chunks remain
  in the partials list; they will be unavailable to the caller
  unless the list is presupplied.
- <a name="BaseToken"></a>`class BaseToken(cs.deco.Promotable)`: A mixin for token dataclasses.

  Presently I use this in `cs.app.tagger.rules` and `cs.app.pilfer.parse`.

*`BaseToken.from_str(text: str) -> 'BaseToken'`*:
Parse `test` as a token of type `cls`, return the token.
Raises `SyntaxError` on a parse failure.
This is a wrapper for the `parse` class method.

*`BaseToken.matched_text`*:
The text from `self.source_text` which matches this token.

*`BaseToken.parse(text: str, offset: int = 0, *, skip=False) -> Tuple[ForwardRef('BaseToken'), int]`*:
Parse a token from `test` at `offset` (default `0`).
Return a `BaseToken` subclass instance.
Raise `SyntaxError` if no subclass parses it.
Raise `EOFError` if at the end of the `text`,
checked after any whitespace if `skip` is true.
The returned token's `.end_offset` is the next parse point.

This base class method attempts the `.parse` method of all
the public subclasses.

Parameters:
* `text`: the text being parsed
* `offset`: the offset within the `text` of the the parse cursor
* `skip`: if true (default `False`), skip any leading
  whitespace before matching

*`BaseToken.scan(text: str, offset: int = 0, *, skip=True) -> Iterable[ForwardRef('BaseToken')]`*:
Scan `text`, parsing tokens using `BaseToken.parse` and yielding them.
Parameters are as for `BaseToken.parse` except as follows:
- encountering end of text end the iteration instead of raising `EOFError`
- `skip` defaults to `True` to allow whitespace between tokens

*`BaseToken.token_classes()`*:
Return the `baseToken` subclasses to consider when parsing a token stream.
- <a name="camelcase"></a>`camelcase(snakecased, first_letter_only=False)`: Convert a snake cased string `snakecased` into camel case.

  Parameters:
  * `snakecased`: the snake case string to convert
  * `first_letter_only`: optional flag (default `False`);
    if true then just ensure that the first character of a word
    is uppercased, otherwise use `str.title`

  Example:

      >>> camelcase('abc_def')
      'abcDef'
      >>> camelcase('ABc_def')
      'abcDef'
      >>> camelcase('abc_dEf')
      'abcDef'
      >>> camelcase('abc_dEf', first_letter_only=True)
      'abcDEf'
- <a name="common_prefix"></a>`common_prefix(*strs)`: Return the common prefix of the strings `strs`.

  Examples:

      >>> common_prefix('abc', 'def')
      ''
      >>> common_prefix('abc', 'abd')
      'ab'
      >>> common_prefix('abc', 'abcdef')
      'abc'
      >>> common_prefix('abc', 'abcdef', 'abz')
      'ab'
      >>> # contrast with cs.fileutils.common_path_prefix
      >>> common_prefix('abc/def', 'abc/def1', 'abc/def2')
      'abc/def'
- <a name="common_suffix"></a>`common_suffix(*strs)`: Return the common suffix of the strings `strs`.
- <a name="CoreTokens"></a>`class CoreTokens(BaseToken)`: A mixin for token dataclasses whose subclasses include `Identifier`,
  'NumericValue` and `QuotedString`.
- <a name="cropped"></a>`cropped(s: str, max_length: int = 32, roffset: int = 1, ellipsis: str = '...')`: If the length of `s` exceeds `max_length` (default `32`),
  replace enough of the tail with `ellipsis`
  and the last `roffset` (default `1`) characters of `s`
  to fit in `max_length` characters.
- <a name="cropped_repr"></a>`cropped_repr(o, roffset=1, max_length=32, inner_max_length=None)`: Compute a cropped `repr()` of `o`.

  Parameters:
  * `o`: the object to represent
  * `max_length`: the maximum length of the representation, default `32`
  * `inner_max_length`: the maximum length of the representations
    of members of `o`, default `max_length//2`
  * `roffset`: the number of trailing characters to preserve, default `1`
- <a name="cutprefix"></a>`cutprefix(s, prefix)`: Remove `prefix` from the front of `s` if present.
  Return the suffix if `s.startswith(prefix)`, else `s`.
  As with `str.startswith`, `prefix` may be a `str` or a `tuple` of `str`.
  If a tuple, the first matching prefix from the tuple will be removed.

  Example:

      >>> abc_def = 'abc.def'
      >>> cutprefix(abc_def, 'abc.')
      'def'
      >>> cutprefix(abc_def, 'zzz.')
      'abc.def'
      >>> cutprefix(abc_def, '.zzz') is abc_def
      True
      >>> cutprefix('this_that', ('this', 'thusly'))
      '_that'
      >>> cutprefix('thusly_that', ('this', 'thusly'))
      '_that'
- <a name="cutsuffix"></a>`cutsuffix(s, suffix)`: Remove `suffix` from the end of `s` if present.
  Return the prefix if `s.endswith(suffix)`, else `s`.
  As with `str.endswith`, `suffix` may be a `str` or a `tuple` of `str`.
  If a tuple, the first matching suffix from the tuple will be removed.

  Example:

      >>> abc_def = 'abc.def'
      >>> cutsuffix(abc_def, '.def')
      'abc'
      >>> cutsuffix(abc_def, '.zzz')
      'abc.def'
      >>> cutsuffix(abc_def, '.zzz') is abc_def
      True
      >>> cutsuffix('this_that', ('that', 'tother'))
      'this_'
      >>> cutsuffix('this_tother', ('that', 'tother'))
      'this_'
- <a name="FFloat"></a>`class FFloat(FNumericMixin, builtins.float)`: Formattable `float`.
- <a name="FInt"></a>`class FInt(FNumericMixin, builtins.int)`: Formattable `int`.
- <a name="FNumericMixin"></a>`class FNumericMixin(FormatableMixin)`: A `FormatableMixin` subclass.

*`FNumericMixin.localtime(self)`*:
Treat this as a UNIX timestamp and return a localtime `datetime`.

*`FNumericMixin.utctime(self)`*:
Treat this as a UNIX timestamp and return a UTC `datetime`.
- <a name="format_as"></a>`format_as(format_s: str, format_mapping, formatter=None, error_sep=None, strict=None)`: Format the string `format_s` using `Formatter.vformat`,
  return the formatted result.
  This is a wrapper for `str.format_map`
  which raises a more informative `FormatAsError` exception on failure.

  Parameters:
  * `format_s`: the format string to use as the template
  * `format_mapping`: the mapping of available replacement fields
  * `formatter`: an optional `string.Formatter`-like instance
    with a `.vformat(format_string,args,kwargs)` method,
    usually a subclass of `string.Formatter`;
    if not specified then `FormatableFormatter` is used
  * `error_sep`: optional separator for the multipart error message,
    default from `FormatAsError.DEFAULT_SEPARATOR`:
    `'; '`
  * `strict`: optional flag (default `False`)
    indicating that an unresolveable field should raise a
    `KeyError` instead of inserting a placeholder
- <a name="format_attribute"></a>`format_attribute(method)`: A decorator to mark a method as available as a format method.
  Requires the enclosing class to be decorated with `@has_format_attributes`.

  For example,
  the `FormatableMixin.json` method is defined like this:

      @format_attribute
      def json(self):
          return self.FORMAT_JSON_ENCODER.encode(self)

  which allows a `FormatableMixin` subclass instance
  to be used in a format string like this:

      {instance:json}

  to insert a JSON transcription of the instance.

  It is recommended that methods marked with `@format_attribute`
  have no side effects and do not modify state,
  as they are intended for use in ad hoc format strings
  supplied by an end user.
- <a name="format_escape"></a>`format_escape(s)`: Escape `{}` characters in a string to protect them from `str.format`.
- <a name="format_recover"></a>`format_recover(*da, **dkw)`: Decorator for `__format__` methods which replaces failed formats
  with `{self:format_spec}`.
- <a name="FormatableFormatter"></a>`class FormatableFormatter(string.Formatter)`: A `string.Formatter` subclass interacting with objects
  which inherit from `FormatableMixin`.

*`FormatableFormatter.format_field(value, format_spec: str)`*:
Format a value using `value.format_format_field`,
returning an `FStr`
(a `str` subclass with additional `format_spec` features).

We actually recognise colon separated chains of formats
and apply each format to the previously converted value.
The final result is promoted to an `FStr` before return.

*`FormatableFormatter.format_mode`*:
Thread local state object.

Attributes:
* `strict`: initially `False`; raise a `KeyError` for
  unresolveable field names

*`FormatableFormatter.get_arg_name(field_name)`*:
Default initial arg_name is an identifier.

Returns `(prefix,offset)`, and `('',0)` if there is no arg_name.

*`FormatableFormatter.get_field(self, field_name, args, kwargs)`*:
Get the object referenced by the field text `field_name`.
Raises `KeyError` for an unknown `field_name`.

*`FormatableFormatter.get_format_subspecs(format_spec)`*:
Parse a `format_spec` as a sequence of colon separated components,
return a list of the components.

*`FormatableFormatter.get_subfield(value, subfield_text: str)`*:
Resolve `value` against `subfield_text`,
the remaining field text after the term which resolved to `value`.

For example, a format `{name.blah[0]}`
has the field text `name.blah[0]`.
A `get_field` implementation might initially
resolve `name` to some value,
leaving `.blah[0]` as the `subfield_text`.
This method supports taking that value
and resolving it against the remaining text `.blah[0]`.

For generality, if `subfield_text` is the empty string
`value` is returned unchanged.

*`FormatableFormatter.get_value(self, arg_name, args, kwargs)`*:
Get the object with index `arg_name`.

This default implementation returns `(kwargs[arg_name],arg_name)`.
- <a name="FormatableMixin"></a>`class FormatableMixin(FormatableFormatter)`: A subclass of `FormatableFormatter` which  provides 2 features:
  - a `__format__` method which parses the `format_spec` string
    into multiple colon separated terms whose results chain
  - a `format_as` method which formats a format string using `str.format_map`
    with a suitable mapping derived from the instance
    via its `format_kwargs` method
    (whose default is to return the instance itself)

  The `format_as` method is like an inside out `str.format` or
  `object.__format__` method.

  The `str.format` method is designed for formatting a string
  from a variety of other objects supplied in the keyword arguments.

  The `object.__format__` method is for filling out a single `str.format`
  replacement field from a single object.

  By contrast, `format_as` is designed to fill out an entire format
  string from the current object.

  For example, the `cs.tagset.TagSetMixin` class
  uses `FormatableMixin` to provide a `format_as` method
  whose replacement fields are derived from the tags in the tag set.

  Subclasses wanting to provide additional `format_spec` terms
  should:
  - override `FormatableFormatter.format_field1` to implement
    terms with no colons, letting `format_field` do the split into terms
  - override `FormatableFormatter.get_format_subspecs` to implement
    the parse of `format_spec` into a sequence of terms.
    This might recognise a special additional syntax
    and quietly fall back to `super().get_format_subspecs`
    if that is not present.

*`FormatableMixin.__format__(self, format_spec)`*:
Format `self` according to `format_spec`.

This implementation calls `self.format_field`.
As such, a `format_spec` is considered
a sequence of colon separated terms.

Classes wanting to implement additional format string syntaxes
should either:
- override `FormatableFormatter.format_field1` to implement
  terms with no colons, letting `format_field1` do the split into terms
- override `FormatableFormatter.get_format_subspecs` to implement
  the term parse.

The default implementation of `__format1__` just calls `super().__format__`.
Implementations providing specialised formats
should implement them in `__format1__`
with fallback to `super().__format1__`.

*`FormatableMixin.convert_field(self, value, conversion)`*:
The default converter for fields calls `Formatter.convert_field`.

*`FormatableMixin.convert_via_method_or_attr(self, value, format_spec)`*:
Apply a method or attribute name based conversion to `value`
where `format_spec` starts with a method name
applicable to `value`.
Return `(converted,offset)`
being the converted value and the offset after the method name.

Note that if there is not a leading identifier on `format_spec`
then `value` is returned unchanged with `offset=0`.

The methods/attributes are looked up in the mapping
returned by `.format_attributes()` which represents allowed methods
(broadly, one should not allow methods which modify any state).

If this returns a callable, it is called to obtain the converted value
otherwise it is used as is.

As a final tweak,
if `value.get_format_attribute()` raises an `AttributeError`
(the attribute is not an allowed attribute)
or calling the attribute raises a `TypeError`
(the `value` isn't suitable)
and the `value` is not an instance of `FStr`,
convert it to an `FStr` and try again.
This provides the common utility methods on other types.

The motivating example was a `PurePosixPath`,
which does not JSON transcribe;
this tweak supports both
`posixpath:basename` via the pathlib stuff
and `posixpath:json` via `FStr`
even though a `PurePosixPath` does not subclass `FStr`.

*`FormatableMixin.format_as(self, format_s, error_sep=None, strict=None, **control_kw)`*:
Return the string `format_s` formatted using the mapping
returned by `self.format_kwargs(**control_kw)`.

If a class using the mixin has no `format_kwargs(**control_kw)` method
to provide a mapping for `str.format_map`
then the instance itself is used as the mapping.

*`FormatableMixin.get_format_attribute(self, attr)`*:
Return a mapping of permitted methods to functions of an instance.
This is used to whitelist allowed `:`*name* method formats
to prevent scenarios like little Bobby Tables calling `delete()`.

*`FormatableMixin.get_format_attributes()`*:
Return the mapping of format attributes.

*`FormatableMixin.json(self)`*:
The value transcribed as compact JSON.
- <a name="FormatAsError"></a>`class FormatAsError(builtins.LookupError)`: Subclass of `LookupError` for use by `format_as`.
- <a name="FStr"></a>`class FStr(FormatableMixin, builtins.str)`: A `str` subclass with the `FormatableMixin` methods,
  particularly its `__format__` method
  which uses `str` method names as valid formats.

  It also has a bunch of utility methods which are available
  as `:`*method* in format strings.

*`FStr.basename(self)`*:
Treat as a filesystem path and return the basename.

*`FStr.dirname(self)`*:
Treat as a filesystem path and return the dirname.

*`FStr.f(self)`*:
Parse `self` as a `float`.

*`FStr.i(self, base=10)`*:
Parse `self` as an `int`.

*`FStr.lc(self)`*:
Lowercase using `lc_()`.

*`FStr.path(self)`*:
Convert to a native filesystem `pathlib.Path`.

*`FStr.posix_path(self)`*:
Convert to a Posix filesystem `pathlib.Path`.

*`FStr.windows_path(self)`*:
Convert to a Windows filesystem `pathlib.Path`.
- <a name="get_chars"></a>`get_chars(s, offset, gochars)`: Scan the string `s` for characters in `gochars` starting at `offset`.
  Return `(match,new_offset)`.

  `gochars` may also be a callable, in which case a character
  `ch` is accepted if `gochars(ch)` is true.
- <a name="get_decimal"></a>`get_decimal(s, offset=0)`: Scan the string `s` for decimal characters starting at `offset` (default `0`).
  Return `(dec_string,new_offset)`.
- <a name="get_decimal_or_float_value"></a>`get_decimal_or_float_value(s, offset=0)`: Fetch a decimal or basic float (nnn.nnn) value
  from the str `s` at `offset` (default `0`).
  Return `(value,new_offset)`.
- <a name="get_decimal_value"></a>`get_decimal_value(s, offset=0)`: Scan the string `s` for a decimal value starting at `offset` (default `0`).
  Return `(value,new_offset)`.
- <a name="get_delimited"></a>`get_delimited(s, offset, delim)`: Collect text from the string `s` from position `offset` up
  to the first occurence of delimiter `delim`; return the text
  excluding the delimiter and the offset after the delimiter.
- <a name="get_dotted_identifier"></a>`get_dotted_identifier(s, offset=0, **kw)`: Scan the string `s` for a dotted identifier (by default an
  ASCII letter or underscore followed by letters, digits or
  underscores) with optional trailing dot and another dotted
  identifier, starting at `offset` (default `0`).
  Return `(match,new_offset)`.

  Note: the empty string and an unchanged offset will be returned if
  there is no leading letter/underscore.

  Keyword arguments are passed to `get_identifier`
  (used for each component of the dotted identifier).
- <a name="get_envvar"></a>`get_envvar(s, offset=0, environ=None, default=None, specials=None)`: Parse a simple environment variable reference to $varname or
  $x where "x" is a special character.

  Parameters:
  * `s`: the string with the variable reference
  * `offset`: the starting point for the reference
  * `default`: default value for missing environment variables;
     if `None` (the default) a `ValueError` is raised
  * `environ`: the environment mapping, default `os.environ`
  * `specials`: the mapping of special single character variables
- <a name="get_hexadecimal"></a>`get_hexadecimal(s, offset=0)`: Scan the string `s` for hexadecimal characters starting at `offset` (default `0`).
  Return `(hex_string,new_offset)`.
- <a name="get_hexadecimal_value"></a>`get_hexadecimal_value(s, offset=0)`: Scan the string `s` for a hexadecimal value starting at `offset` (default `0`).
  Return `(value,new_offset)`.
- <a name="get_identifier"></a>`get_identifier(s, offset=0, alpha='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', number='0123456789', extras='_')`: Scan the string `s` for an identifier (by default an ASCII
  letter or underscore followed by letters, digits or underscores)
  starting at `offset` (default 0).
  Return `(match,new_offset)`.

  *Note*: the empty string and an unchanged offset will be returned if
  there is no leading letter/underscore.

  Parameters:
  * `s`: the string to scan
  * `offset`: the starting offset, default `0`.
  * `alpha`: the characters considered alphabetic,
    default `string.ascii_letters`.
  * `number`: the characters considered numeric,
    default `string.digits`.
  * `extras`: extra characters considered part of an identifier,
    default `'_'`.
- <a name="get_ini_clause_entryname"></a>`get_ini_clause_entryname(s, offset=0)`: Parse a `[`*clausename*`]`*entryname* string
  from `s` at `offset` (default `0`).
  Return `(clausename,entryname,new_offset)`.
- <a name="get_ini_clausename"></a>`get_ini_clausename(s, offset=0)`: Parse a `[`*clausename*`]` string from `s` at `offset` (default `0`).
  Return `(clausename,new_offset)`.
- <a name="get_nonwhite"></a>`get_nonwhite(s, offset=0)`: Scan the string `s` for characters not in `string.whitespace`
  starting at `offset` (default `0`).
  Return `(match,new_offset)`.
- <a name="get_other_chars"></a>`get_other_chars(s, offset=0, stopchars=None)`: Scan the string `s` for characters not in `stopchars` starting
  at `offset` (default `0`).
  Return `(match,new_offset)`.
- <a name="get_prefix_n"></a>`get_prefix_n(s, prefix, n=None, *, offset=0)`: Strip a leading `prefix` and numeric value `n` from the string `s`
  starting at `offset` (default `0`).
  Return the matched prefix, the numeric value and the new offset.
  Returns `(None,None,offset)` on no match.

  Parameters:
  * `s`: the string to parse
  * `prefix`: the prefix string which must appear at `offset`
    or an object with a `match(str,offset)` method
    such as an `re.Pattern` regexp instance
  * `n`: optional integer value;
    if omitted any value will be accepted, otherwise the numeric
    part must match `n`

  If `prefix` is a `str`, the "matched prefix" return value is `prefix`.
  Otherwise the "matched prefix" return value is the result of
  the `prefix.match(s,offset)` call. The result must also support
  a `.end()` method returning the offset in `s` beyond the match,
  used to locate the following numeric portion.

  Examples:

     >>> import re
     >>> get_prefix_n('s03e01--', 's')
     ('s', 3, 3)
     >>> get_prefix_n('s03e01--', 's', 3)
     ('s', 3, 3)
     >>> get_prefix_n('s03e01--', 's', 4)
     (None, None, 0)
     >>> get_prefix_n('s03e01--', re.compile('[es]',re.I))
     (<re.Match object; span=(0, 1), match='s'>, 3, 3)
     >>> get_prefix_n('s03e01--', re.compile('[es]',re.I), offset=3)
     (<re.Match object; span=(3, 4), match='e'>, 1, 6)
- <a name="get_qstr"></a>`get_qstr(s, offset=0, q='"', environ=None, default=None, env_specials=None)`: Get quoted text with slosh escapes and optional environment substitution.

  Parameters:
  * `s`: the string containg the quoted text.
  * `offset`: the starting point, default `0`.
  * `q`: the quote character, default `'"'`. If `q` is `None`,
    do not expect the string to be delimited by quote marks.
  * `environ`: if not `None`, also parse and expand `$`*envvar* references.
  * `default`: passed to `get_envvar`
- <a name="get_qstr_or_identifier"></a>`get_qstr_or_identifier(s, offset)`: Parse a double quoted string or an identifier.
- <a name="get_sloshed_text"></a>`get_sloshed_text(s, delim, offset=0, slosh='\\', mapper=<function slosh_mapper at 0x1028d9ee0>, specials=None)`: Collect slosh escaped text from the string `s` from position
  `offset` (default `0`) and return the decoded unicode string and
  the offset of the completed parse.

  Parameters:
  * `delim`: end of string delimiter, such as a single or double quote.
  * `offset`: starting offset within `s`, default `0`.
  * `slosh`: escape character, default a slosh ('\').
  * `mapper`: a mapping function which accepts a single character
    and returns a replacement string or `None`; this is used the
    replace things such as '\t' or '\n'. The default is the
    `slosh_mapper` function, whose default mapping is `SLOSH_CHARMAP`.
  * `specials`: a mapping of other special character sequences and parse
    functions for gathering them up. When one of the special
    character sequences is found in the string, the parse
    function is called to parse at that point.
    The parse functions accept
    `s` and the offset of the special character. They return
    the decoded string and the offset past the parse.

  The escape character `slosh` introduces an encoding of some
  replacement text whose value depends on the following character.
  If the following character is:
  * the escape character `slosh`, insert the escape character.
  * the string delimiter `delim`, insert the delimiter.
  * the character 'x', insert the character with code from the following
    2 hexadecimal digits.
  * the character 'u', insert the character with code from the following
    4 hexadecimal digits.
  * the character 'U', insert the character with code from the following
    8 hexadecimal digits.
  * a character from the keys of `mapper`
- <a name="get_suffix_part"></a>`get_suffix_part(s, *, keywords=('part',), numeral_map=None)`: Strip a trailing "part N" suffix from the string `s`.
  Return the matched suffix and the number part number.
  Retrn `(None,None)` on no match.

  Parameters:
  * `s`: the string
  * `keywords`: an iterable of `str` to match, or a single `str`;
    default `'part'`
  * `numeral_map`: an optional mapping of numeral names to numeric values;
    default `NUMERAL_NAMES['en']`, the English numerals

  Exanmple:

      >>> get_suffix_part('s09e10 - A New World: Part One')
      (': Part One', 1)
- <a name="get_tokens"></a>`get_tokens(s, offset, getters)`: Parse the string `s` from position `offset` using the supplied
  tokeniser functions `getters`.
  Return the list of tokens matched and the final offset.

  Parameters:
  * `s`: the string to parse.
  * `offset`: the starting position for the parse.
  * `getters`: an iterable of tokeniser specifications.

  Each tokeniser specification `getter` is either:
  * a callable expecting `(s,offset)` and returning `(token,new_offset)`
  * a literal string, to be matched exactly
  * a `tuple` or `list` with values `(func,args,kwargs)`;
    call `func(s,offset,*args,**kwargs)`
  * an object with a `.match` method such as a regex;
    call `getter.match(s,offset)` and return a match object with
    a `.end()` method returning the offset of the end of the match
- <a name="get_uc_identifier"></a>`get_uc_identifier(s, offset=0, number='0123456789', extras='_')`: Scan the string `s` for an identifier as for `get_identifier`,
  but require the letters to be uppercase.
- <a name="get_white"></a>`get_white(s, offset=0)`: Scan the string `s` for characters in `string.whitespace`
  starting at `offset` (default `0`).
  Return `(match,new_offset)`.
- <a name="has_format_attributes"></a>`has_format_attributes(*da, **dkw)`: Class decorator to walk this class for direct methods
  marked as for use in format strings
  and to include them in `cls.format_attributes()`.

  Methods are normally marked with the `@format_attribute` decorator.

  If `inherit` is true the base format attributes will be
  obtained from other classes:
  * `inherit` is `True`: use `cls.__mro__`
  * `inherit` is a class: use that class
  * otherwise assume `inherit` is an iterable of classes
  For each class `otherclass`, update the initial attribute
  mapping from `otherclass.get_format_attributes()`.
- <a name="hexify"></a>`hexify(bs)`: A flavour of `binascii.hexlify` returning a `str`.
- <a name="htmlify"></a>`htmlify(s, nbsp=False)`: Convert a string for safe transcription in HTML.

  Parameters:
  * `s`: the string
  * `nbsp`: replaces spaces with `"&nbsp;"` to prevent word folding,
    default `False`.
- <a name="htmlquote"></a>`htmlquote(s)`: Quote a string for use in HTML.
- <a name="Identifier"></a>`class Identifier(CoreTokens)`: A dotted identifier.

*`Identifier.parse(text: str, offset: int = 0, *, skip=False) -> Tuple[str, cs.lex.CoreTokens, int]`*:
Parse a dotted identifier from `test`.
- <a name="indent"></a>`indent(paragraph, line_indent='  ')`: Return the `paragraph` indented by `line_indent` (default `"  "`).
- <a name="is_dotted_identifier"></a>`is_dotted_identifier(s, offset=0, **kw)`: Test if the string `s` is an identifier from position `offset` onward.
- <a name="is_identifier"></a>`is_identifier(s, offset=0, **kw)`: Test if the string `s` is an identifier
  from position `offset` (default `0`) onward.
- <a name="is_uc_identifier"></a>`is_uc_identifier(s, offset=0, **kw)`: Test if the string `s` is an uppercase identifier
  from position `offset` (default `0`) onward.
- <a name="isUC_"></a>`isUC_(s)`: Check that a string matches the regular expression `^[A-Z][A-Z_0-9]*$`.
- <a name="jsquote"></a>`jsquote(s)`: Quote a string for use in JavaScript.
- <a name="lc_"></a>`lc_(value)`: Return `value.lower()`
  with `'-'` translated into `'_'` and `' '` translated into `'-'`.

  I use this to construct lowercase filenames containing a
  readable transcription of a title string.

  See also `titleify_lc()`, an imperfect reversal of this.
- <a name="match_tokens"></a>`match_tokens(s, offset, getters)`: Wrapper for `get_tokens` which catches `ValueError` exceptions
  and returns `(None,offset)`.
- <a name="NumericValue"></a>`class NumericValue(_LiteralValue)`: An `int` or `float` literal.

*`NumericValue.parse(text: str, offset: int = 0, *, skip=False) -> 'NumericValue'`*:
Parse a Python style `int` or `float`.
- <a name="parseUC_sAttr"></a>`parseUC_sAttr(attr)`: Take an attribute name `attr` and return `(key,is_plural)`.

  Examples:
  * `'FOO'` returns `('FOO',False)`.
  * `'FOOs'` or `'FOOes'` returns `('FOO',True)`.
  Otherwise return `(None,False)`.
- <a name="phpquote"></a>`phpquote(s)`: Quote a string for use in PHP code.
- <a name="printt"></a>`printt(*table, file=None, flush=False, indent='', print_func=None, **tabulate_kw)`: A wrapper for `tabulate()` to print the results.
  Each positional argument is a table row.

  Parameters:
  * `file`: optional output file, passed to `print_func`
  * `flush`: optional flush flag, passed to `print_func`
  * `indent`: optional leading indent for the output lines
  * `print_func`: optional `print()` function, default `builtins.print`
  Other keyword arguments are passed to `tabulate()`.
- <a name="QuotedString"></a>`class QuotedString(_LiteralValue)`: A double quoted string.

*`QuotedString.parse(text: str, offset: int = 0, *, skip=False) -> 'QuotedString'`*:
Parse a double quoted string from `text`.
- <a name="r"></a>`r(o, max_length=None, *, use_cls=False)`: Like `typed_str` but using `repr` instead of `str`.
  This is available as both `typed_repr` and `r`.
- <a name="s"></a>`s(o, use_cls=False, use_repr=False, max_length=32)`: Return "type(o).__name__:str(o)" for some object `o`.
  This is available as both `typed_str` and `s`.

  Parameters:
  * `use_cls`: default `False`;
    if true, use `str(type(o))` instead of `type(o).__name__`
  * `use_repr`: default `False`;
    if true, use `repr(o)` instead of `str(o)`

  I use this a lot when debugging. Example:

      from cs.lex import typed_str as s
      ......
      X("foo = %s", s(foo))
- <a name="skipwhite"></a>`skipwhite(s, offset=0)`: Convenience routine for skipping past whitespace;
  returns the offset of the next nonwhitespace character.
- <a name="slosh_mapper"></a>`slosh_mapper(c, charmap=None)`: Return a string to replace backslash-`c`, or `None`.
- <a name="slosh_quote"></a>`slosh_quote(raw_s: str, q: str)`: Quote a string `raw_s` with quote character `q`.
- <a name="snakecase"></a>`snakecase(camelcased)`: Convert a camel cased string `camelcased` into snake case.

  Parameters:
  * `cameelcased`: the cameel case string to convert
  * `first_letter_only`: optional flag (default `False`);
    if true then just ensure that the first character of a word
    is uppercased, otherwise use `str.title`

  Example:

      >>> snakecase('abcDef')
      'abc_def'
      >>> snakecase('abcDEf')
      'abc_def'
      >>> snakecase('AbcDef')
      'abc_def'
- <a name="split_remote_path"></a>`split_remote_path(remotepath: str) -> Tuple[Optional[str], str]`: OBSOLETE version of split_remote_path, suggestion: cs.fs.RemotePath.from_str

  Split a path with an optional leading `[user@]rhost:` prefix
  into the prefix and the remaining path.
  `None` is returned for the prefix is there is none.
  This is useful for things like `rsync` targets etc.

  OBSOLETE, use `cs.fs.RemotePath.from_str` instead.
- <a name="stripped_dedent"></a>`stripped_dedent(s, post_indent='', sub_indent='')`: Slightly smarter dedent which ignores a string's opening indent.

  Algorithm:
  strip the supplied string `s`, pull off the leading line,
  dedent the rest, put back the leading line.

  This is a lot like the `inspect.cleandoc()` function.

  This supports my preferred docstring layout, where the opening
  line of text is on the same line as the opening quote.

  The optional `post_indent` parameter may be used to indent
  the dedented text before return.

  The optional `sub_indent` parameter may be used to indent
  the second and following lines if the dedented text before return.

  Examples:

      >>> def func(s):
      ...   """ Slightly smarter dedent which ignores a string's opening indent.
      ...       Strip the supplied string `s`. Pull off the leading line.
      ...       Dedent the rest. Put back the leading line.
      ...   """
      ...   pass
      ...
      >>> from cs.lex import stripped_dedent
      >>> print(stripped_dedent(func.__doc__))
      Slightly smarter dedent which ignores a string's opening indent.
      Strip the supplied string `s`. Pull off the leading line.
      Dedent the rest. Put back the leading line.
      >>> print(stripped_dedent(func.__doc__, sub_indent='  '))
      Slightly smarter dedent which ignores a string's opening indent.
        Strip the supplied string `s`. Pull off the leading line.
        Dedent the rest. Put back the leading line.
      >>> print(stripped_dedent(func.__doc__, post_indent='  '))
        Slightly smarter dedent which ignores a string's opening indent.
        Strip the supplied string `s`. Pull off the leading line.
        Dedent the rest. Put back the leading line.
      >>> print(stripped_dedent(func.__doc__, post_indent='  ', sub_indent='| '))
        Slightly smarter dedent which ignores a string's opening indent.
        | Strip the supplied string `s`. Pull off the leading line.
        | Dedent the rest. Put back the leading line.
- <a name="strlist"></a>`strlist(ary, sep=', ')`: Convert an iterable to strings and join with `sep` (default `', '`).
- <a name="tabpadding"></a>`tabpadding(padlen, tabsize=8, offset=0)`: Compute some spaces to use a tab padding at an offfset.
- <a name="tabulate"></a>`tabulate(*rows, sep='  ', ppcls=None)`: A generator yielding lines of values from `rows` aligned in columns.

  Each row in rows is a list of strings. Non-`str` objects are
  promoted to `str` via `pprint.pformat`. If the strings contain
  newlines they will be split into subrows.

  Example:

      >>> for row in tabulate(
      ...     ['one col'],
      ...     ['three', 'column', 'row'],
      ...     ['row3', 'multi\nline\ntext', 'goes\nhere', 'and\nhere'],
      ...     ['two', 'cols'],
      ... ):
      ...     print(row)
      ...
      one col
      three    column  row
      row3     multi   goes  and
               line    here  here
               text
      two      cols
      >>>
- <a name="texthexify"></a>`texthexify(bs, shiftin='[', shiftout=']', whitelist=None)`: Transcribe the bytes `bs` to text using compact text runs for
  some common text values.

  This can be reversed with the `untexthexify` function.

  This is an ad doc format devised to be compact but also to
  expose "text" embedded within to the eye. The original use
  case was transcribing a binary directory entry format, where
  the filename parts would be somewhat visible in the transcription.

  The output is a string of hexadecimal digits for the encoded
  bytes except for runs of values from the whitelist, which are
  enclosed in the shiftin and shiftout markers and transcribed
  as is. The default whitelist is values of the ASCII letters,
  the decimal digits and the punctuation characters '_-+.,'.
  The default shiftin and shiftout markers are '[' and ']'.

  String objects converted with either `hexify` and `texthexify`
  output strings may be freely concatenated and decoded with
  `untexthexify`.

  Example:

      >>> texthexify(b'&^%&^%abcdefghi)(*)(*')
      '265e25265e25[abcdefghi]29282a29282a'

  Parameters:
  * `bs`: the bytes to transcribe
  * `shiftin`: Optional. The marker string used to indicate a shift to
    direct textual transcription of the bytes, default: `'['`.
  * `shiftout`: Optional. The marker string used to indicate a
    shift from text mode back into hexadecimal transcription,
    default `']'`.
  * `whitelist`: an optional bytes or string object indicating byte
    values which may be represented directly in text;
    the default value is the ASCII letters, the decimal digits
    and the punctuation characters `'_-+.,'`.
- <a name="titleify_lc"></a>`titleify_lc(value_lc)`: Translate `'-'` into `' '` and `'_'` translated into `'-'`,
  then titlecased.

  See also `lc_()`, which this reverses imperfectly.
- <a name="typed_repr"></a>`typed_repr(o, max_length=None, *, use_cls=False)`: Like `typed_str` but using `repr` instead of `str`.
  This is available as both `typed_repr` and `r`.
- <a name="typed_str"></a>`typed_str(o, use_cls=False, use_repr=False, max_length=32)`: Return "type(o).__name__:str(o)" for some object `o`.
  This is available as both `typed_str` and `s`.

  Parameters:
  * `use_cls`: default `False`;
    if true, use `str(type(o))` instead of `type(o).__name__`
  * `use_repr`: default `False`;
    if true, use `repr(o)` instead of `str(o)`

  I use this a lot when debugging. Example:

      from cs.lex import typed_str as s
      ......
      X("foo = %s", s(foo))
- <a name="unctrl"></a>`unctrl(s, tabsize=8)`: Return the string `s` with `TAB`s expanded and control characters
  replaced with printable representations.
- <a name="untexthexify"></a>`untexthexify(s, shiftin='[', shiftout=']')`: Decode a textual representation of binary data into binary data.

  This is the reverse of the `texthexify` function.

  Outside of the `shiftin`/`shiftout` markers the binary data
  are represented as hexadecimal. Within the markers the bytes
  have the values of the ordinals of the characters.

  Example:

      >>> untexthexify('265e25265e25[abcdefghi]29282a29282a')
      b'&^%&^%abcdefghi)(*)(*'

  Parameters:
  * `s`: the string containing the text representation.
  * `shiftin`: Optional. The marker string commencing a sequence
    of direct text transcription, default `'['`.
  * `shiftout`: Optional. The marker string ending a sequence
    of direct text transcription, default `']'`.

# Release Log



*Release 20250724*:
tabulate: new optional ppcls argument to supply a custom PrettyPrinter class for formatting, use a date and datetime aware class by default.

*Release 20250428*:
* cutprefix,cutsuffix: also accept a tuple of str like str.startswith and str.endswith.
* typed_str: use cropped_repr() instead of repr().

*Release 20250414*:
Obsolete split_remote_path(), supplanted by cs.fs.RemotePath.from_str().

*Release 20250323*:
* tabulate: format nonstr using pformat.
* New printt() wrapper for tabulate() which prints the table.

*Release 20250103*:
* Move Identifier, NumericValue, QuotedString in from cs.app.tagger.rules.
* BaseToken: expose the parsing subclass selection as a `.token_classes class method.

*Release 20241207*:
tabulate: split cells containing newlines over multiple output rows.

*Release 20241122*:
tabulate: make the default separator two spaces instead of one, immediate return if no rows (avoids max() of empty list).

*Release 20241119.1*:
Add PyPI classifier, in part to test an updated release script.

*Release 20241119*:
stripped_dedent: new optional sub_indent parameter for indenting the second and following lines, handy for usage messages.

*Release 20241109*:
* stripped_dedent: new optional post_indent parameter to indent the dedented text.
* New tabulate(*rows) generator function yielding lines of padded columns.

*Release 20240630*:
New indent(paragraph,line_indent="  ") function.

*Release 20240519*:
New get_suffix_part() to extract things line ": Part One" from something such as a TV episode name.

*Release 20240316*:
Fixed release upload artifacts.

*Release 20240211*:
New split_remote_path() function to recognise [[user@]host]:path.

*Release 20231018*:
New is_uc_identifier function.

*Release 20230401*:
Import update.

*Release 20230217.1*:
Fix package requirements.

*Release 20230217*:
* New get_prefix_n function to parse a numeric value preceeded by a prefix.
* Drop strip_prefix_n, get_prefix_n is more general and I had not got around to using strip_prefix_n yet - when I did, I ended up writing get_prefix_n.

*Release 20230210*:
* @has_format_attributes: new optional inherit parameter to inherit superclass (or other) format attributes, default False.
* New FNumericMixin, FFloat, FInt FormatableMixin subclasses like FStr - they add .localtime and .utctime formattable attributes.

*Release 20220918*:
typed_str(): crop the value part, default max_length=32, bugfix message cropping.

*Release 20220626*:
* Remove dependency on cs.py3, we've been Python 2 incompatible for a while.
* FormatableFormatter.format_field: promote None to FStr(None).

*Release 20220227*:
* typed_str,typed_repr: make max_length the first optional positional parameter, make other parameters keyword only.
* New camelcase() and snakecase() functions.

*Release 20211208*:
Docstring updates.

*Release 20210913*:
* FormatableFormatter.FORMAT_RE_ARG_NAME_s: strings commencing with digits now match \d+(\.\d+)[a-z]+, eg "02d".
* Alias typed_str as s and typed_repr as r.
* FormatableFormatter: new .format_mode thread local state object initially with strict=False, used to control whether unknown fields leave a placeholder or raise KeyError.
* FormatableFormatter.format_field: assorted fixes.

*Release 20210906*:
New strip_prefix_n() function to strip a leading `prefix` and numeric value `n` from the start of a string.

*Release 20210717*:
* Many many changes to FormatableMixin, FormatableFormatter and friends around supporting {foo|conv1|con2|...} instead of {foo!conv}. Still in flux.
* New typed_repr like typed_str but using repr.

*Release 20210306*:
* New cropped() function to crop strings.
* Rework cropped_repr() to do the repr() itself, and to crop the interiors of tuples and lists.
* cropped_repr: new inner_max_length for cropping the members of collections.
* cropped_repr: special case for length=1 tuples.
* New typed_str(o) object returning type(o).__name__:str(o) in the default case, useful for debugging.

*Release 20201228*:
Minor doc updates.

*Release 20200914*:
* Hide terribly special purpose lastlinelen() in cs.hier under a private name.
* New common_prefix and common_suffix function to compare strings.

*Release 20200718*:
get_chars: accept a callable for gochars, indicating a per character test function.

*Release 20200613*:
cropped_repr: replace hardwired 29 with computed length

*Release 20200517*:
* New get_ini_clausename to parse "[clausename]".
* New get_ini_clause_entryname parsing "[clausename]entryname".
* New cropped_repr for returning a shortened repr()+"..." if the length exceeds a threshold.
* New format_escape function to double {} characters to survive str.format.

*Release 20200318*:
* New lc_() function to lowercase and dash a string, new titleify_lc() to mostly reverse lc_().
* New format_as function, FormatableMixin and related FormatAsError.

*Release 20200229*:
New cutprefix and cutsuffix functions.

*Release 20190812*:
Fix bad slosh escapes in strings.

*Release 20190220*:
New function get_qstr_or_identifier.

*Release 20181108*:
new function get_decimal_or_float_value to read a decimal or basic float

*Release 20180815*:
No semantic changes; update some docstrings and clean some lint, fix a unit test.

*Release 20180810*:
* New get_decimal_value and get_hexadecimal_value functions.
* New stripped_dedent function, a slightly smarter textwrap.dedent.

*Release 20171231*:
New function get_decimal. Drop unused function dict2js.

*Release 20170904*:
Python 2/3 ports, move rfc2047 into new cs.rfc2047 module.

*Release 20160828*:
* Use "install_requires" instead of "requires" in DISTINFO.
* Discard str1(), pointless optimisation.
* unrfc2047: map _ to SPACE, improve exception handling.
* Add phpquote: quote a string for use in PHP code; add docstring to jsquote.
* Add is_identifier test.
* Add get_dotted_identifier.
* Add is_dotted_identifier.
* Add get_hexadecimal.
* Add skipwhite, convenince wrapper for get_white returning just the next offset.
* Assorted bugfixes and improvements.

*Release 20150120*:
cs.lex: texthexify: backport to python 2 using cs.py3 bytes type

*Release 20150118*:
metadata updates

*Release 20150116*:
PyPI metadata and slight code cleanup.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "cs-lex",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "python2, python3",
    "author": null,
    "author_email": "Cameron Simpson <cs@cskk.id.au>",
    "download_url": "https://files.pythonhosted.org/packages/7f/c6/6bc66ddb17f870b62b94f506eaabfbc1543b3e6c99bb93c701bc6b078a93/cs_lex-20250724.tar.gz",
    "platform": null,
    "description": "Lexical analysis functions, tokenisers, transcribers:\nan arbitrary assortment of lexical and tokenisation functions useful\nfor writing recursive descent parsers, of which I have several.\nThere are also some transcription functions for producing text\nfrom various objects, such as `hexify` and `unctrl`.\n\n*Latest release 20250724*:\ntabulate: new optional ppcls argument to supply a custom PrettyPrinter class for formatting, use a date and datetime aware class by default.\n\nGenerally the get_* functions accept a source string and an offset\n(usually optional, default `0`) and return a token and the new offset,\nraising `ValueError` on failed tokenisation.\n\nShort summary:\n* `as_lines`: Generator yielding complete lines from arbitrary pieces of text from the iterable of `str` `chunks`.\n* `BaseToken`: A mixin for token dataclasses.\n* `camelcase`: Convert a snake cased string `snakecased` into camel case.\n* `common_prefix`: Return the common prefix of the strings `strs`.\n* `common_suffix`: Return the common suffix of the strings `strs`.\n* `CoreTokens`: A mixin for token dataclasses whose subclasses include `Identifier`, 'NumericValue` and `QuotedString`.\n* `cropped`: If the length of `s` exceeds `max_length` (default `32`), replace enough of the tail with `ellipsis` and the last `roffset` (default `1`) characters of `s` to fit in `max_length` characters.\n* `cropped_repr`: Compute a cropped `repr()` of `o`.\n* `cutprefix`: Remove `prefix` from the front of `s` if present. Return the suffix if `s.startswith(prefix)`, else `s`. As with `str.startswith`, `prefix` may be a `str` or a `tuple` of `str`. If a tuple, the first matching prefix from the tuple will be removed.\n* `cutsuffix`: Remove `suffix` from the end of `s` if present. Return the prefix if `s.endswith(suffix)`, else `s`. As with `str.endswith`, `suffix` may be a `str` or a `tuple` of `str`. If a tuple, the first matching suffix from the tuple will be removed.\n* `FFloat`: Formattable `float`.\n* `FInt`: Formattable `int`.\n* `FNumericMixin`: A `FormatableMixin` subclass.\n* `format_as`: Format the string `format_s` using `Formatter.vformat`, return the formatted result. This is a wrapper for `str.format_map` which raises a more informative `FormatAsError` exception on failure.\n* `format_attribute`: A decorator to mark a method as available as a format method. Requires the enclosing class to be decorated with `@has_format_attributes`.\n* `format_escape`: Escape `{}` characters in a string to protect them from `str.format`.\n* `format_recover`: Decorator for `__format__` methods which replaces failed formats with `{self:format_spec}`.\n* `FormatableFormatter`: A `string.Formatter` subclass interacting with objects which inherit from `FormatableMixin`.\n* `FormatableMixin`: A subclass of `FormatableFormatter` which  provides 2 features: - a `__format__` method which parses the `format_spec` string into multiple colon separated terms whose results chain - a `format_as` method which formats a format string using `str.format_map` with a suitable mapping derived from the instance via its `format_kwargs` method (whose default is to return the instance itself).\n* `FormatAsError`: Subclass of `LookupError` for use by `format_as`.\n* `FStr`: A `str` subclass with the `FormatableMixin` methods, particularly its `__format__` method which uses `str` method names as valid formats.\n* `get_chars`: Scan the string `s` for characters in `gochars` starting at `offset`. Return `(match,new_offset)`.\n* `get_decimal`: Scan the string `s` for decimal characters starting at `offset` (default `0`). Return `(dec_string,new_offset)`.\n* `get_decimal_or_float_value`: Fetch a decimal or basic float (nnn.nnn) value from the str `s` at `offset` (default `0`). Return `(value,new_offset)`.\n* `get_decimal_value`: Scan the string `s` for a decimal value starting at `offset` (default `0`). Return `(value,new_offset)`.\n* `get_delimited`: Collect text from the string `s` from position `offset` up to the first occurence of delimiter `delim`; return the text excluding the delimiter and the offset after the delimiter.\n* `get_dotted_identifier`: Scan the string `s` for a dotted identifier (by default an ASCII letter or underscore followed by letters, digits or underscores) with optional trailing dot and another dotted identifier, starting at `offset` (default `0`). Return `(match,new_offset)`.\n* `get_envvar`: Parse a simple environment variable reference to $varname or $x where \"x\" is a special character.\n* `get_hexadecimal`: Scan the string `s` for hexadecimal characters starting at `offset` (default `0`). Return `(hex_string,new_offset)`.\n* `get_hexadecimal_value`: Scan the string `s` for a hexadecimal value starting at `offset` (default `0`). Return `(value,new_offset)`.\n* `get_identifier`: Scan the string `s` for an identifier (by default an ASCII letter or underscore followed by letters, digits or underscores) starting at `offset` (default 0). Return `(match,new_offset)`.\n* `get_ini_clause_entryname`: Parse a `[`*clausename*`]`*entryname* string from `s` at `offset` (default `0`). Return `(clausename,entryname,new_offset)`.\n* `get_ini_clausename`: Parse a `[`*clausename*`]` string from `s` at `offset` (default `0`). Return `(clausename,new_offset)`.\n* `get_nonwhite`: Scan the string `s` for characters not in `string.whitespace` starting at `offset` (default `0`). Return `(match,new_offset)`.\n* `get_other_chars`: Scan the string `s` for characters not in `stopchars` starting at `offset` (default `0`). Return `(match,new_offset)`.\n* `get_prefix_n`: Strip a leading `prefix` and numeric value `n` from the string `s` starting at `offset` (default `0`). Return the matched prefix, the numeric value and the new offset. Returns `(None,None,offset)` on no match.\n* `get_qstr`: Get quoted text with slosh escapes and optional environment substitution.\n* `get_qstr_or_identifier`: Parse a double quoted string or an identifier.\n* `get_sloshed_text`: Collect slosh escaped text from the string `s` from position `offset` (default `0`) and return the decoded unicode string and the offset of the completed parse.\n* `get_suffix_part`: Strip a trailing \"part N\" suffix from the string `s`. Return the matched suffix and the number part number. Retrn `(None,None)` on no match.\n* `get_tokens`: Parse the string `s` from position `offset` using the supplied tokeniser functions `getters`. Return the list of tokens matched and the final offset.\n* `get_uc_identifier`: Scan the string `s` for an identifier as for `get_identifier`, but require the letters to be uppercase.\n* `get_white`: Scan the string `s` for characters in `string.whitespace` starting at `offset` (default `0`). Return `(match,new_offset)`.\n* `has_format_attributes`: Class decorator to walk this class for direct methods marked as for use in format strings and to include them in `cls.format_attributes()`.\n* `hexify`: A flavour of `binascii.hexlify` returning a `str`.\n* `htmlify`: Convert a string for safe transcription in HTML.\n* `htmlquote`: Quote a string for use in HTML.\n* `Identifier`: A dotted identifier.\n* `indent`: Return the `paragraph` indented by `line_indent` (default `\"  \"`).\n* `is_dotted_identifier`: Test if the string `s` is an identifier from position `offset` onward.\n* `is_identifier`: Test if the string `s` is an identifier from position `offset` (default `0`) onward.\n* `is_uc_identifier`: Test if the string `s` is an uppercase identifier from position `offset` (default `0`) onward.\n* `isUC_`: Check that a string matches the regular expression `^[A-Z][A-Z_0-9]*$`.\n* `jsquote`: Quote a string for use in JavaScript.\n* `lc_`: Return `value.lower()` with `'-'` translated into `'_'` and `' '` translated into `'-'`.\n* `match_tokens`: Wrapper for `get_tokens` which catches `ValueError` exceptions and returns `(None,offset)`.\n* `NumericValue`: An `int` or `float` literal.\n* `parseUC_sAttr`: Take an attribute name `attr` and return `(key,is_plural)`.\n* `phpquote`: Quote a string for use in PHP code.\n* `printt`: A wrapper for `tabulate()` to print the results. Each positional argument is a table row.\n* `QuotedString`: A double quoted string.\n* `r`: Like `typed_str` but using `repr` instead of `str`. This is available as both `typed_repr` and `r`.\n* `s`: Return \"type(o).__name__:str(o)\" for some object `o`. This is available as both `typed_str` and `s`.\n* `skipwhite`: Convenience routine for skipping past whitespace; returns the offset of the next nonwhitespace character.\n* `slosh_mapper`: Return a string to replace backslash-`c`, or `None`.\n* `slosh_quote`: Quote a string `raw_s` with quote character `q`.\n* `snakecase`: Convert a camel cased string `camelcased` into snake case.\n* `split_remote_path`: OBSOLETE version of split_remote_path, suggestion: cs.fs.RemotePath.from_str.\n* `stripped_dedent`: Slightly smarter dedent which ignores a string's opening indent.\n* `strlist`: Convert an iterable to strings and join with `sep` (default `', '`).\n* `tabpadding`: Compute some spaces to use a tab padding at an offfset.\n* `tabulate`: A generator yielding lines of values from `rows` aligned in columns.\n* `texthexify`: Transcribe the bytes `bs` to text using compact text runs for some common text values.\n* `titleify_lc`: Translate `'-'` into `' '` and `'_'` translated into `'-'`, then titlecased.\n* `typed_repr`: Like `typed_str` but using `repr` instead of `str`. This is available as both `typed_repr` and `r`.\n* `typed_str`: Return \"type(o).__name__:str(o)\" for some object `o`. This is available as both `typed_str` and `s`.\n* `unctrl`: Return the string `s` with `TAB`s expanded and control characters replaced with printable representations.\n* `untexthexify`: Decode a textual representation of binary data into binary data.\n\nModule contents:\n- <a name=\"as_lines\"></a>`as_lines(chunks, partials=None)`: Generator yielding complete lines from arbitrary pieces of text from\n  the iterable of `str` `chunks`.\n\n  After completion, any remaining newline-free chunks remain\n  in the partials list; they will be unavailable to the caller\n  unless the list is presupplied.\n- <a name=\"BaseToken\"></a>`class BaseToken(cs.deco.Promotable)`: A mixin for token dataclasses.\n\n  Presently I use this in `cs.app.tagger.rules` and `cs.app.pilfer.parse`.\n\n*`BaseToken.from_str(text: str) -> 'BaseToken'`*:\nParse `test` as a token of type `cls`, return the token.\nRaises `SyntaxError` on a parse failure.\nThis is a wrapper for the `parse` class method.\n\n*`BaseToken.matched_text`*:\nThe text from `self.source_text` which matches this token.\n\n*`BaseToken.parse(text: str, offset: int = 0, *, skip=False) -> Tuple[ForwardRef('BaseToken'), int]`*:\nParse a token from `test` at `offset` (default `0`).\nReturn a `BaseToken` subclass instance.\nRaise `SyntaxError` if no subclass parses it.\nRaise `EOFError` if at the end of the `text`,\nchecked after any whitespace if `skip` is true.\nThe returned token's `.end_offset` is the next parse point.\n\nThis base class method attempts the `.parse` method of all\nthe public subclasses.\n\nParameters:\n* `text`: the text being parsed\n* `offset`: the offset within the `text` of the the parse cursor\n* `skip`: if true (default `False`), skip any leading\n  whitespace before matching\n\n*`BaseToken.scan(text: str, offset: int = 0, *, skip=True) -> Iterable[ForwardRef('BaseToken')]`*:\nScan `text`, parsing tokens using `BaseToken.parse` and yielding them.\nParameters are as for `BaseToken.parse` except as follows:\n- encountering end of text end the iteration instead of raising `EOFError`\n- `skip` defaults to `True` to allow whitespace between tokens\n\n*`BaseToken.token_classes()`*:\nReturn the `baseToken` subclasses to consider when parsing a token stream.\n- <a name=\"camelcase\"></a>`camelcase(snakecased, first_letter_only=False)`: Convert a snake cased string `snakecased` into camel case.\n\n  Parameters:\n  * `snakecased`: the snake case string to convert\n  * `first_letter_only`: optional flag (default `False`);\n    if true then just ensure that the first character of a word\n    is uppercased, otherwise use `str.title`\n\n  Example:\n\n      >>> camelcase('abc_def')\n      'abcDef'\n      >>> camelcase('ABc_def')\n      'abcDef'\n      >>> camelcase('abc_dEf')\n      'abcDef'\n      >>> camelcase('abc_dEf', first_letter_only=True)\n      'abcDEf'\n- <a name=\"common_prefix\"></a>`common_prefix(*strs)`: Return the common prefix of the strings `strs`.\n\n  Examples:\n\n      >>> common_prefix('abc', 'def')\n      ''\n      >>> common_prefix('abc', 'abd')\n      'ab'\n      >>> common_prefix('abc', 'abcdef')\n      'abc'\n      >>> common_prefix('abc', 'abcdef', 'abz')\n      'ab'\n      >>> # contrast with cs.fileutils.common_path_prefix\n      >>> common_prefix('abc/def', 'abc/def1', 'abc/def2')\n      'abc/def'\n- <a name=\"common_suffix\"></a>`common_suffix(*strs)`: Return the common suffix of the strings `strs`.\n- <a name=\"CoreTokens\"></a>`class CoreTokens(BaseToken)`: A mixin for token dataclasses whose subclasses include `Identifier`,\n  'NumericValue` and `QuotedString`.\n- <a name=\"cropped\"></a>`cropped(s: str, max_length: int = 32, roffset: int = 1, ellipsis: str = '...')`: If the length of `s` exceeds `max_length` (default `32`),\n  replace enough of the tail with `ellipsis`\n  and the last `roffset` (default `1`) characters of `s`\n  to fit in `max_length` characters.\n- <a name=\"cropped_repr\"></a>`cropped_repr(o, roffset=1, max_length=32, inner_max_length=None)`: Compute a cropped `repr()` of `o`.\n\n  Parameters:\n  * `o`: the object to represent\n  * `max_length`: the maximum length of the representation, default `32`\n  * `inner_max_length`: the maximum length of the representations\n    of members of `o`, default `max_length//2`\n  * `roffset`: the number of trailing characters to preserve, default `1`\n- <a name=\"cutprefix\"></a>`cutprefix(s, prefix)`: Remove `prefix` from the front of `s` if present.\n  Return the suffix if `s.startswith(prefix)`, else `s`.\n  As with `str.startswith`, `prefix` may be a `str` or a `tuple` of `str`.\n  If a tuple, the first matching prefix from the tuple will be removed.\n\n  Example:\n\n      >>> abc_def = 'abc.def'\n      >>> cutprefix(abc_def, 'abc.')\n      'def'\n      >>> cutprefix(abc_def, 'zzz.')\n      'abc.def'\n      >>> cutprefix(abc_def, '.zzz') is abc_def\n      True\n      >>> cutprefix('this_that', ('this', 'thusly'))\n      '_that'\n      >>> cutprefix('thusly_that', ('this', 'thusly'))\n      '_that'\n- <a name=\"cutsuffix\"></a>`cutsuffix(s, suffix)`: Remove `suffix` from the end of `s` if present.\n  Return the prefix if `s.endswith(suffix)`, else `s`.\n  As with `str.endswith`, `suffix` may be a `str` or a `tuple` of `str`.\n  If a tuple, the first matching suffix from the tuple will be removed.\n\n  Example:\n\n      >>> abc_def = 'abc.def'\n      >>> cutsuffix(abc_def, '.def')\n      'abc'\n      >>> cutsuffix(abc_def, '.zzz')\n      'abc.def'\n      >>> cutsuffix(abc_def, '.zzz') is abc_def\n      True\n      >>> cutsuffix('this_that', ('that', 'tother'))\n      'this_'\n      >>> cutsuffix('this_tother', ('that', 'tother'))\n      'this_'\n- <a name=\"FFloat\"></a>`class FFloat(FNumericMixin, builtins.float)`: Formattable `float`.\n- <a name=\"FInt\"></a>`class FInt(FNumericMixin, builtins.int)`: Formattable `int`.\n- <a name=\"FNumericMixin\"></a>`class FNumericMixin(FormatableMixin)`: A `FormatableMixin` subclass.\n\n*`FNumericMixin.localtime(self)`*:\nTreat this as a UNIX timestamp and return a localtime `datetime`.\n\n*`FNumericMixin.utctime(self)`*:\nTreat this as a UNIX timestamp and return a UTC `datetime`.\n- <a name=\"format_as\"></a>`format_as(format_s: str, format_mapping, formatter=None, error_sep=None, strict=None)`: Format the string `format_s` using `Formatter.vformat`,\n  return the formatted result.\n  This is a wrapper for `str.format_map`\n  which raises a more informative `FormatAsError` exception on failure.\n\n  Parameters:\n  * `format_s`: the format string to use as the template\n  * `format_mapping`: the mapping of available replacement fields\n  * `formatter`: an optional `string.Formatter`-like instance\n    with a `.vformat(format_string,args,kwargs)` method,\n    usually a subclass of `string.Formatter`;\n    if not specified then `FormatableFormatter` is used\n  * `error_sep`: optional separator for the multipart error message,\n    default from `FormatAsError.DEFAULT_SEPARATOR`:\n    `'; '`\n  * `strict`: optional flag (default `False`)\n    indicating that an unresolveable field should raise a\n    `KeyError` instead of inserting a placeholder\n- <a name=\"format_attribute\"></a>`format_attribute(method)`: A decorator to mark a method as available as a format method.\n  Requires the enclosing class to be decorated with `@has_format_attributes`.\n\n  For example,\n  the `FormatableMixin.json` method is defined like this:\n\n      @format_attribute\n      def json(self):\n          return self.FORMAT_JSON_ENCODER.encode(self)\n\n  which allows a `FormatableMixin` subclass instance\n  to be used in a format string like this:\n\n      {instance:json}\n\n  to insert a JSON transcription of the instance.\n\n  It is recommended that methods marked with `@format_attribute`\n  have no side effects and do not modify state,\n  as they are intended for use in ad hoc format strings\n  supplied by an end user.\n- <a name=\"format_escape\"></a>`format_escape(s)`: Escape `{}` characters in a string to protect them from `str.format`.\n- <a name=\"format_recover\"></a>`format_recover(*da, **dkw)`: Decorator for `__format__` methods which replaces failed formats\n  with `{self:format_spec}`.\n- <a name=\"FormatableFormatter\"></a>`class FormatableFormatter(string.Formatter)`: A `string.Formatter` subclass interacting with objects\n  which inherit from `FormatableMixin`.\n\n*`FormatableFormatter.format_field(value, format_spec: str)`*:\nFormat a value using `value.format_format_field`,\nreturning an `FStr`\n(a `str` subclass with additional `format_spec` features).\n\nWe actually recognise colon separated chains of formats\nand apply each format to the previously converted value.\nThe final result is promoted to an `FStr` before return.\n\n*`FormatableFormatter.format_mode`*:\nThread local state object.\n\nAttributes:\n* `strict`: initially `False`; raise a `KeyError` for\n  unresolveable field names\n\n*`FormatableFormatter.get_arg_name(field_name)`*:\nDefault initial arg_name is an identifier.\n\nReturns `(prefix,offset)`, and `('',0)` if there is no arg_name.\n\n*`FormatableFormatter.get_field(self, field_name, args, kwargs)`*:\nGet the object referenced by the field text `field_name`.\nRaises `KeyError` for an unknown `field_name`.\n\n*`FormatableFormatter.get_format_subspecs(format_spec)`*:\nParse a `format_spec` as a sequence of colon separated components,\nreturn a list of the components.\n\n*`FormatableFormatter.get_subfield(value, subfield_text: str)`*:\nResolve `value` against `subfield_text`,\nthe remaining field text after the term which resolved to `value`.\n\nFor example, a format `{name.blah[0]}`\nhas the field text `name.blah[0]`.\nA `get_field` implementation might initially\nresolve `name` to some value,\nleaving `.blah[0]` as the `subfield_text`.\nThis method supports taking that value\nand resolving it against the remaining text `.blah[0]`.\n\nFor generality, if `subfield_text` is the empty string\n`value` is returned unchanged.\n\n*`FormatableFormatter.get_value(self, arg_name, args, kwargs)`*:\nGet the object with index `arg_name`.\n\nThis default implementation returns `(kwargs[arg_name],arg_name)`.\n- <a name=\"FormatableMixin\"></a>`class FormatableMixin(FormatableFormatter)`: A subclass of `FormatableFormatter` which  provides 2 features:\n  - a `__format__` method which parses the `format_spec` string\n    into multiple colon separated terms whose results chain\n  - a `format_as` method which formats a format string using `str.format_map`\n    with a suitable mapping derived from the instance\n    via its `format_kwargs` method\n    (whose default is to return the instance itself)\n\n  The `format_as` method is like an inside out `str.format` or\n  `object.__format__` method.\n\n  The `str.format` method is designed for formatting a string\n  from a variety of other objects supplied in the keyword arguments.\n\n  The `object.__format__` method is for filling out a single `str.format`\n  replacement field from a single object.\n\n  By contrast, `format_as` is designed to fill out an entire format\n  string from the current object.\n\n  For example, the `cs.tagset.TagSetMixin` class\n  uses `FormatableMixin` to provide a `format_as` method\n  whose replacement fields are derived from the tags in the tag set.\n\n  Subclasses wanting to provide additional `format_spec` terms\n  should:\n  - override `FormatableFormatter.format_field1` to implement\n    terms with no colons, letting `format_field` do the split into terms\n  - override `FormatableFormatter.get_format_subspecs` to implement\n    the parse of `format_spec` into a sequence of terms.\n    This might recognise a special additional syntax\n    and quietly fall back to `super().get_format_subspecs`\n    if that is not present.\n\n*`FormatableMixin.__format__(self, format_spec)`*:\nFormat `self` according to `format_spec`.\n\nThis implementation calls `self.format_field`.\nAs such, a `format_spec` is considered\na sequence of colon separated terms.\n\nClasses wanting to implement additional format string syntaxes\nshould either:\n- override `FormatableFormatter.format_field1` to implement\n  terms with no colons, letting `format_field1` do the split into terms\n- override `FormatableFormatter.get_format_subspecs` to implement\n  the term parse.\n\nThe default implementation of `__format1__` just calls `super().__format__`.\nImplementations providing specialised formats\nshould implement them in `__format1__`\nwith fallback to `super().__format1__`.\n\n*`FormatableMixin.convert_field(self, value, conversion)`*:\nThe default converter for fields calls `Formatter.convert_field`.\n\n*`FormatableMixin.convert_via_method_or_attr(self, value, format_spec)`*:\nApply a method or attribute name based conversion to `value`\nwhere `format_spec` starts with a method name\napplicable to `value`.\nReturn `(converted,offset)`\nbeing the converted value and the offset after the method name.\n\nNote that if there is not a leading identifier on `format_spec`\nthen `value` is returned unchanged with `offset=0`.\n\nThe methods/attributes are looked up in the mapping\nreturned by `.format_attributes()` which represents allowed methods\n(broadly, one should not allow methods which modify any state).\n\nIf this returns a callable, it is called to obtain the converted value\notherwise it is used as is.\n\nAs a final tweak,\nif `value.get_format_attribute()` raises an `AttributeError`\n(the attribute is not an allowed attribute)\nor calling the attribute raises a `TypeError`\n(the `value` isn't suitable)\nand the `value` is not an instance of `FStr`,\nconvert it to an `FStr` and try again.\nThis provides the common utility methods on other types.\n\nThe motivating example was a `PurePosixPath`,\nwhich does not JSON transcribe;\nthis tweak supports both\n`posixpath:basename` via the pathlib stuff\nand `posixpath:json` via `FStr`\neven though a `PurePosixPath` does not subclass `FStr`.\n\n*`FormatableMixin.format_as(self, format_s, error_sep=None, strict=None, **control_kw)`*:\nReturn the string `format_s` formatted using the mapping\nreturned by `self.format_kwargs(**control_kw)`.\n\nIf a class using the mixin has no `format_kwargs(**control_kw)` method\nto provide a mapping for `str.format_map`\nthen the instance itself is used as the mapping.\n\n*`FormatableMixin.get_format_attribute(self, attr)`*:\nReturn a mapping of permitted methods to functions of an instance.\nThis is used to whitelist allowed `:`*name* method formats\nto prevent scenarios like little Bobby Tables calling `delete()`.\n\n*`FormatableMixin.get_format_attributes()`*:\nReturn the mapping of format attributes.\n\n*`FormatableMixin.json(self)`*:\nThe value transcribed as compact JSON.\n- <a name=\"FormatAsError\"></a>`class FormatAsError(builtins.LookupError)`: Subclass of `LookupError` for use by `format_as`.\n- <a name=\"FStr\"></a>`class FStr(FormatableMixin, builtins.str)`: A `str` subclass with the `FormatableMixin` methods,\n  particularly its `__format__` method\n  which uses `str` method names as valid formats.\n\n  It also has a bunch of utility methods which are available\n  as `:`*method* in format strings.\n\n*`FStr.basename(self)`*:\nTreat as a filesystem path and return the basename.\n\n*`FStr.dirname(self)`*:\nTreat as a filesystem path and return the dirname.\n\n*`FStr.f(self)`*:\nParse `self` as a `float`.\n\n*`FStr.i(self, base=10)`*:\nParse `self` as an `int`.\n\n*`FStr.lc(self)`*:\nLowercase using `lc_()`.\n\n*`FStr.path(self)`*:\nConvert to a native filesystem `pathlib.Path`.\n\n*`FStr.posix_path(self)`*:\nConvert to a Posix filesystem `pathlib.Path`.\n\n*`FStr.windows_path(self)`*:\nConvert to a Windows filesystem `pathlib.Path`.\n- <a name=\"get_chars\"></a>`get_chars(s, offset, gochars)`: Scan the string `s` for characters in `gochars` starting at `offset`.\n  Return `(match,new_offset)`.\n\n  `gochars` may also be a callable, in which case a character\n  `ch` is accepted if `gochars(ch)` is true.\n- <a name=\"get_decimal\"></a>`get_decimal(s, offset=0)`: Scan the string `s` for decimal characters starting at `offset` (default `0`).\n  Return `(dec_string,new_offset)`.\n- <a name=\"get_decimal_or_float_value\"></a>`get_decimal_or_float_value(s, offset=0)`: Fetch a decimal or basic float (nnn.nnn) value\n  from the str `s` at `offset` (default `0`).\n  Return `(value,new_offset)`.\n- <a name=\"get_decimal_value\"></a>`get_decimal_value(s, offset=0)`: Scan the string `s` for a decimal value starting at `offset` (default `0`).\n  Return `(value,new_offset)`.\n- <a name=\"get_delimited\"></a>`get_delimited(s, offset, delim)`: Collect text from the string `s` from position `offset` up\n  to the first occurence of delimiter `delim`; return the text\n  excluding the delimiter and the offset after the delimiter.\n- <a name=\"get_dotted_identifier\"></a>`get_dotted_identifier(s, offset=0, **kw)`: Scan the string `s` for a dotted identifier (by default an\n  ASCII letter or underscore followed by letters, digits or\n  underscores) with optional trailing dot and another dotted\n  identifier, starting at `offset` (default `0`).\n  Return `(match,new_offset)`.\n\n  Note: the empty string and an unchanged offset will be returned if\n  there is no leading letter/underscore.\n\n  Keyword arguments are passed to `get_identifier`\n  (used for each component of the dotted identifier).\n- <a name=\"get_envvar\"></a>`get_envvar(s, offset=0, environ=None, default=None, specials=None)`: Parse a simple environment variable reference to $varname or\n  $x where \"x\" is a special character.\n\n  Parameters:\n  * `s`: the string with the variable reference\n  * `offset`: the starting point for the reference\n  * `default`: default value for missing environment variables;\n     if `None` (the default) a `ValueError` is raised\n  * `environ`: the environment mapping, default `os.environ`\n  * `specials`: the mapping of special single character variables\n- <a name=\"get_hexadecimal\"></a>`get_hexadecimal(s, offset=0)`: Scan the string `s` for hexadecimal characters starting at `offset` (default `0`).\n  Return `(hex_string,new_offset)`.\n- <a name=\"get_hexadecimal_value\"></a>`get_hexadecimal_value(s, offset=0)`: Scan the string `s` for a hexadecimal value starting at `offset` (default `0`).\n  Return `(value,new_offset)`.\n- <a name=\"get_identifier\"></a>`get_identifier(s, offset=0, alpha='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', number='0123456789', extras='_')`: Scan the string `s` for an identifier (by default an ASCII\n  letter or underscore followed by letters, digits or underscores)\n  starting at `offset` (default 0).\n  Return `(match,new_offset)`.\n\n  *Note*: the empty string and an unchanged offset will be returned if\n  there is no leading letter/underscore.\n\n  Parameters:\n  * `s`: the string to scan\n  * `offset`: the starting offset, default `0`.\n  * `alpha`: the characters considered alphabetic,\n    default `string.ascii_letters`.\n  * `number`: the characters considered numeric,\n    default `string.digits`.\n  * `extras`: extra characters considered part of an identifier,\n    default `'_'`.\n- <a name=\"get_ini_clause_entryname\"></a>`get_ini_clause_entryname(s, offset=0)`: Parse a `[`*clausename*`]`*entryname* string\n  from `s` at `offset` (default `0`).\n  Return `(clausename,entryname,new_offset)`.\n- <a name=\"get_ini_clausename\"></a>`get_ini_clausename(s, offset=0)`: Parse a `[`*clausename*`]` string from `s` at `offset` (default `0`).\n  Return `(clausename,new_offset)`.\n- <a name=\"get_nonwhite\"></a>`get_nonwhite(s, offset=0)`: Scan the string `s` for characters not in `string.whitespace`\n  starting at `offset` (default `0`).\n  Return `(match,new_offset)`.\n- <a name=\"get_other_chars\"></a>`get_other_chars(s, offset=0, stopchars=None)`: Scan the string `s` for characters not in `stopchars` starting\n  at `offset` (default `0`).\n  Return `(match,new_offset)`.\n- <a name=\"get_prefix_n\"></a>`get_prefix_n(s, prefix, n=None, *, offset=0)`: Strip a leading `prefix` and numeric value `n` from the string `s`\n  starting at `offset` (default `0`).\n  Return the matched prefix, the numeric value and the new offset.\n  Returns `(None,None,offset)` on no match.\n\n  Parameters:\n  * `s`: the string to parse\n  * `prefix`: the prefix string which must appear at `offset`\n    or an object with a `match(str,offset)` method\n    such as an `re.Pattern` regexp instance\n  * `n`: optional integer value;\n    if omitted any value will be accepted, otherwise the numeric\n    part must match `n`\n\n  If `prefix` is a `str`, the \"matched prefix\" return value is `prefix`.\n  Otherwise the \"matched prefix\" return value is the result of\n  the `prefix.match(s,offset)` call. The result must also support\n  a `.end()` method returning the offset in `s` beyond the match,\n  used to locate the following numeric portion.\n\n  Examples:\n\n     >>> import re\n     >>> get_prefix_n('s03e01--', 's')\n     ('s', 3, 3)\n     >>> get_prefix_n('s03e01--', 's', 3)\n     ('s', 3, 3)\n     >>> get_prefix_n('s03e01--', 's', 4)\n     (None, None, 0)\n     >>> get_prefix_n('s03e01--', re.compile('[es]',re.I))\n     (<re.Match object; span=(0, 1), match='s'>, 3, 3)\n     >>> get_prefix_n('s03e01--', re.compile('[es]',re.I), offset=3)\n     (<re.Match object; span=(3, 4), match='e'>, 1, 6)\n- <a name=\"get_qstr\"></a>`get_qstr(s, offset=0, q='\"', environ=None, default=None, env_specials=None)`: Get quoted text with slosh escapes and optional environment substitution.\n\n  Parameters:\n  * `s`: the string containg the quoted text.\n  * `offset`: the starting point, default `0`.\n  * `q`: the quote character, default `'\"'`. If `q` is `None`,\n    do not expect the string to be delimited by quote marks.\n  * `environ`: if not `None`, also parse and expand `$`*envvar* references.\n  * `default`: passed to `get_envvar`\n- <a name=\"get_qstr_or_identifier\"></a>`get_qstr_or_identifier(s, offset)`: Parse a double quoted string or an identifier.\n- <a name=\"get_sloshed_text\"></a>`get_sloshed_text(s, delim, offset=0, slosh='\\\\', mapper=<function slosh_mapper at 0x1028d9ee0>, specials=None)`: Collect slosh escaped text from the string `s` from position\n  `offset` (default `0`) and return the decoded unicode string and\n  the offset of the completed parse.\n\n  Parameters:\n  * `delim`: end of string delimiter, such as a single or double quote.\n  * `offset`: starting offset within `s`, default `0`.\n  * `slosh`: escape character, default a slosh ('\\').\n  * `mapper`: a mapping function which accepts a single character\n    and returns a replacement string or `None`; this is used the\n    replace things such as '\\t' or '\\n'. The default is the\n    `slosh_mapper` function, whose default mapping is `SLOSH_CHARMAP`.\n  * `specials`: a mapping of other special character sequences and parse\n    functions for gathering them up. When one of the special\n    character sequences is found in the string, the parse\n    function is called to parse at that point.\n    The parse functions accept\n    `s` and the offset of the special character. They return\n    the decoded string and the offset past the parse.\n\n  The escape character `slosh` introduces an encoding of some\n  replacement text whose value depends on the following character.\n  If the following character is:\n  * the escape character `slosh`, insert the escape character.\n  * the string delimiter `delim`, insert the delimiter.\n  * the character 'x', insert the character with code from the following\n    2 hexadecimal digits.\n  * the character 'u', insert the character with code from the following\n    4 hexadecimal digits.\n  * the character 'U', insert the character with code from the following\n    8 hexadecimal digits.\n  * a character from the keys of `mapper`\n- <a name=\"get_suffix_part\"></a>`get_suffix_part(s, *, keywords=('part',), numeral_map=None)`: Strip a trailing \"part N\" suffix from the string `s`.\n  Return the matched suffix and the number part number.\n  Retrn `(None,None)` on no match.\n\n  Parameters:\n  * `s`: the string\n  * `keywords`: an iterable of `str` to match, or a single `str`;\n    default `'part'`\n  * `numeral_map`: an optional mapping of numeral names to numeric values;\n    default `NUMERAL_NAMES['en']`, the English numerals\n\n  Exanmple:\n\n      >>> get_suffix_part('s09e10 - A New World: Part One')\n      (': Part One', 1)\n- <a name=\"get_tokens\"></a>`get_tokens(s, offset, getters)`: Parse the string `s` from position `offset` using the supplied\n  tokeniser functions `getters`.\n  Return the list of tokens matched and the final offset.\n\n  Parameters:\n  * `s`: the string to parse.\n  * `offset`: the starting position for the parse.\n  * `getters`: an iterable of tokeniser specifications.\n\n  Each tokeniser specification `getter` is either:\n  * a callable expecting `(s,offset)` and returning `(token,new_offset)`\n  * a literal string, to be matched exactly\n  * a `tuple` or `list` with values `(func,args,kwargs)`;\n    call `func(s,offset,*args,**kwargs)`\n  * an object with a `.match` method such as a regex;\n    call `getter.match(s,offset)` and return a match object with\n    a `.end()` method returning the offset of the end of the match\n- <a name=\"get_uc_identifier\"></a>`get_uc_identifier(s, offset=0, number='0123456789', extras='_')`: Scan the string `s` for an identifier as for `get_identifier`,\n  but require the letters to be uppercase.\n- <a name=\"get_white\"></a>`get_white(s, offset=0)`: Scan the string `s` for characters in `string.whitespace`\n  starting at `offset` (default `0`).\n  Return `(match,new_offset)`.\n- <a name=\"has_format_attributes\"></a>`has_format_attributes(*da, **dkw)`: Class decorator to walk this class for direct methods\n  marked as for use in format strings\n  and to include them in `cls.format_attributes()`.\n\n  Methods are normally marked with the `@format_attribute` decorator.\n\n  If `inherit` is true the base format attributes will be\n  obtained from other classes:\n  * `inherit` is `True`: use `cls.__mro__`\n  * `inherit` is a class: use that class\n  * otherwise assume `inherit` is an iterable of classes\n  For each class `otherclass`, update the initial attribute\n  mapping from `otherclass.get_format_attributes()`.\n- <a name=\"hexify\"></a>`hexify(bs)`: A flavour of `binascii.hexlify` returning a `str`.\n- <a name=\"htmlify\"></a>`htmlify(s, nbsp=False)`: Convert a string for safe transcription in HTML.\n\n  Parameters:\n  * `s`: the string\n  * `nbsp`: replaces spaces with `\"&nbsp;\"` to prevent word folding,\n    default `False`.\n- <a name=\"htmlquote\"></a>`htmlquote(s)`: Quote a string for use in HTML.\n- <a name=\"Identifier\"></a>`class Identifier(CoreTokens)`: A dotted identifier.\n\n*`Identifier.parse(text: str, offset: int = 0, *, skip=False) -> Tuple[str, cs.lex.CoreTokens, int]`*:\nParse a dotted identifier from `test`.\n- <a name=\"indent\"></a>`indent(paragraph, line_indent='  ')`: Return the `paragraph` indented by `line_indent` (default `\"  \"`).\n- <a name=\"is_dotted_identifier\"></a>`is_dotted_identifier(s, offset=0, **kw)`: Test if the string `s` is an identifier from position `offset` onward.\n- <a name=\"is_identifier\"></a>`is_identifier(s, offset=0, **kw)`: Test if the string `s` is an identifier\n  from position `offset` (default `0`) onward.\n- <a name=\"is_uc_identifier\"></a>`is_uc_identifier(s, offset=0, **kw)`: Test if the string `s` is an uppercase identifier\n  from position `offset` (default `0`) onward.\n- <a name=\"isUC_\"></a>`isUC_(s)`: Check that a string matches the regular expression `^[A-Z][A-Z_0-9]*$`.\n- <a name=\"jsquote\"></a>`jsquote(s)`: Quote a string for use in JavaScript.\n- <a name=\"lc_\"></a>`lc_(value)`: Return `value.lower()`\n  with `'-'` translated into `'_'` and `' '` translated into `'-'`.\n\n  I use this to construct lowercase filenames containing a\n  readable transcription of a title string.\n\n  See also `titleify_lc()`, an imperfect reversal of this.\n- <a name=\"match_tokens\"></a>`match_tokens(s, offset, getters)`: Wrapper for `get_tokens` which catches `ValueError` exceptions\n  and returns `(None,offset)`.\n- <a name=\"NumericValue\"></a>`class NumericValue(_LiteralValue)`: An `int` or `float` literal.\n\n*`NumericValue.parse(text: str, offset: int = 0, *, skip=False) -> 'NumericValue'`*:\nParse a Python style `int` or `float`.\n- <a name=\"parseUC_sAttr\"></a>`parseUC_sAttr(attr)`: Take an attribute name `attr` and return `(key,is_plural)`.\n\n  Examples:\n  * `'FOO'` returns `('FOO',False)`.\n  * `'FOOs'` or `'FOOes'` returns `('FOO',True)`.\n  Otherwise return `(None,False)`.\n- <a name=\"phpquote\"></a>`phpquote(s)`: Quote a string for use in PHP code.\n- <a name=\"printt\"></a>`printt(*table, file=None, flush=False, indent='', print_func=None, **tabulate_kw)`: A wrapper for `tabulate()` to print the results.\n  Each positional argument is a table row.\n\n  Parameters:\n  * `file`: optional output file, passed to `print_func`\n  * `flush`: optional flush flag, passed to `print_func`\n  * `indent`: optional leading indent for the output lines\n  * `print_func`: optional `print()` function, default `builtins.print`\n  Other keyword arguments are passed to `tabulate()`.\n- <a name=\"QuotedString\"></a>`class QuotedString(_LiteralValue)`: A double quoted string.\n\n*`QuotedString.parse(text: str, offset: int = 0, *, skip=False) -> 'QuotedString'`*:\nParse a double quoted string from `text`.\n- <a name=\"r\"></a>`r(o, max_length=None, *, use_cls=False)`: Like `typed_str` but using `repr` instead of `str`.\n  This is available as both `typed_repr` and `r`.\n- <a name=\"s\"></a>`s(o, use_cls=False, use_repr=False, max_length=32)`: Return \"type(o).__name__:str(o)\" for some object `o`.\n  This is available as both `typed_str` and `s`.\n\n  Parameters:\n  * `use_cls`: default `False`;\n    if true, use `str(type(o))` instead of `type(o).__name__`\n  * `use_repr`: default `False`;\n    if true, use `repr(o)` instead of `str(o)`\n\n  I use this a lot when debugging. Example:\n\n      from cs.lex import typed_str as s\n      ......\n      X(\"foo = %s\", s(foo))\n- <a name=\"skipwhite\"></a>`skipwhite(s, offset=0)`: Convenience routine for skipping past whitespace;\n  returns the offset of the next nonwhitespace character.\n- <a name=\"slosh_mapper\"></a>`slosh_mapper(c, charmap=None)`: Return a string to replace backslash-`c`, or `None`.\n- <a name=\"slosh_quote\"></a>`slosh_quote(raw_s: str, q: str)`: Quote a string `raw_s` with quote character `q`.\n- <a name=\"snakecase\"></a>`snakecase(camelcased)`: Convert a camel cased string `camelcased` into snake case.\n\n  Parameters:\n  * `cameelcased`: the cameel case string to convert\n  * `first_letter_only`: optional flag (default `False`);\n    if true then just ensure that the first character of a word\n    is uppercased, otherwise use `str.title`\n\n  Example:\n\n      >>> snakecase('abcDef')\n      'abc_def'\n      >>> snakecase('abcDEf')\n      'abc_def'\n      >>> snakecase('AbcDef')\n      'abc_def'\n- <a name=\"split_remote_path\"></a>`split_remote_path(remotepath: str) -> Tuple[Optional[str], str]`: OBSOLETE version of split_remote_path, suggestion: cs.fs.RemotePath.from_str\n\n  Split a path with an optional leading `[user@]rhost:` prefix\n  into the prefix and the remaining path.\n  `None` is returned for the prefix is there is none.\n  This is useful for things like `rsync` targets etc.\n\n  OBSOLETE, use `cs.fs.RemotePath.from_str` instead.\n- <a name=\"stripped_dedent\"></a>`stripped_dedent(s, post_indent='', sub_indent='')`: Slightly smarter dedent which ignores a string's opening indent.\n\n  Algorithm:\n  strip the supplied string `s`, pull off the leading line,\n  dedent the rest, put back the leading line.\n\n  This is a lot like the `inspect.cleandoc()` function.\n\n  This supports my preferred docstring layout, where the opening\n  line of text is on the same line as the opening quote.\n\n  The optional `post_indent` parameter may be used to indent\n  the dedented text before return.\n\n  The optional `sub_indent` parameter may be used to indent\n  the second and following lines if the dedented text before return.\n\n  Examples:\n\n      >>> def func(s):\n      ...   \"\"\" Slightly smarter dedent which ignores a string's opening indent.\n      ...       Strip the supplied string `s`. Pull off the leading line.\n      ...       Dedent the rest. Put back the leading line.\n      ...   \"\"\"\n      ...   pass\n      ...\n      >>> from cs.lex import stripped_dedent\n      >>> print(stripped_dedent(func.__doc__))\n      Slightly smarter dedent which ignores a string's opening indent.\n      Strip the supplied string `s`. Pull off the leading line.\n      Dedent the rest. Put back the leading line.\n      >>> print(stripped_dedent(func.__doc__, sub_indent='  '))\n      Slightly smarter dedent which ignores a string's opening indent.\n        Strip the supplied string `s`. Pull off the leading line.\n        Dedent the rest. Put back the leading line.\n      >>> print(stripped_dedent(func.__doc__, post_indent='  '))\n        Slightly smarter dedent which ignores a string's opening indent.\n        Strip the supplied string `s`. Pull off the leading line.\n        Dedent the rest. Put back the leading line.\n      >>> print(stripped_dedent(func.__doc__, post_indent='  ', sub_indent='| '))\n        Slightly smarter dedent which ignores a string's opening indent.\n        | Strip the supplied string `s`. Pull off the leading line.\n        | Dedent the rest. Put back the leading line.\n- <a name=\"strlist\"></a>`strlist(ary, sep=', ')`: Convert an iterable to strings and join with `sep` (default `', '`).\n- <a name=\"tabpadding\"></a>`tabpadding(padlen, tabsize=8, offset=0)`: Compute some spaces to use a tab padding at an offfset.\n- <a name=\"tabulate\"></a>`tabulate(*rows, sep='  ', ppcls=None)`: A generator yielding lines of values from `rows` aligned in columns.\n\n  Each row in rows is a list of strings. Non-`str` objects are\n  promoted to `str` via `pprint.pformat`. If the strings contain\n  newlines they will be split into subrows.\n\n  Example:\n\n      >>> for row in tabulate(\n      ...     ['one col'],\n      ...     ['three', 'column', 'row'],\n      ...     ['row3', 'multi\\nline\\ntext', 'goes\\nhere', 'and\\nhere'],\n      ...     ['two', 'cols'],\n      ... ):\n      ...     print(row)\n      ...\n      one col\n      three    column  row\n      row3     multi   goes  and\n               line    here  here\n               text\n      two      cols\n      >>>\n- <a name=\"texthexify\"></a>`texthexify(bs, shiftin='[', shiftout=']', whitelist=None)`: Transcribe the bytes `bs` to text using compact text runs for\n  some common text values.\n\n  This can be reversed with the `untexthexify` function.\n\n  This is an ad doc format devised to be compact but also to\n  expose \"text\" embedded within to the eye. The original use\n  case was transcribing a binary directory entry format, where\n  the filename parts would be somewhat visible in the transcription.\n\n  The output is a string of hexadecimal digits for the encoded\n  bytes except for runs of values from the whitelist, which are\n  enclosed in the shiftin and shiftout markers and transcribed\n  as is. The default whitelist is values of the ASCII letters,\n  the decimal digits and the punctuation characters '_-+.,'.\n  The default shiftin and shiftout markers are '[' and ']'.\n\n  String objects converted with either `hexify` and `texthexify`\n  output strings may be freely concatenated and decoded with\n  `untexthexify`.\n\n  Example:\n\n      >>> texthexify(b'&^%&^%abcdefghi)(*)(*')\n      '265e25265e25[abcdefghi]29282a29282a'\n\n  Parameters:\n  * `bs`: the bytes to transcribe\n  * `shiftin`: Optional. The marker string used to indicate a shift to\n    direct textual transcription of the bytes, default: `'['`.\n  * `shiftout`: Optional. The marker string used to indicate a\n    shift from text mode back into hexadecimal transcription,\n    default `']'`.\n  * `whitelist`: an optional bytes or string object indicating byte\n    values which may be represented directly in text;\n    the default value is the ASCII letters, the decimal digits\n    and the punctuation characters `'_-+.,'`.\n- <a name=\"titleify_lc\"></a>`titleify_lc(value_lc)`: Translate `'-'` into `' '` and `'_'` translated into `'-'`,\n  then titlecased.\n\n  See also `lc_()`, which this reverses imperfectly.\n- <a name=\"typed_repr\"></a>`typed_repr(o, max_length=None, *, use_cls=False)`: Like `typed_str` but using `repr` instead of `str`.\n  This is available as both `typed_repr` and `r`.\n- <a name=\"typed_str\"></a>`typed_str(o, use_cls=False, use_repr=False, max_length=32)`: Return \"type(o).__name__:str(o)\" for some object `o`.\n  This is available as both `typed_str` and `s`.\n\n  Parameters:\n  * `use_cls`: default `False`;\n    if true, use `str(type(o))` instead of `type(o).__name__`\n  * `use_repr`: default `False`;\n    if true, use `repr(o)` instead of `str(o)`\n\n  I use this a lot when debugging. Example:\n\n      from cs.lex import typed_str as s\n      ......\n      X(\"foo = %s\", s(foo))\n- <a name=\"unctrl\"></a>`unctrl(s, tabsize=8)`: Return the string `s` with `TAB`s expanded and control characters\n  replaced with printable representations.\n- <a name=\"untexthexify\"></a>`untexthexify(s, shiftin='[', shiftout=']')`: Decode a textual representation of binary data into binary data.\n\n  This is the reverse of the `texthexify` function.\n\n  Outside of the `shiftin`/`shiftout` markers the binary data\n  are represented as hexadecimal. Within the markers the bytes\n  have the values of the ordinals of the characters.\n\n  Example:\n\n      >>> untexthexify('265e25265e25[abcdefghi]29282a29282a')\n      b'&^%&^%abcdefghi)(*)(*'\n\n  Parameters:\n  * `s`: the string containing the text representation.\n  * `shiftin`: Optional. The marker string commencing a sequence\n    of direct text transcription, default `'['`.\n  * `shiftout`: Optional. The marker string ending a sequence\n    of direct text transcription, default `']'`.\n\n# Release Log\n\n\n\n*Release 20250724*:\ntabulate: new optional ppcls argument to supply a custom PrettyPrinter class for formatting, use a date and datetime aware class by default.\n\n*Release 20250428*:\n* cutprefix,cutsuffix: also accept a tuple of str like str.startswith and str.endswith.\n* typed_str: use cropped_repr() instead of repr().\n\n*Release 20250414*:\nObsolete split_remote_path(), supplanted by cs.fs.RemotePath.from_str().\n\n*Release 20250323*:\n* tabulate: format nonstr using pformat.\n* New printt() wrapper for tabulate() which prints the table.\n\n*Release 20250103*:\n* Move Identifier, NumericValue, QuotedString in from cs.app.tagger.rules.\n* BaseToken: expose the parsing subclass selection as a `.token_classes class method.\n\n*Release 20241207*:\ntabulate: split cells containing newlines over multiple output rows.\n\n*Release 20241122*:\ntabulate: make the default separator two spaces instead of one, immediate return if no rows (avoids max() of empty list).\n\n*Release 20241119.1*:\nAdd PyPI classifier, in part to test an updated release script.\n\n*Release 20241119*:\nstripped_dedent: new optional sub_indent parameter for indenting the second and following lines, handy for usage messages.\n\n*Release 20241109*:\n* stripped_dedent: new optional post_indent parameter to indent the dedented text.\n* New tabulate(*rows) generator function yielding lines of padded columns.\n\n*Release 20240630*:\nNew indent(paragraph,line_indent=\"  \") function.\n\n*Release 20240519*:\nNew get_suffix_part() to extract things line \": Part One\" from something such as a TV episode name.\n\n*Release 20240316*:\nFixed release upload artifacts.\n\n*Release 20240211*:\nNew split_remote_path() function to recognise [[user@]host]:path.\n\n*Release 20231018*:\nNew is_uc_identifier function.\n\n*Release 20230401*:\nImport update.\n\n*Release 20230217.1*:\nFix package requirements.\n\n*Release 20230217*:\n* New get_prefix_n function to parse a numeric value preceeded by a prefix.\n* Drop strip_prefix_n, get_prefix_n is more general and I had not got around to using strip_prefix_n yet - when I did, I ended up writing get_prefix_n.\n\n*Release 20230210*:\n* @has_format_attributes: new optional inherit parameter to inherit superclass (or other) format attributes, default False.\n* New FNumericMixin, FFloat, FInt FormatableMixin subclasses like FStr - they add .localtime and .utctime formattable attributes.\n\n*Release 20220918*:\ntyped_str(): crop the value part, default max_length=32, bugfix message cropping.\n\n*Release 20220626*:\n* Remove dependency on cs.py3, we've been Python 2 incompatible for a while.\n* FormatableFormatter.format_field: promote None to FStr(None).\n\n*Release 20220227*:\n* typed_str,typed_repr: make max_length the first optional positional parameter, make other parameters keyword only.\n* New camelcase() and snakecase() functions.\n\n*Release 20211208*:\nDocstring updates.\n\n*Release 20210913*:\n* FormatableFormatter.FORMAT_RE_ARG_NAME_s: strings commencing with digits now match \\d+(\\.\\d+)[a-z]+, eg \"02d\".\n* Alias typed_str as s and typed_repr as r.\n* FormatableFormatter: new .format_mode thread local state object initially with strict=False, used to control whether unknown fields leave a placeholder or raise KeyError.\n* FormatableFormatter.format_field: assorted fixes.\n\n*Release 20210906*:\nNew strip_prefix_n() function to strip a leading `prefix` and numeric value `n` from the start of a string.\n\n*Release 20210717*:\n* Many many changes to FormatableMixin, FormatableFormatter and friends around supporting {foo|conv1|con2|...} instead of {foo!conv}. Still in flux.\n* New typed_repr like typed_str but using repr.\n\n*Release 20210306*:\n* New cropped() function to crop strings.\n* Rework cropped_repr() to do the repr() itself, and to crop the interiors of tuples and lists.\n* cropped_repr: new inner_max_length for cropping the members of collections.\n* cropped_repr: special case for length=1 tuples.\n* New typed_str(o) object returning type(o).__name__:str(o) in the default case, useful for debugging.\n\n*Release 20201228*:\nMinor doc updates.\n\n*Release 20200914*:\n* Hide terribly special purpose lastlinelen() in cs.hier under a private name.\n* New common_prefix and common_suffix function to compare strings.\n\n*Release 20200718*:\nget_chars: accept a callable for gochars, indicating a per character test function.\n\n*Release 20200613*:\ncropped_repr: replace hardwired 29 with computed length\n\n*Release 20200517*:\n* New get_ini_clausename to parse \"[clausename]\".\n* New get_ini_clause_entryname parsing \"[clausename]entryname\".\n* New cropped_repr for returning a shortened repr()+\"...\" if the length exceeds a threshold.\n* New format_escape function to double {} characters to survive str.format.\n\n*Release 20200318*:\n* New lc_() function to lowercase and dash a string, new titleify_lc() to mostly reverse lc_().\n* New format_as function, FormatableMixin and related FormatAsError.\n\n*Release 20200229*:\nNew cutprefix and cutsuffix functions.\n\n*Release 20190812*:\nFix bad slosh escapes in strings.\n\n*Release 20190220*:\nNew function get_qstr_or_identifier.\n\n*Release 20181108*:\nnew function get_decimal_or_float_value to read a decimal or basic float\n\n*Release 20180815*:\nNo semantic changes; update some docstrings and clean some lint, fix a unit test.\n\n*Release 20180810*:\n* New get_decimal_value and get_hexadecimal_value functions.\n* New stripped_dedent function, a slightly smarter textwrap.dedent.\n\n*Release 20171231*:\nNew function get_decimal. Drop unused function dict2js.\n\n*Release 20170904*:\nPython 2/3 ports, move rfc2047 into new cs.rfc2047 module.\n\n*Release 20160828*:\n* Use \"install_requires\" instead of \"requires\" in DISTINFO.\n* Discard str1(), pointless optimisation.\n* unrfc2047: map _ to SPACE, improve exception handling.\n* Add phpquote: quote a string for use in PHP code; add docstring to jsquote.\n* Add is_identifier test.\n* Add get_dotted_identifier.\n* Add is_dotted_identifier.\n* Add get_hexadecimal.\n* Add skipwhite, convenince wrapper for get_white returning just the next offset.\n* Assorted bugfixes and improvements.\n\n*Release 20150120*:\ncs.lex: texthexify: backport to python 2 using cs.py3 bytes type\n\n*Release 20150118*:\nmetadata updates\n\n*Release 20150116*:\nPyPI metadata and slight code cleanup.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Lexical analysis functions, tokenisers, transcribers: an arbitrary assortment of lexical and tokenisation functions useful for writing recursive descent parsers, of which I have several. There are also some transcription functions for producing text from various objects, such as `hexify` and `unctrl`.",
    "version": "20250724",
    "project_urls": {
        "MonoRepo Commits": "https://bitbucket.org/cameron_simpson/css/commits/branch/main",
        "Monorepo Git Mirror": "https://github.com/cameron-simpson/css",
        "Monorepo Hg/Mercurial Mirror": "https://hg.sr.ht/~cameron-simpson/css",
        "Source": "https://github.com/cameron-simpson/css/blob/main/lib/python/cs/lex.py"
    },
    "split_keywords": [
        "python2",
        " python3"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cfbc758f07bbee6046fdb4f82126cdcff4629ac4dcbf99a54b24e1d3a632888b",
                "md5": "93d9f5162d2dc2e4becdf3608453b9e7",
                "sha256": "ac2078a75aa61c60f42a93477f8a55d3ed578d0c416a97b6afbcb426c7082811"
            },
            "downloads": -1,
            "filename": "cs_lex-20250724-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "93d9f5162d2dc2e4becdf3608453b9e7",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 36453,
            "upload_time": "2025-07-24T02:22:44",
            "upload_time_iso_8601": "2025-07-24T02:22:44.710231Z",
            "url": "https://files.pythonhosted.org/packages/cf/bc/758f07bbee6046fdb4f82126cdcff4629ac4dcbf99a54b24e1d3a632888b/cs_lex-20250724-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7fc66bc66ddb17f870b62b94f506eaabfbc1543b3e6c99bb93c701bc6b078a93",
                "md5": "248598eec4eebdef42f9349d31dd148b",
                "sha256": "ed82e5292c8aff8a857f8372664eaabaa37f2b1f2691963e01ee98debdcb9924"
            },
            "downloads": -1,
            "filename": "cs_lex-20250724.tar.gz",
            "has_sig": false,
            "md5_digest": "248598eec4eebdef42f9349d31dd148b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 47760,
            "upload_time": "2025-07-24T02:22:46",
            "upload_time_iso_8601": "2025-07-24T02:22:46.480580Z",
            "url": "https://files.pythonhosted.org/packages/7f/c6/6bc66ddb17f870b62b94f506eaabfbc1543b3e6c99bb93c701bc6b078a93/cs_lex-20250724.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-24 02:22:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "cameron-simpson",
    "github_project": "css",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "cs-lex"
}

None