blocksets


Nameblocksets JSON
Version 0.0.20 PyPI version JSON
download
home_pageNone
SummaryA Python package for efficiently computing blocks of discrete space in any dimension.
upload_time2024-07-05 12:03:51
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords intervals lattice sets pixels points dimension ranges orthogonal space grid
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Project Description

![PyPI - Version](https://img.shields.io/pypi/v/blocksets)
![Codecov](https://img.shields.io/codecov/c/github/daveisagit/blocksets)

Python library for efficiently modelling blocks of discrete space.

Discrete space being made up of integer amounts of unit space i.e. unit segment
in 1D or a pixel in 2D (or a _voxel_ in 3D - new term for me)

Largely inspired by advent of code puzzles that involve solving problems on
integer grids in 2 or 3 dimensions and where modelling the data as sets of
tuples is not practical because of the volume (i.e. lots of points over a large
amount of space).

## For Example

Say we would like to model a set of pixels in a 2D grid as a union of several rectangles.

<img
src="https://raw.githubusercontent.com/daveisagit/blocksets/main/example_2d.png"
width="300" height="250" alt="2D example">

Sure, at this scale we can simply model it as a set of point tuples to represent
each pixel, but as the resolution increases with respect to the rectangle sizes
we start to encounter computational limits on memory and search times.

The aim of **blocksets** is to group members (i.e the discrete space) into lines
/ rectangles / cuboids etc. to reduce memory & search times whilst still
allowing the usual set operations to be performed.

We define a _block_ to be a discrete space that can be defined by opposite
ends/corners such as a line segment / rectangle / cuboid etc. and we are aim to
model any layout as a set of _blocks_ instead of _tuples_.

## Notions, Goals, Ideals & Constraints

- Aim for a multidimensional solution from the start, rather than building
  separate solutions/models.
- Allow for the expression of _Open_ intervals (to infinity).
- Mirror methods (and operators) of built-in Python **sets**.
- Set members are always blocks of the same dimension and member blocks are
  always disjoint from each other.
- Aim to be able to compare 2 sets. This maybe too hard if we don't have a
  consistent way to divide a subset up, i.e. there a multiple ways to partition
  the same 2D space into rectangles.
- Iteration over a block set yields blocks.
- Iteration over a finite block yields tuples.

_Let the exploration begin ..._

## Installation

`pip install blocksets`

There are no dependent packages

## Example Use

It's worth reviewing and running the `example_use.py` module via
`python -m blocksets.example_use`
here's a taste snippet for reading purposes

```python
from blocksets import Block, BlockSet

big_rubik = Block((0, 0, 0), (99999, 99999, 99999))
assert big_rubik.measure == 999970000299999
centre_cube = Block((49999, 49999, 49999))
assert centre_cube.measure == 1

# Creates a large 3 dimensional cube with the centre missing
bs = BlockSet(3)  
bs.add(big_rubik)
bs.remove(centre_cube)

assert bs.point_count == 999970000299998
assert bs.block_count == 6

sorted_blocks = sorted(bs.blocks(), key=lambda x: x.norm)

for blk in sorted_blocks:
    print(f"{blk:50} {blk.measure}")
```

## Classes

To begin with I have decided to model the block and the set of blocks as classes
**Block** and **BlockSet**.

Because of

- the multi-dimensional ambition
- multiple ways to represent a block
- option of _Open_ limits

it seems sensible to keep these concerns separate from the main task in hand and
allow us to make valid assumptions about how the blocks are expressed.

### Block

A **Block** is an orthogonal clump (_a line segment, rectangle, cuboid, hyper...
you get the idea_) of discrete space defined by opposite end/corner points $A,B$.

- Decimal precision can always be achieved using some desired scaling, so
  coordinates of the space are always specified as `int` _(unless specifying an
  open interval, see below)_
- The only exception being that the `float("inf")` value can be used to
  represent no limit (i.e. an open interval to infinity be it +/- in any of the
  component dimensions)
- Internally the block will always be **normalised** such that $A \lt B$ in all
  component dimensions.

#### Normalised Form

Opposite corners will always be normalised internally so that ordinates $a_i \lt
b_i$ i.e. the vector $\vec{AB}$ is always positive in every component dimension.

The **Block** constructor handles various argument formats and resolves them to
this normalised representation after passing the following validations:

- Arguments must be either `int`, `+/- float("inf")` or a tuple of these.
- The Dimension of the 2 end/corner points must match.
- Ordinates in corresponding dimensions can not be the same value _(as that
  would imply a block of zero space)_.

#### Single Argument to the Constructor

If the second argument (for the opposite end/corner) is not supplied then it is
defaulted as follows for each ordinate in turn:

- For finite values we add +1 thus modelling a unit value
- For infinite values we assume the opposite end (i.e. the multiplicative inverse)

_So for example if all values are finite then it is a single point (or unit
block)._

Expressing them in a normalised form like this means we can easily _hash_ and
compare for equality using just the pair of coordinate tuples.

#### Block Operations

Find the overlapping intersection of 2 blocks `c = a & b` seems reasonable.
Union & Minus make no sense as there is no guarantee you end up with a single
block as a result.

We can easily support subset and superset operations too

- `a <= b`
- `a => b`
- `in`

Generally, we will assume any arguments being supplied to a **BlockSet** method
are attempting to express a **Block** (casting inline for ease of use).

For the `in` operator however, we will make an exception and assume any `tuple`
arguments are expressing a _point_ as opposed to a dereferenced list of
arguments for the Block() constructor.

> This exception may just be a matter of personal taste, but is in keeping with
> the strict notion of membership (∈) as different to subset (⊆).

### BlockSet

A **BlockSet** is an attempt to mirror python sets where members are strictly
**Block**s.

However, unlike a python set we will constrain and validate on the type of member
in that we only want **Block**s as members and they must be consistent with each
other in dimension.

In order to write nice code using methods of a **BlockSet** we will convert
non-**Block** arguments into **Block** types inline via a parsing method in the
**Block** class.

The **BlockSet** class itself offers methods and behaviour similar to that of a set.

However, the actual construction of the set using **Block**s happens via an
operation stack which gets resolved during a _normalisation_ process in which
the stacked operations `add` `remove` `toggle` are resolved to purely add
operations of disjoint spaces.

The normalisation process resolves any overlapping and redundancy such that any 2
sets of equal content (i.e. the same set of points) will have the same
representation in terms of the **Block**s used to represent the space.

Methods and operators mirror those of the native set class

- Content modifiers (add, remove, toggle)
- Set comparisons (equality, subset, superset, disjoint)
- Set operations (intersection, union, difference, symmetric_difference)
- In place set operations (intersection_update, update, difference_update, symmetric_difference_update)
- Use of the `in` operator will apply to a single **Block** object

#### Normalisation

This is main concern of the class, taking the operation stack and resolving it
to a resulting set of disjoint blocks in a consistent manner that also removes
redundancy (this being where 2 adjacent blocks could have been expressed as 1 in
the same consistent manner).

Normalisation is required and important (for accurate comparisons) but also
costly. We only want to perform it when its absolutely necessary and so clients
are advised to group together modification calls as much as possible in order to
minimise the amount of normalising required and especially so if performance is
of a significant concern.

## Design Approach

Since _Normalisation_ is more optimal when batched it makes sense to stack the
blocks being added or removed as layers so we only normalise the set when required.

This also gives us options for potentially giving blocks a value and performing
some other arithmetic (other than boolean) with them.

The main idea is to create a grid specific to the set based on only the relevant
ordinates and then recursively (by dimension) look for changes in cross-sections
at those relevant points.

Each recursive call returns a normalised set of blocks for the lower cross-section
dimension. This allows for the detection of changes and removal of redundant blocks
in a consistent fashion.

### Local Grid System

The universe is made up of galaxies and changes within themselves do not affect
other galaxies.

If we consider connected Blocks as a local system then we can optimise the
computation when Blocks are added and removed, in that we only need to consider
refreshing the representation of the local system.

> This concept may introduce an unnecessary overhead when identifying the local
systems and so we will not consider this for v1. We may look at again if there
are meaningful use cases for it.

### Optimisation

Clearly this approach is only useful if there is a significant saving on
modelling the granular space. For example if all the normalised disjoint blocks
are unit blocks then we would be better off using a set of tuples regardless.

#### Complexity

N = Number of block operation layers to normalise

- Establish the grid markers in each dimension and map their ordinates to
  markers using a binary search. NLog(N)
- Create a cross section for each marker in each dimension. (2N^2)^d at worse

#### Memory

Storing block data as a Block object has an overhead of 472 bytes per block
object (for garbage collection). Although it is unlikely to impact we may be
inclined to prefer a simple tuple to a block when it comes to modelling any
transient structures.

## Testing Strategy

- Have a small number of Block fixtures covering the various possibilities
- 3 block operations seems like a good balance (between cognition and coverage)
- Apply all possible combinations of order and operation for a group of 3 using
  some base set
- Duplicate the test cases using sets of tuple points. Since we are confident
  they produce the correct result we can compare them to the results obtained by
  the BlockSet methods.
- Testing cases with _Open_ intervals will require manually crafted tests as we
  won't be able to generate an equivalent set of tuple points

### Running Tests

Install `pytest` if not already

For a coverage report

`pytest --cov=blocksets tests/ --cov-report term-missing`

## Ideas

Currently, we are effectively modelling boolean values at specific points. We
could extend the notion of a Block Operation (i.e. layer) by say adding _value_
which may introduce more arithmetic functions like say `.sub()`  (and
`.remove()` meaning zeroise) etc.

For suggesting further ideas please create a github issue as per the
contribution guidelines

## Contribution

- At the moment it is early days so whilst the foundations are forming I am only
  inviting comments which can be given via [github issues]([https://](https://github.com/daveisagit/blocksets/issues))

## TODO - Reminders

- [ ] Test cases for normalisation using _Open_ intervals
- [ ] Test cases for set operations using _Open_ intervals
- [ ] Test cases for set operations in 3D
- [ ] More systematic test case coverage for set operations
- [ ] Some performance tests on large operation stacks
- [ ] Maybe a
  [Manim]([https://](https://docs.manim.community/en/stable/index.html)) style
  video explainer?

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "blocksets",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "intervals, lattice, sets, pixels, points, dimension, ranges, orthogonal, space, grid",
    "author": null,
    "author_email": "Dave Budd <maildavebudd@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/29/c4/2fbc16c72813a726078c4823c13c07b40aa4269bfbaae004c1be3b1e9b38/blocksets-0.0.20.tar.gz",
    "platform": null,
    "description": "# Project Description\n\n![PyPI - Version](https://img.shields.io/pypi/v/blocksets)\n![Codecov](https://img.shields.io/codecov/c/github/daveisagit/blocksets)\n\nPython library for efficiently modelling blocks of discrete space.\n\nDiscrete space being made up of integer amounts of unit space i.e. unit segment\nin 1D or a pixel in 2D (or a _voxel_ in 3D - new term for me)\n\nLargely inspired by advent of code puzzles that involve solving problems on\ninteger grids in 2 or 3 dimensions and where modelling the data as sets of\ntuples is not practical because of the volume (i.e. lots of points over a large\namount of space).\n\n## For Example\n\nSay we would like to model a set of pixels in a 2D grid as a union of several rectangles.\n\n<img\nsrc=\"https://raw.githubusercontent.com/daveisagit/blocksets/main/example_2d.png\"\nwidth=\"300\" height=\"250\" alt=\"2D example\">\n\nSure, at this scale we can simply model it as a set of point tuples to represent\neach pixel, but as the resolution increases with respect to the rectangle sizes\nwe start to encounter computational limits on memory and search times.\n\nThe aim of **blocksets** is to group members (i.e the discrete space) into lines\n/ rectangles / cuboids etc. to reduce memory & search times whilst still\nallowing the usual set operations to be performed.\n\nWe define a _block_ to be a discrete space that can be defined by opposite\nends/corners such as a line segment / rectangle / cuboid etc. and we are aim to\nmodel any layout as a set of _blocks_ instead of _tuples_.\n\n## Notions, Goals, Ideals & Constraints\n\n- Aim for a multidimensional solution from the start, rather than building\n  separate solutions/models.\n- Allow for the expression of _Open_ intervals (to infinity).\n- Mirror methods (and operators) of built-in Python **sets**.\n- Set members are always blocks of the same dimension and member blocks are\n  always disjoint from each other.\n- Aim to be able to compare 2 sets. This maybe too hard if we don't have a\n  consistent way to divide a subset up, i.e. there a multiple ways to partition\n  the same 2D space into rectangles.\n- Iteration over a block set yields blocks.\n- Iteration over a finite block yields tuples.\n\n_Let the exploration begin ..._\n\n## Installation\n\n`pip install blocksets`\n\nThere are no dependent packages\n\n## Example Use\n\nIt's worth reviewing and running the `example_use.py` module via\n`python -m blocksets.example_use`\nhere's a taste snippet for reading purposes\n\n```python\nfrom blocksets import Block, BlockSet\n\nbig_rubik = Block((0, 0, 0), (99999, 99999, 99999))\nassert big_rubik.measure == 999970000299999\ncentre_cube = Block((49999, 49999, 49999))\nassert centre_cube.measure == 1\n\n# Creates a large 3 dimensional cube with the centre missing\nbs = BlockSet(3)  \nbs.add(big_rubik)\nbs.remove(centre_cube)\n\nassert bs.point_count == 999970000299998\nassert bs.block_count == 6\n\nsorted_blocks = sorted(bs.blocks(), key=lambda x: x.norm)\n\nfor blk in sorted_blocks:\n    print(f\"{blk:50} {blk.measure}\")\n```\n\n## Classes\n\nTo begin with I have decided to model the block and the set of blocks as classes\n**Block** and **BlockSet**.\n\nBecause of\n\n- the multi-dimensional ambition\n- multiple ways to represent a block\n- option of _Open_ limits\n\nit seems sensible to keep these concerns separate from the main task in hand and\nallow us to make valid assumptions about how the blocks are expressed.\n\n### Block\n\nA **Block** is an orthogonal clump (_a line segment, rectangle, cuboid, hyper...\nyou get the idea_) of discrete space defined by opposite end/corner points $A,B$.\n\n- Decimal precision can always be achieved using some desired scaling, so\n  coordinates of the space are always specified as `int` _(unless specifying an\n  open interval, see below)_\n- The only exception being that the `float(\"inf\")` value can be used to\n  represent no limit (i.e. an open interval to infinity be it +/- in any of the\n  component dimensions)\n- Internally the block will always be **normalised** such that $A \\lt B$ in all\n  component dimensions.\n\n#### Normalised Form\n\nOpposite corners will always be normalised internally so that ordinates $a_i \\lt\nb_i$ i.e. the vector $\\vec{AB}$ is always positive in every component dimension.\n\nThe **Block** constructor handles various argument formats and resolves them to\nthis normalised representation after passing the following validations:\n\n- Arguments must be either `int`, `+/- float(\"inf\")` or a tuple of these.\n- The Dimension of the 2 end/corner points must match.\n- Ordinates in corresponding dimensions can not be the same value _(as that\n  would imply a block of zero space)_.\n\n#### Single Argument to the Constructor\n\nIf the second argument (for the opposite end/corner) is not supplied then it is\ndefaulted as follows for each ordinate in turn:\n\n- For finite values we add +1 thus modelling a unit value\n- For infinite values we assume the opposite end (i.e. the multiplicative inverse)\n\n_So for example if all values are finite then it is a single point (or unit\nblock)._\n\nExpressing them in a normalised form like this means we can easily _hash_ and\ncompare for equality using just the pair of coordinate tuples.\n\n#### Block Operations\n\nFind the overlapping intersection of 2 blocks `c = a & b` seems reasonable.\nUnion & Minus make no sense as there is no guarantee you end up with a single\nblock as a result.\n\nWe can easily support subset and superset operations too\n\n- `a <= b`\n- `a => b`\n- `in`\n\nGenerally, we will assume any arguments being supplied to a **BlockSet** method\nare attempting to express a **Block** (casting inline for ease of use).\n\nFor the `in` operator however, we will make an exception and assume any `tuple`\narguments are expressing a _point_ as opposed to a dereferenced list of\narguments for the Block() constructor.\n\n> This exception may just be a matter of personal taste, but is in keeping with\n> the strict notion of membership (\u2208) as different to subset (\u2286).\n\n### BlockSet\n\nA **BlockSet** is an attempt to mirror python sets where members are strictly\n**Block**s.\n\nHowever, unlike a python set we will constrain and validate on the type of member\nin that we only want **Block**s as members and they must be consistent with each\nother in dimension.\n\nIn order to write nice code using methods of a **BlockSet** we will convert\nnon-**Block** arguments into **Block** types inline via a parsing method in the\n**Block** class.\n\nThe **BlockSet** class itself offers methods and behaviour similar to that of a set.\n\nHowever, the actual construction of the set using **Block**s happens via an\noperation stack which gets resolved during a _normalisation_ process in which\nthe stacked operations `add` `remove` `toggle` are resolved to purely add\noperations of disjoint spaces.\n\nThe normalisation process resolves any overlapping and redundancy such that any 2\nsets of equal content (i.e. the same set of points) will have the same\nrepresentation in terms of the **Block**s used to represent the space.\n\nMethods and operators mirror those of the native set class\n\n- Content modifiers (add, remove, toggle)\n- Set comparisons (equality, subset, superset, disjoint)\n- Set operations (intersection, union, difference, symmetric_difference)\n- In place set operations (intersection_update, update, difference_update, symmetric_difference_update)\n- Use of the `in` operator will apply to a single **Block** object\n\n#### Normalisation\n\nThis is main concern of the class, taking the operation stack and resolving it\nto a resulting set of disjoint blocks in a consistent manner that also removes\nredundancy (this being where 2 adjacent blocks could have been expressed as 1 in\nthe same consistent manner).\n\nNormalisation is required and important (for accurate comparisons) but also\ncostly. We only want to perform it when its absolutely necessary and so clients\nare advised to group together modification calls as much as possible in order to\nminimise the amount of normalising required and especially so if performance is\nof a significant concern.\n\n## Design Approach\n\nSince _Normalisation_ is more optimal when batched it makes sense to stack the\nblocks being added or removed as layers so we only normalise the set when required.\n\nThis also gives us options for potentially giving blocks a value and performing\nsome other arithmetic (other than boolean) with them.\n\nThe main idea is to create a grid specific to the set based on only the relevant\nordinates and then recursively (by dimension) look for changes in cross-sections\nat those relevant points.\n\nEach recursive call returns a normalised set of blocks for the lower cross-section\ndimension. This allows for the detection of changes and removal of redundant blocks\nin a consistent fashion.\n\n### Local Grid System\n\nThe universe is made up of galaxies and changes within themselves do not affect\nother galaxies.\n\nIf we consider connected Blocks as a local system then we can optimise the\ncomputation when Blocks are added and removed, in that we only need to consider\nrefreshing the representation of the local system.\n\n> This concept may introduce an unnecessary overhead when identifying the local\nsystems and so we will not consider this for v1. We may look at again if there\nare meaningful use cases for it.\n\n### Optimisation\n\nClearly this approach is only useful if there is a significant saving on\nmodelling the granular space. For example if all the normalised disjoint blocks\nare unit blocks then we would be better off using a set of tuples regardless.\n\n#### Complexity\n\nN = Number of block operation layers to normalise\n\n- Establish the grid markers in each dimension and map their ordinates to\n  markers using a binary search. NLog(N)\n- Create a cross section for each marker in each dimension. (2N^2)^d at worse\n\n#### Memory\n\nStoring block data as a Block object has an overhead of 472 bytes per block\nobject (for garbage collection). Although it is unlikely to impact we may be\ninclined to prefer a simple tuple to a block when it comes to modelling any\ntransient structures.\n\n## Testing Strategy\n\n- Have a small number of Block fixtures covering the various possibilities\n- 3 block operations seems like a good balance (between cognition and coverage)\n- Apply all possible combinations of order and operation for a group of 3 using\n  some base set\n- Duplicate the test cases using sets of tuple points. Since we are confident\n  they produce the correct result we can compare them to the results obtained by\n  the BlockSet methods.\n- Testing cases with _Open_ intervals will require manually crafted tests as we\n  won't be able to generate an equivalent set of tuple points\n\n### Running Tests\n\nInstall `pytest` if not already\n\nFor a coverage report\n\n`pytest --cov=blocksets tests/ --cov-report term-missing`\n\n## Ideas\n\nCurrently, we are effectively modelling boolean values at specific points. We\ncould extend the notion of a Block Operation (i.e. layer) by say adding _value_\nwhich may introduce more arithmetic functions like say `.sub()`  (and\n`.remove()` meaning zeroise) etc.\n\nFor suggesting further ideas please create a github issue as per the\ncontribution guidelines\n\n## Contribution\n\n- At the moment it is early days so whilst the foundations are forming I am only\n  inviting comments which can be given via [github issues]([https://](https://github.com/daveisagit/blocksets/issues))\n\n## TODO - Reminders\n\n- [ ] Test cases for normalisation using _Open_ intervals\n- [ ] Test cases for set operations using _Open_ intervals\n- [ ] Test cases for set operations in 3D\n- [ ] More systematic test case coverage for set operations\n- [ ] Some performance tests on large operation stacks\n- [ ] Maybe a\n  [Manim]([https://](https://docs.manim.community/en/stable/index.html)) style\n  video explainer?\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python package for efficiently computing blocks of discrete space in any dimension.",
    "version": "0.0.20",
    "project_urls": {
        "Homepage": "https://github.com/daveisagit/blocksets",
        "Issues": "https://github.com/daveisagit/blocksets/issues"
    },
    "split_keywords": [
        "intervals",
        " lattice",
        " sets",
        " pixels",
        " points",
        " dimension",
        " ranges",
        " orthogonal",
        " space",
        " grid"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b318ac3ee2ff0815191c99da99301bbb5670c0c040cb717fa6989a9359e6720f",
                "md5": "9115b6b7b20f03d1174616a0a5bf6156",
                "sha256": "61671d83e7e9c7ea48d1964ecd9c3f12e5d26a17908fced108b80b6a22f91498"
            },
            "downloads": -1,
            "filename": "blocksets-0.0.20-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9115b6b7b20f03d1174616a0a5bf6156",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 17929,
            "upload_time": "2024-07-05T12:03:50",
            "upload_time_iso_8601": "2024-07-05T12:03:50.647516Z",
            "url": "https://files.pythonhosted.org/packages/b3/18/ac3ee2ff0815191c99da99301bbb5670c0c040cb717fa6989a9359e6720f/blocksets-0.0.20-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "29c42fbc16c72813a726078c4823c13c07b40aa4269bfbaae004c1be3b1e9b38",
                "md5": "5011e8c39a4a723cb05a17a66873aec5",
                "sha256": "87fa895aa82a254317bfe2f9b88f08c7da30a0a868c45055bf4db0431c59e40a"
            },
            "downloads": -1,
            "filename": "blocksets-0.0.20.tar.gz",
            "has_sig": false,
            "md5_digest": "5011e8c39a4a723cb05a17a66873aec5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 27815,
            "upload_time": "2024-07-05T12:03:51",
            "upload_time_iso_8601": "2024-07-05T12:03:51.745219Z",
            "url": "https://files.pythonhosted.org/packages/29/c4/2fbc16c72813a726078c4823c13c07b40aa4269bfbaae004c1be3b1e9b38/blocksets-0.0.20.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-05 12:03:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "daveisagit",
    "github_project": "blocksets",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "blocksets"
}
        
Elapsed time: 0.29947s