================
Relation Catalog
================
.. contents::
Overview
========
The relation catalog can be used to optimize intransitive and transitive
searches for N-ary relations of finite, preset dimensions.
For example, you can index simple two-way relations, like employee to
supervisor; RDF-style triples of subject-predicate-object; and more complex
relations such as subject-predicate-object with context and state. These
can be searched with variable definitions of transitive behavior.
The catalog can be used in the ZODB or standalone. It is a generic, relatively
policy-free tool.
It is expected to be used usually as an engine for more specialized and
constrained tools and APIs. Three such tools are zc.relationship containers,
plone.relations containers, and zc.vault. The documents in the package,
including this one, describe other possible uses.
History
=======
This is a refactoring of the ZODB-only parts of the zc.relationship package.
Specifically, the zc.relation catalog is largely equivalent to the
zc.relationship index. The index in the zc.relationship 2.x line is an
almost-completely backwards-compatible wrapper of the zc.relation catalog.
zc.relationship will continue to be maintained, though active development is
expected to go into zc.relation.
Many of the ideas come from discussions with and code from Casey Duncan, Tres
Seaver, Ken Manheimer, and more.
Setting Up a Relation Catalog
=============================
In this section, we will be introducing the following ideas.
- Relations are objects with indexed values.
- You add value indexes to relation catalogs to be able to search. Values
can be identified to the catalog with callables or interface elements. The
indexed value must be specified to the catalog as a single value or a
collection.
- Relations and their values are stored in the catalog as tokens: unique
identifiers that you can resolve back to the original value. Integers are the
most efficient tokens, but others can work fine too.
- Token type determines the BTree module needed.
- You must define your own functions for tokenizing and resolving tokens. These
functions are registered with the catalog for the relations and for each of
their value indexes.
- Relations are indexed with ``index``.
We will use a simple two way relation as our example here. A brief introduction
to a more complex RDF-style subject-predicate-object set up can be found later
in the document.
Creating the Catalog
--------------------
Imagine a two way relation from one value to another. Let's say that we
are modeling a relation of people to their supervisors: an employee may
have a single supervisor. For this first example, the relation between
employee and supervisor will be intrinsic: the employee has a pointer to
the supervisor, and the employee object itself represents the relation.
Let's say further, for simplicity, that employee names are unique and
can be used to represent employees. We can use names as our "tokens".
Tokens are similar to the primary key in a relational database. A token is a
way to identify an object. It must sort reliably and you must be able to write
a callable that reliably resolves to the object given the right context. In
Zope 3, intids (zope.app.intid) and keyreferences (zope.app.keyreference) are
good examples of reasonable tokens.
As we'll see below, you provide a way to convert objects to tokens, and resolve
tokens to objects, for the relations, and for each value index individually.
They can be the all the same functions or completely different, depending on
your needs.
For speed, integers make the best tokens; followed by other
immutables like strings; followed by non-persistent objects; followed by
persistent objects. The choice also determines a choice of BTree module, as
we'll see below.
Here is our toy ``Employee`` example class. Again, we will use the employee
name as the tokens.
>>> employees = {} # we'll use this to resolve the "name" tokens
>>> from functools import total_ordering
>>> @total_ordering
... class Employee(object):
... def __init__(self, name, supervisor=None):
... if name in employees:
... raise ValueError('employee with same name already exists')
... self.name = name # expect this to be readonly
... self.supervisor = supervisor
... employees[name] = self
... # the next parts just make the tests prettier
... def __repr__(self):
... return '<Employee instance "' + self.name + '">'
... def __lt__(self, other):
... return self.name < other.name
... def __eq__(self, other):
... return self is other
... def __hash__(self):
... ''' Dummy method needed because we defined __eq__
... '''
... return 1
...
So, we need to define how to turn employees into their tokens. We call the
tokenization a "dump" function. Conversely, the function to resolve tokens into
objects is called a "load".
Functions to dump relations and values get several arguments. The first
argument is the object to be tokenized. Next, because it helps sometimes to
provide context, is the catalog. The last argument is a dictionary that will be
shared for a given search. The dictionary can be ignored, or used as a cache
for optimizations (for instance, to stash a utility that you looked up).
For this example, our function is trivial: we said the token would be
the employee's name.
>>> def dumpEmployees(emp, catalog, cache):
... return emp.name
...
If you store the relation catalog persistently (e.g., in the ZODB) be aware
that the callables you provide must be picklable--a module-level function,
for instance.
We also need a way to turn tokens into employees, or "load".
The "load" functions get the token to be resolved; the catalog, for
context; and a dict cache, for optimizations of subsequent calls.
You might have noticed in our ``Employee.__init__`` that we keep a mapping
of name to object in the ``employees`` global dict (defined right above
the class definition). We'll use that for resolving the tokens.
>>> def loadEmployees(token, catalog, cache):
... return employees[token]
...
Now we know enough to get started with a catalog. We'll instantiate it
by specifying how to tokenize relations, and what kind of BTree modules
should be used to hold the tokens.
How do you pick BTree modules?
- If the tokens are 32-bit ints, choose ``BTrees.family32.II``,
``BTrees.family32.IF`` or ``BTrees.family32.IO``.
- If the tokens are 64 bit ints, choose ``BTrees.family64.II``,
``BTrees.family64.IF`` or ``BTrees.family64.IO``.
- If they are anything else, choose ``BTrees.family32.OI``,
``BTrees.family64.OI``, or ``BTrees.family32.OO`` (or
``BTrees.family64.OO``--they are the same).
Within these rules, the choice is somewhat arbitrary unless you plan to merge
these results with that of another source that is using a particular BTree
module. BTree set operations only work within the same module, so you must
match module to module. The catalog defaults to IF trees, because that's what
standard zope catalogs use. That's as reasonable a choice as any, and will
potentially come in handy if your tokens are in fact the same as those used by
the zope catalog and you want to do some set operations.
In this example, our tokens are strings, so we want OO or an OI variant. We'll
choose BTrees.family32.OI, arbitrarily.
>>> import zc.relation.catalog
>>> import BTrees
>>> catalog = zc.relation.catalog.Catalog(dumpEmployees, loadEmployees,
... btree=BTrees.family32.OI)
[#verifyObjectICatalog]_
.. [#verifyObjectICatalog] The catalog provides ICatalog.
>>> from zope.interface.verify import verifyObject
>>> import zc.relation.interfaces
>>> verifyObject(zc.relation.interfaces.ICatalog, catalog)
True
[#legacy]_
.. [#legacy] Old instances of zc.relationship indexes, which in the newest
version subclass a zc.relation Catalog, used to have a dict in an
internal data structure. We specify that here so that the code that
converts the dict to an OOBTree can have a chance to run.
>>> catalog._attrs = dict(catalog._attrs)
Look! A relation catalog! We can't do very
much searching with it so far though, because the catalog doesn't have any
indexes.
In this example, the relation itself represents the employee, so we won't need
to index that separately.
But we do need a way to tell the catalog how to find the other end of the
relation, the supervisor. You can specify this to the catalog with an attribute
or method specified from ``zope.interface Interface``, or with a callable.
We'll use a callable for now. The callable will receive the indexed relation
and the catalog for context.
>>> def supervisor(emp, catalog):
... return emp.supervisor # None or another employee
...
We'll also need to specify how to tokenize (dump and load) those values. In
this case, we're able to use the same functions as the relations themselves.
However, do note that we can specify a completely different way to dump and
load for each "value index," or relation element.
We could also specify the name to call the index, but it will default to the
``__name__`` of the function (or interface element), which will work just fine
for us now.
Now we can add the "supervisor" value index.
>>> catalog.addValueIndex(supervisor, dumpEmployees, loadEmployees,
... btree=BTrees.family32.OI)
Now we have an index [#addValueIndexExceptions]_.
.. [#addValueIndexExceptions] Adding a value index can generate several
exceptions.
You must supply both of dump and load or neither.
>>> catalog.addValueIndex(supervisor, dumpEmployees, None,
... btree=BTrees.family32.OI, name='supervisor2')
Traceback (most recent call last):
...
ValueError: either both of 'dump' and 'load' must be None, or neither
In this example, even if we fix it, we'll get an error, because we have
already indexed the supervisor function.
>>> catalog.addValueIndex(supervisor, dumpEmployees, loadEmployees,
... btree=BTrees.family32.OI, name='supervisor2')
... # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: ('element already indexed', <function supervisor at ...>)
You also can't add a different function under the same name.
>>> def supervisor2(emp, catalog):
... return emp.supervisor # None or another employee
...
>>> catalog.addValueIndex(supervisor2, dumpEmployees, loadEmployees,
... btree=BTrees.family32.OI, name='supervisor')
... # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: ('name already used', 'supervisor')
Finally, if your function does not have a ``__name__`` and you do not
provide one, you may not add an index.
>>> class Supervisor3(object):
... __name__ = None
... def __call__(klass, emp, catalog):
... return emp.supervisor
...
>>> supervisor3 = Supervisor3()
>>> supervisor3.__name__
>>> catalog.addValueIndex(supervisor3, dumpEmployees, loadEmployees,
... btree=BTrees.family32.OI)
... # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: no name specified
>>> [info['name'] for info in catalog.iterValueIndexInfo()]
['supervisor']
Adding Relations
----------------
Now let's create a few employees. All but one will have supervisors.
If you recall our toy ``Employee`` class, the first argument to the
constructor is the employee name (and therefore the token), and the
optional second argument is the supervisor.
>>> a = Employee('Alice')
>>> b = Employee('Betty', a)
>>> c = Employee('Chuck', a)
>>> d = Employee('Diane', b)
>>> e = Employee('Edgar', b)
>>> f = Employee('Frank', c)
>>> g = Employee('Galyn', c)
>>> h = Employee('Howie', d)
Here is a diagram of the hierarchy.
::
Alice
__/ \__
Betty Chuck
/ \ / \
Diane Edgar Frank Galyn
|
Howie
Let's tell the catalog about the relations, using the ``index`` method.
>>> for emp in (a,b,c,d,e,f,g,h):
... catalog.index(emp)
...
We've now created the relation catalog and added relations to it. We're ready
to search!
Searching
=========
In this section, we will introduce the following ideas.
- Queries to the relation catalog are formed with dicts.
- Query keys are the names of the indexes you want to search, or, for the
special case of precise relations, the ``zc.relation.RELATION`` constant.
- Query values are the tokens of the results you want to match; or ``None``,
indicating relations that have ``None`` as a value (or an empty collection,
if it is a multiple). Search values can use
``zc.relation.catalog.any(args)`` or ``zc.relation.catalog.Any(args)`` to
specify multiple (non-``None``) results to match for a given key.
- The index has a variety of methods to help you work with tokens.
``tokenizeQuery`` is typically the most used, though others are available.
- To find relations that match a query, use ``findRelations`` or
``findRelationTokens``.
- To find values that match a query, use ``findValues`` or ``findValueTokens``.
- You search transitively by using a query factory. The
``zc.relation.queryfactory.TransposingTransitive`` is a good common case
factory that lets you walk up and down a hierarchy. A query factory can be
passed in as an argument to search methods as a ``queryFactory``, or
installed as a default behavior using ``addDefaultQueryFactory``.
- To find how a query is related, use ``findRelationChains`` or
``findRelationTokenChains``.
- To find out if a query is related, use ``canFind``.
- Circular transitive relations are handled to prevent infinite loops. They
are identified in ``findRelationChains`` and ``findRelationTokenChains`` with
a ``zc.relation.interfaces.ICircularRelationPath`` marker interface.
- search methods share the following arguments:
* ``maxDepth``, limiting the transitive depth for searches;
* ``filter``, allowing code to filter transitive paths;
* ``targetQuery``, allowing a query to filter transitive paths on the basis
of the endpoint;
* ``targetFilter``, allowing code to filter transitive paths on the basis of
the endpoint; and
* ``queryFactory``, mentioned above.
- You can set up search indexes to speed up specific transitive searches.
Queries, ``findRelations``, and special query values
----------------------------------------------------
So who works for Alice? That means we want to get the relations--the
employees--with a ``supervisor`` of Alice.
The heart of a question to the catalog is a query. A query is spelled
as a dictionary. The main idea is simply that keys in a dictionary
specify index names, and the values specify the constraints.
The values in a query are always expressed with tokens. The catalog has
several helpers to make this less onerous, but for now let's take
advantage of the fact that our tokens are easily comprehensible.
>>> sorted(catalog.findRelations({'supervisor': 'Alice'}))
[<Employee instance "Betty">, <Employee instance "Chuck">]
Alice is the direct (intransitive) boss of Betty and Chuck.
What if you want to ask "who doesn't report to anyone?" Then you want to
ask for a relation in which the supervisor is None.
>>> list(catalog.findRelations({'supervisor': None}))
[<Employee instance "Alice">]
Alice is the only employee who doesn't report to anyone.
What if you want to ask "who reports to Diane or Chuck?" Then you use the
zc.relation ``Any`` class or ``any`` function to pass the multiple values.
>>> sorted(catalog.findRelations(
... {'supervisor': zc.relation.catalog.any('Diane', 'Chuck')}))
... # doctest: +NORMALIZE_WHITESPACE
[<Employee instance "Frank">, <Employee instance "Galyn">,
<Employee instance "Howie">]
Frank, Galyn, and Howie each report to either Diane or Chuck. [#any]_
.. [#any] ``Any`` can be compared.
>>> zc.relation.catalog.any('foo', 'bar', 'baz')
<zc.relation.catalog.Any instance ('bar', 'baz', 'foo')>
>>> (zc.relation.catalog.any('foo', 'bar', 'baz') ==
... zc.relation.catalog.any('bar', 'foo', 'baz'))
True
>>> (zc.relation.catalog.any('foo', 'bar', 'baz') !=
... zc.relation.catalog.any('bar', 'foo', 'baz'))
False
>>> (zc.relation.catalog.any('foo', 'bar', 'baz') ==
... zc.relation.catalog.any('foo', 'baz'))
False
>>> (zc.relation.catalog.any('foo', 'bar', 'baz') !=
... zc.relation.catalog.any('foo', 'baz'))
True
``findValues`` and the ``RELATION`` query key
---------------------------------------------
So how do we find who an employee's supervisor is? Well, in this case,
look at the attribute on the employee! If you can use an attribute that
will usually be a win in the ZODB.
>>> h.supervisor
<Employee instance "Diane">
Again, as we mentioned at the start of this first example, the knowledge
of a supervisor is "intrinsic" to the employee instance. It is
possible, and even easy, to ask the catalog this kind of question, but
the catalog syntax is more geared to "extrinsic" relations, such as the
one from the supervisor to the employee: the connection between a
supervisor object and its employees is extrinsic to the supervisor, so
you actually might want a catalog to find it!
However, we will explore the syntax very briefly, because it introduces an
important pair of search methods, and because it is a stepping stone
to our first transitive search.
So, o relation catalog, who is Howie's supervisor?
To ask this question we want to get the indexed values off of the relations:
``findValues``. In its simplest form, the arguments are the index name of the
values you want, and a query to find the relations that have the desired
values.
What about the query? Above, we noted that the keys in a query are the names of
the indexes to search. However, in this case, we don't want to search one or
more indexes for matching relations, as usual, but actually specify a relation:
Howie.
We do not have a value index name: we are looking for a relation. The query
key, then, should be the constant ``zc.relation.RELATION``. For our current
example, that would mean the query is ``{zc.relation.RELATION: 'Howie'}``.
>>> import zc.relation
>>> list(catalog.findValues(
... 'supervisor', {zc.relation.RELATION: 'Howie'}))[0]
<Employee instance "Diane">
Congratulations, you just found an obfuscated and comparitively
inefficient way to write ``howie.supervisor``! [#intrinsic_search]_
.. [#intrinsic_search] Here's the same with token results.
>>> list(catalog.findValueTokens('supervisor',
... {zc.relation.RELATION: 'Howie'}))
['Diane']
While we're down here in the footnotes, I'll mention that you can
search for relations that haven't been indexed.
>>> list(catalog.findRelationTokens({zc.relation.RELATION: 'Ygritte'}))
[]
>>> list(catalog.findRelations({zc.relation.RELATION: 'Ygritte'}))
[]
[#findValuesExceptions]_
.. [#findValuesExceptions] If you use ``findValues`` or ``findValueTokens`` and
try to specify a value name that is not indexed, you get a ValueError.
>>> catalog.findValues('foo')
Traceback (most recent call last):
...
ValueError: ('name not indexed', 'foo')
Slightly more usefully, you can use other query keys along with
zc.relation.RELATION. This asks, "Of Betty, Alice, and Frank, who are
supervised by Alice?"
>>> sorted(catalog.findRelations(
... {zc.relation.RELATION: zc.relation.catalog.any(
... 'Betty', 'Alice', 'Frank'),
... 'supervisor': 'Alice'}))
[<Employee instance "Betty">]
Only Betty is.
Tokens
------
As mentioned above, the catalog provides several helpers to work with tokens.
The most frequently used is ``tokenizeQuery``, which takes a query with object
values and converts them to tokens using the "dump" functions registered for
the relations and indexed values. Here are alternate spellings of some of the
queries we've encountered above.
>>> catalog.tokenizeQuery({'supervisor': a})
{'supervisor': 'Alice'}
>>> catalog.tokenizeQuery({'supervisor': None})
{'supervisor': None}
>>> import pprint
>>> result = catalog.tokenizeQuery(
... {zc.relation.RELATION: zc.relation.catalog.any(a, b, f),
... 'supervisor': a}) # doctest: +NORMALIZE_WHITESPACE
>>> pprint.pprint(result)
{None: <zc.relation.catalog.Any instance ('Alice', 'Betty', 'Frank')>,
'supervisor': 'Alice'}
(If you are wondering about that ``None`` in the last result, yes,
``zc.relation.RELATION`` is just readability sugar for ``None``.)
So, here's a real search using ``tokenizeQuery``. We'll make an alias for
``catalog.tokenizeQuery`` just to shorten things up a bit.
>>> query = catalog.tokenizeQuery
>>> sorted(catalog.findRelations(query(
... {zc.relation.RELATION: zc.relation.catalog.any(a, b, f),
... 'supervisor': a})))
[<Employee instance "Betty">]
The catalog always has parallel search methods, one for finding objects, as
seen above, and one for finding tokens (the only exception is ``canFind``,
described below). Finding tokens can be much more efficient, especially if the
result from the relation catalog is just one step along the path of finding
your desired result. But finding objects is simpler for some common cases.
Here's a quick example of some queries above, getting tokens rather than
objects.
You can also spell a query in ``tokenizeQuery`` with keyword arguments. This
won't work if your key is ``zc.relation.RELATION``, but otherwise it can
improve readability. We'll see some examples of this below as well.
>>> sorted(catalog.findRelationTokens(query(supervisor=a)))
['Betty', 'Chuck']
>>> sorted(catalog.findRelationTokens({'supervisor': None}))
['Alice']
>>> sorted(catalog.findRelationTokens(
... query(supervisor=zc.relation.catalog.any(c, d))))
['Frank', 'Galyn', 'Howie']
>>> sorted(catalog.findRelationTokens(
... query({zc.relation.RELATION: zc.relation.catalog.any(a, b, f),
... 'supervisor': a})))
['Betty']
The catalog provides several other methods just for working with tokens.
- ``resolveQuery``: the inverse of ``tokenizeQuery``, converting a
tokenizedquery to a query with objects.
- ``tokenizeValues``: returns an iterable of tokens for the values of the given
index name.
- ``resolveValueTokens``: returns an iterable of values for the tokens of the
given index name.
- ``tokenizeRelation``: returns a token for the given relation.
- ``resolveRelationToken``: returns a relation for the given token.
- ``tokenizeRelations``: returns an iterable of tokens for the relations given.
- ``resolveRelationTokens``: returns an iterable of relations for the tokens
given.
These methods are lesser used, and described in more technical documents in
this package.
Transitive Searching, Query Factories, and ``maxDepth``
-------------------------------------------------------
So, we've seen a lot of one-level, intransitive searching. What about
transitive searching? Well, you need to tell the catalog how to walk the tree.
In simple (and very common) cases like this, the
``zc.relation.queryfactory.TransposingTransitive`` will do the trick.
A transitive query factory is just a callable that the catalog uses to
ask "I got this query, and here are the results I found. I'm supposed to
walk another step transitively, so what query should I search for next?"
Writing a factory is more complex than we want to talk about right now,
but using the ``TransposingTransitiveQueryFactory`` is easy. You just tell
it the two query names it should transpose for walking in either
direction.
For instance, here we just want to tell the factory to transpose the two keys
we've used, ``zc.relation.RELATION`` and 'supervisor'. Let's make a factory,
use it in a query for a couple of transitive searches, and then, if you want,
you can read through a footnote to talk through what is happening.
Here's the factory.
>>> import zc.relation.queryfactory
>>> factory = zc.relation.queryfactory.TransposingTransitive(
... zc.relation.RELATION, 'supervisor')
Now ``factory`` is just a callable. Let's let it help answer a couple of
questions.
Who are all of Howie's supervisors transitively (this looks up in the
diagram)?
>>> list(catalog.findValues('supervisor', {zc.relation.RELATION: 'Howie'},
... queryFactory=factory))
... # doctest: +NORMALIZE_WHITESPACE
[<Employee instance "Diane">, <Employee instance "Betty">,
<Employee instance "Alice">]
Who are all of the people Betty supervises transitively, breadth first (this
looks down in the diagram)?
>>> people = list(catalog.findRelations(
... {'supervisor': 'Betty'}, queryFactory=factory))
>>> sorted(people[:2])
[<Employee instance "Diane">, <Employee instance "Edgar">]
>>> people[2]
<Employee instance "Howie">
Yup, that looks right. So how did that work? If you care, read this
footnote. [#I_care]_
This transitive factory is really the only transitive factory you would
want for this particular catalog, so it probably is safe to wire it in
as a default. You can add multiple query factories to match different
queries using ``addDefaultQueryFactory``.
>>> catalog.addDefaultQueryFactory(factory)
Now all searches are transitive by default.
>>> list(catalog.findValues('supervisor', {zc.relation.RELATION: 'Howie'}))
... # doctest: +NORMALIZE_WHITESPACE
[<Employee instance "Diane">, <Employee instance "Betty">,
<Employee instance "Alice">]
>>> people = list(catalog.findRelations({'supervisor': 'Betty'}))
>>> sorted(people[:2])
[<Employee instance "Diane">, <Employee instance "Edgar">]
>>> people[2]
<Employee instance "Howie">
We can force a non-transitive search, or a specific search depth, with
``maxDepth`` [#needs_a_transitive_queries_factory]_.
.. [#needs_a_transitive_queries_factory] A search with a ``maxDepth`` > 1 but
no ``queryFactory`` raises an error.
>>> catalog.removeDefaultQueryFactory(factory)
>>> catalog.findRelationTokens({'supervisor': 'Diane'}, maxDepth=3)
Traceback (most recent call last):
...
ValueError: if maxDepth not in (None, 1), queryFactory must be available
>>> catalog.addDefaultQueryFactory(factory)
>>> list(catalog.findValues(
... 'supervisor', {zc.relation.RELATION: 'Howie'}, maxDepth=1))
[<Employee instance "Diane">]
>>> sorted(catalog.findRelations({'supervisor': 'Betty'}, maxDepth=1))
[<Employee instance "Diane">, <Employee instance "Edgar">]
[#maxDepthExceptions]_
.. [#maxDepthExceptions] ``maxDepth`` must be None or a positive integer, or
else you'll get a value error.
>>> catalog.findRelations({'supervisor': 'Betty'}, maxDepth=0)
Traceback (most recent call last):
...
ValueError: maxDepth must be None or a positive integer
>>> catalog.findRelations({'supervisor': 'Betty'}, maxDepth=-1)
Traceback (most recent call last):
...
ValueError: maxDepth must be None or a positive integer
We'll introduce some other available search
arguments later in this document and in other documents. It's important
to note that *all search methods share the same arguments as
``findRelations``*. ``findValues`` and ``findValueTokens`` only add the
initial argument of specifying the desired value.
We've looked at two search methods so far: the ``findValues`` and
``findRelations`` methods help you ask what is related. But what if you
want to know *how* things are transitively related?
``findRelationChains`` and ``targetQuery``
------------------------------------------
Another search method, ``findRelationChains``, helps you discover how
things are transitively related.
The method name says "find relation chains". But what is a "relation
chain"? In this API, it is a transitive path of relations. For
instance, what's the chain of command above Howie? ``findRelationChains``
will return each unique path.
>>> list(catalog.findRelationChains({zc.relation.RELATION: 'Howie'}))
... # doctest: +NORMALIZE_WHITESPACE
[(<Employee instance "Howie">,),
(<Employee instance "Howie">, <Employee instance "Diane">),
(<Employee instance "Howie">, <Employee instance "Diane">,
<Employee instance "Betty">),
(<Employee instance "Howie">, <Employee instance "Diane">,
<Employee instance "Betty">, <Employee instance "Alice">)]
Look at that result carefully. Notice that the result is an iterable of
tuples. Each tuple is a unique chain, which may be a part of a
subsequent chain. In this case, the last chain is the longest and the
most comprehensive.
What if we wanted to see all the paths from Alice? That will be one
chain for each supervised employee, because it shows all possible paths.
>>> sorted(catalog.findRelationChains(
... {'supervisor': 'Alice'}))
... # doctest: +NORMALIZE_WHITESPACE
[(<Employee instance "Betty">,),
(<Employee instance "Betty">, <Employee instance "Diane">),
(<Employee instance "Betty">, <Employee instance "Diane">,
<Employee instance "Howie">),
(<Employee instance "Betty">, <Employee instance "Edgar">),
(<Employee instance "Chuck">,),
(<Employee instance "Chuck">, <Employee instance "Frank">),
(<Employee instance "Chuck">, <Employee instance "Galyn">)]
That's all the paths--all the chains--from Alice. We sorted the results,
but normally they would be breadth first.
But what if we wanted to just find the paths from one query result to
another query result--say, we wanted to know the chain of command from Alice
down to Howie? Then we can specify a ``targetQuery`` that specifies the
characteristics of our desired end point (or points).
>>> list(catalog.findRelationChains(
... {'supervisor': 'Alice'},
... targetQuery={zc.relation.RELATION: 'Howie'}))
... # doctest: +NORMALIZE_WHITESPACE
[(<Employee instance "Betty">, <Employee instance "Diane">,
<Employee instance "Howie">)]
So, Betty supervises Diane, who supervises Howie.
Note that ``targetQuery`` now joins ``maxDepth`` in our collection of shared
search arguments that we have introduced.
``filter`` and ``targetFilter``
-------------------------------
We can take a quick look now at the last of the two shared search arguments:
``filter`` and ``targetFilter``. These two are similar in that they both are
callables that can approve or reject given relations in a search based on
whatever logic you can code. They differ in that ``filter`` stops any further
transitive searches from the relation, while ``targetFilter`` merely omits the
given result but allows further search from it. Like ``targetQuery``, then,
``targetFilter`` is good when you want to specify the other end of a path.
As an example, let's say we only want to return female employees.
>>> female_employees = ('Alice', 'Betty', 'Diane', 'Galyn')
>>> def female_filter(relchain, query, catalog, cache):
... return relchain[-1] in female_employees
...
Here are all the female employees supervised by Alice transitively, using
``targetFilter``.
>>> list(catalog.findRelations({'supervisor': 'Alice'},
... targetFilter=female_filter))
... # doctest: +NORMALIZE_WHITESPACE
[<Employee instance "Betty">, <Employee instance "Diane">,
<Employee instance "Galyn">]
Here are all the female employees supervised by Chuck.
>>> list(catalog.findRelations({'supervisor': 'Chuck'},
... targetFilter=female_filter))
[<Employee instance "Galyn">]
The same method used as a filter will only return females directly
supervised by other females--not Galyn, in this case.
>>> list(catalog.findRelations({'supervisor': 'Alice'},
... filter=female_filter))
[<Employee instance "Betty">, <Employee instance "Diane">]
These can be combined with one another, and with the other search
arguments [#filter]_.
.. [#filter] For instance:
>>> list(catalog.findRelationTokens(
... {'supervisor': 'Alice'}, targetFilter=female_filter,
... targetQuery={zc.relation.RELATION: 'Galyn'}))
['Galyn']
>>> list(catalog.findRelationTokens(
... {'supervisor': 'Alice'}, targetFilter=female_filter,
... targetQuery={zc.relation.RELATION: 'Not known'}))
[]
>>> arbitrary = ['Alice', 'Chuck', 'Betty', 'Galyn']
>>> def arbitrary_filter(relchain, query, catalog, cache):
... return relchain[-1] in arbitrary
>>> list(catalog.findRelationTokens({'supervisor': 'Alice'},
... filter=arbitrary_filter,
... targetFilter=female_filter))
['Betty', 'Galyn']
Search indexes
--------------
Without setting up any additional indexes, the transitive behavior of
the ``findRelations`` and ``findValues`` methods essentially relies on the
brute force searches of ``findRelationChains``. Results are iterables
that are gradually computed. For instance, let's repeat the question
"Whom does Betty supervise?". Notice that ``res`` first populates a list
with three members, but then does not populate a second list. The
iterator has been exhausted.
>>> res = catalog.findRelationTokens({'supervisor': 'Betty'})
>>> unindexed = sorted(res)
>>> len(unindexed)
3
>>> len(list(res)) # iterator is exhausted
0
The brute force of this approach can be sufficient in many cases, but
sometimes speed for these searches is critical. In these cases, you can
add a "search index". A search index speeds up the result of one or
more precise searches by indexing the results. Search indexes can
affect the results of searches with a ``queryFactory`` in ``findRelations``,
``findValues``, and the soon-to-be-introduced ``canFind``, but they do not
affect ``findRelationChains``.
The zc.relation package currently includes two kinds of search indexes, one for
indexing transitive membership searches in a hierarchy and one for intransitive
searches explored in tokens.rst in this package, which can optimize frequent
searches on complex queries or can effectively change the meaning of an
intransitive search. Other search index implementations and approaches may be
added in the future.
Here's a very brief example of adding a search index for the transitive
searches seen above that specify a 'supervisor'.
>>> import zc.relation.searchindex
>>> catalog.addSearchIndex(
... zc.relation.searchindex.TransposingTransitiveMembership(
... 'supervisor', zc.relation.RELATION))
The ``zc.relation.RELATION`` describes how to walk back up the chain. Search
indexes are explained in reasonable detail in searchindex.rst.
Now that we have added the index, we can search again. The result this
time is already computed, so, at least when you ask for tokens, it
is repeatable.
>>> res = catalog.findRelationTokens({'supervisor': 'Betty'})
>>> len(list(res))
3
>>> len(list(res))
3
>>> sorted(res) == unindexed
True
Note that the breadth-first sorting is lost when an index is used [#updates]_.
.. [#updates] The scenario we are looking at in this document shows a case
in which special logic in the search index needs to address updates.
For example, if we move Howie from Diane
::
Alice
__/ \__
Betty Chuck
/ \ / \
Diane Edgar Frank Galyn
|
Howie
to Galyn
::
Alice
__/ \__
Betty Chuck
/ \ / \
Diane Edgar Frank Galyn
|
Howie
then the search index is correct both for the new location and the old.
>>> h.supervisor = g
>>> catalog.index(h)
>>> list(catalog.findRelationTokens({'supervisor': 'Diane'}))
[]
>>> list(catalog.findRelationTokens({'supervisor': 'Betty'}))
['Diane', 'Edgar']
>>> list(catalog.findRelationTokens({'supervisor': 'Chuck'}))
['Frank', 'Galyn', 'Howie']
>>> list(catalog.findRelationTokens({'supervisor': 'Galyn'}))
['Howie']
>>> h.supervisor = d
>>> catalog.index(h) # move him back
>>> list(catalog.findRelationTokens({'supervisor': 'Galyn'}))
[]
>>> list(catalog.findRelationTokens({'supervisor': 'Diane'}))
['Howie']
Transitive cycles (and updating and removing relations)
-------------------------------------------------------
The transitive searches and the provided search indexes can handle
cycles. Cycles are less likely in the current example than some others,
but we can stretch the case a bit: imagine a "king in disguise", in
which someone at the top works lower in the hierarchy. Perhaps Alice
works for Zane, who works for Betty, who works for Alice. Artificial,
but easy enough to draw::
______
/ \
/ Zane
/ |
/ Alice
/ __/ \__
/ Betty__ Chuck
\-/ / \ / \
Diane Edgar Frank Galyn
|
Howie
Easy to create too.
>>> z = Employee('Zane', b)
>>> a.supervisor = z
Now we have a cycle. Of course, we have not yet told the catalog about it.
``index`` can be used both to reindex Alice and index Zane.
>>> catalog.index(a)
>>> catalog.index(z)
Now, if we ask who works for Betty, we get the entire tree. (We'll ask
for tokens, just so that the result is smaller to look at.) [#same_set]_
.. [#same_set] The result of the query for Betty, Alice, and Zane are all the
same.
>>> res1 = catalog.findRelationTokens({'supervisor': 'Betty'})
>>> res2 = catalog.findRelationTokens({'supervisor': 'Alice'})
>>> res3 = catalog.findRelationTokens({'supervisor': 'Zane'})
>>> list(res1) == list(res2) == list(res3)
True
The cycle doesn't pollute the index outside of the cycle.
>>> res = catalog.findRelationTokens({'supervisor': 'Diane'})
>>> list(res)
['Howie']
>>> list(res) # it isn't lazy, it is precalculated
['Howie']
>>> sorted(catalog.findRelationTokens({'supervisor': 'Betty'}))
... # doctest: +NORMALIZE_WHITESPACE
['Alice', 'Betty', 'Chuck', 'Diane', 'Edgar', 'Frank', 'Galyn', 'Howie',
'Zane']
If we ask for the supervisors of Frank, it will include Betty.
>>> list(catalog.findValueTokens(
... 'supervisor', {zc.relation.RELATION: 'Frank'}))
['Chuck', 'Alice', 'Zane', 'Betty']
Paths returned by ``findRelationChains`` are marked with special interfaces,
and special metadata, to show the chain.
>>> res = list(catalog.findRelationChains({zc.relation.RELATION: 'Frank'}))
>>> len(res)
5
>>> import zc.relation.interfaces
>>> [zc.relation.interfaces.ICircularRelationPath.providedBy(r)
... for r in res]
[False, False, False, False, True]
Here's the last chain:
>>> res[-1] # doctest: +NORMALIZE_WHITESPACE
cycle(<Employee instance "Frank">, <Employee instance "Chuck">,
<Employee instance "Alice">, <Employee instance "Zane">,
<Employee instance "Betty">)
The chain's 'cycled' attribute has a list of queries that create a cycle.
If you run the query, or queries, you see where the cycle would
restart--where the path would have started to overlap. Sometimes the query
results will include multiple cycles, and some paths that are not cycles.
In this case, there's only a single cycled query, which results in a single
cycled relation.
>>> len(res[4].cycled)
1
>>> list(catalog.findRelations(res[4].cycled[0], maxDepth=1))
[<Employee instance "Alice">]
To remove this craziness [#reverse_lookup]_, we can unindex Zane, and change
and reindex Alice.
.. [#reverse_lookup] If you want to, look what happens when you go the
other way:
>>> res = list(catalog.findRelationChains({'supervisor': 'Zane'}))
>>> def sortEqualLenByName(one):
... return len(one), one
...
>>> res.sort(key=sortEqualLenByName) # normalizes for test stability
>>> from __future__ import print_function
>>> print(res) # doctest: +NORMALIZE_WHITESPACE
[(<Employee instance "Alice">,),
(<Employee instance "Alice">, <Employee instance "Betty">),
(<Employee instance "Alice">, <Employee instance "Chuck">),
(<Employee instance "Alice">, <Employee instance "Betty">,
<Employee instance "Diane">),
(<Employee instance "Alice">, <Employee instance "Betty">,
<Employee instance "Edgar">),
cycle(<Employee instance "Alice">, <Employee instance "Betty">,
<Employee instance "Zane">),
(<Employee instance "Alice">, <Employee instance "Chuck">,
<Employee instance "Frank">),
(<Employee instance "Alice">, <Employee instance "Chuck">,
<Employee instance "Galyn">),
(<Employee instance "Alice">, <Employee instance "Betty">,
<Employee instance "Diane">, <Employee instance "Howie">)]
>>> [zc.relation.interfaces.ICircularRelationPath.providedBy(r)
... for r in res]
[False, False, False, False, False, True, False, False, False]
>>> len(res[5].cycled)
1
>>> list(catalog.findRelations(res[5].cycled[0], maxDepth=1))
[<Employee instance "Alice">]
>>> a.supervisor = None
>>> catalog.index(a)
>>> list(catalog.findValueTokens(
... 'supervisor', {zc.relation.RELATION: 'Frank'}))
['Chuck', 'Alice']
>>> catalog.unindex(z)
>>> sorted(catalog.findRelationTokens({'supervisor': 'Betty'}))
['Diane', 'Edgar', 'Howie']
``canFind``
-----------
We're to the last search method: ``canFind``. We've gotten values and
relations, but what if you simply want to know if there is any
connection at all? For instance, is Alice a supervisor of Howie? Is
Chuck? To answer these questions, you can use the ``canFind`` method
combined with the ``targetQuery`` search argument.
The ``canFind`` method takes the same arguments as findRelations. However,
it simply returns a boolean about whether the search has any results. This
is a convenience that also allows some extra optimizations.
Does Betty supervise anyone?
>>> catalog.canFind({'supervisor': 'Betty'})
True
What about Howie?
>>> catalog.canFind({'supervisor': 'Howie'})
False
What about...Zane (no longer an employee)?
>>> catalog.canFind({'supervisor': 'Zane'})
False
If we want to know if Alice or Chuck supervise Howie, then we want to specify
characteristics of two points on a path. To ask a question about the other
end of a path, use ``targetQuery``.
Is Alice a supervisor of Howie?
>>> catalog.canFind({'supervisor': 'Alice'},
... targetQuery={zc.relation.RELATION: 'Howie'})
True
Is Chuck a supervisor of Howie?
>>> catalog.canFind({'supervisor': 'Chuck'},
... targetQuery={zc.relation.RELATION: 'Howie'})
False
Is Howie Alice's employee?
>>> catalog.canFind({zc.relation.RELATION: 'Howie'},
... targetQuery={'supervisor': 'Alice'})
True
Is Howie Chuck's employee?
>>> catalog.canFind({zc.relation.RELATION: 'Howie'},
... targetQuery={'supervisor': 'Chuck'})
False
(Note that, if your relations describe a hierarchy, searching up a hierarchy is
usually more efficient than searching down, so the second pair of questions is
generally preferable to the first in that case.)
Working with More Complex Relations
===================================
So far, our examples have used a simple relation, in which the indexed object
is one end of the relation, and the indexed value on the object is the other.
This example has let us look at all of the basic zc.relation catalog
functionality.
As mentioned in the introduction, though, the catalog supports, and was
designed for, more complex relations. This section will quickly examine a
few examples of other uses.
In this section, we will see several examples of ideas mentioned above but not
yet demonstrated.
- We can use interface attributes (values or callables) to define value
indexes.
- Using interface attributes will cause an attempt to adapt the relation if it
does not already provide the interface.
- We can use the ``multiple`` argument when defining a value index to indicate
that the indexed value is a collection.
- We can use the ``name`` argument when defining a value index to specify the
name to be used in queries, rather than relying on the name of the interface
attribute or callable.
- The ``family`` argument in instantiating the catalog lets you change the
default btree family for relations and value indexes from
``BTrees.family32.IF`` to ``BTrees.family64.IF``.
Extrinsic Two-Way Relations
---------------------------
A simple variation of our current story is this: what if the indexed relation
were between two other objects--that is, what if the relation were extrinsic to
both participants?
Let's imagine we have relations that show biological parentage. We'll want a
"Person" and a "Parentage" relation. We'll define an interface for
``IParentage`` so we can see how using an interface to define a value index
works.
>>> class Person(object):
... def __init__(self, name):
... self.name = name
... def __repr__(self):
... return '<Person %r>' % (self.name,)
...
>>> import zope.interface
>>> class IParentage(zope.interface.Interface):
... child = zope.interface.Attribute('the child')
... parents = zope.interface.Attribute('the parents')
...
>>> @zope.interface.implementer(IParentage)
... class Parentage(object):
...
... def __init__(self, child, parent1, parent2):
... self.child = child
... self.parents = (parent1, parent2)
...
Now we'll define the dumpers and loaders and then the catalog. Notice that
we are relying on a pattern: the dump must be called before the load.
>>> _people = {}
>>> _relations = {}
>>> def dumpPeople(obj, catalog, cache):
... if _people.setdefault(obj.name, obj) is not obj:
... raise ValueError('we are assuming names are unique')
... return obj.name
...
>>> def loadPeople(token, catalog, cache):
... return _people[token]
...
>>> def dumpRelations(obj, catalog, cache):
... if _relations.setdefault(id(obj), obj) is not obj:
... raise ValueError('huh?')
... return id(obj)
...
>>> def loadRelations(token, catalog, cache):
... return _relations[token]
...
>>> catalog = zc.relation.catalog.Catalog(dumpRelations, loadRelations, family=BTrees.family64)
>>> catalog.addValueIndex(IParentage['child'], dumpPeople, loadPeople,
... btree=BTrees.family32.OO)
>>> catalog.addValueIndex(IParentage['parents'], dumpPeople, loadPeople,
... btree=BTrees.family32.OO, multiple=True,
... name='parent')
>>> catalog.addDefaultQueryFactory(
... zc.relation.queryfactory.TransposingTransitive(
... 'child', 'parent'))
Now we have a catalog fully set up. Let's add some relations.
>>> a = Person('Alice')
>>> b = Person('Betty')
>>> c = Person('Charles')
>>> d = Person('Donald')
>>> e = Person('Eugenia')
>>> f = Person('Fred')
>>> g = Person('Gertrude')
>>> h = Person('Harry')
>>> i = Person('Iphigenia')
>>> j = Person('Jacob')
>>> k = Person('Karyn')
>>> l = Person('Lee')
>>> r1 = Parentage(child=j, parent1=k, parent2=l)
>>> r2 = Parentage(child=g, parent1=i, parent2=j)
>>> r3 = Parentage(child=f, parent1=g, parent2=h)
>>> r4 = Parentage(child=e, parent1=g, parent2=h)
>>> r5 = Parentage(child=b, parent1=e, parent2=d)
>>> r6 = Parentage(child=a, parent1=e, parent2=c)
Here's that in one of our hierarchy diagrams.
::
Karyn Lee
\ /
Jacob Iphigenia
\ /
Gertrude Harry
\ /
/-------\
Fred Eugenia
Donald / \ Charles
\ / \ /
Betty Alice
Now we can index the relations, and ask some questions.
>>> for r in (r1, r2, r3, r4, r5, r6):
... catalog.index(r)
>>> query = catalog.tokenizeQuery
>>> sorted(catalog.findValueTokens(
... 'parent', query(child=a), maxDepth=1))
['Charles', 'Eugenia']
>>> sorted(catalog.findValueTokens('parent', query(child=g)))
['Iphigenia', 'Jacob', 'Karyn', 'Lee']
>>> sorted(catalog.findValueTokens(
... 'child', query(parent=h), maxDepth=1))
['Eugenia', 'Fred']
>>> sorted(catalog.findValueTokens('child', query(parent=h)))
['Alice', 'Betty', 'Eugenia', 'Fred']
>>> catalog.canFind(query(parent=h), targetQuery=query(child=d))
False
>>> catalog.canFind(query(parent=l), targetQuery=query(child=b))
True
Multi-Way Relations
-------------------
The previous example quickly showed how to set the catalog up for a completely
extrinsic two-way relation. The same pattern can be extended for N-way
relations. For example, consider a four way relation in the form of
SUBJECTS PREDICATE OBJECTS [in CONTEXT]. For instance, we might
want to say "(joe,) SELLS (doughnuts, coffee) in corner_store", where "(joe,)"
is the collection of subjects, "SELLS" is the predicate, "(doughnuts, coffee)"
is the collection of objects, and "corner_store" is the optional context.
For this last example, we'll integrate two components we haven't seen examples
of here before: the ZODB and adaptation.
Our example ZODB approach uses OIDs as the tokens. this might be OK in some
cases, if you will never support multiple databases and you don't need an
abstraction layer so that a different object can have the same identifier.
>>> import persistent
>>> import struct
>>> class Demo(persistent.Persistent):
... def __init__(self, name):
... self.name = name
... def __repr__(self):
... return '<Demo instance %r>' % (self.name,)
...
>>> class IRelation(zope.interface.Interface):
... subjects = zope.interface.Attribute('subjects')
... predicate = zope.interface.Attribute('predicate')
... objects = zope.interface.Attribute('objects')
...
>>> class IContextual(zope.interface.Interface):
... def getContext():
... 'return context'
... def setContext(value):
... 'set context'
...
>>> @zope.interface.implementer(IContextual)
... class Contextual(object):
...
... _context = None
... def getContext(self):
... return self._context
... def setContext(self, value):
... self._context = value
...
>>> @zope.interface.implementer(IRelation)
... class Relation(persistent.Persistent):
...
... def __init__(self, subjects, predicate, objects):
... self.subjects = subjects
... self.predicate = predicate
... self.objects = objects
... self._contextual = Contextual()
...
... def __conform__(self, iface):
... if iface is IContextual:
... return self._contextual
...
(When using zope.component, the ``__conform__`` would normally be unnecessary;
however, this package does not depend on zope.component.)
>>> def dumpPersistent(obj, catalog, cache):
... if obj._p_jar is None:
... catalog._p_jar.add(obj) # assumes something else places it
... return struct.unpack('<q', obj._p_oid)[0]
...
>>> def loadPersistent(token, catalog, cache):
... return catalog._p_jar.get(struct.pack('<q', token))
...
>>> from ZODB.tests.util import DB
>>> db = DB()
>>> conn = db.open()
>>> root = conn.root()
>>> catalog = root['catalog'] = zc.relation.catalog.Catalog(
... dumpPersistent, loadPersistent, family=BTrees.family64)
>>> catalog.addValueIndex(IRelation['subjects'],
... dumpPersistent, loadPersistent, multiple=True, name='subject')
>>> catalog.addValueIndex(IRelation['objects'],
... dumpPersistent, loadPersistent, multiple=True, name='object')
>>> catalog.addValueIndex(IRelation['predicate'], btree=BTrees.family32.OO)
>>> catalog.addValueIndex(IContextual['getContext'],
... dumpPersistent, loadPersistent, name='context')
>>> import transaction
>>> transaction.commit()
The ``dumpPersistent`` and ``loadPersistent`` is a bit of a toy, as warned
above. Also, while our predicate will be stored as a string, some programmers
may prefer to have a dump in such a case verify that the string has been
explicitly registered in some way, to prevent typos. Obviously, we are not
bothering with this for our example.
We make some objects, and then we make some relations with those objects and
index them.
>>> joe = root['joe'] = Demo('joe')
>>> sara = root['sara'] = Demo('sara')
>>> jack = root['jack'] = Demo('jack')
>>> ann = root['ann'] = Demo('ann')
>>> doughnuts = root['doughnuts'] = Demo('doughnuts')
>>> coffee = root['coffee'] = Demo('coffee')
>>> muffins = root['muffins'] = Demo('muffins')
>>> cookies = root['cookies'] = Demo('cookies')
>>> newspaper = root['newspaper'] = Demo('newspaper')
>>> corner_store = root['corner_store'] = Demo('corner_store')
>>> bistro = root['bistro'] = Demo('bistro')
>>> bakery = root['bakery'] = Demo('bakery')
>>> SELLS = 'SELLS'
>>> BUYS = 'BUYS'
>>> OBSERVES = 'OBSERVES'
>>> rel1 = root['rel1'] = Relation((joe,), SELLS, (doughnuts, coffee))
>>> IContextual(rel1).setContext(corner_store)
>>> rel2 = root['rel2'] = Relation((sara, jack), SELLS,
... (muffins, doughnuts, cookies))
>>> IContextual(rel2).setContext(bakery)
>>> rel3 = root['rel3'] = Relation((ann,), BUYS, (doughnuts,))
>>> rel4 = root['rel4'] = Relation((sara,), BUYS, (bistro,))
>>> for r in (rel1, rel2, rel3, rel4):
... catalog.index(r)
...
Now we can ask a simple question. Where do they sell doughnuts?
>>> query = catalog.tokenizeQuery
>>> sorted(catalog.findValues(
... 'context',
... (query(predicate=SELLS, object=doughnuts))),
... key=lambda ob: ob.name)
[<Demo instance 'bakery'>, <Demo instance 'corner_store'>]
Hopefully these examples give you further ideas on how you can use this tool.
Additional Functionality
========================
This section introduces peripheral functionality. We will learn the following.
- Listeners can be registered in the catalog. They are alerted when a relation
is added, modified, or removed; and when the catalog is cleared and copied
(see below).
- The ``clear`` method clears the relations in the catalog.
- The ``copy`` method makes a copy of the current catalog by copying internal
data structures, rather than reindexing the relations, which can be a
significant optimization opportunity. This copies value indexes and search
indexes; and gives listeners an opportunity to specify what, if anything,
should be included in the new copy.
- The ``ignoreSearchIndex`` argument to the five pertinent search methods
causes the search to ignore search indexes, even if there is an appropriate
one.
- ``findRelationTokens()`` (without arguments) returns the BTree set of all
relation tokens in the catalog.
- ``findValueTokens(INDEX_NAME)`` (where "INDEX_NAME" should be replaced with
an index name) returns the BTree set of all value tokens in the catalog for
the given index name.
Listeners
---------
A variety of potential clients may want to be alerted when the catalog changes.
zc.relation does not depend on zope.event, so listeners may be registered for
various changes. Let's make a quick demo listener. The ``additions`` and
``removals`` arguments are dictionaries of {value name: iterable of added or
removed value tokens}.
>>> def pchange(d):
... pprint.pprint(dict(
... (k, v is not None and sorted(set(v)) or v) for k, v in d.items()))
>>> @zope.interface.implementer(zc.relation.interfaces.IListener)
... class DemoListener(persistent.Persistent):
...
... def relationAdded(self, token, catalog, additions):
... print('a relation (token %r) was added to %r '
... 'with these values:' % (token, catalog))
... pchange(additions)
... def relationModified(self, token, catalog, additions, removals):
... print('a relation (token %r) in %r was modified '
... 'with these additions:' % (token, catalog))
... pchange(additions)
... print('and these removals:')
... pchange(removals)
... def relationRemoved(self, token, catalog, removals):
... print('a relation (token %r) was removed from %r '
... 'with these values:' % (token, catalog))
... pchange(removals)
... def sourceCleared(self, catalog):
... print('catalog %r had all relations unindexed' % (catalog,))
... def sourceAdded(self, catalog):
... print('now listening to catalog %r' % (catalog,))
... def sourceRemoved(self, catalog):
... print('no longer listening to catalog %r' % (catalog,))
... def sourceCopied(self, original, copy):
... print('catalog %r made a copy %r' % (catalog, copy))
... copy.addListener(self)
...
Listeners can be installed multiple times.
Listeners can be added as persistent weak references, so that, if they are
deleted elsewhere, a ZODB pack will not consider the reference in the catalog
to be something preventing garbage collection.
We'll install one of these demo listeners into our new catalog as a
normal reference, the default behavior. Then we'll show some example messages
sent to the demo listener.
>>> listener = DemoListener()
>>> catalog.addListener(listener) # doctest: +ELLIPSIS
now listening to catalog <zc.relation.catalog.Catalog object at ...>
>>> rel5 = root['rel5'] = Relation((ann,), OBSERVES, (newspaper,))
>>> catalog.index(rel5) # doctest: +ELLIPSIS
a relation (token ...) was added to <...Catalog...> with these values:
{'context': None,
'object': [...],
'predicate': ['OBSERVES'],
'subject': [...]}
>>> rel5.subjects = (jack,)
>>> IContextual(rel5).setContext(bistro)
>>> catalog.index(rel5) # doctest: +ELLIPSIS
a relation (token ...) in ...Catalog... was modified with these additions:
{'context': [...], 'subject': [...]}
and these removals:
{'subject': [...]}
>>> catalog.unindex(rel5) # doctest: +ELLIPSIS
a relation (token ...) was removed from <...Catalog...> with these values:
{'context': [...],
'object': [...],
'predicate': ['OBSERVES'],
'subject': [...]}
>>> catalog.removeListener(listener) # doctest: +ELLIPSIS
no longer listening to catalog <...Catalog...>
>>> catalog.index(rel5) # doctest: +ELLIPSIS
The only two methods not shown by those examples are ``sourceCleared`` and
``sourceCopied``. We'll get to those very soon below.
The ``clear`` Method
--------------------
The ``clear`` method simply indexes all relations from a catalog. Installed
listeners have ``sourceCleared`` called.
>>> len(catalog)
5
>>> catalog.addListener(listener) # doctest: +ELLIPSIS
now listening to catalog <zc.relation.catalog.Catalog object at ...>
>>> catalog.clear() # doctest: +ELLIPSIS
catalog <...Catalog...> had all relations unindexed
>>> len(catalog)
0
>>> sorted(catalog.findValues(
... 'context',
... (query(predicate=SELLS, object=doughnuts))),
... key=lambda ob: ob.name)
[]
The ``copy`` Method
-------------------
Sometimes you may want to copy a relation catalog. One way of doing this is
to create a new catalog, set it up like the current one, and then reindex
all the same relations. This is unnecessarily slow for programmer and
computer. The ``copy`` method makes a new catalog with the same corpus of
indexed relations by copying internal data structures.
Search indexes are requested to make new copies of themselves for the new
catalog; and listeners are given an opportunity to react as desired to the new
copy, including installing themselves, and/or another object of their choosing
as a listener.
Let's make a copy of a populated index with a search index and a listener.
Notice in our listener that ``sourceCopied`` adds itself as a listener to the
new copy. This is done at the very end of the ``copy`` process.
>>> for r in (rel1, rel2, rel3, rel4, rel5):
... catalog.index(r)
... # doctest: +ELLIPSIS
a relation ... was added...
a relation ... was added...
a relation ... was added...
a relation ... was added...
a relation ... was added...
>>> BEGAT = 'BEGAT'
>>> rel6 = root['rel6'] = Relation((jack, ann), BEGAT, (sara,))
>>> henry = root['henry'] = Demo('henry')
>>> rel7 = root['rel7'] = Relation((sara, joe), BEGAT, (henry,))
>>> catalog.index(rel6) # doctest: +ELLIPSIS
a relation (token ...) was added to <...Catalog...> with these values:
{'context': None,
'object': [...],
'predicate': ['BEGAT'],
'subject': [..., ...]}
>>> catalog.index(rel7) # doctest: +ELLIPSIS
a relation (token ...) was added to <...Catalog...> with these values:
{'context': None,
'object': [...],
'predicate': ['BEGAT'],
'subject': [..., ...]}
>>> catalog.addDefaultQueryFactory(
... zc.relation.queryfactory.TransposingTransitive(
... 'subject', 'object', {'predicate': BEGAT}))
...
>>> list(catalog.findValues(
... 'object', query(subject=jack, predicate=BEGAT)))
[<Demo instance 'sara'>, <Demo instance 'henry'>]
>>> catalog.addSearchIndex(
... zc.relation.searchindex.TransposingTransitiveMembership(
... 'subject', 'object', static={'predicate': BEGAT}))
>>> sorted(
... catalog.findValues(
... 'object', query(subject=jack, predicate=BEGAT)),
... key=lambda o: o.name)
[<Demo instance 'henry'>, <Demo instance 'sara'>]
>>> newcat = root['newcat'] = catalog.copy() # doctest: +ELLIPSIS
catalog <...Catalog...> made a copy <...Catalog...>
now listening to catalog <...Catalog...>
>>> transaction.commit()
Now the copy has its own copies of internal data structures and of the
searchindex. For example, let's modify the relations and add a new one to the
copy.
>>> mary = root['mary'] = Demo('mary')
>>> buffy = root['buffy'] = Demo('buffy')
>>> zack = root['zack'] = Demo('zack')
>>> rel7.objects += (mary,)
>>> rel8 = root['rel8'] = Relation((henry, buffy), BEGAT, (zack,))
>>> newcat.index(rel7) # doctest: +ELLIPSIS
a relation (token ...) in ...Catalog... was modified with these additions:
{'object': [...]}
and these removals:
{}
>>> newcat.index(rel8) # doctest: +ELLIPSIS
a relation (token ...) was added to ...Catalog... with these values:
{'context': None,
'object': [...],
'predicate': ['BEGAT'],
'subject': [..., ...]}
>>> len(newcat)
8
>>> sorted(
... newcat.findValues(
... 'object', query(subject=jack, predicate=BEGAT)),
... key=lambda o: o.name) # doctest: +NORMALIZE_WHITESPACE
[<Demo instance 'henry'>, <Demo instance 'mary'>, <Demo instance 'sara'>,
<Demo instance 'zack'>]
>>> sorted(
... newcat.findValues(
... 'object', query(subject=sara)),
... key=lambda o: o.name) # doctest: +NORMALIZE_WHITESPACE
[<Demo instance 'bistro'>, <Demo instance 'cookies'>,
<Demo instance 'doughnuts'>, <Demo instance 'henry'>,
<Demo instance 'mary'>, <Demo instance 'muffins'>]
The original catalog is not modified.
>>> len(catalog)
7
>>> sorted(
... catalog.findValues(
... 'object', query(subject=jack, predicate=BEGAT)),
... key=lambda o: o.name)
[<Demo instance 'henry'>, <Demo instance 'sara'>]
>>> sorted(
... catalog.findValues(
... 'object', query(subject=sara)),
... key=lambda o: o.name) # doctest: +NORMALIZE_WHITESPACE
[<Demo instance 'bistro'>, <Demo instance 'cookies'>,
<Demo instance 'doughnuts'>, <Demo instance 'henry'>,
<Demo instance 'muffins'>]
The ``ignoreSearchIndex`` argument
----------------------------------
The five methods that can use search indexes, ``findValues``,
``findValueTokens``, ``findRelations``, ``findRelationTokens``, and
``canFind``, can be explicitly requested to ignore any pertinent search index
using the ``ignoreSearchIndex`` argument.
We can see this easily with the token-related methods: the search index result
will be a BTree set, while without the search index the result will be a
generator.
>>> res1 = newcat.findValueTokens(
... 'object', query(subject=jack, predicate=BEGAT))
>>> res1 # doctest: +ELLIPSIS
LFSet([..., ..., ..., ...])
>>> res2 = newcat.findValueTokens(
... 'object', query(subject=jack, predicate=BEGAT),
... ignoreSearchIndex=True)
>>> res2 # doctest: +ELLIPSIS
<generator object ... at 0x...>
>>> sorted(res2) == list(res1)
True
>>> res1 = newcat.findRelationTokens(
... query(subject=jack, predicate=BEGAT))
>>> res1 # doctest: +ELLIPSIS
LFSet([..., ..., ...])
>>> res2 = newcat.findRelationTokens(
... query(subject=jack, predicate=BEGAT), ignoreSearchIndex=True)
>>> res2 # doctest: +ELLIPSIS
<generator object ... at 0x...>
>>> sorted(res2) == list(res1)
True
We can see that the other methods take the argument, but the results look the
same as usual.
>>> res = newcat.findValues(
... 'object', query(subject=jack, predicate=BEGAT),
... ignoreSearchIndex=True)
>>> res # doctest: +ELLIPSIS
<generator object ... at 0x...>
>>> list(res) == list(newcat.resolveValueTokens(newcat.findValueTokens(
... 'object', query(subject=jack, predicate=BEGAT),
... ignoreSearchIndex=True), 'object'))
True
>>> res = newcat.findRelations(
... query(subject=jack, predicate=BEGAT),
... ignoreSearchIndex=True)
>>> res # doctest: +ELLIPSIS
<generator object ... at 0x...>
>>> list(res) == list(newcat.resolveRelationTokens(
... newcat.findRelationTokens(
... query(subject=jack, predicate=BEGAT),
... ignoreSearchIndex=True)))
True
>>> newcat.canFind(
... query(subject=jack, predicate=BEGAT), ignoreSearchIndex=True)
True
``findRelationTokens()``
------------------------
If you call ``findRelationTokens`` without any arguments, you will get the
BTree set of all relation tokens in the catalog. This can be handy for tests
and for advanced uses of the catalog.
>>> newcat.findRelationTokens() # doctest: +ELLIPSIS
<BTrees.LFBTree.LFTreeSet object at ...>
>>> len(newcat.findRelationTokens())
8
>>> set(newcat.resolveRelationTokens(newcat.findRelationTokens())) == set(
... (rel1, rel2, rel3, rel4, rel5, rel6, rel7, rel8))
True
``findValueTokens(INDEX_NAME)``
-------------------------------
If you call ``findValueTokens`` with only an index name, you will get the BTree
structure of all tokens for that value in the index. This can be handy for
tests and for advanced uses of the catalog.
>>> newcat.findValueTokens('predicate') # doctest: +ELLIPSIS
<BTrees.OOBTree.OOBTree object at ...>
>>> list(newcat.findValueTokens('predicate'))
['BEGAT', 'BUYS', 'OBSERVES', 'SELLS']
Conclusion
==========
Review
------
That brings us to the end of our introductory examples. Let's review, and
then look at where you can go from here.
* Relations are objects with indexed values.
* The relation catalog indexes relations. The relations can be one-way,
two-way, three-way, or N-way, as long as you tell the catalog to index the
different values.
* Creating a catalog:
- Relations and their values are stored in the catalog as tokens: unique
identifiers that you can resolve back to the original value. Integers are
the most efficient tokens, but others can work fine too.
- Token type determines the BTree module needed.
- If the tokens are 32-bit ints, choose ``BTrees.family32.II``,
``BTrees.family32.IF`` or ``BTrees.family32.IO``.
- If the tokens are 64 bit ints, choose ``BTrees.family64.II``,
``BTrees.family64.IF`` or ``BTrees.family64.IO``.
- If they are anything else, choose ``BTrees.family32.OI``,
``BTrees.family64.OI``, or ``BTrees.family32.OO`` (or
BTrees.family64.OO--they are the same).
Within these rules, the choice is somewhat arbitrary unless you plan to
merge these results with that of another source that is using a
particular BTree module. BTree set operations only work within the same
module, so you must match module to module.
- The ``family`` argument in instantiating the catalog lets you change the
default btree family for relations and value indexes from
``BTrees.family32.IF`` to ``BTrees.family64.IF``.
- You must define your own functions for tokenizing and resolving tokens.
These functions are registered with the catalog for the relations and for
each of their value indexes.
- You add value indexes to relation catalogs to be able to search. Values
can be identified to the catalog with callables or interface elements.
- Using interface attributes will cause an attempt to adapt the
relation if it does not already provide the interface.
- We can use the ``multiple`` argument when defining a value index to
indicate that the indexed value is a collection. This defaults to
False.
- We can use the ``name`` argument when defining a value index to
specify the name to be used in queries, rather than relying on the
name of the interface attribute or callable.
- You can set up search indexes to speed up specific searches, usually
transitive.
- Listeners can be registered in the catalog. They are alerted when a
relation is added, modified, or removed; and when the catalog is cleared
and copied.
* Catalog Management:
- Relations are indexed with ``index(relation)``, and removed from the
catalog with ``unindex(relation)``. ``index_doc(relation_token,
relation)`` and ``unindex_doc(relation_token)`` also work.
- The ``clear`` method clears the relations in the catalog.
- The ``copy`` method makes a copy of the current catalog by copying
internal data structures, rather than reindexing the relations, which can
be a significant optimization opportunity. This copies value indexes and
search indexes; and gives listeners an opportunity to specify what, if
anything, should be included in the new copy.
* Searching a catalog:
- Queries to the relation catalog are formed with dicts.
- Query keys are the names of the indexes you want to search, or, for the
special case of precise relations, the ``zc.relation.RELATION`` constant.
- Query values are the tokens of the results you want to match; or
``None``, indicating relations that have ``None`` as a value (or an empty
collection, if it is a multiple). Search values can use
``zc.relation.catalog.any(args)`` or ``zc.relation.catalog.Any(args)`` to
specify multiple (non-``None``) results to match for a given key.
- The index has a variety of methods to help you work with tokens.
``tokenizeQuery`` is typically the most used, though others are
available.
- To find relations that match a query, use ``findRelations`` or
``findRelationTokens``. Calling ``findRelationTokens`` without any
arguments returns the BTree set of all relation tokens in the catalog.
- To find values that match a query, use ``findValues`` or
``findValueTokens``. Calling ``findValueTokens`` with only the name
of a value index returns the BTree set of all tokens in the catalog for
that value index.
- You search transitively by using a query factory. The
``zc.relation.queryfactory.TransposingTransitive`` is a good common case
factory that lets you walk up and down a hierarchy. A query factory can
be passed in as an argument to search methods as a ``queryFactory``, or
installed as a default behavior using ``addDefaultQueryFactory``.
- To find how a query is related, use ``findRelationChains`` or
``findRelationTokenChains``.
- To find out if a query is related, use ``canFind``.
- Circular transitive relations are handled to prevent infinite loops. They
are identified in ``findRelationChains`` and ``findRelationTokenChains``
with a ``zc.relation.interfaces.ICircularRelationPath`` marker interface.
- search methods share the following arguments:
* ``maxDepth``, limiting the transitive depth for searches;
* ``filter``, allowing code to filter transitive paths;
* ``targetQuery``, allowing a query to filter transitive paths on the
basis of the endpoint;
* ``targetFilter``, allowing code to filter transitive paths on the basis
of the endpoint; and
* ``queryFactory``, mentioned above.
In addition, the ``ignoreSearchIndex`` argument to ``findRelations``,
``findRelationTokens``, ``findValues``, ``findValueTokens``, and
``canFind`` causes the search to ignore search indexes, even if there is
an appropriate one.
Next Steps
----------
If you want to read more, next steps depend on how you like to learn. Here
are some of the other documents in the zc.relation package.
:optimization.rst:
Best practices for optimizing your use of the relation catalog.
:searchindex.rst:
Queries factories and search indexes: from basics to nitty gritty details.
:tokens.rst:
This document explores the details of tokens. All God's chillun
love tokens, at least if God's chillun are writing non-toy apps
using zc.relation. It includes discussion of the token helpers that
the catalog provides, how to use zope.app.intid-like registries with
zc.relation, how to use tokens to "join" query results reasonably
efficiently, and how to index joins. It also is unnecessarily
mind-blowing because of the examples used.
:interfaces.py:
The contract, for nuts and bolts.
Finally, the truly die-hard might also be interested in the timeit
directory, which holds scripts used to test assumptions and learn.
.. ......... ..
.. FOOTNOTES ..
.. ......... ..
.. [#I_care] OK, you care about how that query factory worked, so
we will look into it a bit. Let's talk through two steps of the
transitive search in the second question. The catalog initially
performs the initial intransitive search requested: find relations
for which Betty is the supervisor. That's Diane and Edgar.
Now, for each of the results, the catalog asks the query factory for
next steps. Let's take Diane. The catalog says to the factory,
"Given this query for relations where Betty is supervisor, I got
this result of Diane. Do you have any other queries I should try to
look further?". The factory also gets the catalog instance so it
can use it to answer the question if it needs to.
OK, the next part is where your brain hurts. Hang on.
In our case, the factory sees that the query was for supervisor. Its
other key, the one it transposes with, is ``zc.relation.RELATION``. *The
factory gets the transposing key's result for the current token.* So, for
us, a key of ``zc.relation.RELATION`` is actually a no-op: the result *is*
the current token, Diane. Then, the factory has its answer: replace the old
value of supervisor in the query, Betty, with the result, Diane. The next
transitive query should be {'supervisor', 'Diane'}. Ta-da.
======================================================
Tokens and Joins: zc.relation Catalog Extended Example
======================================================
Introduction and Set Up
=======================
This document assumes you have read the introductory README.rst and want
to learn a bit more by example. In it, we will explore a more
complicated set of relations that demonstrates most of the aspects of
working with tokens. In particular, we will look at joins, which will
also give us a chance to look more in depth at query factories and
search indexes, and introduce the idea of listeners. It will not explain
the basics that the README already addressed.
Imagine we are indexing security assertions in a system. In this
system, users may have roles within an organization. Each organization
may have multiple child organizations and may have a single parent
organization. A user with a role in a parent organization will have the
same role in all transitively connected child relations.
We have two kinds of relations, then. One kind of relation will model
the hierarchy of organizations. We'll do it with an intrinsic relation
of organizations to their children: that reflects the fact that parent
organizations choose and are comprised of their children; children do
not choose their parents.
The other relation will model the (multiple) roles a (single) user has
in a (single) organization. This relation will be entirely extrinsic.
We could create two catalogs, one for each type. Or we could put them
both in the same catalog. Initially, we'll go with the single-catalog
approach for our examples. This single catalog, then, will be indexing
a heterogeneous collection of relations.
Let's define the two relations with interfaces. We'll include one
accessor, getOrganization, largely to show how to handle methods.
>>> import zope.interface
>>> class IOrganization(zope.interface.Interface):
... title = zope.interface.Attribute('the title')
... parts = zope.interface.Attribute(
... 'the organizations that make up this one')
...
>>> class IRoles(zope.interface.Interface):
... def getOrganization():
... 'return the organization in which this relation operates'
... principal_id = zope.interface.Attribute(
... 'the pricipal id whose roles this relation lists')
... role_ids = zope.interface.Attribute(
... 'the role ids that the principal explicitly has in the '
... 'organization. The principal may have other roles via '
... 'roles in parent organizations.')
...
Now we can create some classes. In the README example, the setup was a bit
of a toy. This time we will be just a bit more practical. We'll also expect
to be operating within the ZODB, with a root and transactions. [#ZODB]_
.. [#ZODB] Here we will set up a ZODB instance for us to use.
>>> from ZODB.tests.util import DB
>>> db = DB()
>>> conn = db.open()
>>> root = conn.root()
Here's how we will dump and load our relations: use a "registry"
object, similar to an intid utility. [#faux_intid]_
.. [#faux_intid] Here's a simple persistent keyreference. Notice that it is
not persistent itself: this is important for conflict resolution to be
able to work (which we don't show here, but we're trying to lean more
towards real usage for this example).
>>> from functools import total_ordering
>>> @total_ordering
... class Reference(object): # see zope.app.keyreference
... def __init__(self, obj):
... self.object = obj
... def _get_sorting_key(self):
... # this doesn't work during conflict resolution. See
... # zope.app.keyreference.persistent, 3.5 release, for current
... # best practice.
... if self.object._p_jar is None:
... raise ValueError(
... 'can only compare when both objects have connections')
... return self.object._p_oid or ''
... def __lt__(self, other):
... # this doesn't work during conflict resolution. See
... # zope.app.keyreference.persistent, 3.5 release, for current
... # best practice.
... if not isinstance(other, Reference):
... raise ValueError('can only compare with Reference objects')
... return self._get_sorting_key() < other._get_sorting_key()
... def __eq__(self, other):
... # this doesn't work during conflict resolution. See
... # zope.app.keyreference.persistent, 3.5 release, for current
... # best practice.
... if not isinstance(other, Reference):
... raise ValueError('can only compare with Reference objects')
... return self._get_sorting_key() == other._get_sorting_key()
Here's a simple integer identifier tool.
>>> import persistent
>>> import BTrees
>>> class Registry(persistent.Persistent): # see zope.app.intid
... def __init__(self, family=BTrees.family32):
... self.family = family
... self.ids = self.family.IO.BTree()
... self.refs = self.family.OI.BTree()
... def getId(self, obj):
... if not isinstance(obj, persistent.Persistent):
... raise ValueError('not a persistent object', obj)
... if obj._p_jar is None:
... self._p_jar.add(obj)
... ref = Reference(obj)
... id = self.refs.get(ref)
... if id is None:
... # naive for conflict resolution; see zope.app.intid
... if self.ids:
... id = self.ids.maxKey() + 1
... else:
... id = self.family.minint
... self.ids[id] = ref
... self.refs[ref] = id
... return id
... def __contains__(self, obj):
... if (not isinstance(obj, persistent.Persistent) or
... obj._p_oid is None):
... return False
... return Reference(obj) in self.refs
... def getObject(self, id, default=None):
... res = self.ids.get(id, None)
... if res is None:
... return default
... else:
... return res.object
... def remove(self, r):
... if isinstance(r, int):
... self.refs.pop(self.ids.pop(r))
... elif (not isinstance(r, persistent.Persistent) or
... r._p_oid is None):
... raise LookupError(r)
... else:
... self.ids.pop(self.refs.pop(Reference(r)))
...
>>> registry = root['registry'] = Registry()
>>> import transaction
>>> transaction.commit()
In this implementation of the "dump" method, we use the cache just to
show you how you might use it. It probably is overkill for this job,
and maybe even a speed loss, but you can see the idea.
>>> def dump(obj, catalog, cache):
... reg = cache.get('registry')
... if reg is None:
... reg = cache['registry'] = catalog._p_jar.root()['registry']
... return reg.getId(obj)
...
>>> def load(token, catalog, cache):
... reg = cache.get('registry')
... if reg is None:
... reg = cache['registry'] = catalog._p_jar.root()['registry']
... return reg.getObject(token)
...
Now we can create a relation catalog to hold these items.
>>> import zc.relation.catalog
>>> catalog = root['catalog'] = zc.relation.catalog.Catalog(dump, load)
>>> transaction.commit()
Now we set up our indexes. We'll start with just the organizations, and
set up the catalog with them. This part will be similar to the example
in README.rst, but will introduce more discussions of optimizations and
tokens. Then we'll add in the part about roles, and explore queries and
token-based "joins".
Organizations
=============
The organization will hold a set of organizations. This is actually not
inherently easy in the ZODB because this means that we need to compare
or hash persistent objects, which does not work reliably over time and
across machines out-of-the-box. To side-step the issue for this example,
and still do something a bit interesting and real-world, we'll use the
registry tokens introduced above. This will also give us a chance to
talk a bit more about optimizations and tokens. (If you would like
to sanely and transparently hold a set of persistent objects, try the
zc.set package XXX not yet.)
>>> import BTrees
>>> import persistent
>>> @zope.interface.implementer(IOrganization)
... @total_ordering
... class Organization(persistent.Persistent):
...
... def __init__(self, title):
... self.title = title
... self.parts = BTrees.family32.IF.TreeSet()
... # the next parts just make the tests prettier
... def __repr__(self):
... return '<Organization instance "' + self.title + '">'
... def __lt__(self, other):
... # pukes if other doesn't have name
... return self.title < other.title
... def __eq__(self, other):
... return self is other
... def __hash__(self):
... return 1 # dummy
...
OK, now we know how organizations will work. Now we can add the `parts`
index to the catalog. This will do a few new things from how we added
indexes in the README.
>>> catalog.addValueIndex(IOrganization['parts'], multiple=True,
... name="part")
So, what's different from the README examples?
First, we are using an interface element to define the value to be indexed.
It provides an interface to which objects will be adapted, a default name
for the index, and information as to whether the attribute should be used
directly or called.
Second, we are not specifying a dump or load. They are None. This
means that the indexed value can already be treated as a token. This
can allow a very significant optimization for reindexing if the indexed
value is a large collection using the same BTree family as the
index--which leads us to the next difference.
Third, we are specifying that `multiple=True`. This means that the value
on a given relation that provides or can be adapted to IOrganization will
have a collection of `parts`. These will always be regarded as a set,
whether the actual colection is a BTrees set or the keys of a BTree.
Last, we are specifying a name to be used for queries. I find that queries
read more easily when the query keys are singular, so I often rename plurals.
As in the README, We can add another simple transposing transitive query
factory, switching between 'part' and `None`.
>>> import zc.relation.queryfactory
>>> factory1 = zc.relation.queryfactory.TransposingTransitive(
... 'part', None)
>>> catalog.addDefaultQueryFactory(factory1)
Let's add a couple of search indexes in too, of the hierarchy looking up...
>>> import zc.relation.searchindex
>>> catalog.addSearchIndex(
... zc.relation.searchindex.TransposingTransitiveMembership(
... 'part', None))
...and down.
>>> catalog.addSearchIndex(
... zc.relation.searchindex.TransposingTransitiveMembership(
... None, 'part'))
PLEASE NOTE: the search index looking up is not a good idea practically. The
index is designed for looking down [#verifyObjectTransitive]_.
.. [#verifyObjectTransitive] The TransposingTransitiveMembership indexes
provide ISearchIndex.
>>> from zope.interface.verify import verifyObject
>>> import zc.relation.interfaces
>>> index = list(catalog.iterSearchIndexes())[0]
>>> verifyObject(zc.relation.interfaces.ISearchIndex, index)
True
Let's create and add a few organizations.
We'll make a structure like this [#silliness]_::
Ynod Corp Mangement Zookd Corp Management
/ | \ / | \
Ynod Devs Ynod SAs Ynod Admins Zookd Admins Zookd SAs Zookd Devs
/ \ \ / / \
Y3L4 Proj Bet Proj Ynod Zookd Task Force Zookd hOgnmd Zookd Nbd
Here's the Python.
>>> orgs = root['organizations'] = BTrees.family32.OO.BTree()
>>> for nm, parts in (
... ('Y3L4 Proj', ()),
... ('Bet Proj', ()),
... ('Ynod Zookd Task Force', ()),
... ('Zookd hOgnmd', ()),
... ('Zookd Nbd', ()),
... ('Ynod Devs', ('Y3L4 Proj', 'Bet Proj')),
... ('Ynod SAs', ()),
... ('Ynod Admins', ('Ynod Zookd Task Force',)),
... ('Zookd Admins', ('Ynod Zookd Task Force',)),
... ('Zookd SAs', ()),
... ('Zookd Devs', ('Zookd hOgnmd', 'Zookd Nbd')),
... ('Ynod Corp Management', ('Ynod Devs', 'Ynod SAs', 'Ynod Admins')),
... ('Zookd Corp Management', ('Zookd Devs', 'Zookd SAs',
... 'Zookd Admins'))):
... org = Organization(nm)
... for part in parts:
... ignore = org.parts.insert(registry.getId(orgs[part]))
... orgs[nm] = org
... catalog.index(org)
...
Now the catalog knows about the relations.
>>> len(catalog)
13
>>> root['dummy'] = Organization('Foo')
>>> root['dummy'] in catalog
False
>>> orgs['Y3L4 Proj'] in catalog
True
Also, now we can search. To do this, we can use some of the token methods that
the catalog provides. The most commonly used is `tokenizeQuery`. It takes a
query with values that are not tokenized and converts them to values that are
tokenized.
>>> Ynod_SAs_id = registry.getId(orgs['Ynod SAs'])
>>> catalog.tokenizeQuery({None: orgs['Ynod SAs']}) == {
... None: Ynod_SAs_id}
True
>>> Zookd_SAs_id = registry.getId(orgs['Zookd SAs'])
>>> Zookd_Devs_id = registry.getId(orgs['Zookd Devs'])
>>> catalog.tokenizeQuery(
... {None: zc.relation.catalog.any(
... orgs['Zookd SAs'], orgs['Zookd Devs'])}) == {
... None: zc.relation.catalog.any(Zookd_SAs_id, Zookd_Devs_id)}
True
Of course, right now doing this with 'part' alone is kind of silly, since it
does not change within the relation catalog (because we said that dump and
load were `None`, as discussed above).
>>> catalog.tokenizeQuery({'part': Ynod_SAs_id}) == {
... 'part': Ynod_SAs_id}
True
>>> catalog.tokenizeQuery(
... {'part': zc.relation.catalog.any(Zookd_SAs_id, Zookd_Devs_id)}
... ) == {'part': zc.relation.catalog.any(Zookd_SAs_id, Zookd_Devs_id)}
True
The `tokenizeQuery` method is so common that we're going to assign it to
a variable in our example. Then we'll do a search or two.
So...find the relations that Ynod Devs supervise.
>>> t = catalog.tokenizeQuery
>>> res = list(catalog.findRelationTokens(t({None: orgs['Ynod Devs']})))
OK...we used `findRelationTokens`, as opposed to `findRelations`, so res
is a couple of numbers now. How do we convert them back?
`resolveRelationTokens` will do the trick.
>>> len(res)
3
>>> sorted(catalog.resolveRelationTokens(res))
... # doctest: +NORMALIZE_WHITESPACE
[<Organization instance "Bet Proj">, <Organization instance "Y3L4 Proj">,
<Organization instance "Ynod Devs">]
`resolveQuery` is the mirror image of `tokenizeQuery`: it converts
tokenized queries to queries with "loaded" values.
>>> original = {'part': zc.relation.catalog.any(
... Zookd_SAs_id, Zookd_Devs_id),
... None: orgs['Zookd Devs']}
>>> tokenized = catalog.tokenizeQuery(original)
>>> original == catalog.resolveQuery(tokenized)
True
>>> original = {None: zc.relation.catalog.any(
... orgs['Zookd SAs'], orgs['Zookd Devs']),
... 'part': Zookd_Devs_id}
>>> tokenized = catalog.tokenizeQuery(original)
>>> original == catalog.resolveQuery(tokenized)
True
Likewise, `tokenizeRelations` is the mirror image of `resolveRelationTokens`.
>>> sorted(catalog.tokenizeRelations(
... [orgs["Bet Proj"], orgs["Y3L4 Proj"]])) == sorted(
... registry.getId(o) for o in
... [orgs["Bet Proj"], orgs["Y3L4 Proj"]])
True
The other token-related methods are as follows
[#show_remaining_token_methods]_:
.. [#show_remaining_token_methods] For what it's worth, here are some small
examples of the remaining token-related methods.
These two are the singular versions of `tokenizeRelations` and
`resolveRelationTokens`.
`tokenizeRelation` returns a token for the given relation.
>>> catalog.tokenizeRelation(orgs['Zookd Corp Management']) == (
... registry.getId(orgs['Zookd Corp Management']))
True
`resolveRelationToken` returns a relation for the given token.
>>> catalog.resolveRelationToken(registry.getId(
... orgs['Zookd Corp Management'])) is orgs['Zookd Corp Management']
True
The "values" ones are a bit lame to show now, since the only value
we have right now is not tokenized but used straight up. But here
goes, showing some fascinating no-ops.
`tokenizeValues`, returns an iterable of tokens for the values of
the given index name.
>>> list(catalog.tokenizeValues((1,2,3), 'part'))
[1, 2, 3]
`resolveValueTokens` returns an iterable of values for the tokens of
the given index name.
>>> list(catalog.resolveValueTokens((1,2,3), 'part'))
[1, 2, 3]
- `tokenizeValues`, which returns an iterable of tokens for the values
of the given index name;
- `resolveValueTokens`, which returns an iterable of values for the tokens of
the given index name;
- `tokenizeRelation`, which returns a token for the given relation; and
- `resolveRelationToken`, which returns a relation for the given token.
Why do we bother with these tokens, instead of hiding them away and
making the API prettier? By exposing them, we enable efficient joining,
and efficient use in other contexts. For instance, if you use the same
intid utility to tokenize in other catalogs, our results can be merged
with the results of other catalogs. Similarly, you can use the results
of queries to other catalogs--or even "joins" from earlier results of
querying this catalog--as query values here. We'll explore this in the
next section.
Roles
=====
We have set up the Organization relations. Now let's set up the roles, and
actually be able to answer the questions that we described at the beginning
of the document.
In our Roles object, roles and principals will simply be strings--ids, if
this were a real system. The organization will be a direct object reference.
>>> @zope.interface.implementer(IRoles)
... @total_ordering
... class Roles(persistent.Persistent):
...
... def __init__(self, principal_id, role_ids, organization):
... self.principal_id = principal_id
... self.role_ids = BTrees.family32.OI.TreeSet(role_ids)
... self._organization = organization
... def getOrganization(self):
... return self._organization
... # the rest is for prettier/easier tests
... def __repr__(self):
... return "<Roles instance (%s has %s in %s)>" % (
... self.principal_id, ', '.join(self.role_ids),
... self._organization.title)
... def __lt__(self, other):
... _self = (
... self.principal_id,
... tuple(self.role_ids),
... self._organization.title,
... )
... _other = (
... other.principal_id,
... tuple(other.role_ids),
... other._organization.title,
... )
... return _self <_other
... def __eq__(self, other):
... return self is other
... def __hash__(self):
... return 1 # dummy
...
Now let's add add the value indexes to the relation catalog.
>>> catalog.addValueIndex(IRoles['principal_id'], btree=BTrees.family32.OI)
>>> catalog.addValueIndex(IRoles['role_ids'], btree=BTrees.family32.OI,
... multiple=True, name='role_id')
>>> catalog.addValueIndex(IRoles['getOrganization'], dump, load,
... name='organization')
Those are some slightly new variations of what we've seen in `addValueIndex`
before, but all mixing and matching on the same ingredients.
As a reminder, here is our organization structure::
Ynod Corp Mangement Zookd Corp Management
/ | \ / | \
Ynod Devs Ynod SAs Ynod Admins Zookd Admins Zookd SAs Zookd Devs
/ \ \ / / \
Y3L4 Proj Bet Proj Ynod Zookd Task Force Zookd hOgnmd Zookd Nbd
Now let's create and add some roles.
>>> principal_ids = [
... 'abe', 'bran', 'cathy', 'david', 'edgar', 'frank', 'gertrude',
... 'harriet', 'ignas', 'jacob', 'karyn', 'lettie', 'molly', 'nancy',
... 'ophelia', 'pat']
>>> role_ids = ['user manager', 'writer', 'reviewer', 'publisher']
>>> get_role = dict((v[0], v) for v in role_ids).__getitem__
>>> roles = root['roles'] = BTrees.family32.IO.BTree()
>>> next = 0
>>> for prin, org, role_ids in (
... ('abe', orgs['Zookd Corp Management'], 'uwrp'),
... ('bran', orgs['Ynod Corp Management'], 'uwrp'),
... ('cathy', orgs['Ynod Devs'], 'w'),
... ('cathy', orgs['Y3L4 Proj'], 'r'),
... ('david', orgs['Bet Proj'], 'wrp'),
... ('edgar', orgs['Ynod Devs'], 'up'),
... ('frank', orgs['Ynod SAs'], 'uwrp'),
... ('frank', orgs['Ynod Admins'], 'w'),
... ('gertrude', orgs['Ynod Zookd Task Force'], 'uwrp'),
... ('harriet', orgs['Ynod Zookd Task Force'], 'w'),
... ('harriet', orgs['Ynod Admins'], 'r'),
... ('ignas', orgs['Zookd Admins'], 'r'),
... ('ignas', orgs['Zookd Corp Management'], 'w'),
... ('karyn', orgs['Zookd Corp Management'], 'uwrp'),
... ('karyn', orgs['Ynod Corp Management'], 'uwrp'),
... ('lettie', orgs['Zookd Corp Management'], 'u'),
... ('lettie', orgs['Ynod Zookd Task Force'], 'w'),
... ('lettie', orgs['Zookd SAs'], 'w'),
... ('molly', orgs['Zookd SAs'], 'uwrp'),
... ('nancy', orgs['Zookd Devs'], 'wrp'),
... ('nancy', orgs['Zookd hOgnmd'], 'u'),
... ('ophelia', orgs['Zookd Corp Management'], 'w'),
... ('ophelia', orgs['Zookd Devs'], 'r'),
... ('ophelia', orgs['Zookd Nbd'], 'p'),
... ('pat', orgs['Zookd Nbd'], 'wrp')):
... assert prin in principal_ids
... role_ids = [get_role(l) for l in role_ids]
... role = roles[next] = Roles(prin, role_ids, org)
... role.key = next
... next += 1
... catalog.index(role)
...
Now we can begin to do searches [#real_value_tokens]_.
.. [#real_value_tokens] We can also show the values token methods more
sanely now.
>>> original = sorted((orgs['Zookd Devs'], orgs['Ynod SAs']))
>>> tokens = list(catalog.tokenizeValues(original, 'organization'))
>>> original == sorted(catalog.resolveValueTokens(tokens, 'organization'))
True
What are all the role settings for ophelia?
>>> sorted(catalog.findRelations({'principal_id': 'ophelia'}))
... # doctest: +NORMALIZE_WHITESPACE
[<Roles instance (ophelia has publisher in Zookd Nbd)>,
<Roles instance (ophelia has reviewer in Zookd Devs)>,
<Roles instance (ophelia has writer in Zookd Corp Management)>]
That answer does not need to be transitive: we're done.
Next question. Where does ophelia have the 'writer' role?
>>> list(catalog.findValues(
... 'organization', {'principal_id': 'ophelia',
... 'role_id': 'writer'}))
[<Organization instance "Zookd Corp Management">]
Well, that's correct intransitively. Do we need a transitive queries
factory? No! This is a great chance to look at the token join we talked
about in the previous section. This should actually be a two-step
operation: find all of the organizations in which ophelia has writer,
and then find all of the transitive parts to that organization.
>>> sorted(catalog.findRelations({None: zc.relation.catalog.Any(
... catalog.findValueTokens('organization',
... {'principal_id': 'ophelia',
... 'role_id': 'writer'}))}))
... # doctest: +NORMALIZE_WHITESPACE
[<Organization instance "Ynod Zookd Task Force">,
<Organization instance "Zookd Admins">,
<Organization instance "Zookd Corp Management">,
<Organization instance "Zookd Devs">,
<Organization instance "Zookd Nbd">,
<Organization instance "Zookd SAs">,
<Organization instance "Zookd hOgnmd">]
That's more like it.
Next question. What users have roles in the 'Zookd Devs' organization?
Intransitively, that's pretty easy.
>>> sorted(catalog.findValueTokens(
... 'principal_id', t({'organization': orgs['Zookd Devs']})))
['nancy', 'ophelia']
Transitively, we should do another join.
>>> org_id = registry.getId(orgs['Zookd Devs'])
>>> sorted(catalog.findValueTokens(
... 'principal_id', {
... 'organization': zc.relation.catalog.any(
... org_id, *catalog.findRelationTokens({'part': org_id}))}))
['abe', 'ignas', 'karyn', 'lettie', 'nancy', 'ophelia']
That's a little awkward, but it does the trick.
Last question, and the kind of question that started the entire example.
What roles does ophelia have in the "Zookd Nbd" organization?
>>> list(catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'ophelia'})))
['publisher']
Intransitively, that's correct. But, transitively, ophelia also has
reviewer and writer, and that's the answer we want to be able to get quickly.
We could ask the question a different way, then, again leveraging a join.
We'll set it up as a function, because we will want to use it a little later
without repeating the code.
>>> def getRolesInOrganization(principal_id, org):
... org_id = registry.getId(org)
... return sorted(catalog.findValueTokens(
... 'role_id', {
... 'organization': zc.relation.catalog.any(
... org_id,
... *catalog.findRelationTokens({'part': org_id})),
... 'principal_id': principal_id}))
...
>>> getRolesInOrganization('ophelia', orgs['Zookd Nbd'])
['publisher', 'reviewer', 'writer']
As you can see, then, working with tokens makes interesting joins possible,
as long as the tokens are the same across the two queries.
We have examined tokens methods and token techniques like joins. The example
story we have told can let us get into a few more advanced topics, such as
query factory joins and search indexes that can increase their read speed.
Query Factory Joins
===================
We can build a query factory that makes the join automatic. A query
factory is a callable that takes two arguments: a query (the one that
starts the search) and the catalog. The factory either returns None,
indicating that the query factory cannot be used for this query, or it
returns another callable that takes a chain of relations. The last
token in the relation chain is the most recent. The output of this
inner callable is expected to be an iterable of
BTrees.family32.OO.Bucket queries to search further from the given chain
of relations.
Here's a flawed approach to this problem.
>>> def flawed_factory(query, catalog):
... if (len(query) == 2 and
... 'organization' in query and
... 'principal_id' in query):
... def getQueries(relchain):
... if not relchain:
... yield query
... return
... current = catalog.getValueTokens(
... 'organization', relchain[-1])
... if current:
... organizations = catalog.getRelationTokens(
... {'part': zc.relation.catalog.Any(current)})
... if organizations:
... res = BTrees.family32.OO.Bucket(query)
... res['organization'] = zc.relation.catalog.Any(
... organizations)
... yield res
... return getQueries
...
That works for our current example.
>>> sorted(catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'ophelia'}),
... queryFactory=flawed_factory))
['publisher', 'reviewer', 'writer']
However, it won't work for other similar queries.
>>> getRolesInOrganization('abe', orgs['Zookd Nbd'])
['publisher', 'reviewer', 'user manager', 'writer']
>>> sorted(catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'abe'}),
... queryFactory=flawed_factory))
[]
oops.
The flawed_factory is actually a useful pattern for more typical relation
traversal. It goes from relation to relation to relation, and ophelia has
connected relations all the way to the top. However, abe only has them at
the top, so nothing is traversed.
Instead, we can make a query factory that modifies the initial query.
>>> def factory2(query, catalog):
... if (len(query) == 2 and
... 'organization' in query and
... 'principal_id' in query):
... def getQueries(relchain):
... if not relchain:
... res = BTrees.family32.OO.Bucket(query)
... org_id = query['organization']
... if org_id is not None:
... res['organization'] = zc.relation.catalog.any(
... org_id,
... *catalog.findRelationTokens({'part': org_id}))
... yield res
... return getQueries
...
>>> sorted(catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'ophelia'}),
... queryFactory=factory2))
['publisher', 'reviewer', 'writer']
>>> sorted(catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'abe'}),
... queryFactory=factory2))
['publisher', 'reviewer', 'user manager', 'writer']
A difference between this and the other approach is that it is essentially
intransitive: this query factory modifies the initial query, and then does
not give further queries. The catalog currently always stops calling the
query factory if the queries do not return any results, so an approach like
the flawed_factory simply won't work for this kind of problem.
We could add this query factory as another default.
>>> catalog.addDefaultQueryFactory(factory2)
>>> sorted(catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'ophelia'})))
['publisher', 'reviewer', 'writer']
>>> sorted(catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'abe'})))
['publisher', 'reviewer', 'user manager', 'writer']
The previously installed query factory is still available.
>>> list(catalog.iterDefaultQueryFactories()) == [factory1, factory2]
True
>>> list(catalog.findRelations(
... {'part': registry.getId(orgs['Y3L4 Proj'])}))
... # doctest: +NORMALIZE_WHITESPACE
[<Organization instance "Ynod Devs">,
<Organization instance "Ynod Corp Management">]
>>> sorted(catalog.findRelations(
... {None: registry.getId(orgs['Ynod Corp Management'])}))
... # doctest: +NORMALIZE_WHITESPACE
[<Organization instance "Bet Proj">, <Organization instance "Y3L4 Proj">,
<Organization instance "Ynod Admins">,
<Organization instance "Ynod Corp Management">,
<Organization instance "Ynod Devs">, <Organization instance "Ynod SAs">,
<Organization instance "Ynod Zookd Task Force">]
Search Index for Query Factory Joins
====================================
Now that we have written a query factory that encapsulates the join, we can
use a search index that speeds it up. We've only used transitive search
indexes so far. Now we will add an intransitive search index.
The intransitive search index generally just needs the search value
names it should be indexing, optionally the result name (defaulting to
relations), and optionally the query factory to be used.
We need to use two additional options because of the odd join trick we're
doing. We need to specify what organization and principal_id values need
to be changed when an object is indexed, and we need to indicate that we
should update when organization, principal_id, *or* parts changes.
`getValueTokens` specifies the values that need to be indexed. It gets
the index, the name for the tokens desired, the token, the catalog that
generated the token change (it may not be the same as the index's
catalog, the source dictionary that contains a dictionary of the values
that will be used for tokens if you do not override them, a dict of the
added values for this token (keys are value names), a dict of the
removed values for this token, and whether the token has been removed.
The method can return None, which will leave the index to its default
behavior that should work if no query factory is used; or an iterable of
values.
>>> def getValueTokens(index, name, token, catalog, source,
... additions, removals, removed):
... if name == 'organization':
... orgs = source.get('organization')
... if not removed or not orgs:
... orgs = index.catalog.getValueTokens(
... 'organization', token)
... if not orgs:
... orgs = [token]
... orgs.extend(removals.get('part', ()))
... orgs = set(orgs)
... orgs.update(index.catalog.findValueTokens(
... 'part',
... {None: zc.relation.catalog.Any(
... t for t in orgs if t is not None)}))
... return orgs
... elif name == 'principal_id':
... # we only want custom behavior if this is an organization
... if 'principal_id' in source or index.catalog.getValueTokens(
... 'principal_id', token):
... return ''
... orgs = set((token,))
... orgs.update(index.catalog.findRelationTokens(
... {'part': token}))
... return set(index.catalog.findValueTokens(
... 'principal_id', {
... 'organization': zc.relation.catalog.Any(orgs)}))
...
>>> index = zc.relation.searchindex.Intransitive(
... ('organization', 'principal_id'), 'role_id', factory2,
... getValueTokens,
... ('organization', 'principal_id', 'part', 'role_id'),
... unlimitedDepth=True)
>>> catalog.addSearchIndex(index)
>>> res = catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'ophelia'}))
>>> list(res)
['publisher', 'reviewer', 'writer']
>>> list(res)
['publisher', 'reviewer', 'writer']
>>> res = catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'abe'}))
>>> list(res)
['publisher', 'reviewer', 'user manager', 'writer']
>>> list(res)
['publisher', 'reviewer', 'user manager', 'writer']
[#verifyObjectIntransitive]_
.. [#verifyObjectIntransitive] The Intransitive search index provides
ISearchIndex and IListener.
>>> from zope.interface.verify import verifyObject
>>> import zc.relation.interfaces
>>> verifyObject(zc.relation.interfaces.ISearchIndex, index)
True
>>> verifyObject(zc.relation.interfaces.IListener, index)
True
Now we can change and remove relations--both organizations and roles--and
have the index maintain correct state. Given the current state of
organizations--
::
Ynod Corp Mangement Zookd Corp Management
/ | \ / | \
Ynod Devs Ynod SAs Ynod Admins Zookd Admins Zookd SAs Zookd Devs
/ \ \ / / \
Y3L4 Proj Bet Proj Ynod Zookd Task Force Zookd hOgnmd Zookd Nbd
--first we will move Ynod Devs to beneath Zookd Devs, and back out. This will
briefly give abe full privileges to Y3L4 Proj., among others.
>>> list(catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Y3L4 Proj'],
... 'principal_id': 'abe'})))
[]
>>> orgs['Zookd Devs'].parts.insert(registry.getId(orgs['Ynod Devs']))
1
>>> catalog.index(orgs['Zookd Devs'])
>>> res = catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Y3L4 Proj'],
... 'principal_id': 'abe'}))
>>> list(res)
['publisher', 'reviewer', 'user manager', 'writer']
>>> list(res)
['publisher', 'reviewer', 'user manager', 'writer']
>>> orgs['Zookd Devs'].parts.remove(registry.getId(orgs['Ynod Devs']))
>>> catalog.index(orgs['Zookd Devs'])
>>> list(catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Y3L4 Proj'],
... 'principal_id': 'abe'})))
[]
As another example, we will change the roles abe has, and see that it is
propagated down to Zookd Nbd.
>>> rels = list(catalog.findRelations(t(
... {'principal_id': 'abe',
... 'organization': orgs['Zookd Corp Management']})))
>>> len(rels)
1
>>> rels[0].role_ids.remove('reviewer')
>>> catalog.index(rels[0])
>>> res = catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'abe'}))
>>> list(res)
['publisher', 'user manager', 'writer']
>>> list(res)
['publisher', 'user manager', 'writer']
Note that search index order matters. In our case, our intransitive search
index is relying on our transitive index, so the transitive index needs to
come first. You want transitive relation indexes before name. Right now,
you are in charge of this order: it will be difficult to come up with a
reliable algorithm for guessing this.
Listeners, Catalog Administration, and Joining Across Relation Catalogs
=======================================================================
We've done all of our examples so far with a single catalog that indexes
both kinds of relations. What if we want to have two catalogs with
homogenous collections of relations? That can feel cleaner, but it also
introduces some new wrinkles.
Let's use our current catalog for organizations, removing the extra
information; and create a new one for roles.
>>> role_catalog = root['role_catalog'] = catalog.copy()
>>> transaction.commit()
>>> org_catalog = catalog
>>> del catalog
We'll need a slightly different query factory and a slightly different
search index `getValueTokens` function. We'll write those, then modify the
configuration of our two catalogs for the new world.
The transitive factory we write here is for the role catalog. It needs
access to the organzation catalog. We could do this a variety of
ways--relying on a utility, or finding the catalog from context. We will
make the role_catalog have a .org_catalog attribute, and rely on that.
>>> role_catalog.org_catalog = org_catalog
>>> def factory3(query, catalog):
... if (len(query) == 2 and
... 'organization' in query and
... 'principal_id' in query):
... def getQueries(relchain):
... if not relchain:
... res = BTrees.family32.OO.Bucket(query)
... org_id = query['organization']
... if org_id is not None:
... res['organization'] = zc.relation.catalog.any(
... org_id,
... *catalog.org_catalog.findRelationTokens(
... {'part': org_id}))
... yield res
... return getQueries
...
>>> def getValueTokens2(index, name, token, catalog, source,
... additions, removals, removed):
... is_role_catalog = catalog is index.catalog # role_catalog
... if name == 'organization':
... if is_role_catalog:
... orgs = set(source.get('organization') or
... index.catalog.getValueTokens(
... 'organization', token) or ())
... else:
... orgs = set((token,))
... orgs.update(removals.get('part', ()))
... orgs.update(index.catalog.org_catalog.findValueTokens(
... 'part',
... {None: zc.relation.catalog.Any(
... t for t in orgs if t is not None)}))
... return orgs
... elif name == 'principal_id':
... # we only want custom behavior if this is an organization
... if not is_role_catalog:
... orgs = set((token,))
... orgs.update(index.catalog.org_catalog.findRelationTokens(
... {'part': token}))
... return set(index.catalog.findValueTokens(
... 'principal_id', {
... 'organization': zc.relation.catalog.Any(orgs)}))
... return ''
If you are following along in the code and comparing to the originals, you may
see that this approach is a bit cleaner than the one when the relations were
in the same catalog.
Now we will fix up the the organization catalog [#compare_copy]_.
.. [#compare_copy] Before we modify them, let's look at the copy we made.
The copy should currently behave identically to the original.
>>> len(org_catalog)
38
>>> len(role_catalog)
38
>>> indexed = list(org_catalog)
>>> len(indexed)
38
>>> orgs['Zookd Devs'] in indexed
True
>>> for r in indexed:
... if r not in role_catalog:
... print('bad')
... break
... else:
... print('good')
...
good
>>> org_names = set(dir(org_catalog))
>>> role_names = set(dir(role_catalog))
>>> sorted(org_names - role_names)
[]
>>> sorted(role_names - org_names)
['org_catalog']
>>> def checkYnodDevsParts(catalog):
... res = sorted(catalog.findRelations(t({None: orgs['Ynod Devs']})))
... if res != [
... orgs["Bet Proj"], orgs["Y3L4 Proj"], orgs["Ynod Devs"]]:
... print("bad", res)
...
>>> checkYnodDevsParts(org_catalog)
>>> checkYnodDevsParts(role_catalog)
>>> def checkOpheliaRoles(catalog):
... res = sorted(catalog.findRelations({'principal_id': 'ophelia'}))
... if repr(res) != (
... "[<Roles instance (ophelia has publisher in Zookd Nbd)>, " +
... "<Roles instance (ophelia has reviewer in Zookd Devs)>, " +
... "<Roles instance (ophelia has writer in " +
... "Zookd Corp Management)>]"):
... print("bad", res)
...
>>> checkOpheliaRoles(org_catalog)
>>> checkOpheliaRoles(role_catalog)
>>> def checkOpheliaWriterOrganizations(catalog):
... res = sorted(catalog.findRelations({None: zc.relation.catalog.Any(
... catalog.findValueTokens(
... 'organization', {'principal_id': 'ophelia',
... 'role_id': 'writer'}))}))
... if repr(res) != (
... '[<Organization instance "Ynod Zookd Task Force">, ' +
... '<Organization instance "Zookd Admins">, ' +
... '<Organization instance "Zookd Corp Management">, ' +
... '<Organization instance "Zookd Devs">, ' +
... '<Organization instance "Zookd Nbd">, ' +
... '<Organization instance "Zookd SAs">, ' +
... '<Organization instance "Zookd hOgnmd">]'):
... print("bad", res)
...
>>> checkOpheliaWriterOrganizations(org_catalog)
>>> checkOpheliaWriterOrganizations(role_catalog)
>>> def checkPrincipalsWithRolesInZookdDevs(catalog):
... org_id = registry.getId(orgs['Zookd Devs'])
... res = sorted(catalog.findValueTokens(
... 'principal_id',
... {'organization': zc.relation.catalog.any(
... org_id, *catalog.findRelationTokens({'part': org_id}))}))
... if res != ['abe', 'ignas', 'karyn', 'lettie', 'nancy', 'ophelia']:
... print("bad", res)
...
>>> checkPrincipalsWithRolesInZookdDevs(org_catalog)
>>> checkPrincipalsWithRolesInZookdDevs(role_catalog)
>>> def checkOpheliaRolesInZookdNbd(catalog):
... res = sorted(catalog.findValueTokens(
... 'role_id', {
... 'organization': registry.getId(orgs['Zookd Nbd']),
... 'principal_id': 'ophelia'}))
... if res != ['publisher', 'reviewer', 'writer']:
... print("bad", res)
...
>>> checkOpheliaRolesInZookdNbd(org_catalog)
>>> checkOpheliaRolesInZookdNbd(role_catalog)
>>> def checkAbeRolesInZookdNbd(catalog):
... res = sorted(catalog.findValueTokens(
... 'role_id', {
... 'organization': registry.getId(orgs['Zookd Nbd']),
... 'principal_id': 'abe'}))
... if res != ['publisher', 'user manager', 'writer']:
... print("bad", res)
...
>>> checkAbeRolesInZookdNbd(org_catalog)
>>> checkAbeRolesInZookdNbd(role_catalog)
>>> org_catalog.removeDefaultQueryFactory(None) # doctest: +ELLIPSIS
Traceback (most recent call last):
...
LookupError: ('factory not found', None)
>>> org_catalog.removeValueIndex('organization')
>>> org_catalog.removeValueIndex('role_id')
>>> org_catalog.removeValueIndex('principal_id')
>>> org_catalog.removeDefaultQueryFactory(factory2)
>>> org_catalog.removeSearchIndex(index)
>>> org_catalog.clear()
>>> len(org_catalog)
0
>>> for v in orgs.values():
... org_catalog.index(v)
This also shows using the `removeDefaultQueryFactory` and `removeSearchIndex`
methods [#removeDefaultQueryFactoryExceptions]_.
.. [#removeDefaultQueryFactoryExceptions] You get errors by removing query
factories that are not registered.
>>> org_catalog.removeDefaultQueryFactory(factory2) # doctest: +ELLIPSIS
Traceback (most recent call last):
...
LookupError: ('factory not found', <function factory2 at ...>)
Now we will set up the role catalog [#copy_unchanged]_.
.. [#copy_unchanged] Changes to one copy should not affect the other. That
means the role_catalog should still work as before.
>>> len(org_catalog)
13
>>> len(list(org_catalog))
13
>>> len(role_catalog)
38
>>> indexed = list(role_catalog)
>>> len(indexed)
38
>>> orgs['Zookd Devs'] in indexed
True
>>> orgs['Zookd Devs'] in role_catalog
True
>>> checkYnodDevsParts(role_catalog)
>>> checkOpheliaRoles(role_catalog)
>>> checkOpheliaWriterOrganizations(role_catalog)
>>> checkPrincipalsWithRolesInZookdDevs(role_catalog)
>>> checkOpheliaRolesInZookdNbd(role_catalog)
>>> checkAbeRolesInZookdNbd(role_catalog)
>>> role_catalog.removeValueIndex('part')
>>> for ix in list(role_catalog.iterSearchIndexes()):
... role_catalog.removeSearchIndex(ix)
...
>>> role_catalog.removeDefaultQueryFactory(factory1)
>>> role_catalog.removeDefaultQueryFactory(factory2)
>>> role_catalog.addDefaultQueryFactory(factory3)
>>> root['index2'] = index2 = zc.relation.searchindex.Intransitive(
... ('organization', 'principal_id'), 'role_id', factory3,
... getValueTokens2,
... ('organization', 'principal_id', 'part', 'role_id'),
... unlimitedDepth=True)
>>> role_catalog.addSearchIndex(index2)
The new role_catalog index needs to be updated from the org_catalog.
We'll set that up using listeners, a new concept.
>>> org_catalog.addListener(index2)
>>> list(org_catalog.iterListeners()) == [index2]
True
Now the role_catalog should be able to answer the same questions as the old
single catalog approach.
>>> t = role_catalog.tokenizeQuery
>>> list(role_catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'abe'})))
['publisher', 'user manager', 'writer']
>>> list(role_catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'ophelia'})))
['publisher', 'reviewer', 'writer']
We can also make changes to both catalogs and the search indexes are
maintained.
>>> list(role_catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Y3L4 Proj'],
... 'principal_id': 'abe'})))
[]
>>> orgs['Zookd Devs'].parts.insert(registry.getId(orgs['Ynod Devs']))
1
>>> org_catalog.index(orgs['Zookd Devs'])
>>> list(role_catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Y3L4 Proj'],
... 'principal_id': 'abe'})))
['publisher', 'user manager', 'writer']
>>> orgs['Zookd Devs'].parts.remove(registry.getId(orgs['Ynod Devs']))
>>> org_catalog.index(orgs['Zookd Devs'])
>>> list(role_catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Y3L4 Proj'],
... 'principal_id': 'abe'})))
[]
>>> rels = list(role_catalog.findRelations(t(
... {'principal_id': 'abe',
... 'organization': orgs['Zookd Corp Management']})))
>>> len(rels)
1
>>> rels[0].role_ids.insert('reviewer')
1
>>> role_catalog.index(rels[0])
>>> res = role_catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd Nbd'],
... 'principal_id': 'abe'}))
>>> list(res)
['publisher', 'reviewer', 'user manager', 'writer']
Here we add a new organization.
>>> orgs['Zookd hOnc'] = org = Organization('Zookd hOnc')
>>> orgs['Zookd Devs'].parts.insert(registry.getId(org))
1
>>> org_catalog.index(orgs['Zookd hOnc'])
>>> org_catalog.index(orgs['Zookd Devs'])
>>> list(role_catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd hOnc'],
... 'principal_id': 'abe'})))
['publisher', 'reviewer', 'user manager', 'writer']
>>> list(role_catalog.findValueTokens(
... 'role_id', t({'organization': orgs['Zookd hOnc'],
... 'principal_id': 'ophelia'})))
['reviewer', 'writer']
Now we'll remove it.
>>> orgs['Zookd Devs'].parts.remove(registry.getId(org))
>>> org_catalog.index(orgs['Zookd Devs'])
>>> org_catalog.unindex(orgs['Zookd hOnc'])
TODO make sure that intransitive copy looks the way we expect
[#administrivia]_
.. [#administrivia]
You can add listeners multiple times.
>>> org_catalog.addListener(index2)
>>> list(org_catalog.iterListeners()) == [index2, index2]
True
Now we will remove the listeners, to show we can.
>>> org_catalog.removeListener(index2)
>>> org_catalog.removeListener(index2)
>>> org_catalog.removeListener(index2)
... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
Traceback (most recent call last):
...
LookupError: ('listener not found',
<zc.relation.searchindex.Intransitive object at ...>)
>>> org_catalog.removeListener(None)
... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
Traceback (most recent call last):
...
LookupError: ('listener not found', None)
Here's the same for removing a search index we don't have
>>> org_catalog.removeSearchIndex(index2)
... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
Traceback (most recent call last):
...
LookupError: ('index not found',
<zc.relation.searchindex.Intransitive object at ...>)
.. ......... ..
.. Footnotes ..
.. ......... ..
.. [#silliness] In "2001: A Space Odyssey", many people believe the name HAL
was chosen because it was ROT25 of IBM.... I cheat a bit sometimes and
use ROT1 because the result sounds better.
=================================================================
Working with Search Indexes: zc.relation Catalog Extended Example
=================================================================
Introduction
============
This document assumes you have read the README.rst document, and want to learn
a bit more by example. In it, we will explore a set of relations that
demonstrates most of the aspects of working with search indexes and listeners.
It will not explain the topics that the other documents already addressed. It
also describes an advanced use case.
As we have seen in the other documents, the relation catalog supports
search indexes. These can return the results of any search, as desired.
Of course, the intent is that you supply an index that optimizes the
particular searches it claims.
The searchindex module supplies a few search indexes, optimizing
specified transitive and intransitive searches. We have seen them working
in other documents. We will examine them more in depth in this document.
Search indexes update themselves by receiving messages via a "listener"
interface. We will also look at how this works.
The example described in this file examines a use case similar to that in
the zc.revision or zc.vault packages: a relation describes a graph of
other objects. Therefore, this is our first concrete example of purely
extrinsic relations.
Let's build the example story a bit. Imagine we have a graph, often a
hierarchy, of tokens--integers. Relations specify that a given integer
token relates to other integer tokens, with a containment denotation or
other meaning.
The integers may also have relations that specify that they represent an
object or objects.
This allows us to have a graph of objects in which changing one part of the
graph does not require changing the rest. zc.revision and zc.vault thus
are able to model graphs that can have multiple revisions efficiently and
with quite a bit of metadata to support merges.
Let's imagine a simple hierarchy. The relation has a `token` attribute
and a `children` attribute; children point to tokens. Relations will
identify themselves with ids.
>>> import BTrees
>>> relations = BTrees.family64.IO.BTree()
>>> relations[99] = None # just to give us a start
>>> class Relation(object):
... def __init__(self, token, children=()):
... self.token = token
... self.children = BTrees.family64.IF.TreeSet(children)
... self.id = relations.maxKey() + 1
... relations[self.id] = self
...
>>> def token(rel, self):
... return rel.token
...
>>> def children(rel, self):
... return rel.children
...
>>> def dumpRelation(obj, index, cache):
... return obj.id
...
>>> def loadRelation(token, index, cache):
... return relations[token]
...
The standard TransposingTransitiveQueriesFactory will be able to handle this
quite well, so we'll use that for our index.
>>> import zc.relation.queryfactory
>>> factory = zc.relation.queryfactory.TransposingTransitive(
... 'token', 'children')
>>> import zc.relation.catalog
>>> catalog = zc.relation.catalog.Catalog(
... dumpRelation, loadRelation, BTrees.family64.IO, BTrees.family64)
>>> catalog.addValueIndex(token)
>>> catalog.addValueIndex(children, multiple=True)
>>> catalog.addDefaultQueryFactory(factory)
Now let's quickly create a hierarchy and index it.
>>> for token, children in (
... (0, (1, 2)), (1, (3, 4)), (2, (10, 11, 12)), (3, (5, 6)),
... (4, (13, 14)), (5, (7, 8, 9)), (6, (15, 16)), (7, (17, 18, 19)),
... (8, (20, 21, 22)), (9, (23, 24)), (10, (25, 26)),
... (11, (27, 28, 29, 30, 31, 32))):
... catalog.index(Relation(token, children))
...
[#queryFactory]_ That hierarchy is arbitrary. Here's what we have, in terms of tokens
pointing to children::
_____________0_____________
/ \
________1_______ ______2____________
/ \ / | \
______3_____ _4_ 10 ____11_____ 12
/ \ / \ / \ / / | \ \ \
_______5_______ 6 13 14 25 26 27 28 29 30 31 32
/ | \ / \
_7_ _8_ 9 15 16
/ | \ / | \ / \
17 18 19 20 21 22 23 24
Twelve relations, with tokens 0 through 11, and a total of 33 tokens,
including children. The ids for the 12 relations are 100 through 111,
corresponding with the tokens of 0 through 11.
Without a transitive search index, we can get all transitive results.
The results are iterators.
>>> res = catalog.findRelationTokens({'token': 0})
>>> getattr(res, '__next__') is None
False
>>> getattr(res, '__len__', None) is None
True
>>> sorted(res)
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]
>>> list(res)
[]
>>> res = catalog.findValueTokens('children', {'token': 0})
>>> sorted(res) == list(range(1, 33))
True
>>> list(res)
[]
[#findValuesUnindexed]_ `canFind` also can work transitively, and will
use transitive search indexes, as we'll see below.
>>> catalog.canFind({'token': 1}, targetQuery={'children': 23})
True
>>> catalog.canFind({'token': 2}, targetQuery={'children': 23})
False
>>> catalog.canFind({'children': 23}, targetQuery={'token': 1})
True
>>> catalog.canFind({'children': 23}, targetQuery={'token': 2})
False
`findRelationTokenChains` won't change, but we'll include it in the
discussion and examples to show that.
>>> res = catalog.findRelationTokenChains({'token': 2})
>>> chains = list(res)
>>> len(chains)
3
>>> len(list(res))
0
Transitive Search Indexes
=========================
Now we can add a couple of transitive search index. We'll talk about
them a bit first.
There is currently one variety of transitive index, which indexes
relation and value searches for the transposing transitive query
factory.
The index can only be used under certain conditions.
- The search is not a request for a relation chain.
- It does not specify a maximum depth.
- Filters are not used.
If it is a value search, then specific value indexes cannot be used if a
target filter or target query are used, but the basic relation index can
still be used in that case.
The usage of the search indexes is largely transparent: set them up, and
the relation catalog will use them for the same API calls that used more
brute force previously. The only difference from external uses is that
results that use an index will usually be a BTree structure, rather than
an iterator.
When you add a transitive index for a relation, you must specify the
transitive name (or names) of the query, and the same for the reverse.
That's all we'll do now.
>>> import zc.relation.searchindex
>>> catalog.addSearchIndex(
... zc.relation.searchindex.TransposingTransitiveMembership(
... 'token', 'children', names=('children',)))
Now we should have a search index installed.
Notice that we went from parent (token) to child: this index is primarily
designed for helping transitive membership searches in a hierarchy. Using it to
index parents would incur a lot of write expense for not much win.
There's just a bit more you can specify here: static fields for a query
to do a bit of filtering. We don't need any of that for this example.
Now how does the catalog use this index for searches? Three basic ways,
depending on the kind of search, relations, values, or `canFind`.
Before we start looking into the internals, let's verify that we're getting
what we expect: correct answers, and not iterators, but BTree structures.
>>> res = catalog.findRelationTokens({'token': 0})
>>> list(res)
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]
>>> list(res)
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]
>>> res = catalog.findValueTokens('children', {'token': 0})
>>> list(res) == list(range(1, 33))
True
>>> list(res) == list(range(1, 33))
True
>>> catalog.canFind({'token': 1}, targetQuery={'children': 23})
True
>>> catalog.canFind({'token': 2}, targetQuery={'children': 23})
False
[#findValuesIndexed]_ Note that the last two `canFind` examples from
when we went through these examples without an index do not use the
index, so we don't show them here: they look the wrong direction for
this index.
So how do these results happen?
The first, `findRelationTokens`, and the last, `canFind`, are the most
straightforward. The index finds all relations that match the given
query, intransitively. Then for each relation, it looks up the indexed
transitive results for that token. The end result is the union of all
indexed results found from the intransitive search. `canFind` simply
casts the result into a boolean.
`findValueTokens` is the same story as above with only one more step. After
the union of relations is calculated, the method returns the union of the
sets of the requested value for all found relations.
It will maintain itself when relations are reindexed.
>>> rel = list(catalog.findRelations({'token': 11}))[0]
>>> for t in (27, 28, 29, 30, 31):
... rel.children.remove(t)
...
>>> catalog.index(rel)
>>> catalog.findValueTokens('children', {'token': 0})
... # doctest: +NORMALIZE_WHITESPACE
LFSet([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 32])
>>> catalog.findValueTokens('children', {'token': 2})
LFSet([10, 11, 12, 25, 26, 32])
>>> catalog.findValueTokens('children', {'token': 11})
LFSet([32])
>>> rel.children.remove(32)
>>> catalog.index(rel)
>>> catalog.findValueTokens('children', {'token': 0})
... # doctest: +NORMALIZE_WHITESPACE
LFSet([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26])
>>> catalog.findValueTokens('children', {'token': 2})
LFSet([10, 11, 12, 25, 26])
>>> catalog.findValueTokens('children', {'token': 11})
LFSet([])
>>> rel.children.insert(27)
1
>>> catalog.index(rel)
>>> catalog.findValueTokens('children', {'token': 0})
... # doctest: +NORMALIZE_WHITESPACE
LFSet([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27])
>>> catalog.findValueTokens('children', {'token': 2})
LFSet([10, 11, 12, 25, 26, 27])
>>> catalog.findValueTokens('children', {'token': 11})
LFSet([27])
When the index is copied, the search index is copied.
>>> new = catalog.copy()
>>> res = list(new.iterSearchIndexes())
>>> len(res)
1
>>> new_index = res[0]
>>> res = list(catalog.iterSearchIndexes())
>>> len(res)
1
>>> old_index = res[0]
>>> new_index is old_index
False
>>> old_index.index is new_index.index
False
>>> list(old_index.index.keys()) == list(new_index.index.keys())
True
>>> from __future__ import print_function
>>> for key, value in old_index.index.items():
... v = new_index.index[key]
... if v is value or list(v) != list(value):
... print('oops', key, value, v)
... break
... else:
... print('good')
...
good
>>> old_index.names is not new_index.names
True
>>> list(old_index.names) == list(new_index.names)
True
>>> for name, old_ix in old_index.names.items():
... new_ix = new_index.names[name]
... if new_ix is old_ix or list(new_ix.keys()) != list(old_ix.keys()):
... print('oops')
... break
... for key, value in old_ix.items():
... v = new_ix[key]
... if v is value or list(v) != list(value):
... print('oops', name, key, value, v)
... break
... else:
... continue
... break
... else:
... print('good')
...
good
Helpers
=======
When writing search indexes and query factories, you often want complete
access to relation catalog data. We've seen a number of these tools already:
- `getRelationModuleTools` gets a dictionary of the BTree tools needed to
work with relations.
>>> sorted(catalog.getRelationModuleTools().keys())
... # doctest: +NORMALIZE_WHITESPACE
['BTree', 'Bucket', 'Set', 'TreeSet', 'difference', 'dump',
'intersection', 'load', 'multiunion', 'union']
'multiunion' is only there if the BTree is an I* or L* module.
Use the zc.relation.catalog.multiunion helper function to do the
best union you can for a given set of tools.
- `getValueModuleTools` does the same for indexed values.
>>> tools = set(('BTree', 'Bucket', 'Set', 'TreeSet', 'difference',
... 'dump', 'intersection', 'load', 'multiunion', 'union'))
>>> tools.difference(catalog.getValueModuleTools('children').keys()) == set()
True
>>> tools.difference(catalog.getValueModuleTools('token').keys()) == set()
True
- `getRelationTokens` can return all of the tokens in the catalog.
>>> len(catalog.getRelationTokens()) == len(catalog)
True
This also happens to be equivalent to `findRelationTokens` with an empty
query.
>>> catalog.getRelationTokens() is catalog.findRelationTokens({})
True
It also can return all the tokens that match a given query, or None if
there are no matches.
>>> catalog.getRelationTokens({'token': 0}) # doctest: +ELLIPSIS
<BTrees.LOBTree.LOTreeSet object at ...>
>>> list(catalog.getRelationTokens({'token': 0}))
[100]
This also happens to be equivalent to `findRelationTokens` with a query,
a maxDepth of 1, and no other arguments.
>>> catalog.findRelationTokens({'token': 0}, maxDepth=1) is (
... catalog.getRelationTokens({'token': 0}))
True
Except that if there are no matches, `findRelationTokens` returns an empty
set (so it *always* returns an iterable).
>>> catalog.findRelationTokens({'token': 50}, maxDepth=1)
LOSet([])
>>> print(catalog.getRelationTokens({'token': 50}))
None
- `getValueTokens` can return all of the tokens for a given value name in
the catalog.
>>> list(catalog.getValueTokens('token')) == list(range(12))
True
This is identical to catalog.findValueTokens with a name only (or with
an empty query, and a maxDepth of 1).
>>> list(catalog.findValueTokens('token')) == list(range(12))
True
>>> catalog.findValueTokens('token') is catalog.getValueTokens('token')
True
It can also return the values for a given token.
>>> list(catalog.getValueTokens('children', 100))
[1, 2]
This is identical to catalog.findValueTokens with a name and a query of
{None: token}.
>>> list(catalog.findValueTokens('children', {None: 100}))
[1, 2]
>>> catalog.getValueTokens('children', 100) is (
... catalog.findValueTokens('children', {None: 100}))
True
Except that if there are no matches, `findValueTokens` returns an empty
set (so it *always* returns an iterable); while getValueTokens will
return None if the relation has no values (or the relation is unknown).
>>> catalog.findValueTokens('children', {None: 50}, maxDepth=1)
LFSet([])
>>> print(catalog.getValueTokens('children', 50))
None
>>> rel.children.remove(27)
>>> catalog.index(rel)
>>> catalog.findValueTokens('children', {None: rel.id}, maxDepth=1)
LFSet([])
>>> print(catalog.getValueTokens('children', rel.id))
None
- `yieldRelationTokenChains` is a search workhorse for searches that use a
query factory. TODO: describe.
.. ......... ..
.. Footnotes ..
.. ......... ..
.. [#queryFactory] The query factory knows when it is not needed--not only
when neither of its names are used, but also when both of its names are
used.
>>> list(catalog.findRelationTokens({'token': 0, 'children': 1}))
[100]
.. [#findValuesUnindexed] When values are the same as their tokens,
`findValues` returns the same result as `findValueTokens`. Here
we see this without indexes.
>>> list(catalog.findValueTokens('children', {'token': 0})) == list(
... catalog.findValues('children', {'token': 0}))
True
.. [#findValuesIndexed] Again, when values are the same as their tokens,
`findValues` returns the same result as `findValueTokens`. Here
we see this with indexes.
>>> list(catalog.findValueTokens('children', {'token': 0})) == list(
... catalog.findValues('children', {'token': 0}))
True
Optimizing Relation Catalog Use
===============================
There are several best practices and optimization opportunities in regards to
the catalog.
- Use integer-keyed BTree sets when possible. They can use the BTrees'
`multiunion` for a speed boost. Integers' __cmp__ is reliable, and in C.
- Never use persistent objects as keys. They will cause a database load every
time you need to look at them, they take up memory and object caches, and
they (as of this writing) disable conflict resolution. Intids (or similar)
are your best bet for representing objects, and some other immutable such as
strings are the next-best bet, and zope.app.keyreferences (or similar) are
after that.
- Use multiple-token values in your queries when possible, especially in your
transitive query factories.
- Use the cache when you are loading and dumping tokens, and in your
transitive query factories.
- When possible, don't load or dump tokens (the values themselves may be used
as tokens). This is especially important when you have multiple tokens:
store them in a BTree structure in the same module as the zc.relation module
for the value.
For some operations, particularly with hundreds or thousands of members in a
single relation value, some of these optimizations can speed up some
common-case reindexing work by around 100 times.
The easiest (and perhaps least useful) optimization is that all dump
calls and all load calls generated by a single operation share a cache
dictionary per call type (dump/load), per indexed relation value.
Therefore, for instance, we could stash an intids utility, so that we
only had to do a utility lookup once, and thereafter it was only a
single dictionary lookup. This is what the default `generateToken` and
`resolveToken` functions in zc.relationship's index.py do: look at them
for an example.
A further optimization is to not load or dump tokens at all, but use values
that may be tokens. This will be particularly useful if the tokens have
__cmp__ (or equivalent) in C, such as built-in types like ints. To specify
this behavior, you create an index with the 'load' and 'dump' values for the
indexed attribute descriptions explicitly set to None.
>>> import zope.interface
>>> class IRelation(zope.interface.Interface):
... subjects = zope.interface.Attribute(
... 'The sources of the relation; the subject of the sentence')
... relationtype = zope.interface.Attribute(
... '''unicode: the single relation type of this relation;
... usually contains the verb of the sentence.''')
... objects = zope.interface.Attribute(
... '''the targets of the relation; usually a direct or
... indirect object in the sentence''')
...
>>> import BTrees
>>> relations = BTrees.family32.IO.BTree()
>>> relations[99] = None # just to give us a start
>>> @zope.interface.implementer(IRelation)
... class Relation(object):
...
... def __init__(self, subjects, relationtype, objects):
... self.subjects = subjects
... assert relationtype in relTypes
... self.relationtype = relationtype
... self.objects = objects
... self.id = relations.maxKey() + 1
... relations[self.id] = self
... def __repr__(self):
... return '<%r %s %r>' % (
... self.subjects, self.relationtype, self.objects)
>>> def token(rel, self):
... return rel.token
...
>>> def children(rel, self):
... return rel.children
...
>>> def dumpRelation(obj, index, cache):
... return obj.id
...
>>> def loadRelation(token, index, cache):
... return relations[token]
...
>>> relTypes = ['has the role of']
>>> def relTypeDump(obj, index, cache):
... assert obj in relTypes, 'unknown relationtype'
... return obj
...
>>> def relTypeLoad(token, index, cache):
... assert token in relTypes, 'unknown relationtype'
... return token
...
>>> import zc.relation.catalog
>>> catalog = zc.relation.catalog.Catalog(
... dumpRelation, loadRelation)
>>> catalog.addValueIndex(IRelation['subjects'], multiple=True)
>>> catalog.addValueIndex(
... IRelation['relationtype'], relTypeDump, relTypeLoad,
... BTrees.family32.OI, name='reltype')
>>> catalog.addValueIndex(IRelation['objects'], multiple=True)
>>> import zc.relation.queryfactory
>>> factory = zc.relation.queryfactory.TransposingTransitive(
... 'subjects', 'objects')
>>> catalog.addDefaultQueryFactory(factory)
>>> rel = Relation((1,), 'has the role of', (2,))
>>> catalog.index(rel)
>>> list(catalog.findValueTokens('objects', {'subjects': 1}))
[2]
If you have single relations that relate hundreds or thousands of
objects, it can be a huge win if the value is a 'multiple' of the same
type as the stored BTree for the given attribute. The default BTree
family for attributes is IFBTree; IOBTree is also a good choice, and may
be preferrable for some applications.
>>> catalog.unindex(rel)
>>> rel = Relation(
... BTrees.family32.IF.TreeSet((1,)), 'has the role of',
... BTrees.family32.IF.TreeSet())
>>> catalog.index(rel)
>>> list(catalog.findValueTokens('objects', {'subjects': 1}))
[]
>>> list(catalog.findValueTokens('subjects', {'objects': None}))
[1]
Reindexing is where some of the big improvements can happen. The following
gyrations exercise the optimization code.
>>> rel.objects.insert(2)
1
>>> catalog.index(rel)
>>> list(catalog.findValueTokens('objects', {'subjects': 1}))
[2]
>>> rel.subjects = BTrees.family32.IF.TreeSet((3,4,5))
>>> catalog.index(rel)
>>> list(catalog.findValueTokens('objects', {'subjects': 3}))
[2]
>>> rel.subjects.insert(6)
1
>>> catalog.index(rel)
>>> list(catalog.findValueTokens('objects', {'subjects': 6}))
[2]
>>> rel.subjects.update(range(100, 200))
100
>>> catalog.index(rel)
>>> list(catalog.findValueTokens('objects', {'subjects': 100}))
[2]
>>> rel.subjects = BTrees.family32.IF.TreeSet((3,4,5,6))
>>> catalog.index(rel)
>>> list(catalog.findValueTokens('objects', {'subjects': 3}))
[2]
>>> rel.subjects = BTrees.family32.IF.TreeSet(())
>>> catalog.index(rel)
>>> list(catalog.findValueTokens('objects', {'subjects': 3}))
[]
>>> rel.subjects = BTrees.family32.IF.TreeSet((3,4,5))
>>> catalog.index(rel)
>>> list(catalog.findValueTokens('objects', {'subjects': 3}))
[2]
tokenizeValues and resolveValueTokens work correctly without loaders and
dumpers--that is, they do nothing.
>>> catalog.tokenizeValues((3,4,5), 'subjects')
(3, 4, 5)
>>> catalog.resolveValueTokens((3,4,5), 'subjects')
(3, 4, 5)
=======
Changes
=======
2.1 (2024-12-09)
================
- Add support for Python 3.12, 3.13.
- Drop support for Python 3.7.
2.0 (2023-04-05)
================
- Drop support for Python 2.7, 3.5, 3.6.
[ale-rt]
- Fix the dependency on the ZODB, we just need to depend on the BTrees package.
Refs. #11.
[ale-rt]
1.2 (2023-03-28)
================
- Adapt code for PEP-479 (Change StopIteration handling inside generators).
See: https://peps.python.org/pep-0479.
Fixes #11.
[ale-rt]
1.1.post2 (2018-06-18)
======================
- Another attempt to fix PyPI page by using correct expected metadata syntax.
1.1.post1 (2018-06-18)
======================
- Fix PyPI page by using correct ReST syntax.
1.1 (2018-06-15)
================
- Add support for Python 3.5 and 3.6.
1.0 (2008-04-23)
================
This is the initial release of the zc.relation package. However, it
represents a refactoring of another package, zc.relationship. This
package contains only a modified version of the relation(ship) index,
now called a catalog. The refactored version of zc.relationship index
relies on (subclasses) this catalog. zc.relationship also maintains a
backwards-compatible subclass.
This package only relies on the ZODB, zope.interface, and zope.testing
software, and can be used inside or outside of a standard ZODB database.
The software does have to be there, though (the package relies heavily
on the ZODB BTrees package).
If you would like to switch a legacy zc.relationship index to a
zc.relation catalog, try this trick in your generations script.
Assuming the old index is ``old``, the following line should create
a new zc.relation catalog with your legacy data:
>>> new = old.copy(zc.relation.Catalog)
Why is the same basic data structure called a catalog now? Because we
exposed the ability to mutate the data structure, and what you are really
adding and removing are indexes. It didn't make sense to put an index in
an index, but it does make sense to put an index in a catalog. Thus, a
name change was born.
The catalog in this package has several incompatibilities from the earlier
zc.relationship index, and many new features. The zc.relationship package
maintains a backwards-compatible subclass. The following discussion
compares the zc.relation catalog with the zc.relationship 1.x index.
Incompatibilities with zc.relationship 1.x index
------------------------------------------------
The two big changes are that method names now refer to ``Relation`` rather
than ``Relationship``; and the catalog is instantiated slightly differently
from the index. A few other changes are worth your attention. The
following list attempts to highlight all incompatibilities.
:Big incompatibilities:
- ``findRelationshipTokenSet`` and ``findValueTokenSet`` are renamed, with
some slightly different semantics, as ``getRelationTokens`` and
``getValueTokens``. The exact same result as
``findRelationTokenSet(query)`` can be obtained with
``findRelationTokens(query, 1)`` (where 1 is maxDepth). The same
result as ``findValueTokenSet(reltoken, name)`` can be obtained with
``findValueTokens(name, {zc.relation.RELATION: reltoken}, 1)``.
- ``findRelations`` replaces ``findRelatonships``. The new method will use
the defaultTransitiveQueriesFactory if it is set and maxDepth is not 1.
It shares the call signature of ``findRelationChains``.
- ``isLinked`` is now ``canFind``.
- The catalog instantiation arguments have changed from the old index.
* ``load`` and ``dump`` (formerly ``loadRel`` and ``dumpRel``,
respectively) are now required arguments for instantiation.
* The only other optional arguments are ``btree`` (was ``relFamily``) and
``family``. You now specify what elements to index with
``addValueIndex``
* Note also that ``addValueIndex`` defaults to no load and dump function,
unlike the old instantiation options.
- query factories are different. See ``IQueryFactory`` in the interfaces.
* they first get (query, catalog, cache) and then return a getQueries
callable that gets relchains and yields queries; OR None if they
don't match.
* They must also handle an empty relchain. Typically this should
return the original query, but may also be used to mutate the
original query.
* They are no longer thought of as transitive query factories, but as
general query mutators.
:Medium:
- The catalog no longer inherits from
zope.app.container.contained.Contained.
- The index requires ZODB 3.8 or higher.
:Small:
- ``deactivateSets`` is no longer an instantiation option (it was broken
because of a ZODB bug anyway, as had been described in the
documentation).
Changes and new features
------------------------
- The catalog now offers the ability to index certain
searches. The indexes must be explicitly instantiated and registered
you want to optimize. This can be used when searching for values, when
searching for relations, or when determining if two objects are
linked. It cannot be used for relation chains. Requesting an index
has the usual trade-offs of greater storage space and slower write
speed for faster search speed. Registering a search index is done
after instantiation time; you can iteratate over the current settings
used, and remove them. (The code path expects to support legacy
zc.relationship index instances for all of these APIs.)
- You can now specify new values after the catalog has been created, iterate
over the settings used, and remove values.
- The catalog has a copy method, to quickly make new copies without actually
having to reindex the relations.
- query arguments can now specify multiple values for a given name by
using zc.relation.catalog.any(1, 2, 3, 4) or
zc.relation.catalog.Any((1, 2, 3, 4)).
- The catalog supports specifying indexed values by passing callables rather
than interface elements (which are also still supported).
- ``findRelations`` and new method ``findRelationTokens`` can find
relations transitively and intransitively. ``findRelationTokens``
when used intransitively repeats the legacy zc.relationship index
behavior of ``findRelationTokenSet``.
(``findRelationTokenSet`` remains in the API, not deprecated, a companion
to ``findValueTokenSet``.)
- in findValues and findValueTokens, ``query`` argument is now optional. If
the query evaluates to False in a boolean context, all values, or value
tokens, are returned. Value tokens are explicitly returned using the
underlying BTree storage. This can then be used directly for other BTree
operations.
- Completely new docs. Unfortunately, still really not good enough.
- The package has drastically reduced direct dependecies from zc.relationship:
it is now more clearly a ZODB tool, with no other Zope dependencies than
zope.testing and zope.interface.
- Listeners allow objects to listen to messages from the catalog (which can
be used directly or, for instance, to fire off events).
- You can search for relations, using a key of zc.relation.RELATION...which is
really an alias for None. Sorry. But hey, use the constant! I think it is
more readable.
- tokenizeQuery (and resolveQuery) now accept keyword arguments as an
alternative to a normal dict query. This can make constructing the query
a bit more attractive (i.e., ``query = catalog.tokenizeQuery;
res = catalog.findValues('object', query(subject=joe, predicate=OWNS))``).
Raw data
{
"_id": null,
"home_page": "https://github.com/zopefoundation/zc.relation",
"name": "zc.relation",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "zope zope3 relation",
"author": "Gary Poster",
"author_email": "zope-dev@zope.dev",
"download_url": "https://files.pythonhosted.org/packages/0c/4c/aea699b82fcd6277cf930f86b1d5fcb3f915de0b0ef338c1b258fdf3943f/zc_relation-2.1.tar.gz",
"platform": null,
"description": "================\nRelation Catalog\n================\n\n.. contents::\n\nOverview\n========\n\nThe relation catalog can be used to optimize intransitive and transitive\nsearches for N-ary relations of finite, preset dimensions.\n\nFor example, you can index simple two-way relations, like employee to\nsupervisor; RDF-style triples of subject-predicate-object; and more complex\nrelations such as subject-predicate-object with context and state. These\ncan be searched with variable definitions of transitive behavior.\n\nThe catalog can be used in the ZODB or standalone. It is a generic, relatively\npolicy-free tool.\n\nIt is expected to be used usually as an engine for more specialized and\nconstrained tools and APIs. Three such tools are zc.relationship containers,\nplone.relations containers, and zc.vault. The documents in the package,\nincluding this one, describe other possible uses.\n\nHistory\n=======\n\nThis is a refactoring of the ZODB-only parts of the zc.relationship package.\nSpecifically, the zc.relation catalog is largely equivalent to the\nzc.relationship index. The index in the zc.relationship 2.x line is an\nalmost-completely backwards-compatible wrapper of the zc.relation catalog.\nzc.relationship will continue to be maintained, though active development is\nexpected to go into zc.relation.\n\nMany of the ideas come from discussions with and code from Casey Duncan, Tres\nSeaver, Ken Manheimer, and more.\n\nSetting Up a Relation Catalog\n=============================\n\nIn this section, we will be introducing the following ideas.\n\n- Relations are objects with indexed values.\n\n- You add value indexes to relation catalogs to be able to search. Values\n can be identified to the catalog with callables or interface elements. The\n indexed value must be specified to the catalog as a single value or a\n collection.\n\n- Relations and their values are stored in the catalog as tokens: unique\n identifiers that you can resolve back to the original value. Integers are the\n most efficient tokens, but others can work fine too.\n\n- Token type determines the BTree module needed.\n\n- You must define your own functions for tokenizing and resolving tokens. These\n functions are registered with the catalog for the relations and for each of\n their value indexes.\n\n- Relations are indexed with ``index``.\n\nWe will use a simple two way relation as our example here. A brief introduction\nto a more complex RDF-style subject-predicate-object set up can be found later\nin the document.\n\nCreating the Catalog\n--------------------\n\nImagine a two way relation from one value to another. Let's say that we\nare modeling a relation of people to their supervisors: an employee may\nhave a single supervisor. For this first example, the relation between\nemployee and supervisor will be intrinsic: the employee has a pointer to\nthe supervisor, and the employee object itself represents the relation.\n\nLet's say further, for simplicity, that employee names are unique and\ncan be used to represent employees. We can use names as our \"tokens\".\n\nTokens are similar to the primary key in a relational database. A token is a\nway to identify an object. It must sort reliably and you must be able to write\na callable that reliably resolves to the object given the right context. In\nZope 3, intids (zope.app.intid) and keyreferences (zope.app.keyreference) are\ngood examples of reasonable tokens.\n\nAs we'll see below, you provide a way to convert objects to tokens, and resolve\ntokens to objects, for the relations, and for each value index individually.\nThey can be the all the same functions or completely different, depending on\nyour needs.\n\nFor speed, integers make the best tokens; followed by other\nimmutables like strings; followed by non-persistent objects; followed by\npersistent objects. The choice also determines a choice of BTree module, as\nwe'll see below.\n\nHere is our toy ``Employee`` example class. Again, we will use the employee\nname as the tokens.\n\n >>> employees = {} # we'll use this to resolve the \"name\" tokens\n >>> from functools import total_ordering\n >>> @total_ordering\n ... class Employee(object):\n ... def __init__(self, name, supervisor=None):\n ... if name in employees:\n ... raise ValueError('employee with same name already exists')\n ... self.name = name # expect this to be readonly\n ... self.supervisor = supervisor\n ... employees[name] = self\n ... # the next parts just make the tests prettier\n ... def __repr__(self):\n ... return '<Employee instance \"' + self.name + '\">'\n ... def __lt__(self, other):\n ... return self.name < other.name\n ... def __eq__(self, other):\n ... return self is other\n ... def __hash__(self):\n ... ''' Dummy method needed because we defined __eq__\n ... '''\n ... return 1\n ...\n\nSo, we need to define how to turn employees into their tokens. We call the\ntokenization a \"dump\" function. Conversely, the function to resolve tokens into\nobjects is called a \"load\".\n\nFunctions to dump relations and values get several arguments. The first\nargument is the object to be tokenized. Next, because it helps sometimes to\nprovide context, is the catalog. The last argument is a dictionary that will be\nshared for a given search. The dictionary can be ignored, or used as a cache\nfor optimizations (for instance, to stash a utility that you looked up).\n\nFor this example, our function is trivial: we said the token would be\nthe employee's name.\n\n >>> def dumpEmployees(emp, catalog, cache):\n ... return emp.name\n ...\n\nIf you store the relation catalog persistently (e.g., in the ZODB) be aware\nthat the callables you provide must be picklable--a module-level function,\nfor instance.\n\nWe also need a way to turn tokens into employees, or \"load\".\n\nThe \"load\" functions get the token to be resolved; the catalog, for\ncontext; and a dict cache, for optimizations of subsequent calls.\n\nYou might have noticed in our ``Employee.__init__`` that we keep a mapping\nof name to object in the ``employees`` global dict (defined right above\nthe class definition). We'll use that for resolving the tokens.\n\n >>> def loadEmployees(token, catalog, cache):\n ... return employees[token]\n ...\n\nNow we know enough to get started with a catalog. We'll instantiate it\nby specifying how to tokenize relations, and what kind of BTree modules\nshould be used to hold the tokens.\n\nHow do you pick BTree modules?\n\n- If the tokens are 32-bit ints, choose ``BTrees.family32.II``,\n ``BTrees.family32.IF`` or ``BTrees.family32.IO``.\n\n- If the tokens are 64 bit ints, choose ``BTrees.family64.II``,\n ``BTrees.family64.IF`` or ``BTrees.family64.IO``.\n\n- If they are anything else, choose ``BTrees.family32.OI``,\n ``BTrees.family64.OI``, or ``BTrees.family32.OO`` (or\n ``BTrees.family64.OO``--they are the same).\n\nWithin these rules, the choice is somewhat arbitrary unless you plan to merge\nthese results with that of another source that is using a particular BTree\nmodule. BTree set operations only work within the same module, so you must\nmatch module to module. The catalog defaults to IF trees, because that's what\nstandard zope catalogs use. That's as reasonable a choice as any, and will\npotentially come in handy if your tokens are in fact the same as those used by\nthe zope catalog and you want to do some set operations.\n\nIn this example, our tokens are strings, so we want OO or an OI variant. We'll\nchoose BTrees.family32.OI, arbitrarily.\n\n >>> import zc.relation.catalog\n >>> import BTrees\n >>> catalog = zc.relation.catalog.Catalog(dumpEmployees, loadEmployees,\n ... btree=BTrees.family32.OI)\n\n[#verifyObjectICatalog]_\n\n.. [#verifyObjectICatalog] The catalog provides ICatalog.\n\n >>> from zope.interface.verify import verifyObject\n >>> import zc.relation.interfaces\n >>> verifyObject(zc.relation.interfaces.ICatalog, catalog)\n True\n\n[#legacy]_\n\n\n.. [#legacy] Old instances of zc.relationship indexes, which in the newest\n version subclass a zc.relation Catalog, used to have a dict in an\n internal data structure. We specify that here so that the code that\n converts the dict to an OOBTree can have a chance to run.\n\n >>> catalog._attrs = dict(catalog._attrs)\n\nLook! A relation catalog! We can't do very\nmuch searching with it so far though, because the catalog doesn't have any\nindexes.\n\nIn this example, the relation itself represents the employee, so we won't need\nto index that separately.\n\nBut we do need a way to tell the catalog how to find the other end of the\nrelation, the supervisor. You can specify this to the catalog with an attribute\nor method specified from ``zope.interface Interface``, or with a callable.\nWe'll use a callable for now. The callable will receive the indexed relation\nand the catalog for context.\n\n >>> def supervisor(emp, catalog):\n ... return emp.supervisor # None or another employee\n ...\n\nWe'll also need to specify how to tokenize (dump and load) those values. In\nthis case, we're able to use the same functions as the relations themselves.\nHowever, do note that we can specify a completely different way to dump and\nload for each \"value index,\" or relation element.\n\nWe could also specify the name to call the index, but it will default to the\n``__name__`` of the function (or interface element), which will work just fine\nfor us now.\n\nNow we can add the \"supervisor\" value index.\n\n >>> catalog.addValueIndex(supervisor, dumpEmployees, loadEmployees,\n ... btree=BTrees.family32.OI)\n\nNow we have an index [#addValueIndexExceptions]_.\n\n.. [#addValueIndexExceptions] Adding a value index can generate several\n exceptions.\n\n You must supply both of dump and load or neither.\n\n >>> catalog.addValueIndex(supervisor, dumpEmployees, None,\n ... btree=BTrees.family32.OI, name='supervisor2')\n Traceback (most recent call last):\n ...\n ValueError: either both of 'dump' and 'load' must be None, or neither\n\n In this example, even if we fix it, we'll get an error, because we have\n already indexed the supervisor function.\n\n >>> catalog.addValueIndex(supervisor, dumpEmployees, loadEmployees,\n ... btree=BTrees.family32.OI, name='supervisor2')\n ... # doctest: +ELLIPSIS\n Traceback (most recent call last):\n ...\n ValueError: ('element already indexed', <function supervisor at ...>)\n\n You also can't add a different function under the same name.\n\n >>> def supervisor2(emp, catalog):\n ... return emp.supervisor # None or another employee\n ...\n >>> catalog.addValueIndex(supervisor2, dumpEmployees, loadEmployees,\n ... btree=BTrees.family32.OI, name='supervisor')\n ... # doctest: +ELLIPSIS\n Traceback (most recent call last):\n ...\n ValueError: ('name already used', 'supervisor')\n\n Finally, if your function does not have a ``__name__`` and you do not\n provide one, you may not add an index.\n\n >>> class Supervisor3(object):\n ... __name__ = None\n ... def __call__(klass, emp, catalog):\n ... return emp.supervisor\n ...\n >>> supervisor3 = Supervisor3()\n >>> supervisor3.__name__\n >>> catalog.addValueIndex(supervisor3, dumpEmployees, loadEmployees,\n ... btree=BTrees.family32.OI)\n ... # doctest: +ELLIPSIS\n Traceback (most recent call last):\n ...\n ValueError: no name specified\n\n >>> [info['name'] for info in catalog.iterValueIndexInfo()]\n ['supervisor']\n\nAdding Relations\n----------------\n\nNow let's create a few employees. All but one will have supervisors.\nIf you recall our toy ``Employee`` class, the first argument to the\nconstructor is the employee name (and therefore the token), and the\noptional second argument is the supervisor.\n\n >>> a = Employee('Alice')\n >>> b = Employee('Betty', a)\n >>> c = Employee('Chuck', a)\n >>> d = Employee('Diane', b)\n >>> e = Employee('Edgar', b)\n >>> f = Employee('Frank', c)\n >>> g = Employee('Galyn', c)\n >>> h = Employee('Howie', d)\n\nHere is a diagram of the hierarchy.\n\n::\n\n Alice\n __/ \\__\n Betty Chuck\n / \\ / \\\n Diane Edgar Frank Galyn\n |\n Howie\n\nLet's tell the catalog about the relations, using the ``index`` method.\n\n >>> for emp in (a,b,c,d,e,f,g,h):\n ... catalog.index(emp)\n ...\n\nWe've now created the relation catalog and added relations to it. We're ready\nto search!\n\nSearching\n=========\n\nIn this section, we will introduce the following ideas.\n\n- Queries to the relation catalog are formed with dicts.\n\n- Query keys are the names of the indexes you want to search, or, for the\n special case of precise relations, the ``zc.relation.RELATION`` constant.\n\n- Query values are the tokens of the results you want to match; or ``None``,\n indicating relations that have ``None`` as a value (or an empty collection,\n if it is a multiple). Search values can use\n ``zc.relation.catalog.any(args)`` or ``zc.relation.catalog.Any(args)`` to\n specify multiple (non-``None``) results to match for a given key.\n\n- The index has a variety of methods to help you work with tokens.\n ``tokenizeQuery`` is typically the most used, though others are available.\n\n- To find relations that match a query, use ``findRelations`` or\n ``findRelationTokens``.\n\n- To find values that match a query, use ``findValues`` or ``findValueTokens``.\n\n- You search transitively by using a query factory. The\n ``zc.relation.queryfactory.TransposingTransitive`` is a good common case\n factory that lets you walk up and down a hierarchy. A query factory can be\n passed in as an argument to search methods as a ``queryFactory``, or\n installed as a default behavior using ``addDefaultQueryFactory``.\n\n- To find how a query is related, use ``findRelationChains`` or\n ``findRelationTokenChains``.\n\n- To find out if a query is related, use ``canFind``.\n\n- Circular transitive relations are handled to prevent infinite loops. They\n are identified in ``findRelationChains`` and ``findRelationTokenChains`` with\n a ``zc.relation.interfaces.ICircularRelationPath`` marker interface.\n\n- search methods share the following arguments:\n\n * ``maxDepth``, limiting the transitive depth for searches;\n\n * ``filter``, allowing code to filter transitive paths;\n\n * ``targetQuery``, allowing a query to filter transitive paths on the basis\n of the endpoint;\n\n * ``targetFilter``, allowing code to filter transitive paths on the basis of\n the endpoint; and\n\n * ``queryFactory``, mentioned above.\n\n- You can set up search indexes to speed up specific transitive searches.\n\nQueries, ``findRelations``, and special query values\n----------------------------------------------------\n\nSo who works for Alice? That means we want to get the relations--the\nemployees--with a ``supervisor`` of Alice.\n\nThe heart of a question to the catalog is a query. A query is spelled\nas a dictionary. The main idea is simply that keys in a dictionary\nspecify index names, and the values specify the constraints.\n\nThe values in a query are always expressed with tokens. The catalog has\nseveral helpers to make this less onerous, but for now let's take\nadvantage of the fact that our tokens are easily comprehensible.\n\n >>> sorted(catalog.findRelations({'supervisor': 'Alice'}))\n [<Employee instance \"Betty\">, <Employee instance \"Chuck\">]\n\nAlice is the direct (intransitive) boss of Betty and Chuck.\n\nWhat if you want to ask \"who doesn't report to anyone?\" Then you want to\nask for a relation in which the supervisor is None.\n\n >>> list(catalog.findRelations({'supervisor': None}))\n [<Employee instance \"Alice\">]\n\nAlice is the only employee who doesn't report to anyone.\n\nWhat if you want to ask \"who reports to Diane or Chuck?\" Then you use the\nzc.relation ``Any`` class or ``any`` function to pass the multiple values.\n\n >>> sorted(catalog.findRelations(\n ... {'supervisor': zc.relation.catalog.any('Diane', 'Chuck')}))\n ... # doctest: +NORMALIZE_WHITESPACE\n [<Employee instance \"Frank\">, <Employee instance \"Galyn\">,\n <Employee instance \"Howie\">]\n\nFrank, Galyn, and Howie each report to either Diane or Chuck. [#any]_\n\n.. [#any] ``Any`` can be compared.\n\n >>> zc.relation.catalog.any('foo', 'bar', 'baz')\n <zc.relation.catalog.Any instance ('bar', 'baz', 'foo')>\n >>> (zc.relation.catalog.any('foo', 'bar', 'baz') ==\n ... zc.relation.catalog.any('bar', 'foo', 'baz'))\n True\n >>> (zc.relation.catalog.any('foo', 'bar', 'baz') !=\n ... zc.relation.catalog.any('bar', 'foo', 'baz'))\n False\n >>> (zc.relation.catalog.any('foo', 'bar', 'baz') ==\n ... zc.relation.catalog.any('foo', 'baz'))\n False\n >>> (zc.relation.catalog.any('foo', 'bar', 'baz') !=\n ... zc.relation.catalog.any('foo', 'baz'))\n True\n\n\n\n``findValues`` and the ``RELATION`` query key\n---------------------------------------------\n\nSo how do we find who an employee's supervisor is? Well, in this case,\nlook at the attribute on the employee! If you can use an attribute that\nwill usually be a win in the ZODB.\n\n >>> h.supervisor\n <Employee instance \"Diane\">\n\nAgain, as we mentioned at the start of this first example, the knowledge\nof a supervisor is \"intrinsic\" to the employee instance. It is\npossible, and even easy, to ask the catalog this kind of question, but\nthe catalog syntax is more geared to \"extrinsic\" relations, such as the\none from the supervisor to the employee: the connection between a\nsupervisor object and its employees is extrinsic to the supervisor, so\nyou actually might want a catalog to find it!\n\nHowever, we will explore the syntax very briefly, because it introduces an\nimportant pair of search methods, and because it is a stepping stone\nto our first transitive search.\n\nSo, o relation catalog, who is Howie's supervisor?\n\nTo ask this question we want to get the indexed values off of the relations:\n``findValues``. In its simplest form, the arguments are the index name of the\nvalues you want, and a query to find the relations that have the desired\nvalues.\n\nWhat about the query? Above, we noted that the keys in a query are the names of\nthe indexes to search. However, in this case, we don't want to search one or\nmore indexes for matching relations, as usual, but actually specify a relation:\nHowie.\n\nWe do not have a value index name: we are looking for a relation. The query\nkey, then, should be the constant ``zc.relation.RELATION``. For our current\nexample, that would mean the query is ``{zc.relation.RELATION: 'Howie'}``.\n\n >>> import zc.relation\n >>> list(catalog.findValues(\n ... 'supervisor', {zc.relation.RELATION: 'Howie'}))[0]\n <Employee instance \"Diane\">\n\nCongratulations, you just found an obfuscated and comparitively\ninefficient way to write ``howie.supervisor``! [#intrinsic_search]_\n\n.. [#intrinsic_search] Here's the same with token results.\n\n >>> list(catalog.findValueTokens('supervisor',\n ... {zc.relation.RELATION: 'Howie'}))\n ['Diane']\n\n While we're down here in the footnotes, I'll mention that you can\n search for relations that haven't been indexed.\n\n >>> list(catalog.findRelationTokens({zc.relation.RELATION: 'Ygritte'}))\n []\n >>> list(catalog.findRelations({zc.relation.RELATION: 'Ygritte'}))\n []\n\n[#findValuesExceptions]_\n\n\n.. [#findValuesExceptions] If you use ``findValues`` or ``findValueTokens`` and\n try to specify a value name that is not indexed, you get a ValueError.\n\n >>> catalog.findValues('foo')\n Traceback (most recent call last):\n ...\n ValueError: ('name not indexed', 'foo')\n\n\nSlightly more usefully, you can use other query keys along with\nzc.relation.RELATION. This asks, \"Of Betty, Alice, and Frank, who are\nsupervised by Alice?\"\n\n >>> sorted(catalog.findRelations(\n ... {zc.relation.RELATION: zc.relation.catalog.any(\n ... 'Betty', 'Alice', 'Frank'),\n ... 'supervisor': 'Alice'}))\n [<Employee instance \"Betty\">]\n\nOnly Betty is.\n\nTokens\n------\n\nAs mentioned above, the catalog provides several helpers to work with tokens.\nThe most frequently used is ``tokenizeQuery``, which takes a query with object\nvalues and converts them to tokens using the \"dump\" functions registered for\nthe relations and indexed values. Here are alternate spellings of some of the\nqueries we've encountered above.\n\n >>> catalog.tokenizeQuery({'supervisor': a})\n {'supervisor': 'Alice'}\n >>> catalog.tokenizeQuery({'supervisor': None})\n {'supervisor': None}\n >>> import pprint\n >>> result = catalog.tokenizeQuery(\n ... {zc.relation.RELATION: zc.relation.catalog.any(a, b, f),\n ... 'supervisor': a}) # doctest: +NORMALIZE_WHITESPACE\n >>> pprint.pprint(result)\n {None: <zc.relation.catalog.Any instance ('Alice', 'Betty', 'Frank')>,\n 'supervisor': 'Alice'}\n\n(If you are wondering about that ``None`` in the last result, yes,\n``zc.relation.RELATION`` is just readability sugar for ``None``.)\n\nSo, here's a real search using ``tokenizeQuery``. We'll make an alias for\n``catalog.tokenizeQuery`` just to shorten things up a bit.\n\n >>> query = catalog.tokenizeQuery\n >>> sorted(catalog.findRelations(query(\n ... {zc.relation.RELATION: zc.relation.catalog.any(a, b, f),\n ... 'supervisor': a})))\n [<Employee instance \"Betty\">]\n\nThe catalog always has parallel search methods, one for finding objects, as\nseen above, and one for finding tokens (the only exception is ``canFind``,\ndescribed below). Finding tokens can be much more efficient, especially if the\nresult from the relation catalog is just one step along the path of finding\nyour desired result. But finding objects is simpler for some common cases.\nHere's a quick example of some queries above, getting tokens rather than\nobjects.\n\nYou can also spell a query in ``tokenizeQuery`` with keyword arguments. This\nwon't work if your key is ``zc.relation.RELATION``, but otherwise it can\nimprove readability. We'll see some examples of this below as well.\n\n >>> sorted(catalog.findRelationTokens(query(supervisor=a)))\n ['Betty', 'Chuck']\n\n >>> sorted(catalog.findRelationTokens({'supervisor': None}))\n ['Alice']\n\n >>> sorted(catalog.findRelationTokens(\n ... query(supervisor=zc.relation.catalog.any(c, d))))\n ['Frank', 'Galyn', 'Howie']\n\n >>> sorted(catalog.findRelationTokens(\n ... query({zc.relation.RELATION: zc.relation.catalog.any(a, b, f),\n ... 'supervisor': a})))\n ['Betty']\n\nThe catalog provides several other methods just for working with tokens.\n\n- ``resolveQuery``: the inverse of ``tokenizeQuery``, converting a\n tokenizedquery to a query with objects.\n\n- ``tokenizeValues``: returns an iterable of tokens for the values of the given\n index name.\n\n- ``resolveValueTokens``: returns an iterable of values for the tokens of the\n given index name.\n\n- ``tokenizeRelation``: returns a token for the given relation.\n\n- ``resolveRelationToken``: returns a relation for the given token.\n\n- ``tokenizeRelations``: returns an iterable of tokens for the relations given.\n\n- ``resolveRelationTokens``: returns an iterable of relations for the tokens\n given.\n\nThese methods are lesser used, and described in more technical documents in\nthis package.\n\nTransitive Searching, Query Factories, and ``maxDepth``\n-------------------------------------------------------\n\nSo, we've seen a lot of one-level, intransitive searching. What about\ntransitive searching? Well, you need to tell the catalog how to walk the tree.\nIn simple (and very common) cases like this, the\n``zc.relation.queryfactory.TransposingTransitive`` will do the trick.\n\nA transitive query factory is just a callable that the catalog uses to\nask \"I got this query, and here are the results I found. I'm supposed to\nwalk another step transitively, so what query should I search for next?\"\nWriting a factory is more complex than we want to talk about right now,\nbut using the ``TransposingTransitiveQueryFactory`` is easy. You just tell\nit the two query names it should transpose for walking in either\ndirection.\n\nFor instance, here we just want to tell the factory to transpose the two keys\nwe've used, ``zc.relation.RELATION`` and 'supervisor'. Let's make a factory,\nuse it in a query for a couple of transitive searches, and then, if you want,\nyou can read through a footnote to talk through what is happening.\n\nHere's the factory.\n\n >>> import zc.relation.queryfactory\n >>> factory = zc.relation.queryfactory.TransposingTransitive(\n ... zc.relation.RELATION, 'supervisor')\n\nNow ``factory`` is just a callable. Let's let it help answer a couple of\nquestions.\n\nWho are all of Howie's supervisors transitively (this looks up in the\ndiagram)?\n\n >>> list(catalog.findValues('supervisor', {zc.relation.RELATION: 'Howie'},\n ... queryFactory=factory))\n ... # doctest: +NORMALIZE_WHITESPACE\n [<Employee instance \"Diane\">, <Employee instance \"Betty\">,\n <Employee instance \"Alice\">]\n\nWho are all of the people Betty supervises transitively, breadth first (this\nlooks down in the diagram)?\n\n >>> people = list(catalog.findRelations(\n ... {'supervisor': 'Betty'}, queryFactory=factory))\n >>> sorted(people[:2])\n [<Employee instance \"Diane\">, <Employee instance \"Edgar\">]\n >>> people[2]\n <Employee instance \"Howie\">\n\nYup, that looks right. So how did that work? If you care, read this\nfootnote. [#I_care]_\n\nThis transitive factory is really the only transitive factory you would\nwant for this particular catalog, so it probably is safe to wire it in\nas a default. You can add multiple query factories to match different\nqueries using ``addDefaultQueryFactory``.\n\n >>> catalog.addDefaultQueryFactory(factory)\n\nNow all searches are transitive by default.\n\n >>> list(catalog.findValues('supervisor', {zc.relation.RELATION: 'Howie'}))\n ... # doctest: +NORMALIZE_WHITESPACE\n [<Employee instance \"Diane\">, <Employee instance \"Betty\">,\n <Employee instance \"Alice\">]\n >>> people = list(catalog.findRelations({'supervisor': 'Betty'}))\n >>> sorted(people[:2])\n [<Employee instance \"Diane\">, <Employee instance \"Edgar\">]\n >>> people[2]\n <Employee instance \"Howie\">\n\nWe can force a non-transitive search, or a specific search depth, with\n``maxDepth`` [#needs_a_transitive_queries_factory]_.\n\n\n.. [#needs_a_transitive_queries_factory] A search with a ``maxDepth`` > 1 but\n no ``queryFactory`` raises an error.\n\n >>> catalog.removeDefaultQueryFactory(factory)\n >>> catalog.findRelationTokens({'supervisor': 'Diane'}, maxDepth=3)\n Traceback (most recent call last):\n ...\n ValueError: if maxDepth not in (None, 1), queryFactory must be available\n\n >>> catalog.addDefaultQueryFactory(factory)\n\n >>> list(catalog.findValues(\n ... 'supervisor', {zc.relation.RELATION: 'Howie'}, maxDepth=1))\n [<Employee instance \"Diane\">]\n >>> sorted(catalog.findRelations({'supervisor': 'Betty'}, maxDepth=1))\n [<Employee instance \"Diane\">, <Employee instance \"Edgar\">]\n\n[#maxDepthExceptions]_\n\n\n.. [#maxDepthExceptions] ``maxDepth`` must be None or a positive integer, or\n else you'll get a value error.\n\n >>> catalog.findRelations({'supervisor': 'Betty'}, maxDepth=0)\n Traceback (most recent call last):\n ...\n ValueError: maxDepth must be None or a positive integer\n\n >>> catalog.findRelations({'supervisor': 'Betty'}, maxDepth=-1)\n Traceback (most recent call last):\n ...\n ValueError: maxDepth must be None or a positive integer\n\nWe'll introduce some other available search\narguments later in this document and in other documents. It's important\nto note that *all search methods share the same arguments as\n``findRelations``*. ``findValues`` and ``findValueTokens`` only add the\ninitial argument of specifying the desired value.\n\nWe've looked at two search methods so far: the ``findValues`` and\n``findRelations`` methods help you ask what is related. But what if you\nwant to know *how* things are transitively related?\n\n``findRelationChains`` and ``targetQuery``\n------------------------------------------\n\nAnother search method, ``findRelationChains``, helps you discover how\nthings are transitively related.\n\nThe method name says \"find relation chains\". But what is a \"relation\nchain\"? In this API, it is a transitive path of relations. For\ninstance, what's the chain of command above Howie? ``findRelationChains``\nwill return each unique path.\n\n >>> list(catalog.findRelationChains({zc.relation.RELATION: 'Howie'}))\n ... # doctest: +NORMALIZE_WHITESPACE\n [(<Employee instance \"Howie\">,),\n (<Employee instance \"Howie\">, <Employee instance \"Diane\">),\n (<Employee instance \"Howie\">, <Employee instance \"Diane\">,\n <Employee instance \"Betty\">),\n (<Employee instance \"Howie\">, <Employee instance \"Diane\">,\n <Employee instance \"Betty\">, <Employee instance \"Alice\">)]\n\nLook at that result carefully. Notice that the result is an iterable of\ntuples. Each tuple is a unique chain, which may be a part of a\nsubsequent chain. In this case, the last chain is the longest and the\nmost comprehensive.\n\nWhat if we wanted to see all the paths from Alice? That will be one\nchain for each supervised employee, because it shows all possible paths.\n\n >>> sorted(catalog.findRelationChains(\n ... {'supervisor': 'Alice'}))\n ... # doctest: +NORMALIZE_WHITESPACE\n [(<Employee instance \"Betty\">,),\n (<Employee instance \"Betty\">, <Employee instance \"Diane\">),\n (<Employee instance \"Betty\">, <Employee instance \"Diane\">,\n <Employee instance \"Howie\">),\n (<Employee instance \"Betty\">, <Employee instance \"Edgar\">),\n (<Employee instance \"Chuck\">,),\n (<Employee instance \"Chuck\">, <Employee instance \"Frank\">),\n (<Employee instance \"Chuck\">, <Employee instance \"Galyn\">)]\n\nThat's all the paths--all the chains--from Alice. We sorted the results,\nbut normally they would be breadth first.\n\nBut what if we wanted to just find the paths from one query result to\nanother query result--say, we wanted to know the chain of command from Alice\ndown to Howie? Then we can specify a ``targetQuery`` that specifies the\ncharacteristics of our desired end point (or points).\n\n >>> list(catalog.findRelationChains(\n ... {'supervisor': 'Alice'},\n ... targetQuery={zc.relation.RELATION: 'Howie'}))\n ... # doctest: +NORMALIZE_WHITESPACE\n [(<Employee instance \"Betty\">, <Employee instance \"Diane\">,\n <Employee instance \"Howie\">)]\n\nSo, Betty supervises Diane, who supervises Howie.\n\nNote that ``targetQuery`` now joins ``maxDepth`` in our collection of shared\nsearch arguments that we have introduced.\n\n``filter`` and ``targetFilter``\n-------------------------------\n\nWe can take a quick look now at the last of the two shared search arguments:\n``filter`` and ``targetFilter``. These two are similar in that they both are\ncallables that can approve or reject given relations in a search based on\nwhatever logic you can code. They differ in that ``filter`` stops any further\ntransitive searches from the relation, while ``targetFilter`` merely omits the\ngiven result but allows further search from it. Like ``targetQuery``, then,\n``targetFilter`` is good when you want to specify the other end of a path.\n\nAs an example, let's say we only want to return female employees.\n\n >>> female_employees = ('Alice', 'Betty', 'Diane', 'Galyn')\n >>> def female_filter(relchain, query, catalog, cache):\n ... return relchain[-1] in female_employees\n ...\n\nHere are all the female employees supervised by Alice transitively, using\n``targetFilter``.\n\n >>> list(catalog.findRelations({'supervisor': 'Alice'},\n ... targetFilter=female_filter))\n ... # doctest: +NORMALIZE_WHITESPACE\n [<Employee instance \"Betty\">, <Employee instance \"Diane\">,\n <Employee instance \"Galyn\">]\n\nHere are all the female employees supervised by Chuck.\n\n >>> list(catalog.findRelations({'supervisor': 'Chuck'},\n ... targetFilter=female_filter))\n [<Employee instance \"Galyn\">]\n\nThe same method used as a filter will only return females directly\nsupervised by other females--not Galyn, in this case.\n\n >>> list(catalog.findRelations({'supervisor': 'Alice'},\n ... filter=female_filter))\n [<Employee instance \"Betty\">, <Employee instance \"Diane\">]\n\nThese can be combined with one another, and with the other search\narguments [#filter]_.\n\n.. [#filter] For instance:\n\n >>> list(catalog.findRelationTokens(\n ... {'supervisor': 'Alice'}, targetFilter=female_filter,\n ... targetQuery={zc.relation.RELATION: 'Galyn'}))\n ['Galyn']\n >>> list(catalog.findRelationTokens(\n ... {'supervisor': 'Alice'}, targetFilter=female_filter,\n ... targetQuery={zc.relation.RELATION: 'Not known'}))\n []\n >>> arbitrary = ['Alice', 'Chuck', 'Betty', 'Galyn']\n >>> def arbitrary_filter(relchain, query, catalog, cache):\n ... return relchain[-1] in arbitrary\n >>> list(catalog.findRelationTokens({'supervisor': 'Alice'},\n ... filter=arbitrary_filter,\n ... targetFilter=female_filter))\n ['Betty', 'Galyn']\n\nSearch indexes\n--------------\n\nWithout setting up any additional indexes, the transitive behavior of\nthe ``findRelations`` and ``findValues`` methods essentially relies on the\nbrute force searches of ``findRelationChains``. Results are iterables\nthat are gradually computed. For instance, let's repeat the question\n\"Whom does Betty supervise?\". Notice that ``res`` first populates a list\nwith three members, but then does not populate a second list. The\niterator has been exhausted.\n\n >>> res = catalog.findRelationTokens({'supervisor': 'Betty'})\n >>> unindexed = sorted(res)\n >>> len(unindexed)\n 3\n >>> len(list(res)) # iterator is exhausted\n 0\n\nThe brute force of this approach can be sufficient in many cases, but\nsometimes speed for these searches is critical. In these cases, you can\nadd a \"search index\". A search index speeds up the result of one or\nmore precise searches by indexing the results. Search indexes can\naffect the results of searches with a ``queryFactory`` in ``findRelations``,\n``findValues``, and the soon-to-be-introduced ``canFind``, but they do not\naffect ``findRelationChains``.\n\nThe zc.relation package currently includes two kinds of search indexes, one for\nindexing transitive membership searches in a hierarchy and one for intransitive\nsearches explored in tokens.rst in this package, which can optimize frequent\nsearches on complex queries or can effectively change the meaning of an\nintransitive search. Other search index implementations and approaches may be\nadded in the future.\n\nHere's a very brief example of adding a search index for the transitive\nsearches seen above that specify a 'supervisor'.\n\n >>> import zc.relation.searchindex\n >>> catalog.addSearchIndex(\n ... zc.relation.searchindex.TransposingTransitiveMembership(\n ... 'supervisor', zc.relation.RELATION))\n\nThe ``zc.relation.RELATION`` describes how to walk back up the chain. Search\nindexes are explained in reasonable detail in searchindex.rst.\n\nNow that we have added the index, we can search again. The result this\ntime is already computed, so, at least when you ask for tokens, it\nis repeatable.\n\n >>> res = catalog.findRelationTokens({'supervisor': 'Betty'})\n >>> len(list(res))\n 3\n >>> len(list(res))\n 3\n >>> sorted(res) == unindexed\n True\n\nNote that the breadth-first sorting is lost when an index is used [#updates]_.\n\n.. [#updates] The scenario we are looking at in this document shows a case\n in which special logic in the search index needs to address updates.\n For example, if we move Howie from Diane\n\n ::\n\n Alice\n __/ \\__\n Betty Chuck\n / \\ / \\\n Diane Edgar Frank Galyn\n |\n Howie\n\n to Galyn\n\n ::\n\n Alice\n __/ \\__\n Betty Chuck\n / \\ / \\\n Diane Edgar Frank Galyn\n |\n Howie\n\n then the search index is correct both for the new location and the old.\n\n >>> h.supervisor = g\n >>> catalog.index(h)\n >>> list(catalog.findRelationTokens({'supervisor': 'Diane'}))\n []\n >>> list(catalog.findRelationTokens({'supervisor': 'Betty'}))\n ['Diane', 'Edgar']\n >>> list(catalog.findRelationTokens({'supervisor': 'Chuck'}))\n ['Frank', 'Galyn', 'Howie']\n >>> list(catalog.findRelationTokens({'supervisor': 'Galyn'}))\n ['Howie']\n >>> h.supervisor = d\n >>> catalog.index(h) # move him back\n >>> list(catalog.findRelationTokens({'supervisor': 'Galyn'}))\n []\n >>> list(catalog.findRelationTokens({'supervisor': 'Diane'}))\n ['Howie']\n\nTransitive cycles (and updating and removing relations)\n-------------------------------------------------------\n\nThe transitive searches and the provided search indexes can handle\ncycles. Cycles are less likely in the current example than some others,\nbut we can stretch the case a bit: imagine a \"king in disguise\", in\nwhich someone at the top works lower in the hierarchy. Perhaps Alice\nworks for Zane, who works for Betty, who works for Alice. Artificial,\nbut easy enough to draw::\n\n ______\n / \\\n / Zane\n / |\n / Alice\n / __/ \\__\n / Betty__ Chuck\n \\-/ / \\ / \\\n Diane Edgar Frank Galyn\n |\n Howie\n\nEasy to create too.\n\n >>> z = Employee('Zane', b)\n >>> a.supervisor = z\n\nNow we have a cycle. Of course, we have not yet told the catalog about it.\n``index`` can be used both to reindex Alice and index Zane.\n\n >>> catalog.index(a)\n >>> catalog.index(z)\n\nNow, if we ask who works for Betty, we get the entire tree. (We'll ask\nfor tokens, just so that the result is smaller to look at.) [#same_set]_\n\n.. [#same_set] The result of the query for Betty, Alice, and Zane are all the\n same.\n\n >>> res1 = catalog.findRelationTokens({'supervisor': 'Betty'})\n >>> res2 = catalog.findRelationTokens({'supervisor': 'Alice'})\n >>> res3 = catalog.findRelationTokens({'supervisor': 'Zane'})\n >>> list(res1) == list(res2) == list(res3)\n True\n\n The cycle doesn't pollute the index outside of the cycle.\n\n >>> res = catalog.findRelationTokens({'supervisor': 'Diane'})\n >>> list(res)\n ['Howie']\n >>> list(res) # it isn't lazy, it is precalculated\n ['Howie']\n\n >>> sorted(catalog.findRelationTokens({'supervisor': 'Betty'}))\n ... # doctest: +NORMALIZE_WHITESPACE\n ['Alice', 'Betty', 'Chuck', 'Diane', 'Edgar', 'Frank', 'Galyn', 'Howie',\n 'Zane']\n\nIf we ask for the supervisors of Frank, it will include Betty.\n\n >>> list(catalog.findValueTokens(\n ... 'supervisor', {zc.relation.RELATION: 'Frank'}))\n ['Chuck', 'Alice', 'Zane', 'Betty']\n\nPaths returned by ``findRelationChains`` are marked with special interfaces,\nand special metadata, to show the chain.\n\n >>> res = list(catalog.findRelationChains({zc.relation.RELATION: 'Frank'}))\n >>> len(res)\n 5\n >>> import zc.relation.interfaces\n >>> [zc.relation.interfaces.ICircularRelationPath.providedBy(r)\n ... for r in res]\n [False, False, False, False, True]\n\nHere's the last chain:\n\n >>> res[-1] # doctest: +NORMALIZE_WHITESPACE\n cycle(<Employee instance \"Frank\">, <Employee instance \"Chuck\">,\n <Employee instance \"Alice\">, <Employee instance \"Zane\">,\n <Employee instance \"Betty\">)\n\nThe chain's 'cycled' attribute has a list of queries that create a cycle.\nIf you run the query, or queries, you see where the cycle would\nrestart--where the path would have started to overlap. Sometimes the query\nresults will include multiple cycles, and some paths that are not cycles.\nIn this case, there's only a single cycled query, which results in a single\ncycled relation.\n\n >>> len(res[4].cycled)\n 1\n\n >>> list(catalog.findRelations(res[4].cycled[0], maxDepth=1))\n [<Employee instance \"Alice\">]\n\nTo remove this craziness [#reverse_lookup]_, we can unindex Zane, and change\nand reindex Alice.\n\n.. [#reverse_lookup] If you want to, look what happens when you go the\n other way:\n\n >>> res = list(catalog.findRelationChains({'supervisor': 'Zane'}))\n >>> def sortEqualLenByName(one):\n ... return len(one), one\n ...\n >>> res.sort(key=sortEqualLenByName) # normalizes for test stability\n >>> from __future__ import print_function\n >>> print(res) # doctest: +NORMALIZE_WHITESPACE\n [(<Employee instance \"Alice\">,),\n (<Employee instance \"Alice\">, <Employee instance \"Betty\">),\n (<Employee instance \"Alice\">, <Employee instance \"Chuck\">),\n (<Employee instance \"Alice\">, <Employee instance \"Betty\">,\n <Employee instance \"Diane\">),\n (<Employee instance \"Alice\">, <Employee instance \"Betty\">,\n <Employee instance \"Edgar\">),\n cycle(<Employee instance \"Alice\">, <Employee instance \"Betty\">,\n <Employee instance \"Zane\">),\n (<Employee instance \"Alice\">, <Employee instance \"Chuck\">,\n <Employee instance \"Frank\">),\n (<Employee instance \"Alice\">, <Employee instance \"Chuck\">,\n <Employee instance \"Galyn\">),\n (<Employee instance \"Alice\">, <Employee instance \"Betty\">,\n <Employee instance \"Diane\">, <Employee instance \"Howie\">)]\n\n >>> [zc.relation.interfaces.ICircularRelationPath.providedBy(r)\n ... for r in res]\n [False, False, False, False, False, True, False, False, False]\n >>> len(res[5].cycled)\n 1\n >>> list(catalog.findRelations(res[5].cycled[0], maxDepth=1))\n [<Employee instance \"Alice\">]\n\n >>> a.supervisor = None\n >>> catalog.index(a)\n\n >>> list(catalog.findValueTokens(\n ... 'supervisor', {zc.relation.RELATION: 'Frank'}))\n ['Chuck', 'Alice']\n\n >>> catalog.unindex(z)\n\n >>> sorted(catalog.findRelationTokens({'supervisor': 'Betty'}))\n ['Diane', 'Edgar', 'Howie']\n\n``canFind``\n-----------\n\nWe're to the last search method: ``canFind``. We've gotten values and\nrelations, but what if you simply want to know if there is any\nconnection at all? For instance, is Alice a supervisor of Howie? Is\nChuck? To answer these questions, you can use the ``canFind`` method\ncombined with the ``targetQuery`` search argument.\n\nThe ``canFind`` method takes the same arguments as findRelations. However,\nit simply returns a boolean about whether the search has any results. This\nis a convenience that also allows some extra optimizations.\n\nDoes Betty supervise anyone?\n\n >>> catalog.canFind({'supervisor': 'Betty'})\n True\n\nWhat about Howie?\n\n >>> catalog.canFind({'supervisor': 'Howie'})\n False\n\nWhat about...Zane (no longer an employee)?\n\n >>> catalog.canFind({'supervisor': 'Zane'})\n False\n\nIf we want to know if Alice or Chuck supervise Howie, then we want to specify\ncharacteristics of two points on a path. To ask a question about the other\nend of a path, use ``targetQuery``.\n\nIs Alice a supervisor of Howie?\n\n >>> catalog.canFind({'supervisor': 'Alice'},\n ... targetQuery={zc.relation.RELATION: 'Howie'})\n True\n\nIs Chuck a supervisor of Howie?\n\n >>> catalog.canFind({'supervisor': 'Chuck'},\n ... targetQuery={zc.relation.RELATION: 'Howie'})\n False\n\nIs Howie Alice's employee?\n\n >>> catalog.canFind({zc.relation.RELATION: 'Howie'},\n ... targetQuery={'supervisor': 'Alice'})\n True\n\nIs Howie Chuck's employee?\n\n >>> catalog.canFind({zc.relation.RELATION: 'Howie'},\n ... targetQuery={'supervisor': 'Chuck'})\n False\n\n(Note that, if your relations describe a hierarchy, searching up a hierarchy is\nusually more efficient than searching down, so the second pair of questions is\ngenerally preferable to the first in that case.)\n\nWorking with More Complex Relations\n===================================\n\nSo far, our examples have used a simple relation, in which the indexed object\nis one end of the relation, and the indexed value on the object is the other.\nThis example has let us look at all of the basic zc.relation catalog\nfunctionality.\n\nAs mentioned in the introduction, though, the catalog supports, and was\ndesigned for, more complex relations. This section will quickly examine a\nfew examples of other uses.\n\nIn this section, we will see several examples of ideas mentioned above but not\nyet demonstrated.\n\n- We can use interface attributes (values or callables) to define value\n indexes.\n\n- Using interface attributes will cause an attempt to adapt the relation if it\n does not already provide the interface.\n\n- We can use the ``multiple`` argument when defining a value index to indicate\n that the indexed value is a collection.\n\n- We can use the ``name`` argument when defining a value index to specify the\n name to be used in queries, rather than relying on the name of the interface\n attribute or callable.\n\n- The ``family`` argument in instantiating the catalog lets you change the\n default btree family for relations and value indexes from\n ``BTrees.family32.IF`` to ``BTrees.family64.IF``.\n\nExtrinsic Two-Way Relations\n---------------------------\n\nA simple variation of our current story is this: what if the indexed relation\nwere between two other objects--that is, what if the relation were extrinsic to\nboth participants?\n\nLet's imagine we have relations that show biological parentage. We'll want a\n\"Person\" and a \"Parentage\" relation. We'll define an interface for\n``IParentage`` so we can see how using an interface to define a value index\nworks.\n\n >>> class Person(object):\n ... def __init__(self, name):\n ... self.name = name\n ... def __repr__(self):\n ... return '<Person %r>' % (self.name,)\n ...\n >>> import zope.interface\n >>> class IParentage(zope.interface.Interface):\n ... child = zope.interface.Attribute('the child')\n ... parents = zope.interface.Attribute('the parents')\n ...\n >>> @zope.interface.implementer(IParentage)\n ... class Parentage(object):\n ...\n ... def __init__(self, child, parent1, parent2):\n ... self.child = child\n ... self.parents = (parent1, parent2)\n ...\n\nNow we'll define the dumpers and loaders and then the catalog. Notice that\nwe are relying on a pattern: the dump must be called before the load.\n\n >>> _people = {}\n >>> _relations = {}\n >>> def dumpPeople(obj, catalog, cache):\n ... if _people.setdefault(obj.name, obj) is not obj:\n ... raise ValueError('we are assuming names are unique')\n ... return obj.name\n ...\n >>> def loadPeople(token, catalog, cache):\n ... return _people[token]\n ...\n >>> def dumpRelations(obj, catalog, cache):\n ... if _relations.setdefault(id(obj), obj) is not obj:\n ... raise ValueError('huh?')\n ... return id(obj)\n ...\n >>> def loadRelations(token, catalog, cache):\n ... return _relations[token]\n ...\n >>> catalog = zc.relation.catalog.Catalog(dumpRelations, loadRelations, family=BTrees.family64)\n >>> catalog.addValueIndex(IParentage['child'], dumpPeople, loadPeople,\n ... btree=BTrees.family32.OO)\n >>> catalog.addValueIndex(IParentage['parents'], dumpPeople, loadPeople,\n ... btree=BTrees.family32.OO, multiple=True,\n ... name='parent')\n >>> catalog.addDefaultQueryFactory(\n ... zc.relation.queryfactory.TransposingTransitive(\n ... 'child', 'parent'))\n\nNow we have a catalog fully set up. Let's add some relations.\n\n >>> a = Person('Alice')\n >>> b = Person('Betty')\n >>> c = Person('Charles')\n >>> d = Person('Donald')\n >>> e = Person('Eugenia')\n >>> f = Person('Fred')\n >>> g = Person('Gertrude')\n >>> h = Person('Harry')\n >>> i = Person('Iphigenia')\n >>> j = Person('Jacob')\n >>> k = Person('Karyn')\n >>> l = Person('Lee')\n\n >>> r1 = Parentage(child=j, parent1=k, parent2=l)\n >>> r2 = Parentage(child=g, parent1=i, parent2=j)\n >>> r3 = Parentage(child=f, parent1=g, parent2=h)\n >>> r4 = Parentage(child=e, parent1=g, parent2=h)\n >>> r5 = Parentage(child=b, parent1=e, parent2=d)\n >>> r6 = Parentage(child=a, parent1=e, parent2=c)\n\nHere's that in one of our hierarchy diagrams.\n\n::\n\n Karyn Lee\n \\ /\n Jacob Iphigenia\n \\ /\n Gertrude Harry\n \\ /\n /-------\\\n Fred Eugenia\n Donald / \\ Charles\n \\ / \\ /\n Betty Alice\n\nNow we can index the relations, and ask some questions.\n\n >>> for r in (r1, r2, r3, r4, r5, r6):\n ... catalog.index(r)\n >>> query = catalog.tokenizeQuery\n >>> sorted(catalog.findValueTokens(\n ... 'parent', query(child=a), maxDepth=1))\n ['Charles', 'Eugenia']\n >>> sorted(catalog.findValueTokens('parent', query(child=g)))\n ['Iphigenia', 'Jacob', 'Karyn', 'Lee']\n >>> sorted(catalog.findValueTokens(\n ... 'child', query(parent=h), maxDepth=1))\n ['Eugenia', 'Fred']\n >>> sorted(catalog.findValueTokens('child', query(parent=h)))\n ['Alice', 'Betty', 'Eugenia', 'Fred']\n >>> catalog.canFind(query(parent=h), targetQuery=query(child=d))\n False\n >>> catalog.canFind(query(parent=l), targetQuery=query(child=b))\n True\n\nMulti-Way Relations\n-------------------\n\nThe previous example quickly showed how to set the catalog up for a completely\nextrinsic two-way relation. The same pattern can be extended for N-way\nrelations. For example, consider a four way relation in the form of\nSUBJECTS PREDICATE OBJECTS [in CONTEXT]. For instance, we might\nwant to say \"(joe,) SELLS (doughnuts, coffee) in corner_store\", where \"(joe,)\"\nis the collection of subjects, \"SELLS\" is the predicate, \"(doughnuts, coffee)\"\nis the collection of objects, and \"corner_store\" is the optional context.\n\nFor this last example, we'll integrate two components we haven't seen examples\nof here before: the ZODB and adaptation.\n\nOur example ZODB approach uses OIDs as the tokens. this might be OK in some\ncases, if you will never support multiple databases and you don't need an\nabstraction layer so that a different object can have the same identifier.\n\n >>> import persistent\n >>> import struct\n >>> class Demo(persistent.Persistent):\n ... def __init__(self, name):\n ... self.name = name\n ... def __repr__(self):\n ... return '<Demo instance %r>' % (self.name,)\n ...\n >>> class IRelation(zope.interface.Interface):\n ... subjects = zope.interface.Attribute('subjects')\n ... predicate = zope.interface.Attribute('predicate')\n ... objects = zope.interface.Attribute('objects')\n ...\n >>> class IContextual(zope.interface.Interface):\n ... def getContext():\n ... 'return context'\n ... def setContext(value):\n ... 'set context'\n ...\n >>> @zope.interface.implementer(IContextual)\n ... class Contextual(object):\n ...\n ... _context = None\n ... def getContext(self):\n ... return self._context\n ... def setContext(self, value):\n ... self._context = value\n ...\n >>> @zope.interface.implementer(IRelation)\n ... class Relation(persistent.Persistent):\n ...\n ... def __init__(self, subjects, predicate, objects):\n ... self.subjects = subjects\n ... self.predicate = predicate\n ... self.objects = objects\n ... self._contextual = Contextual()\n ...\n ... def __conform__(self, iface):\n ... if iface is IContextual:\n ... return self._contextual\n ...\n\n(When using zope.component, the ``__conform__`` would normally be unnecessary;\nhowever, this package does not depend on zope.component.)\n\n >>> def dumpPersistent(obj, catalog, cache):\n ... if obj._p_jar is None:\n ... catalog._p_jar.add(obj) # assumes something else places it\n ... return struct.unpack('<q', obj._p_oid)[0]\n ...\n >>> def loadPersistent(token, catalog, cache):\n ... return catalog._p_jar.get(struct.pack('<q', token))\n ...\n\n >>> from ZODB.tests.util import DB\n >>> db = DB()\n >>> conn = db.open()\n >>> root = conn.root()\n >>> catalog = root['catalog'] = zc.relation.catalog.Catalog(\n ... dumpPersistent, loadPersistent, family=BTrees.family64)\n >>> catalog.addValueIndex(IRelation['subjects'],\n ... dumpPersistent, loadPersistent, multiple=True, name='subject')\n >>> catalog.addValueIndex(IRelation['objects'],\n ... dumpPersistent, loadPersistent, multiple=True, name='object')\n >>> catalog.addValueIndex(IRelation['predicate'], btree=BTrees.family32.OO)\n >>> catalog.addValueIndex(IContextual['getContext'],\n ... dumpPersistent, loadPersistent, name='context')\n >>> import transaction\n >>> transaction.commit()\n\nThe ``dumpPersistent`` and ``loadPersistent`` is a bit of a toy, as warned\nabove. Also, while our predicate will be stored as a string, some programmers\nmay prefer to have a dump in such a case verify that the string has been\nexplicitly registered in some way, to prevent typos. Obviously, we are not\nbothering with this for our example.\n\nWe make some objects, and then we make some relations with those objects and\nindex them.\n\n >>> joe = root['joe'] = Demo('joe')\n >>> sara = root['sara'] = Demo('sara')\n >>> jack = root['jack'] = Demo('jack')\n >>> ann = root['ann'] = Demo('ann')\n >>> doughnuts = root['doughnuts'] = Demo('doughnuts')\n >>> coffee = root['coffee'] = Demo('coffee')\n >>> muffins = root['muffins'] = Demo('muffins')\n >>> cookies = root['cookies'] = Demo('cookies')\n >>> newspaper = root['newspaper'] = Demo('newspaper')\n >>> corner_store = root['corner_store'] = Demo('corner_store')\n >>> bistro = root['bistro'] = Demo('bistro')\n >>> bakery = root['bakery'] = Demo('bakery')\n\n >>> SELLS = 'SELLS'\n >>> BUYS = 'BUYS'\n >>> OBSERVES = 'OBSERVES'\n\n >>> rel1 = root['rel1'] = Relation((joe,), SELLS, (doughnuts, coffee))\n >>> IContextual(rel1).setContext(corner_store)\n >>> rel2 = root['rel2'] = Relation((sara, jack), SELLS,\n ... (muffins, doughnuts, cookies))\n >>> IContextual(rel2).setContext(bakery)\n >>> rel3 = root['rel3'] = Relation((ann,), BUYS, (doughnuts,))\n >>> rel4 = root['rel4'] = Relation((sara,), BUYS, (bistro,))\n\n >>> for r in (rel1, rel2, rel3, rel4):\n ... catalog.index(r)\n ...\n\nNow we can ask a simple question. Where do they sell doughnuts?\n\n >>> query = catalog.tokenizeQuery\n >>> sorted(catalog.findValues(\n ... 'context',\n ... (query(predicate=SELLS, object=doughnuts))),\n ... key=lambda ob: ob.name)\n [<Demo instance 'bakery'>, <Demo instance 'corner_store'>]\n\nHopefully these examples give you further ideas on how you can use this tool.\n\nAdditional Functionality\n========================\n\nThis section introduces peripheral functionality. We will learn the following.\n\n- Listeners can be registered in the catalog. They are alerted when a relation\n is added, modified, or removed; and when the catalog is cleared and copied\n (see below).\n\n- The ``clear`` method clears the relations in the catalog.\n\n- The ``copy`` method makes a copy of the current catalog by copying internal\n data structures, rather than reindexing the relations, which can be a\n significant optimization opportunity. This copies value indexes and search\n indexes; and gives listeners an opportunity to specify what, if anything,\n should be included in the new copy.\n\n- The ``ignoreSearchIndex`` argument to the five pertinent search methods\n causes the search to ignore search indexes, even if there is an appropriate\n one.\n\n- ``findRelationTokens()`` (without arguments) returns the BTree set of all\n relation tokens in the catalog.\n\n- ``findValueTokens(INDEX_NAME)`` (where \"INDEX_NAME\" should be replaced with\n an index name) returns the BTree set of all value tokens in the catalog for\n the given index name.\n\nListeners\n---------\n\nA variety of potential clients may want to be alerted when the catalog changes.\nzc.relation does not depend on zope.event, so listeners may be registered for\nvarious changes. Let's make a quick demo listener. The ``additions`` and\n``removals`` arguments are dictionaries of {value name: iterable of added or\nremoved value tokens}.\n\n >>> def pchange(d):\n ... pprint.pprint(dict(\n ... (k, v is not None and sorted(set(v)) or v) for k, v in d.items()))\n >>> @zope.interface.implementer(zc.relation.interfaces.IListener)\n ... class DemoListener(persistent.Persistent):\n ...\n ... def relationAdded(self, token, catalog, additions):\n ... print('a relation (token %r) was added to %r '\n ... 'with these values:' % (token, catalog))\n ... pchange(additions)\n ... def relationModified(self, token, catalog, additions, removals):\n ... print('a relation (token %r) in %r was modified '\n ... 'with these additions:' % (token, catalog))\n ... pchange(additions)\n ... print('and these removals:')\n ... pchange(removals)\n ... def relationRemoved(self, token, catalog, removals):\n ... print('a relation (token %r) was removed from %r '\n ... 'with these values:' % (token, catalog))\n ... pchange(removals)\n ... def sourceCleared(self, catalog):\n ... print('catalog %r had all relations unindexed' % (catalog,))\n ... def sourceAdded(self, catalog):\n ... print('now listening to catalog %r' % (catalog,))\n ... def sourceRemoved(self, catalog):\n ... print('no longer listening to catalog %r' % (catalog,))\n ... def sourceCopied(self, original, copy):\n ... print('catalog %r made a copy %r' % (catalog, copy))\n ... copy.addListener(self)\n ...\n\nListeners can be installed multiple times.\n\nListeners can be added as persistent weak references, so that, if they are\ndeleted elsewhere, a ZODB pack will not consider the reference in the catalog\nto be something preventing garbage collection.\n\nWe'll install one of these demo listeners into our new catalog as a\nnormal reference, the default behavior. Then we'll show some example messages\nsent to the demo listener.\n\n >>> listener = DemoListener()\n >>> catalog.addListener(listener) # doctest: +ELLIPSIS\n now listening to catalog <zc.relation.catalog.Catalog object at ...>\n >>> rel5 = root['rel5'] = Relation((ann,), OBSERVES, (newspaper,))\n >>> catalog.index(rel5) # doctest: +ELLIPSIS\n a relation (token ...) was added to <...Catalog...> with these values:\n {'context': None,\n 'object': [...],\n 'predicate': ['OBSERVES'],\n 'subject': [...]}\n >>> rel5.subjects = (jack,)\n >>> IContextual(rel5).setContext(bistro)\n >>> catalog.index(rel5) # doctest: +ELLIPSIS\n a relation (token ...) in ...Catalog... was modified with these additions:\n {'context': [...], 'subject': [...]}\n and these removals:\n {'subject': [...]}\n >>> catalog.unindex(rel5) # doctest: +ELLIPSIS\n a relation (token ...) was removed from <...Catalog...> with these values:\n {'context': [...],\n 'object': [...],\n 'predicate': ['OBSERVES'],\n 'subject': [...]}\n\n >>> catalog.removeListener(listener) # doctest: +ELLIPSIS\n no longer listening to catalog <...Catalog...>\n >>> catalog.index(rel5) # doctest: +ELLIPSIS\n\nThe only two methods not shown by those examples are ``sourceCleared`` and\n``sourceCopied``. We'll get to those very soon below.\n\nThe ``clear`` Method\n--------------------\n\nThe ``clear`` method simply indexes all relations from a catalog. Installed\nlisteners have ``sourceCleared`` called.\n\n >>> len(catalog)\n 5\n\n >>> catalog.addListener(listener) # doctest: +ELLIPSIS\n now listening to catalog <zc.relation.catalog.Catalog object at ...>\n\n >>> catalog.clear() # doctest: +ELLIPSIS\n catalog <...Catalog...> had all relations unindexed\n\n >>> len(catalog)\n 0\n >>> sorted(catalog.findValues(\n ... 'context',\n ... (query(predicate=SELLS, object=doughnuts))),\n ... key=lambda ob: ob.name)\n []\n\nThe ``copy`` Method\n-------------------\n\nSometimes you may want to copy a relation catalog. One way of doing this is\nto create a new catalog, set it up like the current one, and then reindex\nall the same relations. This is unnecessarily slow for programmer and\ncomputer. The ``copy`` method makes a new catalog with the same corpus of\nindexed relations by copying internal data structures.\n\nSearch indexes are requested to make new copies of themselves for the new\ncatalog; and listeners are given an opportunity to react as desired to the new\ncopy, including installing themselves, and/or another object of their choosing\nas a listener.\n\nLet's make a copy of a populated index with a search index and a listener.\nNotice in our listener that ``sourceCopied`` adds itself as a listener to the\nnew copy. This is done at the very end of the ``copy`` process.\n\n >>> for r in (rel1, rel2, rel3, rel4, rel5):\n ... catalog.index(r)\n ... # doctest: +ELLIPSIS\n a relation ... was added...\n a relation ... was added...\n a relation ... was added...\n a relation ... was added...\n a relation ... was added...\n >>> BEGAT = 'BEGAT'\n >>> rel6 = root['rel6'] = Relation((jack, ann), BEGAT, (sara,))\n >>> henry = root['henry'] = Demo('henry')\n >>> rel7 = root['rel7'] = Relation((sara, joe), BEGAT, (henry,))\n >>> catalog.index(rel6) # doctest: +ELLIPSIS\n a relation (token ...) was added to <...Catalog...> with these values:\n {'context': None,\n 'object': [...],\n 'predicate': ['BEGAT'],\n 'subject': [..., ...]}\n >>> catalog.index(rel7) # doctest: +ELLIPSIS\n a relation (token ...) was added to <...Catalog...> with these values:\n {'context': None,\n 'object': [...],\n 'predicate': ['BEGAT'],\n 'subject': [..., ...]}\n >>> catalog.addDefaultQueryFactory(\n ... zc.relation.queryfactory.TransposingTransitive(\n ... 'subject', 'object', {'predicate': BEGAT}))\n ...\n >>> list(catalog.findValues(\n ... 'object', query(subject=jack, predicate=BEGAT)))\n [<Demo instance 'sara'>, <Demo instance 'henry'>]\n >>> catalog.addSearchIndex(\n ... zc.relation.searchindex.TransposingTransitiveMembership(\n ... 'subject', 'object', static={'predicate': BEGAT}))\n >>> sorted(\n ... catalog.findValues(\n ... 'object', query(subject=jack, predicate=BEGAT)),\n ... key=lambda o: o.name)\n [<Demo instance 'henry'>, <Demo instance 'sara'>]\n\n >>> newcat = root['newcat'] = catalog.copy() # doctest: +ELLIPSIS\n catalog <...Catalog...> made a copy <...Catalog...>\n now listening to catalog <...Catalog...>\n >>> transaction.commit()\n\nNow the copy has its own copies of internal data structures and of the\nsearchindex. For example, let's modify the relations and add a new one to the\ncopy.\n\n >>> mary = root['mary'] = Demo('mary')\n >>> buffy = root['buffy'] = Demo('buffy')\n >>> zack = root['zack'] = Demo('zack')\n >>> rel7.objects += (mary,)\n >>> rel8 = root['rel8'] = Relation((henry, buffy), BEGAT, (zack,))\n >>> newcat.index(rel7) # doctest: +ELLIPSIS\n a relation (token ...) in ...Catalog... was modified with these additions:\n {'object': [...]}\n and these removals:\n {}\n >>> newcat.index(rel8) # doctest: +ELLIPSIS\n a relation (token ...) was added to ...Catalog... with these values:\n {'context': None,\n 'object': [...],\n 'predicate': ['BEGAT'],\n 'subject': [..., ...]}\n >>> len(newcat)\n 8\n >>> sorted(\n ... newcat.findValues(\n ... 'object', query(subject=jack, predicate=BEGAT)),\n ... key=lambda o: o.name) # doctest: +NORMALIZE_WHITESPACE\n [<Demo instance 'henry'>, <Demo instance 'mary'>, <Demo instance 'sara'>,\n <Demo instance 'zack'>]\n >>> sorted(\n ... newcat.findValues(\n ... 'object', query(subject=sara)),\n ... key=lambda o: o.name) # doctest: +NORMALIZE_WHITESPACE\n [<Demo instance 'bistro'>, <Demo instance 'cookies'>,\n <Demo instance 'doughnuts'>, <Demo instance 'henry'>,\n <Demo instance 'mary'>, <Demo instance 'muffins'>]\n\nThe original catalog is not modified.\n\n >>> len(catalog)\n 7\n >>> sorted(\n ... catalog.findValues(\n ... 'object', query(subject=jack, predicate=BEGAT)),\n ... key=lambda o: o.name)\n [<Demo instance 'henry'>, <Demo instance 'sara'>]\n >>> sorted(\n ... catalog.findValues(\n ... 'object', query(subject=sara)),\n ... key=lambda o: o.name) # doctest: +NORMALIZE_WHITESPACE\n [<Demo instance 'bistro'>, <Demo instance 'cookies'>,\n <Demo instance 'doughnuts'>, <Demo instance 'henry'>,\n <Demo instance 'muffins'>]\n\nThe ``ignoreSearchIndex`` argument\n----------------------------------\n\nThe five methods that can use search indexes, ``findValues``,\n``findValueTokens``, ``findRelations``, ``findRelationTokens``, and\n``canFind``, can be explicitly requested to ignore any pertinent search index\nusing the ``ignoreSearchIndex`` argument.\n\nWe can see this easily with the token-related methods: the search index result\nwill be a BTree set, while without the search index the result will be a\ngenerator.\n\n >>> res1 = newcat.findValueTokens(\n ... 'object', query(subject=jack, predicate=BEGAT))\n >>> res1 # doctest: +ELLIPSIS\n LFSet([..., ..., ..., ...])\n >>> res2 = newcat.findValueTokens(\n ... 'object', query(subject=jack, predicate=BEGAT),\n ... ignoreSearchIndex=True)\n >>> res2 # doctest: +ELLIPSIS\n <generator object ... at 0x...>\n >>> sorted(res2) == list(res1)\n True\n\n >>> res1 = newcat.findRelationTokens(\n ... query(subject=jack, predicate=BEGAT))\n >>> res1 # doctest: +ELLIPSIS\n LFSet([..., ..., ...])\n >>> res2 = newcat.findRelationTokens(\n ... query(subject=jack, predicate=BEGAT), ignoreSearchIndex=True)\n >>> res2 # doctest: +ELLIPSIS\n <generator object ... at 0x...>\n >>> sorted(res2) == list(res1)\n True\n\nWe can see that the other methods take the argument, but the results look the\nsame as usual.\n\n >>> res = newcat.findValues(\n ... 'object', query(subject=jack, predicate=BEGAT),\n ... ignoreSearchIndex=True)\n >>> res # doctest: +ELLIPSIS\n <generator object ... at 0x...>\n >>> list(res) == list(newcat.resolveValueTokens(newcat.findValueTokens(\n ... 'object', query(subject=jack, predicate=BEGAT),\n ... ignoreSearchIndex=True), 'object'))\n True\n\n >>> res = newcat.findRelations(\n ... query(subject=jack, predicate=BEGAT),\n ... ignoreSearchIndex=True)\n >>> res # doctest: +ELLIPSIS\n <generator object ... at 0x...>\n >>> list(res) == list(newcat.resolveRelationTokens(\n ... newcat.findRelationTokens(\n ... query(subject=jack, predicate=BEGAT),\n ... ignoreSearchIndex=True)))\n True\n\n >>> newcat.canFind(\n ... query(subject=jack, predicate=BEGAT), ignoreSearchIndex=True)\n True\n\n``findRelationTokens()``\n------------------------\n\nIf you call ``findRelationTokens`` without any arguments, you will get the\nBTree set of all relation tokens in the catalog. This can be handy for tests\nand for advanced uses of the catalog.\n\n >>> newcat.findRelationTokens() # doctest: +ELLIPSIS\n <BTrees.LFBTree.LFTreeSet object at ...>\n >>> len(newcat.findRelationTokens())\n 8\n >>> set(newcat.resolveRelationTokens(newcat.findRelationTokens())) == set(\n ... (rel1, rel2, rel3, rel4, rel5, rel6, rel7, rel8))\n True\n\n``findValueTokens(INDEX_NAME)``\n-------------------------------\n\nIf you call ``findValueTokens`` with only an index name, you will get the BTree\nstructure of all tokens for that value in the index. This can be handy for\ntests and for advanced uses of the catalog.\n\n >>> newcat.findValueTokens('predicate') # doctest: +ELLIPSIS\n <BTrees.OOBTree.OOBTree object at ...>\n >>> list(newcat.findValueTokens('predicate'))\n ['BEGAT', 'BUYS', 'OBSERVES', 'SELLS']\n\nConclusion\n==========\n\nReview\n------\n\nThat brings us to the end of our introductory examples. Let's review, and\nthen look at where you can go from here.\n\n* Relations are objects with indexed values.\n\n* The relation catalog indexes relations. The relations can be one-way,\n two-way, three-way, or N-way, as long as you tell the catalog to index the\n different values.\n\n* Creating a catalog:\n\n - Relations and their values are stored in the catalog as tokens: unique\n identifiers that you can resolve back to the original value. Integers are\n the most efficient tokens, but others can work fine too.\n\n - Token type determines the BTree module needed.\n\n - If the tokens are 32-bit ints, choose ``BTrees.family32.II``,\n ``BTrees.family32.IF`` or ``BTrees.family32.IO``.\n\n - If the tokens are 64 bit ints, choose ``BTrees.family64.II``,\n ``BTrees.family64.IF`` or ``BTrees.family64.IO``.\n\n - If they are anything else, choose ``BTrees.family32.OI``,\n ``BTrees.family64.OI``, or ``BTrees.family32.OO`` (or\n BTrees.family64.OO--they are the same).\n\n Within these rules, the choice is somewhat arbitrary unless you plan to\n merge these results with that of another source that is using a\n particular BTree module. BTree set operations only work within the same\n module, so you must match module to module.\n\n - The ``family`` argument in instantiating the catalog lets you change the\n default btree family for relations and value indexes from\n ``BTrees.family32.IF`` to ``BTrees.family64.IF``.\n\n - You must define your own functions for tokenizing and resolving tokens.\n These functions are registered with the catalog for the relations and for\n each of their value indexes.\n\n - You add value indexes to relation catalogs to be able to search. Values\n can be identified to the catalog with callables or interface elements.\n\n - Using interface attributes will cause an attempt to adapt the\n relation if it does not already provide the interface.\n\n - We can use the ``multiple`` argument when defining a value index to\n indicate that the indexed value is a collection. This defaults to\n False.\n\n - We can use the ``name`` argument when defining a value index to\n specify the name to be used in queries, rather than relying on the\n name of the interface attribute or callable.\n\n - You can set up search indexes to speed up specific searches, usually\n transitive.\n\n - Listeners can be registered in the catalog. They are alerted when a\n relation is added, modified, or removed; and when the catalog is cleared\n and copied.\n\n* Catalog Management:\n\n - Relations are indexed with ``index(relation)``, and removed from the\n catalog with ``unindex(relation)``. ``index_doc(relation_token,\n relation)`` and ``unindex_doc(relation_token)`` also work.\n\n - The ``clear`` method clears the relations in the catalog.\n\n - The ``copy`` method makes a copy of the current catalog by copying\n internal data structures, rather than reindexing the relations, which can\n be a significant optimization opportunity. This copies value indexes and\n search indexes; and gives listeners an opportunity to specify what, if\n anything, should be included in the new copy.\n\n* Searching a catalog:\n\n - Queries to the relation catalog are formed with dicts.\n\n - Query keys are the names of the indexes you want to search, or, for the\n special case of precise relations, the ``zc.relation.RELATION`` constant.\n\n - Query values are the tokens of the results you want to match; or\n ``None``, indicating relations that have ``None`` as a value (or an empty\n collection, if it is a multiple). Search values can use\n ``zc.relation.catalog.any(args)`` or ``zc.relation.catalog.Any(args)`` to\n specify multiple (non-``None``) results to match for a given key.\n\n - The index has a variety of methods to help you work with tokens.\n ``tokenizeQuery`` is typically the most used, though others are\n available.\n\n - To find relations that match a query, use ``findRelations`` or\n ``findRelationTokens``. Calling ``findRelationTokens`` without any\n arguments returns the BTree set of all relation tokens in the catalog.\n\n - To find values that match a query, use ``findValues`` or\n ``findValueTokens``. Calling ``findValueTokens`` with only the name\n of a value index returns the BTree set of all tokens in the catalog for\n that value index.\n\n - You search transitively by using a query factory. The\n ``zc.relation.queryfactory.TransposingTransitive`` is a good common case\n factory that lets you walk up and down a hierarchy. A query factory can\n be passed in as an argument to search methods as a ``queryFactory``, or\n installed as a default behavior using ``addDefaultQueryFactory``.\n\n - To find how a query is related, use ``findRelationChains`` or\n ``findRelationTokenChains``.\n\n - To find out if a query is related, use ``canFind``.\n\n - Circular transitive relations are handled to prevent infinite loops. They\n are identified in ``findRelationChains`` and ``findRelationTokenChains``\n with a ``zc.relation.interfaces.ICircularRelationPath`` marker interface.\n\n - search methods share the following arguments:\n\n * ``maxDepth``, limiting the transitive depth for searches;\n\n * ``filter``, allowing code to filter transitive paths;\n\n * ``targetQuery``, allowing a query to filter transitive paths on the\n basis of the endpoint;\n\n * ``targetFilter``, allowing code to filter transitive paths on the basis\n of the endpoint; and\n\n * ``queryFactory``, mentioned above.\n\n In addition, the ``ignoreSearchIndex`` argument to ``findRelations``,\n ``findRelationTokens``, ``findValues``, ``findValueTokens``, and\n ``canFind`` causes the search to ignore search indexes, even if there is\n an appropriate one.\n\nNext Steps\n----------\n\nIf you want to read more, next steps depend on how you like to learn. Here\nare some of the other documents in the zc.relation package.\n\n:optimization.rst:\n Best practices for optimizing your use of the relation catalog.\n\n:searchindex.rst:\n Queries factories and search indexes: from basics to nitty gritty details.\n\n:tokens.rst:\n This document explores the details of tokens. All God's chillun\n love tokens, at least if God's chillun are writing non-toy apps\n using zc.relation. It includes discussion of the token helpers that\n the catalog provides, how to use zope.app.intid-like registries with\n zc.relation, how to use tokens to \"join\" query results reasonably\n efficiently, and how to index joins. It also is unnecessarily\n mind-blowing because of the examples used.\n\n:interfaces.py:\n The contract, for nuts and bolts.\n\nFinally, the truly die-hard might also be interested in the timeit\ndirectory, which holds scripts used to test assumptions and learn.\n\n.. ......... ..\n.. FOOTNOTES ..\n.. ......... ..\n\n.. [#I_care] OK, you care about how that query factory worked, so\n we will look into it a bit. Let's talk through two steps of the\n transitive search in the second question. The catalog initially\n performs the initial intransitive search requested: find relations\n for which Betty is the supervisor. That's Diane and Edgar.\n\n Now, for each of the results, the catalog asks the query factory for\n next steps. Let's take Diane. The catalog says to the factory,\n \"Given this query for relations where Betty is supervisor, I got\n this result of Diane. Do you have any other queries I should try to\n look further?\". The factory also gets the catalog instance so it\n can use it to answer the question if it needs to.\n\n OK, the next part is where your brain hurts. Hang on.\n\n In our case, the factory sees that the query was for supervisor. Its\n other key, the one it transposes with, is ``zc.relation.RELATION``. *The\n factory gets the transposing key's result for the current token.* So, for\n us, a key of ``zc.relation.RELATION`` is actually a no-op: the result *is*\n the current token, Diane. Then, the factory has its answer: replace the old\n value of supervisor in the query, Betty, with the result, Diane. The next\n transitive query should be {'supervisor', 'Diane'}. Ta-da.\n\n\n======================================================\nTokens and Joins: zc.relation Catalog Extended Example\n======================================================\n\nIntroduction and Set Up\n=======================\n\nThis document assumes you have read the introductory README.rst and want\nto learn a bit more by example. In it, we will explore a more\ncomplicated set of relations that demonstrates most of the aspects of\nworking with tokens. In particular, we will look at joins, which will\nalso give us a chance to look more in depth at query factories and\nsearch indexes, and introduce the idea of listeners. It will not explain\nthe basics that the README already addressed.\n\nImagine we are indexing security assertions in a system. In this\nsystem, users may have roles within an organization. Each organization\nmay have multiple child organizations and may have a single parent\norganization. A user with a role in a parent organization will have the\nsame role in all transitively connected child relations.\n\nWe have two kinds of relations, then. One kind of relation will model\nthe hierarchy of organizations. We'll do it with an intrinsic relation\nof organizations to their children: that reflects the fact that parent\norganizations choose and are comprised of their children; children do\nnot choose their parents.\n\nThe other relation will model the (multiple) roles a (single) user has\nin a (single) organization. This relation will be entirely extrinsic.\n\nWe could create two catalogs, one for each type. Or we could put them\nboth in the same catalog. Initially, we'll go with the single-catalog\napproach for our examples. This single catalog, then, will be indexing\na heterogeneous collection of relations.\n\nLet's define the two relations with interfaces. We'll include one\naccessor, getOrganization, largely to show how to handle methods.\n\n >>> import zope.interface\n >>> class IOrganization(zope.interface.Interface):\n ... title = zope.interface.Attribute('the title')\n ... parts = zope.interface.Attribute(\n ... 'the organizations that make up this one')\n ...\n >>> class IRoles(zope.interface.Interface):\n ... def getOrganization():\n ... 'return the organization in which this relation operates'\n ... principal_id = zope.interface.Attribute(\n ... 'the pricipal id whose roles this relation lists')\n ... role_ids = zope.interface.Attribute(\n ... 'the role ids that the principal explicitly has in the '\n ... 'organization. The principal may have other roles via '\n ... 'roles in parent organizations.')\n ...\n\nNow we can create some classes. In the README example, the setup was a bit\nof a toy. This time we will be just a bit more practical. We'll also expect\nto be operating within the ZODB, with a root and transactions. [#ZODB]_\n\n.. [#ZODB] Here we will set up a ZODB instance for us to use.\n\n >>> from ZODB.tests.util import DB\n >>> db = DB()\n >>> conn = db.open()\n >>> root = conn.root()\n\nHere's how we will dump and load our relations: use a \"registry\"\nobject, similar to an intid utility. [#faux_intid]_\n\n.. [#faux_intid] Here's a simple persistent keyreference. Notice that it is\n not persistent itself: this is important for conflict resolution to be\n able to work (which we don't show here, but we're trying to lean more\n towards real usage for this example).\n\n >>> from functools import total_ordering\n >>> @total_ordering\n ... class Reference(object): # see zope.app.keyreference\n ... def __init__(self, obj):\n ... self.object = obj\n ... def _get_sorting_key(self):\n ... # this doesn't work during conflict resolution. See\n ... # zope.app.keyreference.persistent, 3.5 release, for current\n ... # best practice.\n ... if self.object._p_jar is None:\n ... raise ValueError(\n ... 'can only compare when both objects have connections')\n ... return self.object._p_oid or ''\n ... def __lt__(self, other):\n ... # this doesn't work during conflict resolution. See\n ... # zope.app.keyreference.persistent, 3.5 release, for current\n ... # best practice.\n ... if not isinstance(other, Reference):\n ... raise ValueError('can only compare with Reference objects')\n ... return self._get_sorting_key() < other._get_sorting_key()\n ... def __eq__(self, other):\n ... # this doesn't work during conflict resolution. See\n ... # zope.app.keyreference.persistent, 3.5 release, for current\n ... # best practice.\n ... if not isinstance(other, Reference):\n ... raise ValueError('can only compare with Reference objects')\n ... return self._get_sorting_key() == other._get_sorting_key()\n\n Here's a simple integer identifier tool.\n\n >>> import persistent\n >>> import BTrees\n >>> class Registry(persistent.Persistent): # see zope.app.intid\n ... def __init__(self, family=BTrees.family32):\n ... self.family = family\n ... self.ids = self.family.IO.BTree()\n ... self.refs = self.family.OI.BTree()\n ... def getId(self, obj):\n ... if not isinstance(obj, persistent.Persistent):\n ... raise ValueError('not a persistent object', obj)\n ... if obj._p_jar is None:\n ... self._p_jar.add(obj)\n ... ref = Reference(obj)\n ... id = self.refs.get(ref)\n ... if id is None:\n ... # naive for conflict resolution; see zope.app.intid\n ... if self.ids:\n ... id = self.ids.maxKey() + 1\n ... else:\n ... id = self.family.minint\n ... self.ids[id] = ref\n ... self.refs[ref] = id\n ... return id\n ... def __contains__(self, obj):\n ... if (not isinstance(obj, persistent.Persistent) or\n ... obj._p_oid is None):\n ... return False\n ... return Reference(obj) in self.refs\n ... def getObject(self, id, default=None):\n ... res = self.ids.get(id, None)\n ... if res is None:\n ... return default\n ... else:\n ... return res.object\n ... def remove(self, r):\n ... if isinstance(r, int):\n ... self.refs.pop(self.ids.pop(r))\n ... elif (not isinstance(r, persistent.Persistent) or\n ... r._p_oid is None):\n ... raise LookupError(r)\n ... else:\n ... self.ids.pop(self.refs.pop(Reference(r)))\n ...\n >>> registry = root['registry'] = Registry()\n\n >>> import transaction\n >>> transaction.commit()\n\nIn this implementation of the \"dump\" method, we use the cache just to\nshow you how you might use it. It probably is overkill for this job,\nand maybe even a speed loss, but you can see the idea.\n\n >>> def dump(obj, catalog, cache):\n ... reg = cache.get('registry')\n ... if reg is None:\n ... reg = cache['registry'] = catalog._p_jar.root()['registry']\n ... return reg.getId(obj)\n ...\n >>> def load(token, catalog, cache):\n ... reg = cache.get('registry')\n ... if reg is None:\n ... reg = cache['registry'] = catalog._p_jar.root()['registry']\n ... return reg.getObject(token)\n ...\n\nNow we can create a relation catalog to hold these items.\n\n >>> import zc.relation.catalog\n >>> catalog = root['catalog'] = zc.relation.catalog.Catalog(dump, load)\n >>> transaction.commit()\n\nNow we set up our indexes. We'll start with just the organizations, and\nset up the catalog with them. This part will be similar to the example\nin README.rst, but will introduce more discussions of optimizations and\ntokens. Then we'll add in the part about roles, and explore queries and\ntoken-based \"joins\".\n\nOrganizations\n=============\n\nThe organization will hold a set of organizations. This is actually not\ninherently easy in the ZODB because this means that we need to compare\nor hash persistent objects, which does not work reliably over time and\nacross machines out-of-the-box. To side-step the issue for this example,\nand still do something a bit interesting and real-world, we'll use the\nregistry tokens introduced above. This will also give us a chance to\ntalk a bit more about optimizations and tokens. (If you would like\nto sanely and transparently hold a set of persistent objects, try the\nzc.set package XXX not yet.)\n\n >>> import BTrees\n >>> import persistent\n >>> @zope.interface.implementer(IOrganization)\n ... @total_ordering\n ... class Organization(persistent.Persistent):\n ...\n ... def __init__(self, title):\n ... self.title = title\n ... self.parts = BTrees.family32.IF.TreeSet()\n ... # the next parts just make the tests prettier\n ... def __repr__(self):\n ... return '<Organization instance \"' + self.title + '\">'\n ... def __lt__(self, other):\n ... # pukes if other doesn't have name\n ... return self.title < other.title\n ... def __eq__(self, other):\n ... return self is other\n ... def __hash__(self):\n ... return 1 # dummy\n ...\n\nOK, now we know how organizations will work. Now we can add the `parts`\nindex to the catalog. This will do a few new things from how we added\nindexes in the README.\n\n\n >>> catalog.addValueIndex(IOrganization['parts'], multiple=True,\n ... name=\"part\")\n\nSo, what's different from the README examples?\n\nFirst, we are using an interface element to define the value to be indexed.\nIt provides an interface to which objects will be adapted, a default name\nfor the index, and information as to whether the attribute should be used\ndirectly or called.\n\nSecond, we are not specifying a dump or load. They are None. This\nmeans that the indexed value can already be treated as a token. This\ncan allow a very significant optimization for reindexing if the indexed\nvalue is a large collection using the same BTree family as the\nindex--which leads us to the next difference.\n\nThird, we are specifying that `multiple=True`. This means that the value\non a given relation that provides or can be adapted to IOrganization will\nhave a collection of `parts`. These will always be regarded as a set,\nwhether the actual colection is a BTrees set or the keys of a BTree.\n\nLast, we are specifying a name to be used for queries. I find that queries\nread more easily when the query keys are singular, so I often rename plurals.\n\nAs in the README, We can add another simple transposing transitive query\nfactory, switching between 'part' and `None`.\n\n >>> import zc.relation.queryfactory\n >>> factory1 = zc.relation.queryfactory.TransposingTransitive(\n ... 'part', None)\n >>> catalog.addDefaultQueryFactory(factory1)\n\nLet's add a couple of search indexes in too, of the hierarchy looking up...\n\n >>> import zc.relation.searchindex\n >>> catalog.addSearchIndex(\n ... zc.relation.searchindex.TransposingTransitiveMembership(\n ... 'part', None))\n\n...and down.\n\n >>> catalog.addSearchIndex(\n ... zc.relation.searchindex.TransposingTransitiveMembership(\n ... None, 'part'))\n\nPLEASE NOTE: the search index looking up is not a good idea practically. The\nindex is designed for looking down [#verifyObjectTransitive]_.\n\n.. [#verifyObjectTransitive] The TransposingTransitiveMembership indexes\n provide ISearchIndex.\n\n >>> from zope.interface.verify import verifyObject\n >>> import zc.relation.interfaces\n >>> index = list(catalog.iterSearchIndexes())[0]\n >>> verifyObject(zc.relation.interfaces.ISearchIndex, index)\n True\n\nLet's create and add a few organizations.\n\nWe'll make a structure like this [#silliness]_::\n\n Ynod Corp Mangement Zookd Corp Management\n / | \\ / | \\\n Ynod Devs Ynod SAs Ynod Admins Zookd Admins Zookd SAs Zookd Devs\n / \\ \\ / / \\\n Y3L4 Proj Bet Proj Ynod Zookd Task Force Zookd hOgnmd Zookd Nbd\n\nHere's the Python.\n\n\n >>> orgs = root['organizations'] = BTrees.family32.OO.BTree()\n >>> for nm, parts in (\n ... ('Y3L4 Proj', ()),\n ... ('Bet Proj', ()),\n ... ('Ynod Zookd Task Force', ()),\n ... ('Zookd hOgnmd', ()),\n ... ('Zookd Nbd', ()),\n ... ('Ynod Devs', ('Y3L4 Proj', 'Bet Proj')),\n ... ('Ynod SAs', ()),\n ... ('Ynod Admins', ('Ynod Zookd Task Force',)),\n ... ('Zookd Admins', ('Ynod Zookd Task Force',)),\n ... ('Zookd SAs', ()),\n ... ('Zookd Devs', ('Zookd hOgnmd', 'Zookd Nbd')),\n ... ('Ynod Corp Management', ('Ynod Devs', 'Ynod SAs', 'Ynod Admins')),\n ... ('Zookd Corp Management', ('Zookd Devs', 'Zookd SAs',\n ... 'Zookd Admins'))):\n ... org = Organization(nm)\n ... for part in parts:\n ... ignore = org.parts.insert(registry.getId(orgs[part]))\n ... orgs[nm] = org\n ... catalog.index(org)\n ...\n\nNow the catalog knows about the relations.\n\n >>> len(catalog)\n 13\n >>> root['dummy'] = Organization('Foo')\n >>> root['dummy'] in catalog\n False\n >>> orgs['Y3L4 Proj'] in catalog\n True\n\nAlso, now we can search. To do this, we can use some of the token methods that\nthe catalog provides. The most commonly used is `tokenizeQuery`. It takes a\nquery with values that are not tokenized and converts them to values that are\ntokenized.\n\n >>> Ynod_SAs_id = registry.getId(orgs['Ynod SAs'])\n >>> catalog.tokenizeQuery({None: orgs['Ynod SAs']}) == {\n ... None: Ynod_SAs_id}\n True\n >>> Zookd_SAs_id = registry.getId(orgs['Zookd SAs'])\n >>> Zookd_Devs_id = registry.getId(orgs['Zookd Devs'])\n >>> catalog.tokenizeQuery(\n ... {None: zc.relation.catalog.any(\n ... orgs['Zookd SAs'], orgs['Zookd Devs'])}) == {\n ... None: zc.relation.catalog.any(Zookd_SAs_id, Zookd_Devs_id)}\n True\n\nOf course, right now doing this with 'part' alone is kind of silly, since it\ndoes not change within the relation catalog (because we said that dump and\nload were `None`, as discussed above).\n\n >>> catalog.tokenizeQuery({'part': Ynod_SAs_id}) == {\n ... 'part': Ynod_SAs_id}\n True\n >>> catalog.tokenizeQuery(\n ... {'part': zc.relation.catalog.any(Zookd_SAs_id, Zookd_Devs_id)}\n ... ) == {'part': zc.relation.catalog.any(Zookd_SAs_id, Zookd_Devs_id)}\n True\n\nThe `tokenizeQuery` method is so common that we're going to assign it to\na variable in our example. Then we'll do a search or two.\n\nSo...find the relations that Ynod Devs supervise.\n\n >>> t = catalog.tokenizeQuery\n >>> res = list(catalog.findRelationTokens(t({None: orgs['Ynod Devs']})))\n\nOK...we used `findRelationTokens`, as opposed to `findRelations`, so res\nis a couple of numbers now. How do we convert them back?\n`resolveRelationTokens` will do the trick.\n\n >>> len(res)\n 3\n >>> sorted(catalog.resolveRelationTokens(res))\n ... # doctest: +NORMALIZE_WHITESPACE\n [<Organization instance \"Bet Proj\">, <Organization instance \"Y3L4 Proj\">,\n <Organization instance \"Ynod Devs\">]\n\n`resolveQuery` is the mirror image of `tokenizeQuery`: it converts\ntokenized queries to queries with \"loaded\" values.\n\n >>> original = {'part': zc.relation.catalog.any(\n ... Zookd_SAs_id, Zookd_Devs_id),\n ... None: orgs['Zookd Devs']}\n >>> tokenized = catalog.tokenizeQuery(original)\n >>> original == catalog.resolveQuery(tokenized)\n True\n\n >>> original = {None: zc.relation.catalog.any(\n ... orgs['Zookd SAs'], orgs['Zookd Devs']),\n ... 'part': Zookd_Devs_id}\n >>> tokenized = catalog.tokenizeQuery(original)\n >>> original == catalog.resolveQuery(tokenized)\n True\n\nLikewise, `tokenizeRelations` is the mirror image of `resolveRelationTokens`.\n\n >>> sorted(catalog.tokenizeRelations(\n ... [orgs[\"Bet Proj\"], orgs[\"Y3L4 Proj\"]])) == sorted(\n ... registry.getId(o) for o in\n ... [orgs[\"Bet Proj\"], orgs[\"Y3L4 Proj\"]])\n True\n\nThe other token-related methods are as follows\n[#show_remaining_token_methods]_:\n\n.. [#show_remaining_token_methods] For what it's worth, here are some small\n examples of the remaining token-related methods.\n\n These two are the singular versions of `tokenizeRelations` and\n `resolveRelationTokens`.\n\n `tokenizeRelation` returns a token for the given relation.\n\n >>> catalog.tokenizeRelation(orgs['Zookd Corp Management']) == (\n ... registry.getId(orgs['Zookd Corp Management']))\n True\n\n `resolveRelationToken` returns a relation for the given token.\n\n >>> catalog.resolveRelationToken(registry.getId(\n ... orgs['Zookd Corp Management'])) is orgs['Zookd Corp Management']\n True\n\n The \"values\" ones are a bit lame to show now, since the only value\n we have right now is not tokenized but used straight up. But here\n goes, showing some fascinating no-ops.\n\n `tokenizeValues`, returns an iterable of tokens for the values of\n the given index name.\n\n >>> list(catalog.tokenizeValues((1,2,3), 'part'))\n [1, 2, 3]\n\n `resolveValueTokens` returns an iterable of values for the tokens of\n the given index name.\n\n >>> list(catalog.resolveValueTokens((1,2,3), 'part'))\n [1, 2, 3]\n\n\n- `tokenizeValues`, which returns an iterable of tokens for the values\n of the given index name;\n- `resolveValueTokens`, which returns an iterable of values for the tokens of\n the given index name;\n- `tokenizeRelation`, which returns a token for the given relation; and\n- `resolveRelationToken`, which returns a relation for the given token.\n\nWhy do we bother with these tokens, instead of hiding them away and\nmaking the API prettier? By exposing them, we enable efficient joining,\nand efficient use in other contexts. For instance, if you use the same\nintid utility to tokenize in other catalogs, our results can be merged\nwith the results of other catalogs. Similarly, you can use the results\nof queries to other catalogs--or even \"joins\" from earlier results of\nquerying this catalog--as query values here. We'll explore this in the\nnext section.\n\nRoles\n=====\n\nWe have set up the Organization relations. Now let's set up the roles, and\nactually be able to answer the questions that we described at the beginning\nof the document.\n\nIn our Roles object, roles and principals will simply be strings--ids, if\nthis were a real system. The organization will be a direct object reference.\n\n >>> @zope.interface.implementer(IRoles)\n ... @total_ordering\n ... class Roles(persistent.Persistent):\n ...\n ... def __init__(self, principal_id, role_ids, organization):\n ... self.principal_id = principal_id\n ... self.role_ids = BTrees.family32.OI.TreeSet(role_ids)\n ... self._organization = organization\n ... def getOrganization(self):\n ... return self._organization\n ... # the rest is for prettier/easier tests\n ... def __repr__(self):\n ... return \"<Roles instance (%s has %s in %s)>\" % (\n ... self.principal_id, ', '.join(self.role_ids),\n ... self._organization.title)\n ... def __lt__(self, other):\n ... _self = (\n ... self.principal_id,\n ... tuple(self.role_ids),\n ... self._organization.title,\n ... )\n ... _other = (\n ... other.principal_id,\n ... tuple(other.role_ids),\n ... other._organization.title,\n ... )\n ... return _self <_other\n ... def __eq__(self, other):\n ... return self is other\n ... def __hash__(self):\n ... return 1 # dummy\n ...\n\nNow let's add add the value indexes to the relation catalog.\n\n >>> catalog.addValueIndex(IRoles['principal_id'], btree=BTrees.family32.OI)\n >>> catalog.addValueIndex(IRoles['role_ids'], btree=BTrees.family32.OI,\n ... multiple=True, name='role_id')\n >>> catalog.addValueIndex(IRoles['getOrganization'], dump, load,\n ... name='organization')\n\nThose are some slightly new variations of what we've seen in `addValueIndex`\nbefore, but all mixing and matching on the same ingredients.\n\nAs a reminder, here is our organization structure::\n\n Ynod Corp Mangement Zookd Corp Management\n / | \\ / | \\\n Ynod Devs Ynod SAs Ynod Admins Zookd Admins Zookd SAs Zookd Devs\n / \\ \\ / / \\\n Y3L4 Proj Bet Proj Ynod Zookd Task Force Zookd hOgnmd Zookd Nbd\n\nNow let's create and add some roles.\n\n >>> principal_ids = [\n ... 'abe', 'bran', 'cathy', 'david', 'edgar', 'frank', 'gertrude',\n ... 'harriet', 'ignas', 'jacob', 'karyn', 'lettie', 'molly', 'nancy',\n ... 'ophelia', 'pat']\n >>> role_ids = ['user manager', 'writer', 'reviewer', 'publisher']\n >>> get_role = dict((v[0], v) for v in role_ids).__getitem__\n >>> roles = root['roles'] = BTrees.family32.IO.BTree()\n >>> next = 0\n >>> for prin, org, role_ids in (\n ... ('abe', orgs['Zookd Corp Management'], 'uwrp'),\n ... ('bran', orgs['Ynod Corp Management'], 'uwrp'),\n ... ('cathy', orgs['Ynod Devs'], 'w'),\n ... ('cathy', orgs['Y3L4 Proj'], 'r'),\n ... ('david', orgs['Bet Proj'], 'wrp'),\n ... ('edgar', orgs['Ynod Devs'], 'up'),\n ... ('frank', orgs['Ynod SAs'], 'uwrp'),\n ... ('frank', orgs['Ynod Admins'], 'w'),\n ... ('gertrude', orgs['Ynod Zookd Task Force'], 'uwrp'),\n ... ('harriet', orgs['Ynod Zookd Task Force'], 'w'),\n ... ('harriet', orgs['Ynod Admins'], 'r'),\n ... ('ignas', orgs['Zookd Admins'], 'r'),\n ... ('ignas', orgs['Zookd Corp Management'], 'w'),\n ... ('karyn', orgs['Zookd Corp Management'], 'uwrp'),\n ... ('karyn', orgs['Ynod Corp Management'], 'uwrp'),\n ... ('lettie', orgs['Zookd Corp Management'], 'u'),\n ... ('lettie', orgs['Ynod Zookd Task Force'], 'w'),\n ... ('lettie', orgs['Zookd SAs'], 'w'),\n ... ('molly', orgs['Zookd SAs'], 'uwrp'),\n ... ('nancy', orgs['Zookd Devs'], 'wrp'),\n ... ('nancy', orgs['Zookd hOgnmd'], 'u'),\n ... ('ophelia', orgs['Zookd Corp Management'], 'w'),\n ... ('ophelia', orgs['Zookd Devs'], 'r'),\n ... ('ophelia', orgs['Zookd Nbd'], 'p'),\n ... ('pat', orgs['Zookd Nbd'], 'wrp')):\n ... assert prin in principal_ids\n ... role_ids = [get_role(l) for l in role_ids]\n ... role = roles[next] = Roles(prin, role_ids, org)\n ... role.key = next\n ... next += 1\n ... catalog.index(role)\n ...\n\nNow we can begin to do searches [#real_value_tokens]_.\n\n\n.. [#real_value_tokens] We can also show the values token methods more\n sanely now.\n\n >>> original = sorted((orgs['Zookd Devs'], orgs['Ynod SAs']))\n >>> tokens = list(catalog.tokenizeValues(original, 'organization'))\n >>> original == sorted(catalog.resolveValueTokens(tokens, 'organization'))\n True\n\nWhat are all the role settings for ophelia?\n\n >>> sorted(catalog.findRelations({'principal_id': 'ophelia'}))\n ... # doctest: +NORMALIZE_WHITESPACE\n [<Roles instance (ophelia has publisher in Zookd Nbd)>,\n <Roles instance (ophelia has reviewer in Zookd Devs)>,\n <Roles instance (ophelia has writer in Zookd Corp Management)>]\n\nThat answer does not need to be transitive: we're done.\n\nNext question. Where does ophelia have the 'writer' role?\n\n >>> list(catalog.findValues(\n ... 'organization', {'principal_id': 'ophelia',\n ... 'role_id': 'writer'}))\n [<Organization instance \"Zookd Corp Management\">]\n\nWell, that's correct intransitively. Do we need a transitive queries\nfactory? No! This is a great chance to look at the token join we talked\nabout in the previous section. This should actually be a two-step\noperation: find all of the organizations in which ophelia has writer,\nand then find all of the transitive parts to that organization.\n\n >>> sorted(catalog.findRelations({None: zc.relation.catalog.Any(\n ... catalog.findValueTokens('organization',\n ... {'principal_id': 'ophelia',\n ... 'role_id': 'writer'}))}))\n ... # doctest: +NORMALIZE_WHITESPACE\n [<Organization instance \"Ynod Zookd Task Force\">,\n <Organization instance \"Zookd Admins\">,\n <Organization instance \"Zookd Corp Management\">,\n <Organization instance \"Zookd Devs\">,\n <Organization instance \"Zookd Nbd\">,\n <Organization instance \"Zookd SAs\">,\n <Organization instance \"Zookd hOgnmd\">]\n\nThat's more like it.\n\nNext question. What users have roles in the 'Zookd Devs' organization?\nIntransitively, that's pretty easy.\n\n >>> sorted(catalog.findValueTokens(\n ... 'principal_id', t({'organization': orgs['Zookd Devs']})))\n ['nancy', 'ophelia']\n\nTransitively, we should do another join.\n\n >>> org_id = registry.getId(orgs['Zookd Devs'])\n >>> sorted(catalog.findValueTokens(\n ... 'principal_id', {\n ... 'organization': zc.relation.catalog.any(\n ... org_id, *catalog.findRelationTokens({'part': org_id}))}))\n ['abe', 'ignas', 'karyn', 'lettie', 'nancy', 'ophelia']\n\nThat's a little awkward, but it does the trick.\n\nLast question, and the kind of question that started the entire example.\n What roles does ophelia have in the \"Zookd Nbd\" organization?\n\n >>> list(catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'ophelia'})))\n ['publisher']\n\nIntransitively, that's correct. But, transitively, ophelia also has\nreviewer and writer, and that's the answer we want to be able to get quickly.\n\nWe could ask the question a different way, then, again leveraging a join.\nWe'll set it up as a function, because we will want to use it a little later\nwithout repeating the code.\n\n >>> def getRolesInOrganization(principal_id, org):\n ... org_id = registry.getId(org)\n ... return sorted(catalog.findValueTokens(\n ... 'role_id', {\n ... 'organization': zc.relation.catalog.any(\n ... org_id,\n ... *catalog.findRelationTokens({'part': org_id})),\n ... 'principal_id': principal_id}))\n ...\n >>> getRolesInOrganization('ophelia', orgs['Zookd Nbd'])\n ['publisher', 'reviewer', 'writer']\n\nAs you can see, then, working with tokens makes interesting joins possible,\nas long as the tokens are the same across the two queries.\n\nWe have examined tokens methods and token techniques like joins. The example\nstory we have told can let us get into a few more advanced topics, such as\nquery factory joins and search indexes that can increase their read speed.\n\nQuery Factory Joins\n===================\n\nWe can build a query factory that makes the join automatic. A query\nfactory is a callable that takes two arguments: a query (the one that\nstarts the search) and the catalog. The factory either returns None,\nindicating that the query factory cannot be used for this query, or it\nreturns another callable that takes a chain of relations. The last\ntoken in the relation chain is the most recent. The output of this\ninner callable is expected to be an iterable of\nBTrees.family32.OO.Bucket queries to search further from the given chain\nof relations.\n\nHere's a flawed approach to this problem.\n\n >>> def flawed_factory(query, catalog):\n ... if (len(query) == 2 and\n ... 'organization' in query and\n ... 'principal_id' in query):\n ... def getQueries(relchain):\n ... if not relchain:\n ... yield query\n ... return\n ... current = catalog.getValueTokens(\n ... 'organization', relchain[-1])\n ... if current:\n ... organizations = catalog.getRelationTokens(\n ... {'part': zc.relation.catalog.Any(current)})\n ... if organizations:\n ... res = BTrees.family32.OO.Bucket(query)\n ... res['organization'] = zc.relation.catalog.Any(\n ... organizations)\n ... yield res\n ... return getQueries\n ...\n\nThat works for our current example.\n\n >>> sorted(catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'ophelia'}),\n ... queryFactory=flawed_factory))\n ['publisher', 'reviewer', 'writer']\n\nHowever, it won't work for other similar queries.\n\n >>> getRolesInOrganization('abe', orgs['Zookd Nbd'])\n ['publisher', 'reviewer', 'user manager', 'writer']\n >>> sorted(catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'abe'}),\n ... queryFactory=flawed_factory))\n []\n\noops.\n\nThe flawed_factory is actually a useful pattern for more typical relation\ntraversal. It goes from relation to relation to relation, and ophelia has\nconnected relations all the way to the top. However, abe only has them at\nthe top, so nothing is traversed.\n\nInstead, we can make a query factory that modifies the initial query.\n\n >>> def factory2(query, catalog):\n ... if (len(query) == 2 and\n ... 'organization' in query and\n ... 'principal_id' in query):\n ... def getQueries(relchain):\n ... if not relchain:\n ... res = BTrees.family32.OO.Bucket(query)\n ... org_id = query['organization']\n ... if org_id is not None:\n ... res['organization'] = zc.relation.catalog.any(\n ... org_id,\n ... *catalog.findRelationTokens({'part': org_id}))\n ... yield res\n ... return getQueries\n ...\n\n >>> sorted(catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'ophelia'}),\n ... queryFactory=factory2))\n ['publisher', 'reviewer', 'writer']\n\n >>> sorted(catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'abe'}),\n ... queryFactory=factory2))\n ['publisher', 'reviewer', 'user manager', 'writer']\n\nA difference between this and the other approach is that it is essentially\nintransitive: this query factory modifies the initial query, and then does\nnot give further queries. The catalog currently always stops calling the\nquery factory if the queries do not return any results, so an approach like\nthe flawed_factory simply won't work for this kind of problem.\n\nWe could add this query factory as another default.\n\n >>> catalog.addDefaultQueryFactory(factory2)\n\n >>> sorted(catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'ophelia'})))\n ['publisher', 'reviewer', 'writer']\n\n >>> sorted(catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'abe'})))\n ['publisher', 'reviewer', 'user manager', 'writer']\n\nThe previously installed query factory is still available.\n\n >>> list(catalog.iterDefaultQueryFactories()) == [factory1, factory2]\n True\n\n >>> list(catalog.findRelations(\n ... {'part': registry.getId(orgs['Y3L4 Proj'])}))\n ... # doctest: +NORMALIZE_WHITESPACE\n [<Organization instance \"Ynod Devs\">,\n <Organization instance \"Ynod Corp Management\">]\n\n >>> sorted(catalog.findRelations(\n ... {None: registry.getId(orgs['Ynod Corp Management'])}))\n ... # doctest: +NORMALIZE_WHITESPACE\n [<Organization instance \"Bet Proj\">, <Organization instance \"Y3L4 Proj\">,\n <Organization instance \"Ynod Admins\">,\n <Organization instance \"Ynod Corp Management\">,\n <Organization instance \"Ynod Devs\">, <Organization instance \"Ynod SAs\">,\n <Organization instance \"Ynod Zookd Task Force\">]\n\nSearch Index for Query Factory Joins\n====================================\n\nNow that we have written a query factory that encapsulates the join, we can\nuse a search index that speeds it up. We've only used transitive search\nindexes so far. Now we will add an intransitive search index.\n\nThe intransitive search index generally just needs the search value\nnames it should be indexing, optionally the result name (defaulting to\nrelations), and optionally the query factory to be used.\n\nWe need to use two additional options because of the odd join trick we're\ndoing. We need to specify what organization and principal_id values need\nto be changed when an object is indexed, and we need to indicate that we\nshould update when organization, principal_id, *or* parts changes.\n\n`getValueTokens` specifies the values that need to be indexed. It gets\nthe index, the name for the tokens desired, the token, the catalog that\ngenerated the token change (it may not be the same as the index's\ncatalog, the source dictionary that contains a dictionary of the values\nthat will be used for tokens if you do not override them, a dict of the\nadded values for this token (keys are value names), a dict of the\nremoved values for this token, and whether the token has been removed.\nThe method can return None, which will leave the index to its default\nbehavior that should work if no query factory is used; or an iterable of\nvalues.\n\n >>> def getValueTokens(index, name, token, catalog, source,\n ... additions, removals, removed):\n ... if name == 'organization':\n ... orgs = source.get('organization')\n ... if not removed or not orgs:\n ... orgs = index.catalog.getValueTokens(\n ... 'organization', token)\n ... if not orgs:\n ... orgs = [token]\n ... orgs.extend(removals.get('part', ()))\n ... orgs = set(orgs)\n ... orgs.update(index.catalog.findValueTokens(\n ... 'part',\n ... {None: zc.relation.catalog.Any(\n ... t for t in orgs if t is not None)}))\n ... return orgs\n ... elif name == 'principal_id':\n ... # we only want custom behavior if this is an organization\n ... if 'principal_id' in source or index.catalog.getValueTokens(\n ... 'principal_id', token):\n ... return ''\n ... orgs = set((token,))\n ... orgs.update(index.catalog.findRelationTokens(\n ... {'part': token}))\n ... return set(index.catalog.findValueTokens(\n ... 'principal_id', {\n ... 'organization': zc.relation.catalog.Any(orgs)}))\n ...\n\n >>> index = zc.relation.searchindex.Intransitive(\n ... ('organization', 'principal_id'), 'role_id', factory2,\n ... getValueTokens,\n ... ('organization', 'principal_id', 'part', 'role_id'),\n ... unlimitedDepth=True)\n >>> catalog.addSearchIndex(index)\n\n >>> res = catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'ophelia'}))\n >>> list(res)\n ['publisher', 'reviewer', 'writer']\n >>> list(res)\n ['publisher', 'reviewer', 'writer']\n\n >>> res = catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'abe'}))\n >>> list(res)\n ['publisher', 'reviewer', 'user manager', 'writer']\n >>> list(res)\n ['publisher', 'reviewer', 'user manager', 'writer']\n\n[#verifyObjectIntransitive]_\n\n.. [#verifyObjectIntransitive] The Intransitive search index provides\n ISearchIndex and IListener.\n\n >>> from zope.interface.verify import verifyObject\n >>> import zc.relation.interfaces\n >>> verifyObject(zc.relation.interfaces.ISearchIndex, index)\n True\n >>> verifyObject(zc.relation.interfaces.IListener, index)\n True\n\nNow we can change and remove relations--both organizations and roles--and\nhave the index maintain correct state. Given the current state of\norganizations--\n\n::\n\n Ynod Corp Mangement Zookd Corp Management\n / | \\ / | \\\n Ynod Devs Ynod SAs Ynod Admins Zookd Admins Zookd SAs Zookd Devs\n / \\ \\ / / \\\n Y3L4 Proj Bet Proj Ynod Zookd Task Force Zookd hOgnmd Zookd Nbd\n\n--first we will move Ynod Devs to beneath Zookd Devs, and back out. This will\nbriefly give abe full privileges to Y3L4 Proj., among others.\n\n >>> list(catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Y3L4 Proj'],\n ... 'principal_id': 'abe'})))\n []\n >>> orgs['Zookd Devs'].parts.insert(registry.getId(orgs['Ynod Devs']))\n 1\n >>> catalog.index(orgs['Zookd Devs'])\n >>> res = catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Y3L4 Proj'],\n ... 'principal_id': 'abe'}))\n >>> list(res)\n ['publisher', 'reviewer', 'user manager', 'writer']\n >>> list(res)\n ['publisher', 'reviewer', 'user manager', 'writer']\n >>> orgs['Zookd Devs'].parts.remove(registry.getId(orgs['Ynod Devs']))\n >>> catalog.index(orgs['Zookd Devs'])\n >>> list(catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Y3L4 Proj'],\n ... 'principal_id': 'abe'})))\n []\n\nAs another example, we will change the roles abe has, and see that it is\npropagated down to Zookd Nbd.\n\n >>> rels = list(catalog.findRelations(t(\n ... {'principal_id': 'abe',\n ... 'organization': orgs['Zookd Corp Management']})))\n >>> len(rels)\n 1\n >>> rels[0].role_ids.remove('reviewer')\n >>> catalog.index(rels[0])\n\n >>> res = catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'abe'}))\n >>> list(res)\n ['publisher', 'user manager', 'writer']\n >>> list(res)\n ['publisher', 'user manager', 'writer']\n\nNote that search index order matters. In our case, our intransitive search\nindex is relying on our transitive index, so the transitive index needs to\ncome first. You want transitive relation indexes before name. Right now,\nyou are in charge of this order: it will be difficult to come up with a\nreliable algorithm for guessing this.\n\nListeners, Catalog Administration, and Joining Across Relation Catalogs\n=======================================================================\n\nWe've done all of our examples so far with a single catalog that indexes\nboth kinds of relations. What if we want to have two catalogs with\nhomogenous collections of relations? That can feel cleaner, but it also\nintroduces some new wrinkles.\n\nLet's use our current catalog for organizations, removing the extra\ninformation; and create a new one for roles.\n\n >>> role_catalog = root['role_catalog'] = catalog.copy()\n >>> transaction.commit()\n >>> org_catalog = catalog\n >>> del catalog\n\nWe'll need a slightly different query factory and a slightly different\nsearch index `getValueTokens` function. We'll write those, then modify the\nconfiguration of our two catalogs for the new world.\n\nThe transitive factory we write here is for the role catalog. It needs\naccess to the organzation catalog. We could do this a variety of\nways--relying on a utility, or finding the catalog from context. We will\nmake the role_catalog have a .org_catalog attribute, and rely on that.\n\n >>> role_catalog.org_catalog = org_catalog\n >>> def factory3(query, catalog):\n ... if (len(query) == 2 and\n ... 'organization' in query and\n ... 'principal_id' in query):\n ... def getQueries(relchain):\n ... if not relchain:\n ... res = BTrees.family32.OO.Bucket(query)\n ... org_id = query['organization']\n ... if org_id is not None:\n ... res['organization'] = zc.relation.catalog.any(\n ... org_id,\n ... *catalog.org_catalog.findRelationTokens(\n ... {'part': org_id}))\n ... yield res\n ... return getQueries\n ...\n\n >>> def getValueTokens2(index, name, token, catalog, source,\n ... additions, removals, removed):\n ... is_role_catalog = catalog is index.catalog # role_catalog\n ... if name == 'organization':\n ... if is_role_catalog:\n ... orgs = set(source.get('organization') or\n ... index.catalog.getValueTokens(\n ... 'organization', token) or ())\n ... else:\n ... orgs = set((token,))\n ... orgs.update(removals.get('part', ()))\n ... orgs.update(index.catalog.org_catalog.findValueTokens(\n ... 'part',\n ... {None: zc.relation.catalog.Any(\n ... t for t in orgs if t is not None)}))\n ... return orgs\n ... elif name == 'principal_id':\n ... # we only want custom behavior if this is an organization\n ... if not is_role_catalog:\n ... orgs = set((token,))\n ... orgs.update(index.catalog.org_catalog.findRelationTokens(\n ... {'part': token}))\n ... return set(index.catalog.findValueTokens(\n ... 'principal_id', {\n ... 'organization': zc.relation.catalog.Any(orgs)}))\n ... return ''\n\nIf you are following along in the code and comparing to the originals, you may\nsee that this approach is a bit cleaner than the one when the relations were\nin the same catalog.\n\nNow we will fix up the the organization catalog [#compare_copy]_.\n\n.. [#compare_copy] Before we modify them, let's look at the copy we made.\n The copy should currently behave identically to the original.\n\n >>> len(org_catalog)\n 38\n >>> len(role_catalog)\n 38\n >>> indexed = list(org_catalog)\n >>> len(indexed)\n 38\n >>> orgs['Zookd Devs'] in indexed\n True\n >>> for r in indexed:\n ... if r not in role_catalog:\n ... print('bad')\n ... break\n ... else:\n ... print('good')\n ...\n good\n >>> org_names = set(dir(org_catalog))\n >>> role_names = set(dir(role_catalog))\n >>> sorted(org_names - role_names)\n []\n >>> sorted(role_names - org_names)\n ['org_catalog']\n\n >>> def checkYnodDevsParts(catalog):\n ... res = sorted(catalog.findRelations(t({None: orgs['Ynod Devs']})))\n ... if res != [\n ... orgs[\"Bet Proj\"], orgs[\"Y3L4 Proj\"], orgs[\"Ynod Devs\"]]:\n ... print(\"bad\", res)\n ...\n >>> checkYnodDevsParts(org_catalog)\n >>> checkYnodDevsParts(role_catalog)\n\n >>> def checkOpheliaRoles(catalog):\n ... res = sorted(catalog.findRelations({'principal_id': 'ophelia'}))\n ... if repr(res) != (\n ... \"[<Roles instance (ophelia has publisher in Zookd Nbd)>, \" +\n ... \"<Roles instance (ophelia has reviewer in Zookd Devs)>, \" +\n ... \"<Roles instance (ophelia has writer in \" +\n ... \"Zookd Corp Management)>]\"):\n ... print(\"bad\", res)\n ...\n >>> checkOpheliaRoles(org_catalog)\n >>> checkOpheliaRoles(role_catalog)\n\n >>> def checkOpheliaWriterOrganizations(catalog):\n ... res = sorted(catalog.findRelations({None: zc.relation.catalog.Any(\n ... catalog.findValueTokens(\n ... 'organization', {'principal_id': 'ophelia',\n ... 'role_id': 'writer'}))}))\n ... if repr(res) != (\n ... '[<Organization instance \"Ynod Zookd Task Force\">, ' +\n ... '<Organization instance \"Zookd Admins\">, ' +\n ... '<Organization instance \"Zookd Corp Management\">, ' +\n ... '<Organization instance \"Zookd Devs\">, ' +\n ... '<Organization instance \"Zookd Nbd\">, ' +\n ... '<Organization instance \"Zookd SAs\">, ' +\n ... '<Organization instance \"Zookd hOgnmd\">]'):\n ... print(\"bad\", res)\n ...\n >>> checkOpheliaWriterOrganizations(org_catalog)\n >>> checkOpheliaWriterOrganizations(role_catalog)\n\n >>> def checkPrincipalsWithRolesInZookdDevs(catalog):\n ... org_id = registry.getId(orgs['Zookd Devs'])\n ... res = sorted(catalog.findValueTokens(\n ... 'principal_id',\n ... {'organization': zc.relation.catalog.any(\n ... org_id, *catalog.findRelationTokens({'part': org_id}))}))\n ... if res != ['abe', 'ignas', 'karyn', 'lettie', 'nancy', 'ophelia']:\n ... print(\"bad\", res)\n ...\n >>> checkPrincipalsWithRolesInZookdDevs(org_catalog)\n >>> checkPrincipalsWithRolesInZookdDevs(role_catalog)\n\n >>> def checkOpheliaRolesInZookdNbd(catalog):\n ... res = sorted(catalog.findValueTokens(\n ... 'role_id', {\n ... 'organization': registry.getId(orgs['Zookd Nbd']),\n ... 'principal_id': 'ophelia'}))\n ... if res != ['publisher', 'reviewer', 'writer']:\n ... print(\"bad\", res)\n ...\n >>> checkOpheliaRolesInZookdNbd(org_catalog)\n >>> checkOpheliaRolesInZookdNbd(role_catalog)\n\n >>> def checkAbeRolesInZookdNbd(catalog):\n ... res = sorted(catalog.findValueTokens(\n ... 'role_id', {\n ... 'organization': registry.getId(orgs['Zookd Nbd']),\n ... 'principal_id': 'abe'}))\n ... if res != ['publisher', 'user manager', 'writer']:\n ... print(\"bad\", res)\n ...\n >>> checkAbeRolesInZookdNbd(org_catalog)\n >>> checkAbeRolesInZookdNbd(role_catalog)\n >>> org_catalog.removeDefaultQueryFactory(None) # doctest: +ELLIPSIS\n Traceback (most recent call last):\n ...\n LookupError: ('factory not found', None)\n\n >>> org_catalog.removeValueIndex('organization')\n >>> org_catalog.removeValueIndex('role_id')\n >>> org_catalog.removeValueIndex('principal_id')\n >>> org_catalog.removeDefaultQueryFactory(factory2)\n >>> org_catalog.removeSearchIndex(index)\n >>> org_catalog.clear()\n >>> len(org_catalog)\n 0\n >>> for v in orgs.values():\n ... org_catalog.index(v)\n\nThis also shows using the `removeDefaultQueryFactory` and `removeSearchIndex`\nmethods [#removeDefaultQueryFactoryExceptions]_.\n\n.. [#removeDefaultQueryFactoryExceptions] You get errors by removing query\n factories that are not registered.\n\n >>> org_catalog.removeDefaultQueryFactory(factory2) # doctest: +ELLIPSIS\n Traceback (most recent call last):\n ...\n LookupError: ('factory not found', <function factory2 at ...>)\n\nNow we will set up the role catalog [#copy_unchanged]_.\n\n\n.. [#copy_unchanged] Changes to one copy should not affect the other. That\n means the role_catalog should still work as before.\n\n >>> len(org_catalog)\n 13\n >>> len(list(org_catalog))\n 13\n\n >>> len(role_catalog)\n 38\n >>> indexed = list(role_catalog)\n >>> len(indexed)\n 38\n >>> orgs['Zookd Devs'] in indexed\n True\n >>> orgs['Zookd Devs'] in role_catalog\n True\n\n >>> checkYnodDevsParts(role_catalog)\n >>> checkOpheliaRoles(role_catalog)\n >>> checkOpheliaWriterOrganizations(role_catalog)\n >>> checkPrincipalsWithRolesInZookdDevs(role_catalog)\n >>> checkOpheliaRolesInZookdNbd(role_catalog)\n >>> checkAbeRolesInZookdNbd(role_catalog)\n\n >>> role_catalog.removeValueIndex('part')\n >>> for ix in list(role_catalog.iterSearchIndexes()):\n ... role_catalog.removeSearchIndex(ix)\n ...\n >>> role_catalog.removeDefaultQueryFactory(factory1)\n >>> role_catalog.removeDefaultQueryFactory(factory2)\n >>> role_catalog.addDefaultQueryFactory(factory3)\n >>> root['index2'] = index2 = zc.relation.searchindex.Intransitive(\n ... ('organization', 'principal_id'), 'role_id', factory3,\n ... getValueTokens2,\n ... ('organization', 'principal_id', 'part', 'role_id'),\n ... unlimitedDepth=True)\n >>> role_catalog.addSearchIndex(index2)\n\nThe new role_catalog index needs to be updated from the org_catalog.\nWe'll set that up using listeners, a new concept.\n\n >>> org_catalog.addListener(index2)\n >>> list(org_catalog.iterListeners()) == [index2]\n True\n\nNow the role_catalog should be able to answer the same questions as the old\nsingle catalog approach.\n\n >>> t = role_catalog.tokenizeQuery\n >>> list(role_catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'abe'})))\n ['publisher', 'user manager', 'writer']\n\n >>> list(role_catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'ophelia'})))\n ['publisher', 'reviewer', 'writer']\n\nWe can also make changes to both catalogs and the search indexes are\nmaintained.\n\n >>> list(role_catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Y3L4 Proj'],\n ... 'principal_id': 'abe'})))\n []\n >>> orgs['Zookd Devs'].parts.insert(registry.getId(orgs['Ynod Devs']))\n 1\n >>> org_catalog.index(orgs['Zookd Devs'])\n >>> list(role_catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Y3L4 Proj'],\n ... 'principal_id': 'abe'})))\n ['publisher', 'user manager', 'writer']\n >>> orgs['Zookd Devs'].parts.remove(registry.getId(orgs['Ynod Devs']))\n >>> org_catalog.index(orgs['Zookd Devs'])\n >>> list(role_catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Y3L4 Proj'],\n ... 'principal_id': 'abe'})))\n []\n\n >>> rels = list(role_catalog.findRelations(t(\n ... {'principal_id': 'abe',\n ... 'organization': orgs['Zookd Corp Management']})))\n >>> len(rels)\n 1\n >>> rels[0].role_ids.insert('reviewer')\n 1\n >>> role_catalog.index(rels[0])\n\n >>> res = role_catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd Nbd'],\n ... 'principal_id': 'abe'}))\n >>> list(res)\n ['publisher', 'reviewer', 'user manager', 'writer']\n\nHere we add a new organization.\n\n >>> orgs['Zookd hOnc'] = org = Organization('Zookd hOnc')\n >>> orgs['Zookd Devs'].parts.insert(registry.getId(org))\n 1\n >>> org_catalog.index(orgs['Zookd hOnc'])\n >>> org_catalog.index(orgs['Zookd Devs'])\n\n >>> list(role_catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd hOnc'],\n ... 'principal_id': 'abe'})))\n ['publisher', 'reviewer', 'user manager', 'writer']\n\n >>> list(role_catalog.findValueTokens(\n ... 'role_id', t({'organization': orgs['Zookd hOnc'],\n ... 'principal_id': 'ophelia'})))\n ['reviewer', 'writer']\n\nNow we'll remove it.\n\n >>> orgs['Zookd Devs'].parts.remove(registry.getId(org))\n >>> org_catalog.index(orgs['Zookd Devs'])\n >>> org_catalog.unindex(orgs['Zookd hOnc'])\n\nTODO make sure that intransitive copy looks the way we expect\n\n[#administrivia]_\n\n.. [#administrivia]\n\n You can add listeners multiple times.\n\n >>> org_catalog.addListener(index2)\n >>> list(org_catalog.iterListeners()) == [index2, index2]\n True\n\n Now we will remove the listeners, to show we can.\n\n >>> org_catalog.removeListener(index2)\n >>> org_catalog.removeListener(index2)\n >>> org_catalog.removeListener(index2)\n ... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE\n Traceback (most recent call last):\n ...\n LookupError: ('listener not found',\n <zc.relation.searchindex.Intransitive object at ...>)\n >>> org_catalog.removeListener(None)\n ... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE\n Traceback (most recent call last):\n ...\n LookupError: ('listener not found', None)\n\n Here's the same for removing a search index we don't have\n\n >>> org_catalog.removeSearchIndex(index2)\n ... # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE\n Traceback (most recent call last):\n ...\n LookupError: ('index not found',\n <zc.relation.searchindex.Intransitive object at ...>)\n\n.. ......... ..\n.. Footnotes ..\n.. ......... ..\n\n\n.. [#silliness] In \"2001: A Space Odyssey\", many people believe the name HAL\n was chosen because it was ROT25 of IBM.... I cheat a bit sometimes and\n use ROT1 because the result sounds better.\n\n\n=================================================================\nWorking with Search Indexes: zc.relation Catalog Extended Example\n=================================================================\n\nIntroduction\n============\n\nThis document assumes you have read the README.rst document, and want to learn\na bit more by example. In it, we will explore a set of relations that\ndemonstrates most of the aspects of working with search indexes and listeners.\nIt will not explain the topics that the other documents already addressed. It\nalso describes an advanced use case.\n\nAs we have seen in the other documents, the relation catalog supports\nsearch indexes. These can return the results of any search, as desired.\nOf course, the intent is that you supply an index that optimizes the\nparticular searches it claims.\n\nThe searchindex module supplies a few search indexes, optimizing\nspecified transitive and intransitive searches. We have seen them working\nin other documents. We will examine them more in depth in this document.\n\nSearch indexes update themselves by receiving messages via a \"listener\"\ninterface. We will also look at how this works.\n\nThe example described in this file examines a use case similar to that in\nthe zc.revision or zc.vault packages: a relation describes a graph of\nother objects. Therefore, this is our first concrete example of purely\nextrinsic relations.\n\nLet's build the example story a bit. Imagine we have a graph, often a\nhierarchy, of tokens--integers. Relations specify that a given integer\ntoken relates to other integer tokens, with a containment denotation or\nother meaning.\n\nThe integers may also have relations that specify that they represent an\nobject or objects.\n\nThis allows us to have a graph of objects in which changing one part of the\ngraph does not require changing the rest. zc.revision and zc.vault thus\nare able to model graphs that can have multiple revisions efficiently and\nwith quite a bit of metadata to support merges.\n\nLet's imagine a simple hierarchy. The relation has a `token` attribute\nand a `children` attribute; children point to tokens. Relations will\nidentify themselves with ids.\n\n >>> import BTrees\n >>> relations = BTrees.family64.IO.BTree()\n >>> relations[99] = None # just to give us a start\n\n >>> class Relation(object):\n ... def __init__(self, token, children=()):\n ... self.token = token\n ... self.children = BTrees.family64.IF.TreeSet(children)\n ... self.id = relations.maxKey() + 1\n ... relations[self.id] = self\n ...\n\n >>> def token(rel, self):\n ... return rel.token\n ...\n >>> def children(rel, self):\n ... return rel.children\n ...\n >>> def dumpRelation(obj, index, cache):\n ... return obj.id\n ...\n >>> def loadRelation(token, index, cache):\n ... return relations[token]\n ...\n\nThe standard TransposingTransitiveQueriesFactory will be able to handle this\nquite well, so we'll use that for our index.\n\n >>> import zc.relation.queryfactory\n >>> factory = zc.relation.queryfactory.TransposingTransitive(\n ... 'token', 'children')\n >>> import zc.relation.catalog\n >>> catalog = zc.relation.catalog.Catalog(\n ... dumpRelation, loadRelation, BTrees.family64.IO, BTrees.family64)\n >>> catalog.addValueIndex(token)\n >>> catalog.addValueIndex(children, multiple=True)\n >>> catalog.addDefaultQueryFactory(factory)\n\nNow let's quickly create a hierarchy and index it.\n\n >>> for token, children in (\n ... (0, (1, 2)), (1, (3, 4)), (2, (10, 11, 12)), (3, (5, 6)),\n ... (4, (13, 14)), (5, (7, 8, 9)), (6, (15, 16)), (7, (17, 18, 19)),\n ... (8, (20, 21, 22)), (9, (23, 24)), (10, (25, 26)),\n ... (11, (27, 28, 29, 30, 31, 32))):\n ... catalog.index(Relation(token, children))\n ...\n\n[#queryFactory]_ That hierarchy is arbitrary. Here's what we have, in terms of tokens\npointing to children::\n\n _____________0_____________\n / \\\n ________1_______ ______2____________\n / \\ / | \\\n ______3_____ _4_ 10 ____11_____ 12\n / \\ / \\ / \\ / / | \\ \\ \\\n _______5_______ 6 13 14 25 26 27 28 29 30 31 32\n / | \\ / \\\n _7_ _8_ 9 15 16\n / | \\ / | \\ / \\\n 17 18 19 20 21 22 23 24\n\nTwelve relations, with tokens 0 through 11, and a total of 33 tokens,\nincluding children. The ids for the 12 relations are 100 through 111,\ncorresponding with the tokens of 0 through 11.\n\nWithout a transitive search index, we can get all transitive results.\nThe results are iterators.\n\n >>> res = catalog.findRelationTokens({'token': 0})\n >>> getattr(res, '__next__') is None\n False\n >>> getattr(res, '__len__', None) is None\n True\n >>> sorted(res)\n [100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]\n >>> list(res)\n []\n\n >>> res = catalog.findValueTokens('children', {'token': 0})\n >>> sorted(res) == list(range(1, 33))\n True\n >>> list(res)\n []\n\n[#findValuesUnindexed]_ `canFind` also can work transitively, and will\nuse transitive search indexes, as we'll see below.\n\n >>> catalog.canFind({'token': 1}, targetQuery={'children': 23})\n True\n >>> catalog.canFind({'token': 2}, targetQuery={'children': 23})\n False\n >>> catalog.canFind({'children': 23}, targetQuery={'token': 1})\n True\n >>> catalog.canFind({'children': 23}, targetQuery={'token': 2})\n False\n\n`findRelationTokenChains` won't change, but we'll include it in the\ndiscussion and examples to show that.\n\n >>> res = catalog.findRelationTokenChains({'token': 2})\n >>> chains = list(res)\n >>> len(chains)\n 3\n >>> len(list(res))\n 0\n\nTransitive Search Indexes\n=========================\n\nNow we can add a couple of transitive search index. We'll talk about\nthem a bit first.\n\nThere is currently one variety of transitive index, which indexes\nrelation and value searches for the transposing transitive query\nfactory.\n\nThe index can only be used under certain conditions.\n\n - The search is not a request for a relation chain.\n\n - It does not specify a maximum depth.\n\n - Filters are not used.\n\nIf it is a value search, then specific value indexes cannot be used if a\ntarget filter or target query are used, but the basic relation index can\nstill be used in that case.\n\nThe usage of the search indexes is largely transparent: set them up, and\nthe relation catalog will use them for the same API calls that used more\nbrute force previously. The only difference from external uses is that\nresults that use an index will usually be a BTree structure, rather than\nan iterator.\n\nWhen you add a transitive index for a relation, you must specify the\ntransitive name (or names) of the query, and the same for the reverse.\nThat's all we'll do now.\n\n >>> import zc.relation.searchindex\n >>> catalog.addSearchIndex(\n ... zc.relation.searchindex.TransposingTransitiveMembership(\n ... 'token', 'children', names=('children',)))\n\nNow we should have a search index installed.\n\nNotice that we went from parent (token) to child: this index is primarily\ndesigned for helping transitive membership searches in a hierarchy. Using it to\nindex parents would incur a lot of write expense for not much win.\n\nThere's just a bit more you can specify here: static fields for a query\nto do a bit of filtering. We don't need any of that for this example.\n\nNow how does the catalog use this index for searches? Three basic ways,\ndepending on the kind of search, relations, values, or `canFind`.\nBefore we start looking into the internals, let's verify that we're getting\nwhat we expect: correct answers, and not iterators, but BTree structures.\n\n >>> res = catalog.findRelationTokens({'token': 0})\n >>> list(res)\n [100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]\n >>> list(res)\n [100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111]\n\n >>> res = catalog.findValueTokens('children', {'token': 0})\n >>> list(res) == list(range(1, 33))\n True\n >>> list(res) == list(range(1, 33))\n True\n\n >>> catalog.canFind({'token': 1}, targetQuery={'children': 23})\n True\n >>> catalog.canFind({'token': 2}, targetQuery={'children': 23})\n False\n\n[#findValuesIndexed]_ Note that the last two `canFind` examples from\nwhen we went through these examples without an index do not use the\nindex, so we don't show them here: they look the wrong direction for\nthis index.\n\nSo how do these results happen?\n\nThe first, `findRelationTokens`, and the last, `canFind`, are the most\nstraightforward. The index finds all relations that match the given\nquery, intransitively. Then for each relation, it looks up the indexed\ntransitive results for that token. The end result is the union of all\nindexed results found from the intransitive search. `canFind` simply\ncasts the result into a boolean.\n\n`findValueTokens` is the same story as above with only one more step. After\nthe union of relations is calculated, the method returns the union of the\nsets of the requested value for all found relations.\n\nIt will maintain itself when relations are reindexed.\n\n >>> rel = list(catalog.findRelations({'token': 11}))[0]\n >>> for t in (27, 28, 29, 30, 31):\n ... rel.children.remove(t)\n ...\n >>> catalog.index(rel)\n\n >>> catalog.findValueTokens('children', {'token': 0})\n ... # doctest: +NORMALIZE_WHITESPACE\n LFSet([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,\n 20, 21, 22, 23, 24, 25, 26, 32])\n >>> catalog.findValueTokens('children', {'token': 2})\n LFSet([10, 11, 12, 25, 26, 32])\n >>> catalog.findValueTokens('children', {'token': 11})\n LFSet([32])\n\n >>> rel.children.remove(32)\n >>> catalog.index(rel)\n\n >>> catalog.findValueTokens('children', {'token': 0})\n ... # doctest: +NORMALIZE_WHITESPACE\n LFSet([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,\n 20, 21, 22, 23, 24, 25, 26])\n >>> catalog.findValueTokens('children', {'token': 2})\n LFSet([10, 11, 12, 25, 26])\n >>> catalog.findValueTokens('children', {'token': 11})\n LFSet([])\n\n >>> rel.children.insert(27)\n 1\n >>> catalog.index(rel)\n\n >>> catalog.findValueTokens('children', {'token': 0})\n ... # doctest: +NORMALIZE_WHITESPACE\n LFSet([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,\n 20, 21, 22, 23, 24, 25, 26, 27])\n >>> catalog.findValueTokens('children', {'token': 2})\n LFSet([10, 11, 12, 25, 26, 27])\n >>> catalog.findValueTokens('children', {'token': 11})\n LFSet([27])\n\nWhen the index is copied, the search index is copied.\n\n >>> new = catalog.copy()\n >>> res = list(new.iterSearchIndexes())\n >>> len(res)\n 1\n >>> new_index = res[0]\n >>> res = list(catalog.iterSearchIndexes())\n >>> len(res)\n 1\n >>> old_index = res[0]\n >>> new_index is old_index\n False\n >>> old_index.index is new_index.index\n False\n >>> list(old_index.index.keys()) == list(new_index.index.keys())\n True\n >>> from __future__ import print_function\n >>> for key, value in old_index.index.items():\n ... v = new_index.index[key]\n ... if v is value or list(v) != list(value):\n ... print('oops', key, value, v)\n ... break\n ... else:\n ... print('good')\n ...\n good\n >>> old_index.names is not new_index.names\n True\n >>> list(old_index.names) == list(new_index.names)\n True\n >>> for name, old_ix in old_index.names.items():\n ... new_ix = new_index.names[name]\n ... if new_ix is old_ix or list(new_ix.keys()) != list(old_ix.keys()):\n ... print('oops')\n ... break\n ... for key, value in old_ix.items():\n ... v = new_ix[key]\n ... if v is value or list(v) != list(value):\n ... print('oops', name, key, value, v)\n ... break\n ... else:\n ... continue\n ... break\n ... else:\n ... print('good')\n ...\n good\n\nHelpers\n=======\n\nWhen writing search indexes and query factories, you often want complete\naccess to relation catalog data. We've seen a number of these tools already:\n\n- `getRelationModuleTools` gets a dictionary of the BTree tools needed to\n work with relations.\n\n >>> sorted(catalog.getRelationModuleTools().keys())\n ... # doctest: +NORMALIZE_WHITESPACE\n ['BTree', 'Bucket', 'Set', 'TreeSet', 'difference', 'dump',\n 'intersection', 'load', 'multiunion', 'union']\n\n 'multiunion' is only there if the BTree is an I* or L* module.\n Use the zc.relation.catalog.multiunion helper function to do the\n best union you can for a given set of tools.\n\n- `getValueModuleTools` does the same for indexed values.\n\n >>> tools = set(('BTree', 'Bucket', 'Set', 'TreeSet', 'difference',\n ... 'dump', 'intersection', 'load', 'multiunion', 'union'))\n >>> tools.difference(catalog.getValueModuleTools('children').keys()) == set()\n True\n\n >>> tools.difference(catalog.getValueModuleTools('token').keys()) == set()\n True\n\n- `getRelationTokens` can return all of the tokens in the catalog.\n\n >>> len(catalog.getRelationTokens()) == len(catalog)\n True\n\n This also happens to be equivalent to `findRelationTokens` with an empty\n query.\n\n >>> catalog.getRelationTokens() is catalog.findRelationTokens({})\n True\n\n It also can return all the tokens that match a given query, or None if\n there are no matches.\n\n >>> catalog.getRelationTokens({'token': 0}) # doctest: +ELLIPSIS\n <BTrees.LOBTree.LOTreeSet object at ...>\n >>> list(catalog.getRelationTokens({'token': 0}))\n [100]\n\n This also happens to be equivalent to `findRelationTokens` with a query,\n a maxDepth of 1, and no other arguments.\n\n >>> catalog.findRelationTokens({'token': 0}, maxDepth=1) is (\n ... catalog.getRelationTokens({'token': 0}))\n True\n\n Except that if there are no matches, `findRelationTokens` returns an empty\n set (so it *always* returns an iterable).\n\n >>> catalog.findRelationTokens({'token': 50}, maxDepth=1)\n LOSet([])\n >>> print(catalog.getRelationTokens({'token': 50}))\n None\n\n- `getValueTokens` can return all of the tokens for a given value name in\n the catalog.\n\n >>> list(catalog.getValueTokens('token')) == list(range(12))\n True\n\n This is identical to catalog.findValueTokens with a name only (or with\n an empty query, and a maxDepth of 1).\n\n >>> list(catalog.findValueTokens('token')) == list(range(12))\n True\n >>> catalog.findValueTokens('token') is catalog.getValueTokens('token')\n True\n\n It can also return the values for a given token.\n\n >>> list(catalog.getValueTokens('children', 100))\n [1, 2]\n\n This is identical to catalog.findValueTokens with a name and a query of\n {None: token}.\n\n >>> list(catalog.findValueTokens('children', {None: 100}))\n [1, 2]\n >>> catalog.getValueTokens('children', 100) is (\n ... catalog.findValueTokens('children', {None: 100}))\n True\n\n Except that if there are no matches, `findValueTokens` returns an empty\n set (so it *always* returns an iterable); while getValueTokens will\n return None if the relation has no values (or the relation is unknown).\n\n >>> catalog.findValueTokens('children', {None: 50}, maxDepth=1)\n LFSet([])\n >>> print(catalog.getValueTokens('children', 50))\n None\n\n >>> rel.children.remove(27)\n >>> catalog.index(rel)\n >>> catalog.findValueTokens('children', {None: rel.id}, maxDepth=1)\n LFSet([])\n >>> print(catalog.getValueTokens('children', rel.id))\n None\n\n- `yieldRelationTokenChains` is a search workhorse for searches that use a\n query factory. TODO: describe.\n\n.. ......... ..\n.. Footnotes ..\n.. ......... ..\n\n.. [#queryFactory] The query factory knows when it is not needed--not only\n when neither of its names are used, but also when both of its names are\n used.\n\n >>> list(catalog.findRelationTokens({'token': 0, 'children': 1}))\n [100]\n\n.. [#findValuesUnindexed] When values are the same as their tokens,\n `findValues` returns the same result as `findValueTokens`. Here\n we see this without indexes.\n\n >>> list(catalog.findValueTokens('children', {'token': 0})) == list(\n ... catalog.findValues('children', {'token': 0}))\n True\n\n.. [#findValuesIndexed] Again, when values are the same as their tokens,\n `findValues` returns the same result as `findValueTokens`. Here\n we see this with indexes.\n\n >>> list(catalog.findValueTokens('children', {'token': 0})) == list(\n ... catalog.findValues('children', {'token': 0}))\n True\n\n\nOptimizing Relation Catalog Use\n===============================\n\nThere are several best practices and optimization opportunities in regards to\nthe catalog.\n\n- Use integer-keyed BTree sets when possible. They can use the BTrees'\n `multiunion` for a speed boost. Integers' __cmp__ is reliable, and in C.\n\n- Never use persistent objects as keys. They will cause a database load every\n time you need to look at them, they take up memory and object caches, and\n they (as of this writing) disable conflict resolution. Intids (or similar)\n are your best bet for representing objects, and some other immutable such as\n strings are the next-best bet, and zope.app.keyreferences (or similar) are\n after that.\n\n- Use multiple-token values in your queries when possible, especially in your\n transitive query factories.\n\n- Use the cache when you are loading and dumping tokens, and in your\n transitive query factories.\n\n- When possible, don't load or dump tokens (the values themselves may be used\n as tokens). This is especially important when you have multiple tokens:\n store them in a BTree structure in the same module as the zc.relation module\n for the value.\n\nFor some operations, particularly with hundreds or thousands of members in a\nsingle relation value, some of these optimizations can speed up some\ncommon-case reindexing work by around 100 times.\n\nThe easiest (and perhaps least useful) optimization is that all dump\ncalls and all load calls generated by a single operation share a cache\ndictionary per call type (dump/load), per indexed relation value.\nTherefore, for instance, we could stash an intids utility, so that we\nonly had to do a utility lookup once, and thereafter it was only a\nsingle dictionary lookup. This is what the default `generateToken` and\n`resolveToken` functions in zc.relationship's index.py do: look at them\nfor an example.\n\nA further optimization is to not load or dump tokens at all, but use values\nthat may be tokens. This will be particularly useful if the tokens have\n__cmp__ (or equivalent) in C, such as built-in types like ints. To specify\nthis behavior, you create an index with the 'load' and 'dump' values for the\nindexed attribute descriptions explicitly set to None.\n\n\n >>> import zope.interface\n >>> class IRelation(zope.interface.Interface):\n ... subjects = zope.interface.Attribute(\n ... 'The sources of the relation; the subject of the sentence')\n ... relationtype = zope.interface.Attribute(\n ... '''unicode: the single relation type of this relation;\n ... usually contains the verb of the sentence.''')\n ... objects = zope.interface.Attribute(\n ... '''the targets of the relation; usually a direct or\n ... indirect object in the sentence''')\n ...\n\n >>> import BTrees\n >>> relations = BTrees.family32.IO.BTree()\n >>> relations[99] = None # just to give us a start\n\n >>> @zope.interface.implementer(IRelation)\n ... class Relation(object):\n ...\n ... def __init__(self, subjects, relationtype, objects):\n ... self.subjects = subjects\n ... assert relationtype in relTypes\n ... self.relationtype = relationtype\n ... self.objects = objects\n ... self.id = relations.maxKey() + 1\n ... relations[self.id] = self\n ... def __repr__(self):\n ... return '<%r %s %r>' % (\n ... self.subjects, self.relationtype, self.objects)\n\n >>> def token(rel, self):\n ... return rel.token\n ...\n >>> def children(rel, self):\n ... return rel.children\n ...\n >>> def dumpRelation(obj, index, cache):\n ... return obj.id\n ...\n >>> def loadRelation(token, index, cache):\n ... return relations[token]\n ...\n\n >>> relTypes = ['has the role of']\n >>> def relTypeDump(obj, index, cache):\n ... assert obj in relTypes, 'unknown relationtype'\n ... return obj\n ...\n >>> def relTypeLoad(token, index, cache):\n ... assert token in relTypes, 'unknown relationtype'\n ... return token\n ...\n\n >>> import zc.relation.catalog\n >>> catalog = zc.relation.catalog.Catalog(\n ... dumpRelation, loadRelation)\n >>> catalog.addValueIndex(IRelation['subjects'], multiple=True)\n >>> catalog.addValueIndex(\n ... IRelation['relationtype'], relTypeDump, relTypeLoad,\n ... BTrees.family32.OI, name='reltype')\n >>> catalog.addValueIndex(IRelation['objects'], multiple=True)\n >>> import zc.relation.queryfactory\n >>> factory = zc.relation.queryfactory.TransposingTransitive(\n ... 'subjects', 'objects')\n >>> catalog.addDefaultQueryFactory(factory)\n\n >>> rel = Relation((1,), 'has the role of', (2,))\n >>> catalog.index(rel)\n >>> list(catalog.findValueTokens('objects', {'subjects': 1}))\n [2]\n\nIf you have single relations that relate hundreds or thousands of\nobjects, it can be a huge win if the value is a 'multiple' of the same\ntype as the stored BTree for the given attribute. The default BTree\nfamily for attributes is IFBTree; IOBTree is also a good choice, and may\nbe preferrable for some applications.\n\n >>> catalog.unindex(rel)\n >>> rel = Relation(\n ... BTrees.family32.IF.TreeSet((1,)), 'has the role of',\n ... BTrees.family32.IF.TreeSet())\n >>> catalog.index(rel)\n >>> list(catalog.findValueTokens('objects', {'subjects': 1}))\n []\n >>> list(catalog.findValueTokens('subjects', {'objects': None}))\n [1]\n\nReindexing is where some of the big improvements can happen. The following\ngyrations exercise the optimization code.\n\n >>> rel.objects.insert(2)\n 1\n >>> catalog.index(rel)\n >>> list(catalog.findValueTokens('objects', {'subjects': 1}))\n [2]\n >>> rel.subjects = BTrees.family32.IF.TreeSet((3,4,5))\n >>> catalog.index(rel)\n >>> list(catalog.findValueTokens('objects', {'subjects': 3}))\n [2]\n\n >>> rel.subjects.insert(6)\n 1\n >>> catalog.index(rel)\n >>> list(catalog.findValueTokens('objects', {'subjects': 6}))\n [2]\n\n >>> rel.subjects.update(range(100, 200))\n 100\n >>> catalog.index(rel)\n >>> list(catalog.findValueTokens('objects', {'subjects': 100}))\n [2]\n\n >>> rel.subjects = BTrees.family32.IF.TreeSet((3,4,5,6))\n >>> catalog.index(rel)\n >>> list(catalog.findValueTokens('objects', {'subjects': 3}))\n [2]\n\n >>> rel.subjects = BTrees.family32.IF.TreeSet(())\n >>> catalog.index(rel)\n >>> list(catalog.findValueTokens('objects', {'subjects': 3}))\n []\n\n >>> rel.subjects = BTrees.family32.IF.TreeSet((3,4,5))\n >>> catalog.index(rel)\n >>> list(catalog.findValueTokens('objects', {'subjects': 3}))\n [2]\n\ntokenizeValues and resolveValueTokens work correctly without loaders and\ndumpers--that is, they do nothing.\n\n >>> catalog.tokenizeValues((3,4,5), 'subjects')\n (3, 4, 5)\n >>> catalog.resolveValueTokens((3,4,5), 'subjects')\n (3, 4, 5)\n\n\n=======\nChanges\n=======\n\n\n2.1 (2024-12-09)\n================\n\n- Add support for Python 3.12, 3.13.\n\n- Drop support for Python 3.7.\n\n\n2.0 (2023-04-05)\n================\n\n- Drop support for Python 2.7, 3.5, 3.6.\n [ale-rt]\n\n- Fix the dependency on the ZODB, we just need to depend on the BTrees package.\n Refs. #11.\n [ale-rt]\n\n\n1.2 (2023-03-28)\n================\n\n- Adapt code for PEP-479 (Change StopIteration handling inside generators).\n See: https://peps.python.org/pep-0479.\n Fixes #11.\n [ale-rt]\n\n\n1.1.post2 (2018-06-18)\n======================\n\n- Another attempt to fix PyPI page by using correct expected metadata syntax.\n\n\n1.1.post1 (2018-06-18)\n======================\n\n- Fix PyPI page by using correct ReST syntax.\n\n\n1.1 (2018-06-15)\n================\n\n- Add support for Python 3.5 and 3.6.\n\n\n1.0 (2008-04-23)\n================\n\nThis is the initial release of the zc.relation package. However, it\nrepresents a refactoring of another package, zc.relationship. This\npackage contains only a modified version of the relation(ship) index,\nnow called a catalog. The refactored version of zc.relationship index\nrelies on (subclasses) this catalog. zc.relationship also maintains a\nbackwards-compatible subclass.\n\nThis package only relies on the ZODB, zope.interface, and zope.testing\nsoftware, and can be used inside or outside of a standard ZODB database.\nThe software does have to be there, though (the package relies heavily\non the ZODB BTrees package).\n\nIf you would like to switch a legacy zc.relationship index to a\nzc.relation catalog, try this trick in your generations script.\nAssuming the old index is ``old``, the following line should create\na new zc.relation catalog with your legacy data:\n\n >>> new = old.copy(zc.relation.Catalog)\n\nWhy is the same basic data structure called a catalog now? Because we\nexposed the ability to mutate the data structure, and what you are really\nadding and removing are indexes. It didn't make sense to put an index in\nan index, but it does make sense to put an index in a catalog. Thus, a\nname change was born.\n\nThe catalog in this package has several incompatibilities from the earlier\nzc.relationship index, and many new features. The zc.relationship package\nmaintains a backwards-compatible subclass. The following discussion\ncompares the zc.relation catalog with the zc.relationship 1.x index.\n\nIncompatibilities with zc.relationship 1.x index\n------------------------------------------------\n\nThe two big changes are that method names now refer to ``Relation`` rather\nthan ``Relationship``; and the catalog is instantiated slightly differently\nfrom the index. A few other changes are worth your attention. The\nfollowing list attempts to highlight all incompatibilities.\n\n:Big incompatibilities:\n\n - ``findRelationshipTokenSet`` and ``findValueTokenSet`` are renamed, with\n some slightly different semantics, as ``getRelationTokens`` and\n ``getValueTokens``. The exact same result as\n ``findRelationTokenSet(query)`` can be obtained with\n ``findRelationTokens(query, 1)`` (where 1 is maxDepth). The same\n result as ``findValueTokenSet(reltoken, name)`` can be obtained with\n ``findValueTokens(name, {zc.relation.RELATION: reltoken}, 1)``.\n\n - ``findRelations`` replaces ``findRelatonships``. The new method will use\n the defaultTransitiveQueriesFactory if it is set and maxDepth is not 1.\n It shares the call signature of ``findRelationChains``.\n\n - ``isLinked`` is now ``canFind``.\n\n - The catalog instantiation arguments have changed from the old index.\n\n * ``load`` and ``dump`` (formerly ``loadRel`` and ``dumpRel``,\n respectively) are now required arguments for instantiation.\n\n * The only other optional arguments are ``btree`` (was ``relFamily``) and\n ``family``. You now specify what elements to index with\n ``addValueIndex``\n\n * Note also that ``addValueIndex`` defaults to no load and dump function,\n unlike the old instantiation options.\n\n - query factories are different. See ``IQueryFactory`` in the interfaces.\n\n * they first get (query, catalog, cache) and then return a getQueries\n callable that gets relchains and yields queries; OR None if they\n don't match.\n\n * They must also handle an empty relchain. Typically this should\n return the original query, but may also be used to mutate the\n original query.\n\n * They are no longer thought of as transitive query factories, but as\n general query mutators.\n\n:Medium:\n\n - The catalog no longer inherits from\n zope.app.container.contained.Contained.\n\n - The index requires ZODB 3.8 or higher.\n\n:Small:\n\n - ``deactivateSets`` is no longer an instantiation option (it was broken\n because of a ZODB bug anyway, as had been described in the\n documentation).\n\nChanges and new features\n------------------------\n\n- The catalog now offers the ability to index certain\n searches. The indexes must be explicitly instantiated and registered\n you want to optimize. This can be used when searching for values, when\n searching for relations, or when determining if two objects are\n linked. It cannot be used for relation chains. Requesting an index\n has the usual trade-offs of greater storage space and slower write\n speed for faster search speed. Registering a search index is done\n after instantiation time; you can iteratate over the current settings\n used, and remove them. (The code path expects to support legacy\n zc.relationship index instances for all of these APIs.)\n\n- You can now specify new values after the catalog has been created, iterate\n over the settings used, and remove values.\n\n- The catalog has a copy method, to quickly make new copies without actually\n having to reindex the relations.\n\n- query arguments can now specify multiple values for a given name by\n using zc.relation.catalog.any(1, 2, 3, 4) or\n zc.relation.catalog.Any((1, 2, 3, 4)).\n\n- The catalog supports specifying indexed values by passing callables rather\n than interface elements (which are also still supported).\n\n- ``findRelations`` and new method ``findRelationTokens`` can find\n relations transitively and intransitively. ``findRelationTokens``\n when used intransitively repeats the legacy zc.relationship index\n behavior of ``findRelationTokenSet``.\n (``findRelationTokenSet`` remains in the API, not deprecated, a companion\n to ``findValueTokenSet``.)\n\n- in findValues and findValueTokens, ``query`` argument is now optional. If\n the query evaluates to False in a boolean context, all values, or value\n tokens, are returned. Value tokens are explicitly returned using the\n underlying BTree storage. This can then be used directly for other BTree\n operations.\n\n- Completely new docs. Unfortunately, still really not good enough.\n\n- The package has drastically reduced direct dependecies from zc.relationship:\n it is now more clearly a ZODB tool, with no other Zope dependencies than\n zope.testing and zope.interface.\n\n- Listeners allow objects to listen to messages from the catalog (which can\n be used directly or, for instance, to fire off events).\n\n- You can search for relations, using a key of zc.relation.RELATION...which is\n really an alias for None. Sorry. But hey, use the constant! I think it is\n more readable.\n\n- tokenizeQuery (and resolveQuery) now accept keyword arguments as an\n alternative to a normal dict query. This can make constructing the query\n a bit more attractive (i.e., ``query = catalog.tokenizeQuery;\n res = catalog.findValues('object', query(subject=joe, predicate=OWNS))``).\n",
"bugtrack_url": null,
"license": "ZPL 2.1",
"summary": "Index intransitive and transitive n-ary relationships.",
"version": "2.1",
"project_urls": {
"Homepage": "https://github.com/zopefoundation/zc.relation"
},
"split_keywords": [
"zope",
"zope3",
"relation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "6e953f703d3cb32e60fe991e00cbd91f87102c6478bd84cd08ce46417603ddae",
"md5": "251614406ef78458d44edee533a16cd5",
"sha256": "30778bed1256317f89c2be3df64d4e6943006236952e8e80e3956d79be11a56e"
},
"downloads": -1,
"filename": "zc.relation-2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "251614406ef78458d44edee533a16cd5",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 111698,
"upload_time": "2024-12-09T07:31:37",
"upload_time_iso_8601": "2024-12-09T07:31:37.404118Z",
"url": "https://files.pythonhosted.org/packages/6e/95/3f703d3cb32e60fe991e00cbd91f87102c6478bd84cd08ce46417603ddae/zc.relation-2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0c4caea699b82fcd6277cf930f86b1d5fcb3f915de0b0ef338c1b258fdf3943f",
"md5": "b361800997deffe0a833ec63d498efae",
"sha256": "f7eee35f7741a5cb9af12ef81e8cb5878e367cdd56ae777f1bd52dd21cf568b2"
},
"downloads": -1,
"filename": "zc_relation-2.1.tar.gz",
"has_sig": false,
"md5_digest": "b361800997deffe0a833ec63d498efae",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 151369,
"upload_time": "2024-12-09T07:31:39",
"upload_time_iso_8601": "2024-12-09T07:31:39.398785Z",
"url": "https://files.pythonhosted.org/packages/0c/4c/aea699b82fcd6277cf930f86b1d5fcb3f915de0b0ef338c1b258fdf3943f/zc_relation-2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-09 07:31:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "zopefoundation",
"github_project": "zc.relation",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "zc.relation"
}