lexis

Name	lexis JSON
Version	0.1.2 JSON
	download
home_page	https://github.com/thorwhalen/lexis
Summary	Wordnet wrapper - Easy access to words and their relationships
upload_time	2023-04-07 09:18:06
maintainer
docs_url	None
author	Thor Whalen
requires_python
license	apache-2.0
keywords	words definitions lexicon wordnet nlp natural language processing text mining
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
# lexis
Wordnet wrapper - Easy access to words and their relationships

To install:	```pip install lexis```

The key-value (i.e. dict-list) wrapper to nltk.corpus.wordnet.

Your no fuss gateway to (English) words.

The easiest way to get nltk.corpus.wordnet is

The `nltk` dependency is installed for you when installing 
`lexis`, but the wordnet data is not downloaded automatically.
To do so (only once), go to a python console and do:
```
>>> import nltk; nltk.download('wordnet')  # doctest: +SKIP
```

If you don't like that way, [see here](https://www.nltk.org/install.html) 
for other ways to get wordnet.

The central construct of this module is the Synset 
(a set of synonyms that share a common meaning).
To see a few things you can do with Synsets, naked, 
[see here](https://www.nltk.org/howto/wordnet.html).

Here we put a py2store wrapper around this stuff.

What is WordNet? https://wordnet.princeton.edu/


# A little peek at Lemmas


```python
from lexis import Lemmas

lm = Lemmas()
len(lm)
```




    147306



`lm` is a `Mapping` (think "acts like a (read-only) dict")


```python
from typing import Mapping

isinstance(lm, Mapping)
```




    True



Let's have a look at a few keys


```python
list(lm)[44630:44635]
```




    ['blond', 'kaunda', 'peacetime', 'intolerantly', "'hood"]



And the value of a `lm` item?


```python
lm['blond']
```




    {'blond.n.01': WordnetElement('blond.n.01'),
     'blond.n.02': WordnetElement('blond.n.02'),
     'blond.a.01': WordnetElement('blond.a.01')}



Okay, it looks like it's different meanings of "blond". The middle letter tells us its grammatical role it's a noun (`n`) or an adjective (`a`). More on that later. 

And what's a `WordnetElement`?

Well, it's another Mapping, apparently:


```python
isinstance(lm['blond']['blond.n.01'], Mapping)
```




    True




```python
list(lm['blond']['blond.n.01'])
```




    ['also_sees',
     'instance_hypernyms',
     'verb_groups',
     'entailments',
     'region_domains',
     'substance_holonyms',
     'part_holonyms',
     'examples',
     'part_meronyms',
     'hyponyms',
     'member_meronyms',
     'offset',
     'causes',
     'definition',
     'lemma_names',
     'lexname',
     'member_holonyms',
     'in_topic_domains',
     'lemmas',
     'topic_domains',
     'max_depth',
     'hypernym_distances',
     'name',
     'attributes',
     'hypernyms',
     'min_depth',
     'usage_domains',
     'in_region_domains',
     'instance_hyponyms',
     'in_usage_domains',
     'similar_tos',
     'root_hypernyms',
     'pos',
     'frame_ids',
     'hypernym_paths',
     'substance_meronyms']



Wow! That's a lot of information! 

Let's look at what the definition of `'blond.n.01'` is:


```python
print(lm['blond']['blond.n.01']['definition'])
```

    a person with fair skin and hair


... actually, let's just poke at all of them (at least those that are non-empty)


```python
meaning = 'blond.n.01'
print(f"Values for meaning: {meaning}")
for k, v in lm['blond'][meaning].items():
    if v:
        print(f"- {k}: {v}")
```

    Values for meaning: blond.n.01
    - hyponyms: [WordnetElement('peroxide_blond.n.01'), WordnetElement('platinum_blond.n.01'), WordnetElement('towhead.n.01')]
    - offset: 9860506
    - definition: a person with fair skin and hair
    - lemma_names: ['blond', 'blonde']
    - lexname: noun.person
    - lemmas: [KvLemma('blond.n.01.blond'), KvLemma('blond.n.01.blonde')]
    - max_depth: 7
    - hypernym_distances: {(WordnetElement('physical_entity.n.01'), 6), (WordnetElement('entity.n.01'), 7), (WordnetElement('physical_entity.n.01'), 3), (WordnetElement('entity.n.01'), 4), (WordnetElement('living_thing.n.01'), 3), (WordnetElement('object.n.01'), 5), (WordnetElement('blond.n.01'), 0), (WordnetElement('organism.n.01'), 2), (WordnetElement('causal_agent.n.01'), 2), (WordnetElement('whole.n.02'), 4), (WordnetElement('person.n.01'), 1)}
    - name: blond.n.01
    - hypernyms: [WordnetElement('person.n.01')]
    - min_depth: 4
    - root_hypernyms: [Synset('entity.n.01')]
    - pos: n
    - hypernym_paths: [[WordnetElement('entity.n.01'), WordnetElement('physical_entity.n.01'), WordnetElement('causal_agent.n.01'), WordnetElement('person.n.01'), WordnetElement('blond.n.01')], [WordnetElement('entity.n.01'), WordnetElement('physical_entity.n.01'), WordnetElement('object.n.01'), WordnetElement('whole.n.02'), WordnetElement('living_thing.n.01'), WordnetElement('organism.n.01'), WordnetElement('person.n.01'), WordnetElement('blond.n.01')]]


## You can get meaning information directly

What if you made a list of these strings like `'blond.n.01'`, `'blond.a.01'`... and you wanted to access the `WordnetElement` instances with all that cool information about those specifics meanings?

You could do `lm['blond']['blond.n.01']`, `lm['blond']['blond.a.01']`... But then you'd have to remember the full references `('blond', 'blond.n.01')`, `('blond', 'blond.a.01')`... 

You don't need to go through `lm['blond']` to get to the `WordnetElement` instance that gives you access to the meaning information -- you can use the `Synsets` store (i.e. Mapping). 

Note: "synset" is what Wordnet calls this. We'll just call is meaning for simplicity. I hope the purists won't mind.



```python
from lexis import Synsets
```


```python
meanings = Synsets()
meanings['blond.n.01']
```




    WordnetElement('blond.n.01')



We saw earlier that we had `147306` lemmas (i.e. "words" or more precisely "terms"... but really precisely, "lemmas"). 

Well, we have `117659` synsets (i.e. "meanings") in the `Synsets` instance.


```python
len(meanings)
```




    117659



## Multiple lemma names

`'lemma_names'` are different ways that the same meaning can be written. 


```python
lm['blond']['blond.n.01']['lemma_names']
```




    ['blond', 'blonde']



Indeed, `lm['blond']` and `lm['blonde']` really point to the same thing.


```python
lm['blond']
```




    {'blond.n.01': WordnetElement('blond.n.01'),
     'blond.n.02': WordnetElement('blond.n.02'),
     'blond.a.01': WordnetElement('blond.a.01')}




```python
lm['blonde']
```




    {'blond.n.01': WordnetElement('blond.n.01'),
     'blond.n.02': WordnetElement('blond.n.02'),
     'blond.a.01': WordnetElement('blond.a.01')}



## Grammatical roles

What are the different grammatical roles that are used in the meaning identifiers (aka synset keys) of our lemmas?


```python
from collections import Counter
import re
from lexis import Lemmas

lm = Lemmas()

p_middle_of_dot_path = re.compile('(?P<first>[^\.]+)\.(?P<middle>\w+)\.(?P<last>[^\.]+)')

def extract_grammatical_role_from_meaning(meaning):
    m = p_middle_of_dot_path.match(meaning)
    if m:
        return m.groupdict().get('middle', None) 
    else:
        return None
    

c = Counter()
for meanings in lm.values():
    for meaning in meanings:
        c.update(extract_grammatical_role_from_meaning(meaning))
        
c.most_common()
```




    [('n', 148478),
     ('v', 42751),
     ('s', 20895),
     ('a', 9846),
     ('r', 5619),
     ('_', 29),
     ('e', 28),
     ('u', 17),
     ('g', 17),
     ('i', 15),
     ('t', 14),
     ('p', 8),
     ('b', 7),
     ('o', 7),
     ('l', 6),
     ('d', 4),
     ('c', 2),
     ('m', 1),
     ('k', 1)]




# Miscellaneous explorations

```python
from py2store import filt_iter, cached_keys, add_ipython_key_completions
from py2store import kvhead
from lexis import Lemmas
```


```python
lm = Lemmas()

def print_definitions(words):
    for word in words:
        print(f"- {word}")
        for k, v in lm[word].items():
            print(f"    {'.'.join(k.split('.')[1:])}: {v['definition']}")

```

## Find words containing some substring


```python
from lexis import print_word_definitions
```


```python
substr = 'iep'
words = list(filter(lambda w: substr in w, lm))
len(words)
```


    12


```python
print_definitions(words)
```

    - hemiepiphyte
        n.01: a plant that is an epiphyte for part of its life
    - antiepileptic
        n.01: a drug used to treat or prevent convulsions (as in epilepsy)
    - pieplant
        n.01: long pinkish sour leafstalks usually eaten cooked and sweetened
    - liepaja
        n.01: a city of southwestern Latvia on the Baltic Sea
    - semiepiphyte
        n.01: a plant that is an epiphyte for part of its life
    - archiepiscopal
        a.01: of or associated with an archbishop
    - tiepin
        n.01: a pin used to hold the tie in place
    - giovanni_battista_tiepolo
        n.01: Italian painter (1696-1770)
    - tiepolo
        n.01: Italian painter (1696-1770)
    - antiepileptic_drug
        n.01: a drug used to treat or prevent convulsions (as in epilepsy)
    - dnieper
        n.01: a river that rises in Russia near Smolensk and flowing south through Belarus and Ukraine to empty into the Black Sea
    - dnieper_river
        n.01: a river that rises in Russia near Smolensk and flowing south through Belarus and Ukraine to empty into the Black Sea

## Find palindrome

```python
import re
from lexis import Lemmas

lm = Lemmas()

is_palendrome_with_at_least_3_letters = lambda w: len(w) >= 3 and w == w[::-1]
print(*filter(is_palendrome_with_at_least_3_letters, lm), sep=', ')
```

    ono, waw, tot, kkk, ldl, anna, tenet, mom, igigi, sus, hallah, sls, pcp, mam, ofo, ene, alula, oto, civic, cfc, 101, tet, kazak, sss, ctc, aba, tevet, ara, wnw, mum, siris, tebet, tut-tut, ccc, naan, xix, tnt, peep, tut, kook, xanax, ala, eve, level, xxx, dud, aaa, dad, tdt, odo, pip, tibit, iii, sas, wow, radar, madam, yay, dmd, poop, ana, sos, bib, pop, isi, eye, gag, gig, cdc, dod, nun, pep, mym, bob, malayalam, sis, www, utu, non, ewe, aga, akka, noon, ese, rotor, ded, ppp, kayak, pap, wsw, pup, minim, nan, tat, ada, boob, mem, deed, nauruan, ma'am, succus, seles, cbc, tit, dvd, refer, toot


Wait a minute... Where's racecar?!? Isn't that a palindrome?
```python
# 
assert 'racecar' not in lm
assert 'race_car' in lm
```

### Which of these are (or rather "can be") a verb?

What are the keys of the lemmas? 

Answer: Synset keys -- that is, an id that references a unit of meaning


```python
# what do are the values of the lemmas?
list(lm['eat'])
```




    ['eat.v.01',
     'eat.v.02',
     'feed.v.06',
     'eat.v.04',
     'consume.v.05',
     'corrode.v.01']



That little `v` seems to be indicating that the meaning is... verbal?

Let's make a function to grab that middle part of the dot path and use it to make a `is_a_verb` (more like "can be a verb"). 


```python
from collections import Counter
import re
from lexis import Lemmas

lm = Lemmas()

p_middle_of_dot_path = re.compile('(?P<first>[^\.]+)\.(?P<middle>\w+)\.(?P<last>[^\.]+)')

def _extract_middle(string):
    m = p_middle_of_dot_path.match(string)
    if m:
        return m.groupdict().get('middle', None) 
    else:
        return None
    
def grammatical_roles(lemma):
    return Counter(map(_extract_middle, lm[lemma]))
    

assert grammatical_roles('go') == Counter({'n': 4, 'v': 30, 'a': 1})  # the lemma "go" can be a verb, noun, or adjective

def is_a_verb(lemma):
    return 'v' in grammatical_roles(lemma)
    
assert is_a_verb('go')
assert not is_a_verb('chess')  # unlike go, chess cannot be used as a verb, apparently
```

Palendromes that are verbs


```python
list(filter(lambda x: is_a_verb(x) and is_palendrome_with_at_least_3_letters(x), lm))
```




    ['tot',
     'tut-tut',
     'peep',
     'tut',
     'level',
     'pip',
     'wow',
     'bib',
     'pop',
     'eye',
     'gag',
     'bob',
     'kayak',
     'pup',
     'tat',
     'boob',
     'refer',
     'toot']




## Only p, q, b, d, and vowels


```python
import re
from lexis import Lemmas

lm = Lemmas()

consonants = 'pqbd'
vowels = 'aeiou'  # 'aeiouy'
filt = re.compile(f'[{vowels}{consonants}]+$').match  # the pattern

words = list(filter(lambda w: 2 <= len(w)  <= 7, # number of letters constraing
                    filter(filt, lm)))  # filter for iep pattern
len(words)
```




    249




```python
print(*words, sep='\t')
```

    bod	aaa	add	poa	pop	beaded	aqua	pib	edda	doob	boa	doi	padded	iodide	bop	edo	bide	eb	bai	quid	de	ade	daba	pid	baba	paba	bi	abb	bebop	pa	poop	pb	dea	odo	pope	dad	pup	bode	quad	bb	be	ea	epee	bid	pu	pique	iii	pod	bee	pub	ddi	id	baobab	equid	padua	pipidae	opaque	pappa	uppp	uub	qepiq	bibbed	adp	ada	pied	aoudad	qed	pupa	bedaub	bd	dba	bopeep	oboe	ado	eq	bpi	aid	bud	dodo	abo	qaeda	aa	pad	papua	baa	abode	bad	adad	adapid	papaia	db	bede	ai	po	doei	pep	eib	dubai	epi	boob	uuq	io	beep	quip	ad	babu	ded	ia	dud	da	qi	paid	peep	doodad	beda	eddo	boo	padda	ipidae	deep	dope	ied	doped	dopa	ii	iaea	uda	dd	baud	dido	ebb	epa	bodied	pap	ed	peba	bed	audio	deed	idea	apoidea	beau	up	pda	iud	ip	diode	bida	pi	apidae	bead	odd	od	dia	baddie	iaa	ape	ipo	dod	idp	ee	ie	daub	duo	boidae	poe	abed	adobe	pea	dude	do	aided	obi	ido	pipe	pe	doe	aiai	pd	baboo	quipu	pood	papio	equidae	iop	qadi	ab	dado	dub	adobo	bap	pei	baeda	equip	dupe	aqaba	bob	ba	dead	dada	adapa	pee	opepe	pob	iou	duad	doodia	dab	aide	pip	dipped	bubo	pipa	poi	apia	ode	upupa	iq	aba	abbe	edp	edd	pia	due	pud	ob	audad	dp	deb	pie	oed	die	ppp	queue	papa	adieu	biped	babe	ida	dubuque	dip	uup	eu	ipod	bade	au	abide	bib	bedded



```python
print_definitions(['adapa'])
```

    - adapa
        n.01: a Babylonian demigod or first man (sometimes identified with Adam)


### Containing i, e, p in that order, with other letters in between


```python
filt = re.compile('\w{0,2}i\w{0,2}e\w{0,2}p\w{0,2}$').match  # The *i*e*p* pattern

words = list(filter(lambda w: len(w) <= 6, # no more than 6 letters
                    filter(filt, lm)))  # filter for iep pattern
print_definitions(words)
```




    9




```python
print(*words, sep=', ')
```

    tie_up, ginep, lineup, inept, pileup, tiepin, biceps, icecap, ice_up



```python
print_definitions(words)
```

    - tie_up
        v.01: secure with or as if with ropes
        v.02: invest so as to make unavailable for other purposes
        v.03: restrain from moving or operating normally
        v.01: secure in or as if in a berth or dock
        v.05: finish the last row
    - ginep
        n.01: tropical American tree bearing a small edible fruit with green leathery skin and sweet juicy translucent pulp
    - lineup
        n.01: (baseball) a list of batters in the order in which they will bat
        n.02: a line of persons arranged by police for inspection or identification
    - inept
        s.04: not elegant or graceful in expression
        s.02: generally incompetent and ineffectual
        s.03: revealing lack of perceptiveness or judgment or finesse
    - pileup
        n.01: multiple collisions of vehicles
    - tiepin
        n.01: a pin used to hold the tie in place
    - biceps
        n.01: any skeletal muscle having two origins (but especially the muscle that flexes the forearm)
    - icecap
        n.01: a mass of ice and snow that permanently covers a large area of land (e.g., the polar regions or a mountain peak)
    - ice_up
        v.01: become covered with a layer of ice; of a surface such as a window



## S-words


Words that start with `s` but if you remove `s`, it's still a word.

```python
from lexis import Lemmas  # pip install py2store
lm = Lemmas()
swords = list(filter(lambda x: x.startswith('s') and x[1:] in lm, lm))  # one line!
```


```python
print(len(t))
print(*t[:40], sep=', ')
```

    711
    softener, spock, scent, spark, sbe, stickweed, screaky, salt, salp, sec, strap, sliver, slack, swish, sebs, sarawak, scuttle, stripping, swell, stole, spine, space, scar, sass, sewer, spitting, serving, sew, stalk, smite, sniffy, stripe, slake, stone, slit, sea, shoe, sweeper, swear_off, swan



```python
from py2store import filt_iter, wrap_kvs, KvReader
from lexis import Lemmas  # pip install py2store
lm = Lemmas()

@filt_iter(filt=lambda x: x.startswith('s') and x[1:] in lm)
class Swords(Lemmas):
    def __getitem__(self, k):
        v = super().__getitem__(k)
        for kk, vv in v.items():
            yield f"    {'.'.join(kk.split('.')[1:])}: {vv['definition']}"
            
s = Swords()
len(s)
```




    711




```python
k, v = s.head()
list(v)
```




    ['    n.01: a substance added to another to make it less hard']




```python
from itertools import islice
for k, v in islice(s.items(), 5):
    print(f"------------ {k} -------------")
    print(*v, sep='\n')

```

    ------------ softener -------------
        n.01: a substance added to another to make it less hard
    ------------ spock -------------
        n.01: United States pediatrician whose many books on child care influenced the upbringing of children around the world (1903-1998)
    ------------ scent -------------
        n.02: a distinctive odor that is pleasant
        n.02: an odor left in passing by which a person or animal can be traced
        n.01: any property detected by the olfactory system
        v.01: cause to smell or be smelly
        v.02: catch the scent of; get wind of
        v.02: apply perfume to
    ------------ spark -------------
        n.01: a momentary flash of light
        n.01: merriment expressed by a brightness or gleam or animation of countenance
        n.05: electrical conduction through a gas in an applied electric field
        n.04: a small but noticeable trace of some quality that might become stronger
        n.05: Scottish writer of satirical novels (born in 1918)
        n.06: a small fragment of a burning substance thrown out by burning material or by friction
        v.04: put in motion or move to act
        v.02: emit or produce sparks
    ------------ sbe -------------
        n.01: the compass point that is one point east of due south

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/thorwhalen/lexis",
    "name": "lexis",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "words,definitions,lexicon,wordnet,NLP,Natural Language Processing,text mining",
    "author": "Thor Whalen",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/e5/2c/f15e070aaed9845cdee2cd42ff254c5204624478413280a0168615539a5e/lexis-0.1.2.tar.gz",
    "platform": "any",
    "description": "\n# lexis\nWordnet wrapper - Easy access to words and their relationships\n\nTo install:\t```pip install lexis```\n\nThe key-value (i.e. dict-list) wrapper to nltk.corpus.wordnet.\n\nYour no fuss gateway to (English) words.\n\nThe easiest way to get nltk.corpus.wordnet is\n\nThe `nltk` dependency is installed for you when installing \n`lexis`, but the wordnet data is not downloaded automatically.\nTo do so (only once), go to a python console and do:\n```\n>>> import nltk; nltk.download('wordnet')  # doctest: +SKIP\n```\n\nIf you don't like that way, [see here](https://www.nltk.org/install.html) \nfor other ways to get wordnet.\n\nThe central construct of this module is the Synset \n(a set of synonyms that share a common meaning).\nTo see a few things you can do with Synsets, naked, \n[see here](https://www.nltk.org/howto/wordnet.html).\n\nHere we put a py2store wrapper around this stuff.\n\nWhat is WordNet? https://wordnet.princeton.edu/\n\n\n# A little peek at Lemmas\n\n\n```python\nfrom lexis import Lemmas\n\nlm = Lemmas()\nlen(lm)\n```\n\n\n\n\n    147306\n\n\n\n`lm` is a `Mapping` (think \"acts like a (read-only) dict\")\n\n\n```python\nfrom typing import Mapping\n\nisinstance(lm, Mapping)\n```\n\n\n\n\n    True\n\n\n\nLet's have a look at a few keys\n\n\n```python\nlist(lm)[44630:44635]\n```\n\n\n\n\n    ['blond', 'kaunda', 'peacetime', 'intolerantly', \"'hood\"]\n\n\n\nAnd the value of a `lm` item?\n\n\n```python\nlm['blond']\n```\n\n\n\n\n    {'blond.n.01': WordnetElement('blond.n.01'),\n     'blond.n.02': WordnetElement('blond.n.02'),\n     'blond.a.01': WordnetElement('blond.a.01')}\n\n\n\nOkay, it looks like it's different meanings of \"blond\". The middle letter tells us its grammatical role it's a noun (`n`) or an adjective (`a`). More on that later. \n\nAnd what's a `WordnetElement`?\n\nWell, it's another Mapping, apparently:\n\n\n```python\nisinstance(lm['blond']['blond.n.01'], Mapping)\n```\n\n\n\n\n    True\n\n\n\n\n```python\nlist(lm['blond']['blond.n.01'])\n```\n\n\n\n\n    ['also_sees',\n     'instance_hypernyms',\n     'verb_groups',\n     'entailments',\n     'region_domains',\n     'substance_holonyms',\n     'part_holonyms',\n     'examples',\n     'part_meronyms',\n     'hyponyms',\n     'member_meronyms',\n     'offset',\n     'causes',\n     'definition',\n     'lemma_names',\n     'lexname',\n     'member_holonyms',\n     'in_topic_domains',\n     'lemmas',\n     'topic_domains',\n     'max_depth',\n     'hypernym_distances',\n     'name',\n     'attributes',\n     'hypernyms',\n     'min_depth',\n     'usage_domains',\n     'in_region_domains',\n     'instance_hyponyms',\n     'in_usage_domains',\n     'similar_tos',\n     'root_hypernyms',\n     'pos',\n     'frame_ids',\n     'hypernym_paths',\n     'substance_meronyms']\n\n\n\nWow! That's a lot of information! \n\nLet's look at what the definition of `'blond.n.01'` is:\n\n\n```python\nprint(lm['blond']['blond.n.01']['definition'])\n```\n\n    a person with fair skin and hair\n\n\n... actually, let's just poke at all of them (at least those that are non-empty)\n\n\n```python\nmeaning = 'blond.n.01'\nprint(f\"Values for meaning: {meaning}\")\nfor k, v in lm['blond'][meaning].items():\n    if v:\n        print(f\"- {k}: {v}\")\n```\n\n    Values for meaning: blond.n.01\n    - hyponyms: [WordnetElement('peroxide_blond.n.01'), WordnetElement('platinum_blond.n.01'), WordnetElement('towhead.n.01')]\n    - offset: 9860506\n    - definition: a person with fair skin and hair\n    - lemma_names: ['blond', 'blonde']\n    - lexname: noun.person\n    - lemmas: [KvLemma('blond.n.01.blond'), KvLemma('blond.n.01.blonde')]\n    - max_depth: 7\n    - hypernym_distances: {(WordnetElement('physical_entity.n.01'), 6), (WordnetElement('entity.n.01'), 7), (WordnetElement('physical_entity.n.01'), 3), (WordnetElement('entity.n.01'), 4), (WordnetElement('living_thing.n.01'), 3), (WordnetElement('object.n.01'), 5), (WordnetElement('blond.n.01'), 0), (WordnetElement('organism.n.01'), 2), (WordnetElement('causal_agent.n.01'), 2), (WordnetElement('whole.n.02'), 4), (WordnetElement('person.n.01'), 1)}\n    - name: blond.n.01\n    - hypernyms: [WordnetElement('person.n.01')]\n    - min_depth: 4\n    - root_hypernyms: [Synset('entity.n.01')]\n    - pos: n\n    - hypernym_paths: [[WordnetElement('entity.n.01'), WordnetElement('physical_entity.n.01'), WordnetElement('causal_agent.n.01'), WordnetElement('person.n.01'), WordnetElement('blond.n.01')], [WordnetElement('entity.n.01'), WordnetElement('physical_entity.n.01'), WordnetElement('object.n.01'), WordnetElement('whole.n.02'), WordnetElement('living_thing.n.01'), WordnetElement('organism.n.01'), WordnetElement('person.n.01'), WordnetElement('blond.n.01')]]\n\n\n## You can get meaning information directly\n\nWhat if you made a list of these strings like `'blond.n.01'`, `'blond.a.01'`... and you wanted to access the `WordnetElement` instances with all that cool information about those specifics meanings?\n\nYou could do `lm['blond']['blond.n.01']`, `lm['blond']['blond.a.01']`... But then you'd have to remember the full references `('blond', 'blond.n.01')`, `('blond', 'blond.a.01')`... \n\nYou don't need to go through `lm['blond']` to get to the `WordnetElement` instance that gives you access to the meaning information -- you can use the `Synsets` store (i.e. Mapping). \n\nNote: \"synset\" is what Wordnet calls this. We'll just call is meaning for simplicity. I hope the purists won't mind.\n\n\n\n```python\nfrom lexis import Synsets\n```\n\n\n```python\nmeanings = Synsets()\nmeanings['blond.n.01']\n```\n\n\n\n\n    WordnetElement('blond.n.01')\n\n\n\nWe saw earlier that we had `147306` lemmas (i.e. \"words\" or more precisely \"terms\"... but really precisely, \"lemmas\"). \n\nWell, we have `117659` synsets (i.e. \"meanings\") in the `Synsets` instance.\n\n\n```python\nlen(meanings)\n```\n\n\n\n\n    117659\n\n\n\n## Multiple lemma names\n\n`'lemma_names'` are different ways that the same meaning can be written. \n\n\n```python\nlm['blond']['blond.n.01']['lemma_names']\n```\n\n\n\n\n    ['blond', 'blonde']\n\n\n\nIndeed, `lm['blond']` and `lm['blonde']` really point to the same thing.\n\n\n```python\nlm['blond']\n```\n\n\n\n\n    {'blond.n.01': WordnetElement('blond.n.01'),\n     'blond.n.02': WordnetElement('blond.n.02'),\n     'blond.a.01': WordnetElement('blond.a.01')}\n\n\n\n\n```python\nlm['blonde']\n```\n\n\n\n\n    {'blond.n.01': WordnetElement('blond.n.01'),\n     'blond.n.02': WordnetElement('blond.n.02'),\n     'blond.a.01': WordnetElement('blond.a.01')}\n\n\n\n## Grammatical roles\n\nWhat are the different grammatical roles that are used in the meaning identifiers (aka synset keys) of our lemmas?\n\n\n```python\nfrom collections import Counter\nimport re\nfrom lexis import Lemmas\n\nlm = Lemmas()\n\np_middle_of_dot_path = re.compile('(?P<first>[^\\.]+)\\.(?P<middle>\\w+)\\.(?P<last>[^\\.]+)')\n\ndef extract_grammatical_role_from_meaning(meaning):\n    m = p_middle_of_dot_path.match(meaning)\n    if m:\n        return m.groupdict().get('middle', None) \n    else:\n        return None\n    \n\nc = Counter()\nfor meanings in lm.values():\n    for meaning in meanings:\n        c.update(extract_grammatical_role_from_meaning(meaning))\n        \nc.most_common()\n```\n\n\n\n\n    [('n', 148478),\n     ('v', 42751),\n     ('s', 20895),\n     ('a', 9846),\n     ('r', 5619),\n     ('_', 29),\n     ('e', 28),\n     ('u', 17),\n     ('g', 17),\n     ('i', 15),\n     ('t', 14),\n     ('p', 8),\n     ('b', 7),\n     ('o', 7),\n     ('l', 6),\n     ('d', 4),\n     ('c', 2),\n     ('m', 1),\n     ('k', 1)]\n\n\n\n\n# Miscellaneous explorations\n\n```python\nfrom py2store import filt_iter, cached_keys, add_ipython_key_completions\nfrom py2store import kvhead\nfrom lexis import Lemmas\n```\n\n\n```python\nlm = Lemmas()\n\ndef print_definitions(words):\n    for word in words:\n        print(f\"- {word}\")\n        for k, v in lm[word].items():\n            print(f\"    {'.'.join(k.split('.')[1:])}: {v['definition']}\")\n\n```\n\n## Find words containing some substring\n\n\n```python\nfrom lexis import print_word_definitions\n```\n\n\n```python\nsubstr = 'iep'\nwords = list(filter(lambda w: substr in w, lm))\nlen(words)\n```\n\n\n    12\n\n\n```python\nprint_definitions(words)\n```\n\n    - hemiepiphyte\n        n.01: a plant that is an epiphyte for part of its life\n    - antiepileptic\n        n.01: a drug used to treat or prevent convulsions (as in epilepsy)\n    - pieplant\n        n.01: long pinkish sour leafstalks usually eaten cooked and sweetened\n    - liepaja\n        n.01: a city of southwestern Latvia on the Baltic Sea\n    - semiepiphyte\n        n.01: a plant that is an epiphyte for part of its life\n    - archiepiscopal\n        a.01: of or associated with an archbishop\n    - tiepin\n        n.01: a pin used to hold the tie in place\n    - giovanni_battista_tiepolo\n        n.01: Italian painter (1696-1770)\n    - tiepolo\n        n.01: Italian painter (1696-1770)\n    - antiepileptic_drug\n        n.01: a drug used to treat or prevent convulsions (as in epilepsy)\n    - dnieper\n        n.01: a river that rises in Russia near Smolensk and flowing south through Belarus and Ukraine to empty into the Black Sea\n    - dnieper_river\n        n.01: a river that rises in Russia near Smolensk and flowing south through Belarus and Ukraine to empty into the Black Sea\n\n## Find palindrome\n\n```python\nimport re\nfrom lexis import Lemmas\n\nlm = Lemmas()\n\nis_palendrome_with_at_least_3_letters = lambda w: len(w) >= 3 and w == w[::-1]\nprint(*filter(is_palendrome_with_at_least_3_letters, lm), sep=', ')\n```\n\n    ono, waw, tot, kkk, ldl, anna, tenet, mom, igigi, sus, hallah, sls, pcp, mam, ofo, ene, alula, oto, civic, cfc, 101, tet, kazak, sss, ctc, aba, tevet, ara, wnw, mum, siris, tebet, tut-tut, ccc, naan, xix, tnt, peep, tut, kook, xanax, ala, eve, level, xxx, dud, aaa, dad, tdt, odo, pip, tibit, iii, sas, wow, radar, madam, yay, dmd, poop, ana, sos, bib, pop, isi, eye, gag, gig, cdc, dod, nun, pep, mym, bob, malayalam, sis, www, utu, non, ewe, aga, akka, noon, ese, rotor, ded, ppp, kayak, pap, wsw, pup, minim, nan, tat, ada, boob, mem, deed, nauruan, ma'am, succus, seles, cbc, tit, dvd, refer, toot\n\n\nWait a minute... Where's racecar?!? Isn't that a palindrome?\n```python\n# \nassert 'racecar' not in lm\nassert 'race_car' in lm\n```\n\n### Which of these are (or rather \"can be\") a verb?\n\nWhat are the keys of the lemmas? \n\nAnswer: Synset keys -- that is, an id that references a unit of meaning\n\n\n```python\n# what do are the values of the lemmas?\nlist(lm['eat'])\n```\n\n\n\n\n    ['eat.v.01',\n     'eat.v.02',\n     'feed.v.06',\n     'eat.v.04',\n     'consume.v.05',\n     'corrode.v.01']\n\n\n\nThat little `v` seems to be indicating that the meaning is... verbal?\n\nLet's make a function to grab that middle part of the dot path and use it to make a `is_a_verb` (more like \"can be a verb\"). \n\n\n```python\nfrom collections import Counter\nimport re\nfrom lexis import Lemmas\n\nlm = Lemmas()\n\np_middle_of_dot_path = re.compile('(?P<first>[^\\.]+)\\.(?P<middle>\\w+)\\.(?P<last>[^\\.]+)')\n\ndef _extract_middle(string):\n    m = p_middle_of_dot_path.match(string)\n    if m:\n        return m.groupdict().get('middle', None) \n    else:\n        return None\n    \ndef grammatical_roles(lemma):\n    return Counter(map(_extract_middle, lm[lemma]))\n    \n\nassert grammatical_roles('go') == Counter({'n': 4, 'v': 30, 'a': 1})  # the lemma \"go\" can be a verb, noun, or adjective\n\ndef is_a_verb(lemma):\n    return 'v' in grammatical_roles(lemma)\n    \nassert is_a_verb('go')\nassert not is_a_verb('chess')  # unlike go, chess cannot be used as a verb, apparently\n```\n\nPalendromes that are verbs\n\n\n```python\nlist(filter(lambda x: is_a_verb(x) and is_palendrome_with_at_least_3_letters(x), lm))\n```\n\n\n\n\n    ['tot',\n     'tut-tut',\n     'peep',\n     'tut',\n     'level',\n     'pip',\n     'wow',\n     'bib',\n     'pop',\n     'eye',\n     'gag',\n     'bob',\n     'kayak',\n     'pup',\n     'tat',\n     'boob',\n     'refer',\n     'toot']\n\n\n\n\n## Only p, q, b, d, and vowels\n\n\n```python\nimport re\nfrom lexis import Lemmas\n\nlm = Lemmas()\n\nconsonants = 'pqbd'\nvowels = 'aeiou'  # 'aeiouy'\nfilt = re.compile(f'[{vowels}{consonants}]+$').match  # the pattern\n\nwords = list(filter(lambda w: 2 <= len(w)  <= 7, # number of letters constraing\n                    filter(filt, lm)))  # filter for iep pattern\nlen(words)\n```\n\n\n\n\n    249\n\n\n\n\n```python\nprint(*words, sep='\\t')\n```\n\n    bod\taaa\tadd\tpoa\tpop\tbeaded\taqua\tpib\tedda\tdoob\tboa\tdoi\tpadded\tiodide\tbop\tedo\tbide\teb\tbai\tquid\tde\tade\tdaba\tpid\tbaba\tpaba\tbi\tabb\tbebop\tpa\tpoop\tpb\tdea\todo\tpope\tdad\tpup\tbode\tquad\tbb\tbe\tea\tepee\tbid\tpu\tpique\tiii\tpod\tbee\tpub\tddi\tid\tbaobab\tequid\tpadua\tpipidae\topaque\tpappa\tuppp\tuub\tqepiq\tbibbed\tadp\tada\tpied\taoudad\tqed\tpupa\tbedaub\tbd\tdba\tbopeep\toboe\tado\teq\tbpi\taid\tbud\tdodo\tabo\tqaeda\taa\tpad\tpapua\tbaa\tabode\tbad\tadad\tadapid\tpapaia\tdb\tbede\tai\tpo\tdoei\tpep\teib\tdubai\tepi\tboob\tuuq\tio\tbeep\tquip\tad\tbabu\tded\tia\tdud\tda\tqi\tpaid\tpeep\tdoodad\tbeda\teddo\tboo\tpadda\tipidae\tdeep\tdope\tied\tdoped\tdopa\tii\tiaea\tuda\tdd\tbaud\tdido\tebb\tepa\tbodied\tpap\ted\tpeba\tbed\taudio\tdeed\tidea\tapoidea\tbeau\tup\tpda\tiud\tip\tdiode\tbida\tpi\tapidae\tbead\todd\tod\tdia\tbaddie\tiaa\tape\tipo\tdod\tidp\tee\tie\tdaub\tduo\tboidae\tpoe\tabed\tadobe\tpea\tdude\tdo\taided\tobi\tido\tpipe\tpe\tdoe\taiai\tpd\tbaboo\tquipu\tpood\tpapio\tequidae\tiop\tqadi\tab\tdado\tdub\tadobo\tbap\tpei\tbaeda\tequip\tdupe\taqaba\tbob\tba\tdead\tdada\tadapa\tpee\topepe\tpob\tiou\tduad\tdoodia\tdab\taide\tpip\tdipped\tbubo\tpipa\tpoi\tapia\tode\tupupa\tiq\taba\tabbe\tedp\tedd\tpia\tdue\tpud\tob\taudad\tdp\tdeb\tpie\toed\tdie\tppp\tqueue\tpapa\tadieu\tbiped\tbabe\tida\tdubuque\tdip\tuup\teu\tipod\tbade\tau\tabide\tbib\tbedded\n\n\n\n```python\nprint_definitions(['adapa'])\n```\n\n    - adapa\n        n.01: a Babylonian demigod or first man (sometimes identified with Adam)\n\n\n### Containing i, e, p in that order, with other letters in between\n\n\n```python\nfilt = re.compile('\\w{0,2}i\\w{0,2}e\\w{0,2}p\\w{0,2}$').match  # The *i*e*p* pattern\n\nwords = list(filter(lambda w: len(w) <= 6, # no more than 6 letters\n                    filter(filt, lm)))  # filter for iep pattern\nprint_definitions(words)\n```\n\n\n\n\n    9\n\n\n\n\n```python\nprint(*words, sep=', ')\n```\n\n    tie_up, ginep, lineup, inept, pileup, tiepin, biceps, icecap, ice_up\n\n\n\n```python\nprint_definitions(words)\n```\n\n    - tie_up\n        v.01: secure with or as if with ropes\n        v.02: invest so as to make unavailable for other purposes\n        v.03: restrain from moving or operating normally\n        v.01: secure in or as if in a berth or dock\n        v.05: finish the last row\n    - ginep\n        n.01: tropical American tree bearing a small edible fruit with green leathery skin and sweet juicy translucent pulp\n    - lineup\n        n.01: (baseball) a list of batters in the order in which they will bat\n        n.02: a line of persons arranged by police for inspection or identification\n    - inept\n        s.04: not elegant or graceful in expression\n        s.02: generally incompetent and ineffectual\n        s.03: revealing lack of perceptiveness or judgment or finesse\n    - pileup\n        n.01: multiple collisions of vehicles\n    - tiepin\n        n.01: a pin used to hold the tie in place\n    - biceps\n        n.01: any skeletal muscle having two origins (but especially the muscle that flexes the forearm)\n    - icecap\n        n.01: a mass of ice and snow that permanently covers a large area of land (e.g., the polar regions or a mountain peak)\n    - ice_up\n        v.01: become covered with a layer of ice; of a surface such as a window\n\n\n\n## S-words\n\n\nWords that start with `s` but if you remove `s`, it's still a word.\n\n```python\nfrom lexis import Lemmas  # pip install py2store\nlm = Lemmas()\nswords = list(filter(lambda x: x.startswith('s') and x[1:] in lm, lm))  # one line!\n```\n\n\n```python\nprint(len(t))\nprint(*t[:40], sep=', ')\n```\n\n    711\n    softener, spock, scent, spark, sbe, stickweed, screaky, salt, salp, sec, strap, sliver, slack, swish, sebs, sarawak, scuttle, stripping, swell, stole, spine, space, scar, sass, sewer, spitting, serving, sew, stalk, smite, sniffy, stripe, slake, stone, slit, sea, shoe, sweeper, swear_off, swan\n\n\n\n```python\nfrom py2store import filt_iter, wrap_kvs, KvReader\nfrom lexis import Lemmas  # pip install py2store\nlm = Lemmas()\n\n@filt_iter(filt=lambda x: x.startswith('s') and x[1:] in lm)\nclass Swords(Lemmas):\n    def __getitem__(self, k):\n        v = super().__getitem__(k)\n        for kk, vv in v.items():\n            yield f\"    {'.'.join(kk.split('.')[1:])}: {vv['definition']}\"\n            \ns = Swords()\nlen(s)\n```\n\n\n\n\n    711\n\n\n\n\n```python\nk, v = s.head()\nlist(v)\n```\n\n\n\n\n    ['    n.01: a substance added to another to make it less hard']\n\n\n\n\n```python\nfrom itertools import islice\nfor k, v in islice(s.items(), 5):\n    print(f\"------------ {k} -------------\")\n    print(*v, sep='\\n')\n\n```\n\n    ------------ softener -------------\n        n.01: a substance added to another to make it less hard\n    ------------ spock -------------\n        n.01: United States pediatrician whose many books on child care influenced the upbringing of children around the world (1903-1998)\n    ------------ scent -------------\n        n.02: a distinctive odor that is pleasant\n        n.02: an odor left in passing by which a person or animal can be traced\n        n.01: any property detected by the olfactory system\n        v.01: cause to smell or be smelly\n        v.02: catch the scent of; get wind of\n        v.02: apply perfume to\n    ------------ spark -------------\n        n.01: a momentary flash of light\n        n.01: merriment expressed by a brightness or gleam or animation of countenance\n        n.05: electrical conduction through a gas in an applied electric field\n        n.04: a small but noticeable trace of some quality that might become stronger\n        n.05: Scottish writer of satirical novels (born in 1918)\n        n.06: a small fragment of a burning substance thrown out by burning material or by friction\n        v.04: put in motion or move to act\n        v.02: emit or produce sparks\n    ------------ sbe -------------\n        n.01: the compass point that is one point east of due south\n\n\n",
    "bugtrack_url": null,
    "license": "apache-2.0",
    "summary": "Wordnet wrapper - Easy access to words and their relationships",
    "version": "0.1.2",
    "split_keywords": [
        "words",
        "definitions",
        "lexicon",
        "wordnet",
        "nlp",
        "natural language processing",
        "text mining"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "99cae2a3259ab42b1abd21075a94a9d67867d8b15fb0fcd81c02d5f3d5341261",
                "md5": "2f429854e7c30afe324eb37319ce0e9a",
                "sha256": "2bbdee5ebacb294ca6c9605b45105ab21e23534fd3e624c09eddf5c3a1818b84"
            },
            "downloads": -1,
            "filename": "lexis-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2f429854e7c30afe324eb37319ce0e9a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 18984,
            "upload_time": "2023-04-07T09:18:04",
            "upload_time_iso_8601": "2023-04-07T09:18:04.217467Z",
            "url": "https://files.pythonhosted.org/packages/99/ca/e2a3259ab42b1abd21075a94a9d67867d8b15fb0fcd81c02d5f3d5341261/lexis-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e52cf15e070aaed9845cdee2cd42ff254c5204624478413280a0168615539a5e",
                "md5": "00d9f98fab8613e8a5c44270508172e7",
                "sha256": "8dd68315c20363335febb1df9b814a64e5882fbda2c1157786f9ddb0aa318a3d"
            },
            "downloads": -1,
            "filename": "lexis-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "00d9f98fab8613e8a5c44270508172e7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 24225,
            "upload_time": "2023-04-07T09:18:06",
            "upload_time_iso_8601": "2023-04-07T09:18:06.121896Z",
            "url": "https://files.pythonhosted.org/packages/e5/2c/f15e070aaed9845cdee2cd42ff254c5204624478413280a0168615539a5e/lexis-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-07 09:18:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "thorwhalen",
    "github_project": "lexis",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lexis"
}

Thor Whalen