olipy

Name	olipy JSON
Version	1.0.5 JSON
	download
home_page	None
Summary	Library for artistic text generation
upload_time	2025-01-02 19:36:31
maintainer	None
docs_url	None
author	None
requires_python	>=3.9.0
license	None
keywords	art supplies
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Olipy

Olipy is a Python library for artistic text generation. Unlike most
software packages, which have a single, unifying purpose. Olipy is
more like a set of art supplies. Each module is designed to help you
achieve a different aesthetic effect.

# Setup

Olipy is distributed as the `olipy` package on PyPI. Here's how to
quickly get started from a command line:

```
# Create a virtual environment.
virtualenv env

# Activate the virtual environment.
source env/bin/activate

# Install Olipy within the virtual envirionment.
pip install olipy

# Run an example script.
olipy.apollo
```

Olipy uses the [`TextBlob`](https://textblob.readthedocs.org/) library
to parse text. Installing Olipy through `pip` will install
TextBlob as a dependency, but `TextBlob` has extra dependencies (text corpora) which
are _not_ installed by `pip`.  Instructions for installing the extra
dependencies are on the `TextBlob` site, but they boil down to running
[this Python
script](https://raw.github.com/sloria/TextBlob/master/download_corpora.py).

# Example scripts

Olipy is packaged with a number of  scripts which do fun things with
the data and algorithms. You can run any of these scripts from a
virtual environment that has the `olipy` package installed.

* `olipy.apollo`: Generates dialogue between astronauts and Mission
  Control. Demonstrates Queneau assembly on dialogue.
* `olipy.board_games`: Generates board game names and
  descriptions. Demonstrates complex Queneau assemblies.
* `olipy.corrupt` "Corrupts" whatever text is typed in by adding
  increasing numbers of diacritical marks. Demonstrates the
  `olipy.gibberish.Corruptor` class.
* `olipy.dinosaurs`: Generates dinosaur names. Demonstrates Queneau
  assembly on parts of a word.
* `olipy.eater`: A gateway to a large number of simple but devastating text transformations. Demonstrates the many possibilities of the `olipy.eater` module.
* `olipy.ebooks`: Selects some lines from a public domain text using
  the *_ebooks algorithm. Demonstrates the
  `olipy.gutenberg.ProjectGutenbergText`
  and `olipy.ebooks.EbooksQuotes` classes.
* `olipy.gibberish`: Prints out 140-character string of aesthetically
  pleasing(?) gibberish. Demonstrates the `gibberish.Gibberish` class.
* `olipy.mashteroids`: Generates names and IAU citations for minor
  planets. Demonstrates Queneau assembly on sentences.
* `olipy.sonnet`: Generates Shakespearean sonnets using Queneau assembly.
* `olipy.typewriter`: Retypes whatever you type into it, with added typoes.
* `olipy.words`: Generates common-looking and obscure-looking English
  words.

# Module guide

## alphabet.py

A list of interesting groups of Unicode characters -- alphabets, shapes, and so on.

```
from olipy.alphabet import Alphabet
print(Alphabet.default().random_choice())
# 𝔄𝔅ℭ𝔇𝔈𝔉𝔊ℌℑ𝔍𝔎𝔏𝔐𝔑𝔒𝔓𝔔ℜ𝔖𝔗𝔘𝔙𝔚𝔛𝔜ℨ𝔞𝔟𝔠𝔡𝔢𝔣𝔤𝔥𝔦𝔧𝔨𝔩𝔪𝔫𝔬𝔭𝔮𝔯𝔰𝔱𝔲𝔳𝔴𝔵𝔶𝔷
print(Alphabet.default().random_choice())
# ┌┐└┘├┤┬┴┼═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟╠╡╢╣╤╥╦╧╨╩╪╫╬╴╵╶╷
```

This module is used heavily by gibberish.py.

# corpora.py

This module makes it easy to load datasets from Darius
Kazemi's [Corpora Project](https://github.com/dariusk/corpora), as
well as additional datasets specific to Olipy -- mostly large word
lists which the Corpora Project considers out of scope. (These new
datasets are discussed at the end of this document.)

Olipy is packaged with a complete copy of the data from the Corpora
Project, so you don't have to install anything extra. However,
installing the Corpora Project data some other way can give you
datasets created since the Olipy package was updated.

The interface of the `corpora` module is that used by Allison Parrish's
[`pycorpora`](https://github.com/aparrish/pycorpora/) project. The
datasets show up as Python modules which contain Python data
structures:

```
from olipy import corpora
for city in corpora.geography.large_cities['cities']:
    print(city)
# Akron
# Albequerque
# Anchorage
# ...
```

You can use `from corpora import` ... to import a particular Corpora
Project category:

```
from olipy.corpora import governments
print(governments.nsa_projects["codenames"][0] # prints "ARTIFICE")

from olipy.pycorpora import humans
print(humans.occupations["occupations"][0] # prints "accountant")
```

Additionally, corpora supports an API similar to that provided by the Corpora Project node package:

```
from olipy import corpora

# get a list of all categories
corpora.get_categories() # ["animals", "archetypes"...]

# get a list of subcategories for a particular category
corpora.get_categories("words") # ["literature", "word_clues"...]

# get a list of all files in a particular category
corpora.get_files("animals") # ["birds_antarctica", "birds_uk", ...]

# get data deserialized from the JSON data in a particular file
corpora.get_file("animals", "birds_antarctica") # returns dict w/data

# get file in a subcategory
corpora.get_file("words/literature", "shakespeare_words")
```

## eater.py

The Eater of Meaning is a module containing a variety of simple but
devastating text transformations.

```
from olipy.eater import EatWordEndings
EatWordEndings()("The Eater of Meaning is a tool for extracting the message from the medium.")
# 'There Eatable of Meager is a toot forwards exteroceptor thelytocia mess frolicky therapeusis medially.'

from olipy.eater import EatSyllables
EatSyllables()("Format and presentation are unaffected, but words and letters are subjected to an elaborate nonsensification process")
# 'Absorbed pinks instigating recourse kalamazoo, loaned traced posts fallen stepper tyranny claimed mace particularly infallibility whimper'

from olipy.eater import ScrambleWordCenters
ScrambleWordCenters()("that eliminates semantics root and branch.")
# 'taht eaemiltins scmieants root and bnarch.'

from olipy.eater import URLEater, ReplaceWords
URLEater(ReplaceWords())("https://www.example.com/")
# '<!DOCTYPE html>\n\n<html>\n<head>\n<title>Ipsum Dolor</title>\n<meta charset="sit-Amet">...'
```

This module is an enhanced port of [the original Eater of Meaning CGI script from 2003.](https://www.crummy.com/software/eater/)

## ebooks.py

A module for incongruously sampling texts in the style of the infamous
[https://twitter.com/horse_ebooks](@horse_ebooks). Based on the
[https://twitter.com/zzt_ebooks](@zzt_ebooks) algorithm by Allison
Parrish.

```
from olipy.ebooks import EbooksQuotes
from olipy import corpora
data = corpora.words.literature.fiction.pride_and_prejudice
for quote in EbooksQuotes().quotes_in(data['text']):
    print(quote)
# They attacked him  in various ways--with barefaced
# An invitation to dinner
# Mrs. Bennet
# ...
```

## gibberish.py

A module for those interested in the appearance of Unicode
glyphs. Its main use is generating aesthetically pleasing gibberish
using selected combinations of Unicode code charts.

```
from olipy.gibberish import Gibberish
print(Gibberish.random().tweet().encode("utf8"))
# ৠ𐒧𐒇দ𐒔𐒜ৗ𐒃𐒝𐒓আ৭৭উ𐒇৶০ধপ𐒤৯ৰ৪ড়ঐবননত৲ফঌ𐒓৴ৄু০েএঠৰ𐒔𐒥গনি৶ঘ𐒋উঙ𐒤ঙছতাৃীফ৮৬৸উকফ𐒘ইমঢ৭ূণঌঊ𐒇𐒋ীঁিৃ𐒌𐒒৺𐒤৺ভ𐒖৭𐒤ৡৰল𐒊ঢ়ৎ𐒅যথখৱঌ
# ঈঔ৫ঽ𐒔৩়দ𐒋ৠসুয়ঊশ𐒆𐒖𐒁ঔৰসঈ𐒆অ𐒋𐒑𐒨়দ৯ৄ৫ 😘
```

## gutenberg.py

A module for dealing with texts from Project Gutenberg. Strips headers
and footers, and parses the text.

```
from olipy import corpora
from olipy.gutenberg import ProjectGutenbergText
text = corpora.words.literature.nonfiction.literary_shrines['text']
text = ProjectGutenbergText(text)
print(len(text.paragraphs))
# 1258
```

## ia.py

A module for dealing with texts from Internet Archive.

```
import random
from olipy.ia import Text

# Print a URL to the web reader for a specific title in the IA collection.
item = Text("yorkchronicle1946poqu")
print(item.reader_url(10))
# https://archive.org/details/yorkchronicle1946poqu/page/n10

# Pick a random page from a specific title, and print a URL to a
# reusable image of that page.
identifier = "TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150"
item = Text(identifier)
page = random.randint(0, item.pages-1)
print(item.image_url(page, scale=8))
# https://ia600106.us.archive.org/BookReader/BookReaderImages.php?zip=/30/items/TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150/TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150_jp2.zip&file=TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150_jp2/TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150_0007.jp2&scale=8
```

## letterforms.py

A module that knows things about the shapes of Unicode glyphs.

`alternate_spelling` translates from letters of the English alphabet
to similar-looking characters.

```
from olipy.letterforms import alternate_spelling
print(alternate_spelling("I love alternate letterforms."))
# ヱ 𝑳𝖮Ⓥ𝙀 𝚊𝓵┯⒠┌𝐍ａ⫪𝖊 𝐋𝖾ߙ𝓉ᥱ𝙧ߓ𝕠┍ጠ𝑆.
```

## markov.py

A module for generating new token lists from old token lists using a
Markov chain.

Olipy's primary purpose is to promote alternatives to
Markov chains (such as Queneau assembly and the *_ebooks algorithm),
but sometimes you really do want a Markov chain. Queneau assembly is
usually better than a Markov chain above the word level (constructing
paragraphs from sentences) and below the word level (constructing
words from phonemes), but Markov chains are usually better when
assembling sequences of words.

markov.py was originally written by Allison "A. A." Parrish.

```
from olipy.markov import MarkovGenerator
from olipy import corpora
text = corpora.words.literature.nonfiction.literary_shrines['text']
g = MarkovGenerator(order=1, max=100)
g.add(text)
print(" ".join(g.assemble()))
# The Project Gutenberg-tm trademark.                    Canst thou, e'en thus, thy own savings, went as the gardens, the club. The quarrel occurred between
# him and his essay on the tea-table. In these that, in Lamb's day, for a stray
# relic or four years ago, taken with only Adam and _The
# Corsair_. Writing to his home on his new purple and the young man you might
# mean nothing on Christmas sports and art seriously instead of references to
# the heart'--allowed--yet I got out and more convenient.... Mr.
```

## mosaic.py

Tiles Unicode characters together to create symmetrical mosaics.
gibberish.py uses this module as one of its techniques. Includes
information on Unicode characters whose glyphs appear to be mirror
images.

```
from olipy.mosaic import MirroredMosaicGibberish
mosaic = MirroredMosaicGibberish()
print(mosaic.tweet())
# ▛▞ ▙▞▙▟▚▟ ▚▜
# ▛▞▞ ▞▛▜▚ ▚▚▜
#  ▞▙  ▞▚  ▟▚ 
# ▙▚▚ ▚▙▟▞ ▞▞▟
# ▙▚ ▛▚▛▜▞▜ ▞▟

print(gibberish.tweet())
# 🙌🙌😯📶🙌👍👍🙌📶😯🙌🙌
#  📶🙌😯🙌🕠🕠🙌😯🙌📶 
# 🚂💈🎈🔒🚲🕃🕃🚲🔒🎈💈🚂
#  📶🙌😯🙌🕠🕠🙌😯🙌📶 
# 🙌🙌😯📶🙌👍👍🙌📶😯🙌🙌

```

## queneau.py

A module for Queneau assembly, a technique pioneered by Raymond
Queneau in his 1961 book "Cent mille milliards de poèmes" ("One
hundred million million poems"). Queneau assembly randomly creates new
texts from a collection of existing texts with identical structure.

```
from olipy.queneau import WordAssembler
from olipy.corpus import Corpus
assembler = WordAssembler(Corpus.load("dinosaurs"))
print(assembler.assemble_word())
# Trilusmiasunaus
```

## randomness.py

Techniques for generating random patterns that are more sophisticated
than `random.choice`.

### `Gradient`

The `Gradient` class generates a string of random choices that are
weighted towards one set of options near the start, and weighted
towards another set of options near the end.

Here's a gradient from lowercase letters to uppercase letters:

```
from olipy.randomness import Gradient
import string
print("".join(Gradient.gradient(string.lowercase, string.uppercase, 40)))
# rkwyobijqQOzKfdcSHIhYINGrQkBRddEWPHYtORB
```

### `WanderingMonsterTable`

The `WanderingMonsterTable` class lets you make a weighted random selection from 
one of four buckets. A random selection from the "common" bucket will show up 65% of the time, a 
selection from the "uncommon" bucket 20% of the time, "rare" 11% of the time, and "very rare" 4% of 
the time. (It uses the same probabilities as the first edition of Advanced Dungeons & Dragons.)

```
from olipy.randomness import WanderingMonsterTable

monsters = WanderingMonsterTable(
         common=["Giant rat", "Alligator"],
         uncommon=["Orc", "Hobgoblin"],
         rare=["Mind flayer", "Neo-otyugh"],
         very_rare=["Flumph", "Ygorl, Lord of Entropy"],
)
for i in range(5):
    print monsters.choice()
# Giant rat
# Alligator
# Alligator
# Orc
# Giant rat
```

tokenizer.py
------------

A word tokenizer that performs better than NLTK's default tokenizers
on some common types of English.

```
from nltk.tokenize.treebank import TreebankWordTokenizer
s = '''Good muffins cost $3.88\\nin New York. Email: muffins@example.com'''
TreebankWordTokenizer().tokenize(s)
# ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York.', 'Email', ':', 'muffins', '@', 'example.com']
WordTokenizer().tokenize(s)
# ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York.', 'Email:', 'muffins@example.com']
```

typewriter.py
-------------

Simulates the Adler Universal 39 typewriter used in _The Shining_ and
the sorts of typos that would be made on that typewriter. Originally
written for [@a_dull_bot](https://botsin.space/@adullbot).

```
from olipy.typewriter import Typewriter
typewriter = Typewriter()
typewriter.type("All work and no play makes Jack a dull boy.")
# 'All work and no play makes Jack a dull bo6.'
```

# Extra corpora

Olipy makes available several word lists and datasets that aren't in
the Corpora Project. These datasets (as well as the standard Corpora
Project datasets) can be accessed through the `corpora` module. Just
write code like this:

```
from olipy import corpora
nouns = corpora.words.common_nouns['abstract_nouns']
```

### `corpora.geography.large_cities`

Names of large U.S. and world cities.

### `corpora.geography.us_states`

The fifty U.S. states.

### `corpora.language.languages`

Names of languages defined in ISO-639-1

### `corpora.language.unicode_code_sheets`

The name of every Unicode code sheet, each with the characters found on that sheet.

### `corpora.science.minor_planet_details`

'name', 'number' and IAU 'citation' for named minor planets
(e.g. asteroids) as of July 2013. The 'discovery' field contains
discovery circumstances. The 'suggested_by' field, when present, has
been split out from the end of the original IAU citation with a simple
heuristic. The 'citation' field has then been tokenized into sentences
using NLTK's Punkt tokenizer and a set of custom abbreviations.

Data sources: 
 http://www.minorplanetcenter.net/iau/lists/NumberedMPs.html
 http://ssd.jpl.nasa.gov/sbdb.cgi

This is more complete than the Corpora Project's `minor_planets`,
which only lists the names of the first 1000 minor planets.

### `corpora.words.adjectives`

About 5000 English adjectives, sorted roughly by frequency of occurrence.

### `corpora.words.by_syllable_count`

A map of numbers 1-8 to English words with the corresponding number of syllables.

### `corpora.words.common_nouns`

Lists of English nouns, sorted roughly by frequency of occurrence.

Includes:

* `abstract_nouns` like "work" and "love".
* `concrete_nouns` like "face" and "house".
* `adjectival_nouns` -- nouns that can also act as adjectives -- like "chance" and "light".

### `corpora.words.common_verbs`

Lists of English verbs, sorted roughly by frequency of occurrence.

* `present_tense` verbs like "get" and "want".
* `past_tense` verbs like "said" and "found".
* `gerund` forms like "holding" and "leaving".

### `corpora.words.english_words`

A consolidated list of about 73,000 English words from the FRELI
project. (http://www.nkuitse.com/freli/)

### `corpora.words.scribblenauts`

The top 4000 nouns that were 'concrete' enough to be summonable in the
2009 game _Scribblenauts_. As always, this list is ordered with more common
words towards the front.

### `corpora.words.literature.board_games`

Information about board games, collected from BoardGameGeek in July
2013. One JSON object per line.

Data source:
 http://boardgamegeek.com/wiki/page/BGG_XML_API2


### `corpora.words.literature.fiction.pride_and_prejudice`

The complete text of a public domain novel ("Pride and Prejudice"
by Jane Austen).

### `corpora.words.literature.nonfiction.apollo_11`

Transcripts of the Apollo 11 mission, presented as dialogue, tokenized
into sentences using NLTK's Punkt tokenizer. One JSON object per line.

Data sources:
 The Apollo 11 Flight Journal: http://history.nasa.gov/ap11fj/
 The Apollo 11 Surface Journal: http://history.nasa.gov/alsj/
 "Intended to be a resource for all those interested in the Apollo
  program, whether in a passing or scholarly capacity."

### `corpora.words.literature.nonfiction.literary_shrines`

The complete text of a public domain nonfiction book ("Famous Houses
and Literary Shrines of London" by A. St. John Adcock).

### `corpora.words.literature.gutenberg_id_mapping`

Maps old-style (pre-2007) Project Gutenberg filenames to the new-style
ebook IDs. For example, "/etext95/3boat10.zip" is mapped to the
number 308 (see http://www.gutenberg.org/ebooks/308). Pretty much
nobody needs this.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "olipy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9.0",
    "maintainer_email": null,
    "keywords": "art supplies",
    "author": null,
    "author_email": "Leonard Richardson <leonardr@segfault.org>",
    "download_url": "https://files.pythonhosted.org/packages/ed/09/856f83381ff707d74bbb4d1074db43ddac1994cc0f2209155686e7d1f4d9/olipy-1.0.5.tar.gz",
    "platform": null,
    "description": "# Olipy\n\nOlipy is a Python library for artistic text generation. Unlike most\nsoftware packages, which have a single, unifying purpose. Olipy is\nmore like a set of art supplies. Each module is designed to help you\nachieve a different aesthetic effect.\n\n# Setup\n\nOlipy is distributed as the `olipy` package on PyPI. Here's how to\nquickly get started from a command line:\n\n```\n# Create a virtual environment.\nvirtualenv env\n\n# Activate the virtual environment.\nsource env/bin/activate\n\n# Install Olipy within the virtual envirionment.\npip install olipy\n\n# Run an example script.\nolipy.apollo\n```\n\nOlipy uses the [`TextBlob`](https://textblob.readthedocs.org/) library\nto parse text. Installing Olipy through `pip` will install\nTextBlob as a dependency, but `TextBlob` has extra dependencies (text corpora) which\nare _not_ installed by `pip`.  Instructions for installing the extra\ndependencies are on the `TextBlob` site, but they boil down to running\n[this Python\nscript](https://raw.github.com/sloria/TextBlob/master/download_corpora.py).\n\n# Example scripts\n\nOlipy is packaged with a number of  scripts which do fun things with\nthe data and algorithms. You can run any of these scripts from a\nvirtual environment that has the `olipy` package installed.\n\n* `olipy.apollo`: Generates dialogue between astronauts and Mission\n  Control. Demonstrates Queneau assembly on dialogue.\n* `olipy.board_games`: Generates board game names and\n  descriptions. Demonstrates complex Queneau assemblies.\n* `olipy.corrupt` \"Corrupts\" whatever text is typed in by adding\n  increasing numbers of diacritical marks. Demonstrates the\n  `olipy.gibberish.Corruptor` class.\n* `olipy.dinosaurs`: Generates dinosaur names. Demonstrates Queneau\n  assembly on parts of a word.\n* `olipy.eater`: A gateway to a large number of simple but devastating text transformations. Demonstrates the many possibilities of the `olipy.eater` module.\n* `olipy.ebooks`: Selects some lines from a public domain text using\n  the *_ebooks algorithm. Demonstrates the\n  `olipy.gutenberg.ProjectGutenbergText`\n  and `olipy.ebooks.EbooksQuotes` classes.\n* `olipy.gibberish`: Prints out 140-character string of aesthetically\n  pleasing(?) gibberish. Demonstrates the `gibberish.Gibberish` class.\n* `olipy.mashteroids`: Generates names and IAU citations for minor\n  planets. Demonstrates Queneau assembly on sentences.\n* `olipy.sonnet`: Generates Shakespearean sonnets using Queneau assembly.\n* `olipy.typewriter`: Retypes whatever you type into it, with added typoes.\n* `olipy.words`: Generates common-looking and obscure-looking English\n  words.\n\n# Module guide\n\n## alphabet.py\n\nA list of interesting groups of Unicode characters -- alphabets, shapes, and so on.\n\n```\nfrom olipy.alphabet import Alphabet\nprint(Alphabet.default().random_choice())\n# \ud835\udd04\ud835\udd05\u212d\ud835\udd07\ud835\udd08\ud835\udd09\ud835\udd0a\u210c\u2111\ud835\udd0d\ud835\udd0e\ud835\udd0f\ud835\udd10\ud835\udd11\ud835\udd12\ud835\udd13\ud835\udd14\u211c\ud835\udd16\ud835\udd17\ud835\udd18\ud835\udd19\ud835\udd1a\ud835\udd1b\ud835\udd1c\u2128\ud835\udd1e\ud835\udd1f\ud835\udd20\ud835\udd21\ud835\udd22\ud835\udd23\ud835\udd24\ud835\udd25\ud835\udd26\ud835\udd27\ud835\udd28\ud835\udd29\ud835\udd2a\ud835\udd2b\ud835\udd2c\ud835\udd2d\ud835\udd2e\ud835\udd2f\ud835\udd30\ud835\udd31\ud835\udd32\ud835\udd33\ud835\udd34\ud835\udd35\ud835\udd36\ud835\udd37\nprint(Alphabet.default().random_choice())\n# \u250c\u2510\u2514\u2518\u251c\u2524\u252c\u2534\u253c\u2550\u2551\u2552\u2553\u2554\u2555\u2556\u2557\u2558\u2559\u255a\u255b\u255c\u255d\u255e\u255f\u2560\u2561\u2562\u2563\u2564\u2565\u2566\u2567\u2568\u2569\u256a\u256b\u256c\u2574\u2575\u2576\u2577\n```\n\nThis module is used heavily by gibberish.py.\n\n# corpora.py\n\nThis module makes it easy to load datasets from Darius\nKazemi's [Corpora Project](https://github.com/dariusk/corpora), as\nwell as additional datasets specific to Olipy -- mostly large word\nlists which the Corpora Project considers out of scope. (These new\ndatasets are discussed at the end of this document.)\n\nOlipy is packaged with a complete copy of the data from the Corpora\nProject, so you don't have to install anything extra. However,\ninstalling the Corpora Project data some other way can give you\ndatasets created since the Olipy package was updated.\n\nThe interface of the `corpora` module is that used by Allison Parrish's\n[`pycorpora`](https://github.com/aparrish/pycorpora/) project. The\ndatasets show up as Python modules which contain Python data\nstructures:\n\n```\nfrom olipy import corpora\nfor city in corpora.geography.large_cities['cities']:\n    print(city)\n# Akron\n# Albequerque\n# Anchorage\n# ...\n```\n\nYou can use `from corpora import` ... to import a particular Corpora\nProject category:\n\n```\nfrom olipy.corpora import governments\nprint(governments.nsa_projects[\"codenames\"][0] # prints \"ARTIFICE\")\n\nfrom olipy.pycorpora import humans\nprint(humans.occupations[\"occupations\"][0] # prints \"accountant\")\n```\n\nAdditionally, corpora supports an API similar to that provided by the Corpora Project node package:\n\n```\nfrom olipy import corpora\n\n# get a list of all categories\ncorpora.get_categories() # [\"animals\", \"archetypes\"...]\n\n# get a list of subcategories for a particular category\ncorpora.get_categories(\"words\") # [\"literature\", \"word_clues\"...]\n\n# get a list of all files in a particular category\ncorpora.get_files(\"animals\") # [\"birds_antarctica\", \"birds_uk\", ...]\n\n# get data deserialized from the JSON data in a particular file\ncorpora.get_file(\"animals\", \"birds_antarctica\") # returns dict w/data\n\n# get file in a subcategory\ncorpora.get_file(\"words/literature\", \"shakespeare_words\")\n```\n\n## eater.py\n\nThe Eater of Meaning is a module containing a variety of simple but\ndevastating text transformations.\n\n```\nfrom olipy.eater import EatWordEndings\nEatWordEndings()(\"The Eater of Meaning is a tool for extracting the message from the medium.\")\n# 'There Eatable of Meager is a toot forwards exteroceptor thelytocia mess frolicky therapeusis medially.'\n\nfrom olipy.eater import EatSyllables\nEatSyllables()(\"Format and presentation are unaffected, but words and letters are subjected to an elaborate nonsensification process\")\n# 'Absorbed pinks instigating recourse kalamazoo, loaned traced posts fallen stepper tyranny claimed mace particularly infallibility whimper'\n\nfrom olipy.eater import ScrambleWordCenters\nScrambleWordCenters()(\"that eliminates semantics root and branch.\")\n# 'taht eaemiltins scmieants root and bnarch.'\n\nfrom olipy.eater import URLEater, ReplaceWords\nURLEater(ReplaceWords())(\"https://www.example.com/\")\n# '<!DOCTYPE html>\\n\\n<html>\\n<head>\\n<title>Ipsum Dolor</title>\\n<meta charset=\"sit-Amet\">...'\n```\n\nThis module is an enhanced port of [the original Eater of Meaning CGI script from 2003.](https://www.crummy.com/software/eater/)\n\n## ebooks.py\n\nA module for incongruously sampling texts in the style of the infamous\n[https://twitter.com/horse_ebooks](@horse_ebooks). Based on the\n[https://twitter.com/zzt_ebooks](@zzt_ebooks) algorithm by Allison\nParrish.\n\n```\nfrom olipy.ebooks import EbooksQuotes\nfrom olipy import corpora\ndata = corpora.words.literature.fiction.pride_and_prejudice\nfor quote in EbooksQuotes().quotes_in(data['text']):\n    print(quote)\n# They attacked him  in various ways--with barefaced\n# An invitation to dinner\n# Mrs. Bennet\n# ...\n```\n\n## gibberish.py\n\nA module for those interested in the appearance of Unicode\nglyphs. Its main use is generating aesthetically pleasing gibberish\nusing selected combinations of Unicode code charts.\n\n```\nfrom olipy.gibberish import Gibberish\nprint(Gibberish.random().tweet().encode(\"utf8\"))\n# \u09e0\ud801\udca7\ud801\udc87\u09a6\ud801\udc94\ud801\udc9c\u09d7\ud801\udc83\ud801\udc9d\ud801\udc93\u0986\u09ed\u09ed\u0989\ud801\udc87\u09f6\u09e6\u09a7\u09aa\ud801\udca4\u09ef\u09f0\u09ea\u09a1\u09bc\u0990\u09ac\u09a8\u09a8\u09a4\u09f2\u09ab\u098c\ud801\udc93\u09f4\u09c4\u09c1\u09e6\u09c7\u098f\u09a0\u09f0\ud801\udc94\ud801\udca5\u0997\u09a8\u09bf\u09f6\u0998\ud801\udc8b\u0989\u0999\ud801\udca4\u0999\u099b\u09a4\u09be\u09c3\u09c0\u09ab\u09ee\u09ec\u09f8\u0989\u0995\u09ab\ud801\udc98\u0987\u09ae\u09a2\u09ed\u09c2\u09a3\u098c\u098a\ud801\udc87\ud801\udc8b\u09c0\u0981\u09bf\u09c3\ud801\udc8c\ud801\udc92\u09fa\ud801\udca4\u09fa\u09ad\ud801\udc96\u09ed\ud801\udca4\u09e1\u09f0\u09b2\ud801\udc8a\u09a2\u09bc\u09ce\ud801\udc85\u09af\u09a5\u0996\u09f1\u098c\n# \u0988\u0994\u09eb\u09bd\ud801\udc94\u09e9\u09bc\u09a6\ud801\udc8b\u09e0\u09b8\u09c1\u09af\u09bc\u098a\u09b6\ud801\udc86\ud801\udc96\ud801\udc81\u0994\u09f0\u09b8\u0988\ud801\udc86\u0985\ud801\udc8b\ud801\udc91\ud801\udca8\u09bc\u09a6\u09ef\u09c4\u09eb \ud83d\ude18\n```\n\n## gutenberg.py\n\nA module for dealing with texts from Project Gutenberg. Strips headers\nand footers, and parses the text.\n\n```\nfrom olipy import corpora\nfrom olipy.gutenberg import ProjectGutenbergText\ntext = corpora.words.literature.nonfiction.literary_shrines['text']\ntext = ProjectGutenbergText(text)\nprint(len(text.paragraphs))\n# 1258\n```\n\n## ia.py\n\nA module for dealing with texts from Internet Archive.\n\n```\nimport random\nfrom olipy.ia import Text\n\n# Print a URL to the web reader for a specific title in the IA collection.\nitem = Text(\"yorkchronicle1946poqu\")\nprint(item.reader_url(10))\n# https://archive.org/details/yorkchronicle1946poqu/page/n10\n\n# Pick a random page from a specific title, and print a URL to a\n# reusable image of that page.\nidentifier = \"TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150\"\nitem = Text(identifier)\npage = random.randint(0, item.pages-1)\nprint(item.image_url(page, scale=8))\n# https://ia600106.us.archive.org/BookReader/BookReaderImages.php?zip=/30/items/TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150/TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150_jp2.zip&file=TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150_jp2/TNM_Radio_equipment_catalog_fall__winter_1963_-_H_20180117_0150_0007.jp2&scale=8\n```\n\n## letterforms.py\n\nA module that knows things about the shapes of Unicode glyphs.\n\n`alternate_spelling` translates from letters of the English alphabet\nto similar-looking characters.\n\n```\nfrom olipy.letterforms import alternate_spelling\nprint(alternate_spelling(\"I love alternate letterforms.\"))\n# \u30f1 \ud835\udc73\ud835\uddae\u24cb\ud835\ude40 \ud835\ude8a\ud835\udcf5\u252f\u24a0\u250c\ud835\udc0d\uff41\u2aea\ud835\udd8a \ud835\udc0b\ud835\uddbe\u07d9\ud835\udcc9\u1971\ud835\ude67\u07d3\ud835\udd60\u250d\u1320\ud835\udc46.\n```\n\n## markov.py\n\nA module for generating new token lists from old token lists using a\nMarkov chain.\n\nOlipy's primary purpose is to promote alternatives to\nMarkov chains (such as Queneau assembly and the *_ebooks algorithm),\nbut sometimes you really do want a Markov chain. Queneau assembly is\nusually better than a Markov chain above the word level (constructing\nparagraphs from sentences) and below the word level (constructing\nwords from phonemes), but Markov chains are usually better when\nassembling sequences of words.\n\nmarkov.py was originally written by Allison \"A. A.\" Parrish.\n\n```\nfrom olipy.markov import MarkovGenerator\nfrom olipy import corpora\ntext = corpora.words.literature.nonfiction.literary_shrines['text']\ng = MarkovGenerator(order=1, max=100)\ng.add(text)\nprint(\" \".join(g.assemble()))\n# The Project Gutenberg-tm trademark.                    Canst thou, e'en thus, thy own savings, went as the gardens, the club. The quarrel occurred between\n# him and his essay on the tea-table. In these that, in Lamb's day, for a stray\n# relic or four years ago, taken with only Adam and _The\n# Corsair_. Writing to his home on his new purple and the young man you might\n# mean nothing on Christmas sports and art seriously instead of references to\n# the heart'--allowed--yet I got out and more convenient.... Mr.\n```\n\n## mosaic.py\n\nTiles Unicode characters together to create symmetrical mosaics.\ngibberish.py uses this module as one of its techniques. Includes\ninformation on Unicode characters whose glyphs appear to be mirror\nimages.\n\n```\nfrom olipy.mosaic import MirroredMosaicGibberish\nmosaic = MirroredMosaicGibberish()\nprint(mosaic.tweet())\n# \u259b\u259e\u2003\u2599\u259e\u2599\u259f\u259a\u259f\u2003\u259a\u259c\n# \u259b\u259e\u259e\u2003\u259e\u259b\u259c\u259a\u2003\u259a\u259a\u259c\n# \u2003\u259e\u2599\u2003\u2003\u259e\u259a\u2003\u2003\u259f\u259a\u2003\n# \u2599\u259a\u259a\u2003\u259a\u2599\u259f\u259e\u2003\u259e\u259e\u259f\n# \u2599\u259a\u2003\u259b\u259a\u259b\u259c\u259e\u259c\u2003\u259e\u259f\n\nprint(gibberish.tweet())\n# \ud83d\ude4c\ud83d\ude4c\ud83d\ude2f\ud83d\udcf6\ud83d\ude4c\ud83d\udc4d\ud83d\udc4d\ud83d\ude4c\ud83d\udcf6\ud83d\ude2f\ud83d\ude4c\ud83d\ude4c\n# \u2003\ud83d\udcf6\ud83d\ude4c\ud83d\ude2f\ud83d\ude4c\ud83d\udd60\ud83d\udd60\ud83d\ude4c\ud83d\ude2f\ud83d\ude4c\ud83d\udcf6\u2003\n# \ud83d\ude82\ud83d\udc88\ud83c\udf88\ud83d\udd12\ud83d\udeb2\ud83d\udd43\ud83d\udd43\ud83d\udeb2\ud83d\udd12\ud83c\udf88\ud83d\udc88\ud83d\ude82\n# \u2003\ud83d\udcf6\ud83d\ude4c\ud83d\ude2f\ud83d\ude4c\ud83d\udd60\ud83d\udd60\ud83d\ude4c\ud83d\ude2f\ud83d\ude4c\ud83d\udcf6\u2003\n# \ud83d\ude4c\ud83d\ude4c\ud83d\ude2f\ud83d\udcf6\ud83d\ude4c\ud83d\udc4d\ud83d\udc4d\ud83d\ude4c\ud83d\udcf6\ud83d\ude2f\ud83d\ude4c\ud83d\ude4c\n\n```\n\n## queneau.py\n\nA module for Queneau assembly, a technique pioneered by Raymond\nQueneau in his 1961 book \"Cent mille milliards de po\u00e8mes\" (\"One\nhundred million million poems\"). Queneau assembly randomly creates new\ntexts from a collection of existing texts with identical structure.\n\n```\nfrom olipy.queneau import WordAssembler\nfrom olipy.corpus import Corpus\nassembler = WordAssembler(Corpus.load(\"dinosaurs\"))\nprint(assembler.assemble_word())\n# Trilusmiasunaus\n```\n\n## randomness.py\n\nTechniques for generating random patterns that are more sophisticated\nthan `random.choice`.\n\n### `Gradient`\n\nThe `Gradient` class generates a string of random choices that are\nweighted towards one set of options near the start, and weighted\ntowards another set of options near the end.\n\nHere's a gradient from lowercase letters to uppercase letters:\n\n```\nfrom olipy.randomness import Gradient\nimport string\nprint(\"\".join(Gradient.gradient(string.lowercase, string.uppercase, 40)))\n# rkwyobijqQOzKfdcSHIhYINGrQkBRddEWPHYtORB\n```\n\n### `WanderingMonsterTable`\n\nThe `WanderingMonsterTable` class lets you make a weighted random selection from \none of four buckets. A random selection from the \"common\" bucket will show up 65% of the time, a \nselection from the \"uncommon\" bucket 20% of the time, \"rare\" 11% of the time, and \"very rare\" 4% of \nthe time. (It uses the same probabilities as the first edition of Advanced Dungeons & Dragons.)\n\n```\nfrom olipy.randomness import WanderingMonsterTable\n\nmonsters = WanderingMonsterTable(\n         common=[\"Giant rat\", \"Alligator\"],\n         uncommon=[\"Orc\", \"Hobgoblin\"],\n         rare=[\"Mind flayer\", \"Neo-otyugh\"],\n         very_rare=[\"Flumph\", \"Ygorl, Lord of Entropy\"],\n)\nfor i in range(5):\n    print monsters.choice()\n# Giant rat\n# Alligator\n# Alligator\n# Orc\n# Giant rat\n```\n\ntokenizer.py\n------------\n\nA word tokenizer that performs better than NLTK's default tokenizers\non some common types of English.\n\n```\nfrom nltk.tokenize.treebank import TreebankWordTokenizer\ns = '''Good muffins cost $3.88\\\\nin New York. Email: muffins@example.com'''\nTreebankWordTokenizer().tokenize(s)\n# ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York.', 'Email', ':', 'muffins', '@', 'example.com']\nWordTokenizer().tokenize(s)\n# ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York.', 'Email:', 'muffins@example.com']\n```\n\ntypewriter.py\n-------------\n\nSimulates the Adler Universal 39 typewriter used in _The Shining_ and\nthe sorts of typos that would be made on that typewriter. Originally\nwritten for [@a_dull_bot](https://botsin.space/@adullbot).\n\n```\nfrom olipy.typewriter import Typewriter\ntypewriter = Typewriter()\ntypewriter.type(\"All work and no play makes Jack a dull boy.\")\n# 'All work and no play makes Jack a dull bo6.'\n```\n\n# Extra corpora\n\nOlipy makes available several word lists and datasets that aren't in\nthe Corpora Project. These datasets (as well as the standard Corpora\nProject datasets) can be accessed through the `corpora` module. Just\nwrite code like this:\n\n```\nfrom olipy import corpora\nnouns = corpora.words.common_nouns['abstract_nouns']\n```\n\n### `corpora.geography.large_cities`\n\nNames of large U.S. and world cities.\n\n### `corpora.geography.us_states`\n\nThe fifty U.S. states.\n\n### `corpora.language.languages`\n\nNames of languages defined in ISO-639-1\n\n### `corpora.language.unicode_code_sheets`\n\nThe name of every Unicode code sheet, each with the characters found on that sheet.\n\n### `corpora.science.minor_planet_details`\n\n'name', 'number' and IAU 'citation' for named minor planets\n(e.g. asteroids) as of July 2013. The 'discovery' field contains\ndiscovery circumstances. The 'suggested_by' field, when present, has\nbeen split out from the end of the original IAU citation with a simple\nheuristic. The 'citation' field has then been tokenized into sentences\nusing NLTK's Punkt tokenizer and a set of custom abbreviations.\n\nData sources: \n http://www.minorplanetcenter.net/iau/lists/NumberedMPs.html\n http://ssd.jpl.nasa.gov/sbdb.cgi\n\nThis is more complete than the Corpora Project's `minor_planets`,\nwhich only lists the names of the first 1000 minor planets.\n\n### `corpora.words.adjectives`\n\nAbout 5000 English adjectives, sorted roughly by frequency of occurrence.\n\n### `corpora.words.by_syllable_count`\n\nA map of numbers 1-8 to English words with the corresponding number of syllables.\n\n### `corpora.words.common_nouns`\n\nLists of English nouns, sorted roughly by frequency of occurrence.\n\nIncludes:\n\n* `abstract_nouns` like \"work\" and \"love\".\n* `concrete_nouns` like \"face\" and \"house\".\n* `adjectival_nouns` -- nouns that can also act as adjectives -- like \"chance\" and \"light\".\n\n### `corpora.words.common_verbs`\n\nLists of English verbs, sorted roughly by frequency of occurrence.\n\n* `present_tense` verbs like \"get\" and \"want\".\n* `past_tense` verbs like \"said\" and \"found\".\n* `gerund` forms like \"holding\" and \"leaving\".\n\n### `corpora.words.english_words`\n\nA consolidated list of about 73,000 English words from the FRELI\nproject. (http://www.nkuitse.com/freli/)\n\n### `corpora.words.scribblenauts`\n\nThe top 4000 nouns that were 'concrete' enough to be summonable in the\n2009 game _Scribblenauts_. As always, this list is ordered with more common\nwords towards the front.\n\n### `corpora.words.literature.board_games`\n\nInformation about board games, collected from BoardGameGeek in July\n2013. One JSON object per line.\n\nData source:\n http://boardgamegeek.com/wiki/page/BGG_XML_API2\n\n\n### `corpora.words.literature.fiction.pride_and_prejudice`\n\nThe complete text of a public domain novel (\"Pride and Prejudice\"\nby Jane Austen).\n\n### `corpora.words.literature.nonfiction.apollo_11`\n\nTranscripts of the Apollo 11 mission, presented as dialogue, tokenized\ninto sentences using NLTK's Punkt tokenizer. One JSON object per line.\n\nData sources:\n The Apollo 11 Flight Journal: http://history.nasa.gov/ap11fj/\n The Apollo 11 Surface Journal: http://history.nasa.gov/alsj/\n \"Intended to be a resource for all those interested in the Apollo\n  program, whether in a passing or scholarly capacity.\"\n\n### `corpora.words.literature.nonfiction.literary_shrines`\n\nThe complete text of a public domain nonfiction book (\"Famous Houses\nand Literary Shrines of London\" by A. St. John Adcock).\n\n### `corpora.words.literature.gutenberg_id_mapping`\n\nMaps old-style (pre-2007) Project Gutenberg filenames to the new-style\nebook IDs. For example, \"/etext95/3boat10.zip\" is mapped to the\nnumber 308 (see http://www.gutenberg.org/ebooks/308). Pretty much\nnobody needs this.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Library for artistic text generation",
    "version": "1.0.5",
    "project_urls": {
        "Homepage": "https://github.com/leonardr/olipy/"
    },
    "split_keywords": [
        "art",
        "supplies"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "357c14f6ff8fafb4bfe2f4c7774e74d653c7223c15fbadcfa047fc5f3c72f174",
                "md5": "132ccedd2b40774665c8d4497dbc9f2f",
                "sha256": "91e808a9a7851537dabbbcaf7396064b88f2f17467481c98d7917fd1ca1e3a90"
            },
            "downloads": -1,
            "filename": "olipy-1.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "132ccedd2b40774665c8d4497dbc9f2f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9.0",
            "size": 4916783,
            "upload_time": "2025-01-02T19:36:29",
            "upload_time_iso_8601": "2025-01-02T19:36:29.278023Z",
            "url": "https://files.pythonhosted.org/packages/35/7c/14f6ff8fafb4bfe2f4c7774e74d653c7223c15fbadcfa047fc5f3c72f174/olipy-1.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ed09856f83381ff707d74bbb4d1074db43ddac1994cc0f2209155686e7d1f4d9",
                "md5": "73a7a0946ca890f40a03290a6d21dcd7",
                "sha256": "d25edfd847d2362d65fb3036e74c35e33bf22059d7438942eafca713a85b433c"
            },
            "downloads": -1,
            "filename": "olipy-1.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "73a7a0946ca890f40a03290a6d21dcd7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9.0",
            "size": 4837229,
            "upload_time": "2025-01-02T19:36:31",
            "upload_time_iso_8601": "2025-01-02T19:36:31.500656Z",
            "url": "https://files.pythonhosted.org/packages/ed/09/856f83381ff707d74bbb4d1074db43ddac1994cc0f2209155686e7d1f4d9/olipy-1.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-02 19:36:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "leonardr",
    "github_project": "olipy",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "olipy"
}

None