tapyoca

Name	tapyoca JSON
Version	0.0.4 JSON
	download
home_page	https://github.com/thorwhalen/tapyoca
Summary	A medley of things that got coded because there was an itch to do so
upload_time	2025-03-01 20:08:57
maintainer	None
docs_url	None
author	thorwhalen
requires_python	None
license	Apache Software License
keywords	documentation packaging publishing
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # tapyoca
A medley of small projects


# parquet_deformations

I'm calling these [Parquet deformations](https://www.theguardian.com/artanddesign/alexs-adventures-in-numberland/2014/sep/09/crazy-paving-the-twisted-world-of-parquet-deformations#:~:text=In%20the%201960s%20an%20American,the%20regularity%20of%20the%20tiling.) but purest would lynch me. 

Really, I just wanted to transform one word into another word, gradually, as I've seen in some of [Escher's](https://en.wikipedia.org/wiki/M._C._Escher) work, so I looked it up, and saw that it's called parquet deformations. The math looked enticing, but I had no time for that, so I did the first way I could think of: Mapping pixels to pixels (in some fashion -- but nearest neighbors is the method that yields nicest results, under the pixel-level restriction). 

Of course, this can be applied to any image (that will be transformed to B/W (not even gray -- I mean actual B/W), and there's several ways you can perform the parquet (I like the gif rendering). 

The main function (exposed as a script) is `mk_deformation_image`. All you need is to specify two images (or words). If you want, of course, you can specify:
- `n_steps`: Number of steps from start to end image
- `save_to_file`: path to file to save too (if not given, will just return the image object)
- `kind`: 'gif', 'horizontal_stack', or 'vertical_stack'
- `coordinate_mapping_maker`: A function that will return the mapping between start and end. 
This function should return a pair (`from_coord`, `to_coord`) of aligned matrices whose 2 columns are the the 
`(x, y)` coordinates, and the rows represent aligned positions that should be mapped. 



## Examples

### Two words...


```python
fit_to_size = 400
start_im = image_of_text('sensor').rotate(90, expand=1)
end_im = image_of_text('meaning').rotate(90, expand=1)
start_and_end_image(start_im, end_im)
```




![png](tapyoca/parquet_deformations/img/outputs/output_5_0.png)




```python
im = mk_deformation_image(start_im, end_im, 15, kind='h').resize((500,200))
im
```




![png](tapyoca/parquet_deformations/img/outputs/output_6_0.png)




```python
im = mk_deformation_image(start_im.transpose(4), end_im.transpose(4), 5, kind='v').resize((200,200))
im
```




![png](tapyoca/parquet_deformations/img/outputs/output_7_0.png)




```python
f = 'sensor_meaning_knn.gif'
mk_deformation_image(start_im.transpose(4), end_im.transpose(4), n_steps=20, save_to_file=f)
display_gif(f)
```




<img src="sensor_meaning_knn.gif?76128495">




```python
f = 'sensor_meaning_scan.gif'
mk_deformation_image(start_im.transpose(4), end_im.transpose(4), n_steps=20, save_to_file=f, 
                     coordinate_mapping_maker='scan')
display_gif(f)
```




<img src="sensor_meaning_scan.gif?76996026">




```python
f = 'sensor_meaning_random.gif'
mk_deformation_image(start_im.transpose(4), end_im.transpose(4), n_steps=20, save_to_file=f, 
                     coordinate_mapping_maker='random')
display_gif(f)
```




<img src="sensor_meaning_random.gif?80233280">



### From a list of words


```python
start_words = ['sensor', 'vibration', 'tempature']
end_words = ['sense', 'meaning', 'detection']
start_im, end_im = make_start_and_end_images_with_words(
    start_words, end_words, perm=True, repeat=2, size=150)
start_and_end_image(start_im, end_im).resize((600, 200))
```




![png](tapyoca/parquet_deformations/img/outputs/output_12_0.png)




```python
im = mk_deformation_image(start_im, end_im, 5)
im
```




![png](tapyoca/parquet_deformations/img/outputs/output_13_0.png)




```python
f = 'bunch_of_words.gif'
mk_deformation_image(start_im, end_im, n_steps=20, save_to_file=f)
display_gif(f)
```




<img src="bunch_of_words.gif?7402792">



## From files


```python
start_im = Image.open('sensor_strip_01.png')
end_im = Image.open('sense_strip_01.png')
start_and_end_image(start_im.resize((200, 500)), end_im.resize((200, 500)))
```




![png](tapyoca/parquet_deformations/img/outputs/output_16_0.png)




```python
im = mk_deformation_image(start_im, end_im, 7)
im
```




![png](tapyoca/parquet_deformations/img/outputs/output_17_0.png)




```python
f = 'medley.gif'
mk_deformation_image(start_im, end_im, n_steps=20, save_to_file=f)
display_gif(f)
```




<img src="medley.gif?39255021">




```python
mk_deformation_image(start_im, end_im, n_steps=20, save_to_file=f, coordinate_mapping_maker='scan')
display_gif(f)
```




<img src="sensor_meaning.gif?41172115">



## an image and some text


```python
start_im = 'img/waveform_01.png'  # will first look for a file, and if not consider as text
end_im = 'makes sense'

mk_gif_of_deformations(start_im, end_im, n_steps=20, 
                               save_to_file='image_and_text.gif')
display_gif('image_and_text.gif')  
```




<img src="image_and_text.gif?92524789">






# demonys

## What do we think about other peoples?

This project is meant to get an idea of what people think of people for different nations, as seen by what they ask google about them. 

Here I use python code to acquire, clean up, and analyze the data. 

### Demonym

If you're like me and enjoy the false and fleeting impression of superiority that comes when you know a word someone else doesn't. If you're like me and go to parties for the sole purpose of seeking victims to get a one-up on, here's a cool word to add to your arsenal:

**demonym**: a noun used to denote the natives or inhabitants of a particular country, state, city, etc.
_"he struggled for the correct demonym for the people of Manchester"_

### Back-story of this analysis
 
During a discussion (about traveling in Europe) someone said "why are the swiss so miserable". Now, I wouldn't say that the swiss were especially miserable (a couple of ex-girlfriends aside), but to be fair he was contrasting with Italians, so perhaps he has a point. I apologize if you are swiss, or one of the two ex-girlfriends -- nothing personal, this is all for effect. 

We googled "why are the swiss so ", and sure enough, "why are the swiss so miserable" came up as one of the suggestions. So we got curious and started googling other peoples: the French, the Germans, etc.

That's the back-story of this analysis. This analysis is meant to get an idea of what we think of peoples from other countries. Of course, one can rightfully critique the approach I'll take to gauge "what we think" -- all three of these words should, but will not, be defined. I'm just going to see what google's *current* auto-suggest comes back with when I enter "why are the X so " (where X will be a noun that denotes the natives of inhabitants of a particular country; a *demonym* if you will). 

### Warning

Again, word of warning: All data and analyses are biased. 
Take everything you'll read here (and to be fair, what you read anywhere) with a grain of salt. 
For simplicitly I'll saying things like "what we think of..." or "who do we most...", etc.
But I don't **really** mean that.

### Resources

* http://www.geography-site.co.uk/pages/countries/demonyms.html for my list of demonyms.
* google for my suggestion engine, using the url prefix: `http://suggestqueries.google.com/complete/search?client=chrome&q=`


## The results

### In a nutshell

Below is listed 73 demonyms along with words extracted from the very first google suggestion when you type. 

`why are the DEMONYM so `

```text
afghan    	                eyes beautiful
albanian  	                     beautiful
american  	          girl dolls expensive
australian	                          tall
belgian   	                    fries good
bhutanese 	                         happy
brazilian 	              good at football
british   	     full of grief and despair
bulgarian 	              properties cheap
burmese   	             cats affectionate
cambodian 	                   cows skinny
canadian  	                          nice
chinese   	                       healthy
colombian 	                  avocados big
cuban     	                   cigars good
czech     	                          tall
dominican 	  republic and haiti different
egyptian  	                gods important
english   	                      reserved
eritrean  	                     beautiful
ethiopian 	                     beautiful
filipino  	                         proud
finn      	               shoes expensive
french    	                       healthy
german    	                          tall
greek     	                gods messed up
haitian   	                parents strict
hungarian 	                    words long
indian    	            tv debates chaotic
indonesian	                         smart
iranian   	                     beautiful
israeli   	           startups successful
italian   	                         short
jamaican  	                sprinters fast
japanese  	                        polite
kenyan    	                  runners good
lebanese  	                          rich
malagasy  	                    names long
malaysian 	                   drivers bad
maltese   	                          rude
mongolian 	                  horses small
moroccan  	                rugs expensive
nepalese  	                     beautiful
nigerian  	                          tall
north korean	                      hats big
norwegian 	                 flights cheap
pakistani 	                          fair
peruvian  	               blueberries big
pole      	                  vaulters hot
portuguese	                         short
puerto rican	       and cuban flags similar
romanian  	                     beautiful
russian   	                  good at math
samoan    	                           big
saudi     	                      arrogant
scottish  	                        bitter
senegalese	                          tall
serbian   	                          tall
singaporean	                          rude
somali    	                parents strict
south african	                     plugs big
south korean	                          tall
sri lankan	                          dark
sudanese  	                          tall
swiss     	        good at making watches
syrian    	                families large
taiwanese 	                        pretty
thai      	                        pretty
tongan    	                           big
ukrainian 	                     beautiful
vietnamese	        fiercely nationalistic
welsh     	                          dark
zambian   	                emeralds cheap
```


Notes:
* The queries actually have a space after the "so", which matters so as to omit suggestions containing words that start with so.
* Only the tail of the suggestion is shown -- minus prefix (`why are the DEMONYM` or `why are DEMONYM`) as well as the `so`, where ever it lands in the suggestion. 
For example, the first suggestion for the american demonym was "why are american dolls so expensive", which results in the "dolls expensive" association. 


### Who do we most talk/ask about?

The original list contained 217 demonyms, but many of these yielded no suggestions (to the specific query format I used, that is). 
Only 73 demonyms gave me at least one suggestion. 
But within those, number of suggestions range between 1 and 20 (which is probably the default maximum number of suggestions for the API I used). 
So, pretending that the number of suggestions is an indicator of how much we have to say, or how many different opinions we have, of each of the covered nationalities, 
here's the top 15 demonyms people talk about, with the corresponding number of suggestions 
(proxy for "the number of different things people ask about the said nationality). 

```text
french         20
singaporean    20
german         20
british        20
swiss          20
english        19
italian        18
cuban          18
canadian       18
welsh          18
australian     17
maltese        16
american       16
japanese       14
scottish       14
```

### Who do we least talk/ask about?

Conversely, here are the 19 demonyms that came back with only one suggestion.

```text
somali          1
bhutanese       1
syrian          1
tongan          1
cambodian       1
malagasy        1
saudi           1
serbian         1
czech           1
eritrean        1
finn            1
puerto rican    1
pole            1
haitian         1
hungarian       1
peruvian        1
moroccan        1
mongolian       1
zambian         1
```

### What do we think about people?

Why are the French so...

How would you (if you're (un)lucky enough to know the French) finish this sentence?
You might even have several opinions about the French, and any other group of people you've rubbed shoulders with.
What words would your palette contain to describe different nationalities?
What words would others (at least those that ask questions to google) use?

Well, here's what my auto-suggest search gave me. A set of 357 unique words and expressions to describe the 72 nationalities. 
So a long tail of words use only for one nationality. But some words occur for more than one nationality. 
Here are the top 12 words/expressions used to describe people of the world. 

```text
beautiful         11
tall              11
short              9
names long         8
proud              8
parents strict     8
smart              8
nice               7
boring             6
rich               5
dark               5
successful         5
```

### Who is beautiful? Who is tall? Who is short? Who is smart?

```text
beautiful      : albanian, eritrean, ethiopian, filipino, iranian, lebanese, nepalese, pakistani, romanian, ukrainian, vietnamese
tall           : australian, czech, german, nigerian, pakistani, samoan, senegalese, serbian, south korean, sudanese, taiwanese
short          : filipino, indonesian, italian, maltese, nepalese, pakistani, portuguese, singaporean, welsh
names long     : indian, malagasy, nigerian, portuguese, russian, sri lankan, thai, welsh
proud          : albanian, ethiopian, filipino, iranian, lebanese, portuguese, scottish, welsh
parents strict : albanian, ethiopian, haitian, indian, lebanese, pakistani, somali, sri lankan
smart          : indonesian, iranian, lebanese, pakistani, romanian, singaporean, taiwanese, vietnamese
nice           : canadian, english, filipino, nepalese, portuguese, taiwanese, thai
boring         : british, english, french, german, singaporean, swiss
rich           : lebanese, pakistani, singaporean, taiwanese, vietnamese
dark           : filipino, senegalese, sri lankan, vietnamese, welsh
successful     : chinese, english, japanese, lebanese, swiss
```

## How did I do it?

I scraped a list of (country, demonym) pairs from a table in http://www.geography-site.co.uk/pages/countries/demonyms.html.

Then I diagnosed these and manually made a mapping to simplify some "complex" entries, 
such as mapping an entry such as "Irishman or Irishwoman or Irish" to "Irish".

Using the google suggest API (http://suggestqueries.google.com/complete/search?client=chrome&q=), I requested what the suggestions 
for `why are the $demonym so ` query pattern, for `$demonym` running through all 217 demonyms from the list above, 
storing the results for each if the results were non-empty. 

Then, it was just a matter of pulling this data into memory, formatting it a bit, and creating a pandas dataframe that I could then interrogate.
 
## Resources you can find here

The code to do this analysis yourself, from scratch here: `data_acquisition.py`.

The jupyter notebook I actually used when I developed this: `01 - Demonyms and adjectives - why are the french so....ipynb`
 
Note you'll need to pip install py2store if you haven't already.

In the `data` folder you'll find
* country_demonym.p: A pickle of a dataframe of countries and corresponding demonyms
* country_demonym.xlsx: The same as above, but in excel form
* demonym_suggested_characteristics.p: A pickle of 73 demonyms and auto-suggestion information, including characteristics. 
* what_we_think_about_demonyns.xlsx: An excel containing various statistics about demonyms and their (perceived) characteristics
 





# Agglutinations

Inspired from a [tweet](https://twitter.com/raymondh/status/1311003482531401729) from Raymond Hettinger this morning:

_Resist the urge to elide the underscore in multiword function or method names_

So I wondered...

## Gluglus

The gluglu of a word is the number of partitions you can make of that word into words (of length at least 2 (so no using a or i)).
(No "gluglu" isn't an actual term -- unless everyone starts using it from now on. 
But it was inspired from an actual [linguistic term](https://en.wikipedia.org/wiki/Agglutination).)

For example, the gluglu of ``newspaper`` is 4:

```
newspaper
    new spa per
    news pa per
    news paper
```

Every (valid) word has gluglu at least 1.


## How many standard library names have gluglus at last 2?

108

Here's [the list](https://github.com/thorwhalen/tapyoca/blob/master/tapyoca/agglutination/standard_lib_gluglus.txt) of all of them.

The winner has a gluglu of 6 (not 7 because formatannotationrelativeto isn't in the dictionary)

```
formatannotationrelativeto
	for mat an not at ion relative to
	for mat annotation relative to
	form at an not at ion relative to
	form at annotation relative to
	format an not at ion relative to
	format annotation relative to
```

## Details

### Dictionary

Really it depends on what dictionary we use. 
Here, I used a very conservative one. 
The intersection of two lists: The [corncob](http://www.mieliestronk.com/corncob_lowercase.txt) 
and the [google10000](https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english-usa.txt) word lists.
Additionally, I only kept of those, those that had at least 2 letters, and had only letters (no hyphens or disturbing diacritics).

Diacritics. Look it up. Impress your next nerd date.

Im left with 8116 words. You can find them [here](https://github.com/thorwhalen/tapyoca/blob/master/tapyoca/agglutination/words_8116.csv).

### Standard Lib Names

Surprisingly, that was the hardest part. I know I'm missing some, but that's enough rabbit-holing. 

What I did (modulo some exceptions I won't look into) was to walk the standard lib modules (even that list wasn't a given!) 
extracting (recursively( the names of any (non-underscored) attributes if they were modules or callables, 
as well as extracting the arguments of these callables (when they had signatures).

You can find the code I used to extract these names [here](https://github.com/thorwhalen/tapyoca/blob/master/tapyoca/agglutination/py_names.py) 
and the actual list [there](https://github.com/thorwhalen/tapyoca/blob/master/tapyoca/agglutination/standard_lib_module_names.csv).



# covid

## Bar Chart Races (applied to covid-19 spread)

The module will show is how to make these:
- Confirmed cases (by country): https://public.flourish.studio/visualisation/1704821/
- Deaths (by country): https://public.flourish.studio/visualisation/1705644/
- US Confirmed cases (by state): https://public.flourish.studio/visualisation/1794768/
- US Deaths (by state): https://public.flourish.studio/visualisation/1794797/

### The script

If you just want to run this as a script to get the job done, you have one here: 
https://raw.githubusercontent.com/thorwhalen/tapyoca/master/covid/covid_bar_chart_race.py

Run like this
```
$ python covid_bar_chart_race.py -h
usage: covid_bar_chart_race.py [-h] {mk-and-save-covid-data,update-covid-data,instructions-to-make-bar-chart-race} ...

positional arguments:
  {mk-and-save-covid-data,update-covid-data,instructions-to-make-bar-chart-race}
    mk-and-save-covid-data
                        :param data_sources: Dirpath or py2store Store where the data is :param kinds: The kinds of data you want to compute and save :param
                        skip_first_days: :param verbose: :return:
    update-covid-data   update the coronavirus data
    instructions-to-make-bar-chart-race

optional arguments:
  -h, --help            show this help message and exit
 ```
 
 
### The jupyter notebook

The notebook (the .ipynb file) shows you how to do it step by step in case you want to reuse the methods for other stuff.



## Getting and preparing the data

Corona virus data here: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset (direct download: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset/download). It's currently updated daily, so download a fresh copy if you want.

Population data here: http://api.worldbank.org/v2/en/indicator/SP.POP.TOTL?downloadformat=csv

It comes under the form of a zip file (currently named `novel-corona-virus-2019-dataset.zip` with several `.csv` files in them. We use `py2store` (To install: `pip install py2store`. Project lives here: https://github.com/i2mint/py2store) to access and pre-prepare it. It allows us to not have to unzip the file and replace the older folder with it every time we download a new one. It also gives us the csvs as `pandas.DataFrame` already. 


```python
import pandas as pd
from io import BytesIO
from py2store import kv_wrap, ZipReader  # google it and pip install it
from py2store.caching import mk_cached_store
from py2store import QuickPickleStore
from py2store.sources import FuncReader

def country_flag_image_url():
    import pandas as pd
    return pd.read_csv(
        'https://raw.githubusercontent.com/i2mint/examples/master/data/country_flag_image_url.csv')

def kaggle_coronavirus_dataset():
    import kaggle
    from io import BytesIO
    # didn't find the pure binary download function, so using temp dir to emulate
    from tempfile import mkdtemp  
    download_dir = mkdtemp()
    filename = 'novel-corona-virus-2019-dataset.zip'
    zip_file = os.path.join(download_dir, filename)
    
    dataset = 'sudalairajkumar/novel-corona-virus-2019-dataset'
    kaggle.api.dataset_download_files(dataset, download_dir)
    with open(zip_file, 'rb') as fp:
        b = fp.read()
    return BytesIO(b)

def city_population_in_time():
    import pandas as pd
    return pd.read_csv(
        'https://gist.githubusercontent.com/johnburnmurdoch/'
        '4199dbe55095c3e13de8d5b2e5e5307a/raw/fa018b25c24b7b5f47fd0568937ff6c04e384786/city_populations'
    )

def country_flag_image_url_prep(df: pd.DataFrame):
    # delete the region col (we don't need it)
    del df['region']
    # rewriting a few (not all) of the country names to match those found in kaggle covid data
    # Note: The list is not complete! Add to it as needed
    old_and_new = [('USA', 'US'), 
                   ('Iran, Islamic Rep.', 'Iran'), 
                   ('UK', 'United Kingdom'), 
                   ('Korea, Rep.', 'Korea, South')]
    for old, new in old_and_new:
        df['country'] = df['country'].replace(old, new)

    return df


@kv_wrap.outcoming_vals(lambda x: pd.read_csv(BytesIO(x)))  # this is to format the data as a dataframe
class ZippedCsvs(ZipReader):
    pass
# equivalent to ZippedCsvs = kv_wrap.outcoming_vals(lambda x: pd.read_csv(BytesIO(x)))(ZipReader)
```


```python
# Enter here the place you want to cache your data
my_local_cache = os.path.expanduser('~/ddir/my_sources')
```


```python
CachedFuncReader = mk_cached_store(FuncReader, QuickPickleStore(my_local_cache))
```


```python
data_sources = CachedFuncReader([country_flag_image_url, 
                                 kaggle_coronavirus_dataset, 
                                 city_population_in_time])
list(data_sources)
```




    ['country_flag_image_url',
     'kaggle_coronavirus_dataset',
     'city_population_in_time']




```python
covid_datasets = ZippedCsvs(data_sources['kaggle_coronavirus_dataset'])
list(covid_datasets)
```




    ['COVID19_line_list_data.csv',
     'COVID19_open_line_list.csv',
     'covid_19_data.csv',
     'time_series_covid_19_confirmed.csv',
     'time_series_covid_19_confirmed_US.csv',
     'time_series_covid_19_deaths.csv',
     'time_series_covid_19_deaths_US.csv',
     'time_series_covid_19_recovered.csv']




```python
covid_datasets['time_series_covid_19_confirmed.csv'].head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Province/State</th>
      <th>Country/Region</th>
      <th>Lat</th>
      <th>Long</th>
      <th>1/22/20</th>
      <th>1/23/20</th>
      <th>1/24/20</th>
      <th>1/25/20</th>
      <th>1/26/20</th>
      <th>1/27/20</th>
      <th>...</th>
      <th>3/24/20</th>
      <th>3/25/20</th>
      <th>3/26/20</th>
      <th>3/27/20</th>
      <th>3/28/20</th>
      <th>3/29/20</th>
      <th>3/30/20</th>
      <th>3/31/20</th>
      <th>4/1/20</th>
      <th>4/2/20</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>NaN</td>
      <td>Afghanistan</td>
      <td>33.0000</td>
      <td>65.0000</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>...</td>
      <td>74</td>
      <td>84</td>
      <td>94</td>
      <td>110</td>
      <td>110</td>
      <td>120</td>
      <td>170</td>
      <td>174</td>
      <td>237</td>
      <td>273</td>
    </tr>
    <tr>
      <th>1</th>
      <td>NaN</td>
      <td>Albania</td>
      <td>41.1533</td>
      <td>20.1683</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>...</td>
      <td>123</td>
      <td>146</td>
      <td>174</td>
      <td>186</td>
      <td>197</td>
      <td>212</td>
      <td>223</td>
      <td>243</td>
      <td>259</td>
      <td>277</td>
    </tr>
    <tr>
      <th>2</th>
      <td>NaN</td>
      <td>Algeria</td>
      <td>28.0339</td>
      <td>1.6596</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>...</td>
      <td>264</td>
      <td>302</td>
      <td>367</td>
      <td>409</td>
      <td>454</td>
      <td>511</td>
      <td>584</td>
      <td>716</td>
      <td>847</td>
      <td>986</td>
    </tr>
    <tr>
      <th>3</th>
      <td>NaN</td>
      <td>Andorra</td>
      <td>42.5063</td>
      <td>1.5218</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>...</td>
      <td>164</td>
      <td>188</td>
      <td>224</td>
      <td>267</td>
      <td>308</td>
      <td>334</td>
      <td>370</td>
      <td>376</td>
      <td>390</td>
      <td>428</td>
    </tr>
    <tr>
      <th>4</th>
      <td>NaN</td>
      <td>Angola</td>
      <td>-11.2027</td>
      <td>17.8739</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>...</td>
      <td>3</td>
      <td>3</td>
      <td>4</td>
      <td>4</td>
      <td>5</td>
      <td>7</td>
      <td>7</td>
      <td>7</td>
      <td>8</td>
      <td>8</td>
    </tr>
  </tbody>
</table>
<p>5 rows × 76 columns</p>
</div>




```python
country_flag_image_url = data_sources['country_flag_image_url']
country_flag_image_url.head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>country</th>
      <th>region</th>
      <th>flag_image_url</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Angola</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/ao/flat/64.png</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Burundi</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bi/flat/64.png</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Benin</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bj/flat/64.png</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Burkina Faso</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bf/flat/64.png</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Botswana</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bw/flat/64.png</td>
    </tr>
  </tbody>
</table>
</div>




```python
from IPython.display import Image
flag_image_url_of_country = country_flag_image_url.set_index('country')['flag_image_url']
Image(url=flag_image_url_of_country['Australia'])
```




<img src="https://www.countryflags.io/au/flat/64.png"/>



### Update coronavirus data


```python
# To update the coronavirus data:
def update_covid_data(data_sources):
    """update the coronavirus data"""
    if 'kaggle_coronavirus_dataset' in data_sources._caching_store:
        del data_sources._caching_store['kaggle_coronavirus_dataset']  # delete the cached item
    _ = data_sources['kaggle_coronavirus_dataset']

# update_covid_data(data_sources)  # uncomment here when you want to update
```

### Prepare data for flourish upload


```python
import re

def print_if_verbose(verbose, *args, **kwargs):
    if verbose:
        print(*args, **kwargs)
        
def country_data_for_data_kind(data_sources, kind='confirmed', skip_first_days=0, verbose=False):
    """kind can be 'confirmed', 'deaths', 'confirmed_US', 'confirmed_US', 'recovered'"""
    
    covid_datasets = ZippedCsvs(data_sources['kaggle_coronavirus_dataset'])
    
    df = covid_datasets[f'time_series_covid_19_{kind}.csv']
    # df = s['time_series_covid_19_deaths.csv']
    if 'Province/State' in df.columns:
        df.loc[df['Province/State'].isna(), 'Province/State'] = 'n/a'  # to avoid problems arising from NaNs

    print_if_verbose(verbose, f"Before data shape: {df.shape}")

    # drop some columns we don't need
    p = re.compile('\d+/\d+/\d+')

    assert all(isinstance(x, str) for x in df.columns)
    date_cols = [x for x in df.columns if p.match(x)]
    if not kind.endswith('US'):
        df = df.loc[:, ['Country/Region'] + date_cols]
        # group countries and sum up the contributions of their states/regions/pargs
        df['country'] = df.pop('Country/Region')
        df = df.groupby('country').sum()
    else:
        df = df.loc[:, ['Province_State'] + date_cols]
        df['state'] = df.pop('Province_State')
        df = df.groupby('state').sum()

    
    print_if_verbose(verbose, f"After data shape: {df.shape}")
    df = df.iloc[:, skip_first_days:]
    
    if not kind.endswith('US'):
        # Joining with the country image urls and saving as an xls
        country_image_url = country_flag_image_url_prep(data_sources['country_flag_image_url'])
        t = df.copy()
        t.columns = [str(x)[:10] for x in t.columns]
        t = t.reset_index(drop=False)
        t = country_image_url.merge(t, how='outer')
        t = t.set_index('country')
        df = t
    else:    
        pass

    return df


def mk_and_save_country_data_for_data_kind(data_sources, kind='confirmed', skip_first_days=0, verbose=False):
    t = country_data_for_data_kind(data_sources, kind, skip_first_days, verbose)
    filepath = f'country_covid_{kind}.xlsx'
    t.to_excel(filepath)
    print_if_verbose(verbose, f"Was saved here: {filepath}")

```


```python
# for kind in ['confirmed', 'deaths', 'recovered', 'confirmed_US', 'deaths_US']:
for kind in ['confirmed', 'deaths', 'recovered', 'confirmed_US', 'deaths_US']:
    mk_and_save_country_data_for_data_kind(data_sources, kind=kind, skip_first_days=39, verbose=True)
```

    Before data shape: (262, 79)
    After data shape: (183, 75)
    Was saved here: country_covid_confirmed.xlsx
    Before data shape: (262, 79)
    After data shape: (183, 75)
    Was saved here: country_covid_deaths.xlsx
    Before data shape: (248, 79)
    After data shape: (183, 75)
    Was saved here: country_covid_recovered.xlsx
    Before data shape: (3253, 86)
    After data shape: (58, 75)
    Was saved here: country_covid_confirmed_US.xlsx
    Before data shape: (3253, 87)
    After data shape: (58, 75)
    Was saved here: country_covid_deaths_US.xlsx


### Upload to Flourish, tune, and publish

Go to https://public.flourish.studio/, get a free account, and play.

Got to https://app.flourish.studio/templates

Choose "Bar chart race". At the time of writing this, it was here: https://app.flourish.studio/visualisation/1706060/

... and then play with the settings


## Discussion of the methods


```python
from py2store import *
from IPython.display import Image
```

### country flags images

The manual data prep looks something like this.


```python
import pandas as pd

# get the csv data from the url
country_image_url_source = \
    'https://raw.githubusercontent.com/i2mint/examples/master/data/country_flag_image_url.csv'
country_image_url = pd.read_csv(country_image_url_source)

# delete the region col (we don't need it)
del country_image_url['region']

# rewriting a few (not all) of the country names to match those found in kaggle covid data
# Note: The list is not complete! Add to it as needed
# TODO: (Wishful) Using a general smart soft-matching algorithm to do this automatically.
# TODO:    This could use edit-distance, synonyms, acronym generation, etc.
old_and_new = [('USA', 'US'), 
               ('Iran, Islamic Rep.', 'Iran'), 
               ('UK', 'United Kingdom'), 
               ('Korea, Rep.', 'Korea, South')]
for old, new in old_and_new:
    country_image_url['country'] = country_image_url['country'].replace(old, new)

image_url_of_country = country_image_url.set_index('country')['flag_image_url']

country_image_url.head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>country</th>
      <th>flag_image_url</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Angola</td>
      <td>https://www.countryflags.io/ao/flat/64.png</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Burundi</td>
      <td>https://www.countryflags.io/bi/flat/64.png</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Benin</td>
      <td>https://www.countryflags.io/bj/flat/64.png</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Burkina Faso</td>
      <td>https://www.countryflags.io/bf/flat/64.png</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Botswana</td>
      <td>https://www.countryflags.io/bw/flat/64.png</td>
    </tr>
  </tbody>
</table>
</div>




```python
Image(url=image_url_of_country['Australia'])
```




<img src="https://www.countryflags.io/au/flat/64.png"/>



### Caching the flag images data

Downloading our data sources every time we need them is not sustainable. What if they're big? What if you're offline or have slow internet (yes, dear future reader, even in the US, during coronavirus times!)?

Caching. A "cache aside" read-cache. That's the word. py2store has tools for that (most of which are are caching.py). 

So let's say we're going to have a local folder where we'll store various datas we download. The principle is as follows:


```python
from py2store.caching import mk_cached_store

class TheSource(dict): ...
the_cache = {}
TheCacheSource = mk_cached_store(TheSource, the_cache)

the_source = TheSource({'green': 'eggs', 'and': 'ham'})

the_cached_source = TheCacheSource(the_source)
print(f"the_cache: {the_cache}")
print(f"Getting green...")
the_cached_source['green']
print(f"the_cache: {the_cache}")
print("... so the next time the_cached_source will get it's green from that the_cache")
```

    the_cache: {}
    Getting green...
    the_cache: {'green': 'eggs'}
    ... so the next time the_cached_source will get it's green from that the_cache


But now, you'll notice a slight problem ahead. What exactly does our source store (or rather reader) looks like? In it's raw form it would take urls as it's keys, and the response of a request as it's value. That store wouldn't have an `__iter__` for sure (unless you're Google). But more to the point here, the `mk_cached_store` tool uses the same key for the source and the cache, and we can't just use the url as is, to be a local file path. 

There's many ways we could solve this. One way is to add a key map layer on the cache store, so externally, it speaks the url key language, but internally it will map that url to a valid local file path. We've been there, we got the T-shirt!

But what we're going to do is a bit different: We're going to do the key mapping in the source store itself. It seems to make more sense in our context: We have a data source of `name: data` pairs, and if we impose that the name should be a valid file name, we don't need to have a key map in the cache store.

So let's start by building this `MyDataStore` store. We'll start by defining the functions that get us the data we want. 


```python
def country_flag_image_url():
    import pandas as pd
    return pd.read_csv(
        'https://raw.githubusercontent.com/i2mint/examples/master/data/country_flag_image_url.csv')

def kaggle_coronavirus_dataset():
    import kaggle
    from io import BytesIO
    # didn't find the pure binary download function, so using temp dir to emulate
    from tempfile import mkdtemp  
    download_dir = mkdtemp()
    filename = 'novel-corona-virus-2019-dataset.zip'
    zip_file = os.path.join(download_dir, filename)
    
    dataset = 'sudalairajkumar/novel-corona-virus-2019-dataset'
    kaggle.api.dataset_download_files(dataset, download_dir)
    with open(zip_file, 'rb') as fp:
        b = fp.read()
    return BytesIO(b)

def city_population_in_time():
    import pandas as pd
    return pd.read_csv(
        'https://gist.githubusercontent.com/johnburnmurdoch/'
        '4199dbe55095c3e13de8d5b2e5e5307a/raw/fa018b25c24b7b5f47fd0568937ff6c04e384786/city_populations'
    )
```

Now we can make a store that simply uses these function names as the keys, and their returned value as the values.


```python
from py2store.base import KvReader
from functools import lru_cache

class FuncReader(KvReader):
    _getitem_cache_size = 999
    def __init__(self, funcs):
        # TODO: assert no free arguments (arguments are allowed but must all have defaults)
        self.funcs = funcs
        self._func_of_name = {func.__name__: func for func in funcs}

    def __contains__(self, k):
        return k in self._func_of_name
    
    def __iter__(self):
        yield from self._func_of_name
        
    def __len__(self):
        return len(self._func_of_name)

    @lru_cache(maxsize=_getitem_cache_size)
    def __getitem__(self, k):
        return self._func_of_name[k]()  # call the func
    
    def __hash__(self):
        return 1
    
```


```python
data_sources = FuncReader([country_flag_image_url, kaggle_coronavirus_dataset, city_population_in_time])
list(data_sources)
```




    ['country_flag_image_url',
     'kaggle_coronavirus_dataset',
     'city_population_in_time']




```python
data_sources['country_flag_image_url']
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>country</th>
      <th>region</th>
      <th>flag_image_url</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Angola</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/ao/flat/64.png</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Burundi</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bi/flat/64.png</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Benin</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bj/flat/64.png</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Burkina Faso</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bf/flat/64.png</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Botswana</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bw/flat/64.png</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>210</th>
      <td>Solomon Islands</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/sb/flat/64.png</td>
    </tr>
    <tr>
      <th>211</th>
      <td>Tonga</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/to/flat/64.png</td>
    </tr>
    <tr>
      <th>212</th>
      <td>Tuvalu</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/tv/flat/64.png</td>
    </tr>
    <tr>
      <th>213</th>
      <td>Vanuatu</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/vu/flat/64.png</td>
    </tr>
    <tr>
      <th>214</th>
      <td>Samoa</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/ws/flat/64.png</td>
    </tr>
  </tbody>
</table>
<p>215 rows × 3 columns</p>
</div>




```python
data_sources['country_flag_image_url']
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>country</th>
      <th>region</th>
      <th>flag_image_url</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Angola</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/ao/flat/64.png</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Burundi</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bi/flat/64.png</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Benin</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bj/flat/64.png</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Burkina Faso</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bf/flat/64.png</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Botswana</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bw/flat/64.png</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>210</th>
      <td>Solomon Islands</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/sb/flat/64.png</td>
    </tr>
    <tr>
      <th>211</th>
      <td>Tonga</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/to/flat/64.png</td>
    </tr>
    <tr>
      <th>212</th>
      <td>Tuvalu</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/tv/flat/64.png</td>
    </tr>
    <tr>
      <th>213</th>
      <td>Vanuatu</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/vu/flat/64.png</td>
    </tr>
    <tr>
      <th>214</th>
      <td>Samoa</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/ws/flat/64.png</td>
    </tr>
  </tbody>
</table>
<p>215 rows × 3 columns</p>
</div>




```python
data_sources['city_population_in_time']
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>name</th>
      <th>group</th>
      <th>year</th>
      <th>value</th>
      <th>subGroup</th>
      <th>city_id</th>
      <th>lastValue</th>
      <th>lat</th>
      <th>lon</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Agra</td>
      <td>India</td>
      <td>1575</td>
      <td>200.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>200.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Agra</td>
      <td>India</td>
      <td>1576</td>
      <td>212.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>200.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Agra</td>
      <td>India</td>
      <td>1577</td>
      <td>224.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>212.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Agra</td>
      <td>India</td>
      <td>1578</td>
      <td>236.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>224.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Agra</td>
      <td>India</td>
      <td>1579</td>
      <td>248.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>236.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>6247</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1561</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
    <tr>
      <th>6248</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1562</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
    <tr>
      <th>6249</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1563</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
    <tr>
      <th>6250</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1564</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
    <tr>
      <th>6251</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1565</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
  </tbody>
</table>
<p>6252 rows × 9 columns</p>
</div>



But we wanted this all to be cached locally, right? So a few more lines to do that!


```python
from py2store.caching import mk_cached_store
from py2store import QuickPickleStore
    
my_local_cache = os.path.expanduser('~/ddir/my_sources')

CachedFuncReader = mk_cached_store(FuncReader, QuickPickleStore(my_local_cache))
```


```python
data_sources = CachedFuncReader([country_flag_image_url, kaggle_coronavirus_dataset, city_population_in_time])
list(data_sources)
```




    ['country_flag_image_url',
     'kaggle_coronavirus_dataset',
     'city_population_in_time']




```python
data_sources['country_flag_image_url']
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>country</th>
      <th>region</th>
      <th>flag_image_url</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Angola</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/ao/flat/64.png</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Burundi</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bi/flat/64.png</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Benin</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bj/flat/64.png</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Burkina Faso</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bf/flat/64.png</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Botswana</td>
      <td>Africa</td>
      <td>https://www.countryflags.io/bw/flat/64.png</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>210</th>
      <td>Solomon Islands</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/sb/flat/64.png</td>
    </tr>
    <tr>
      <th>211</th>
      <td>Tonga</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/to/flat/64.png</td>
    </tr>
    <tr>
      <th>212</th>
      <td>Tuvalu</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/tv/flat/64.png</td>
    </tr>
    <tr>
      <th>213</th>
      <td>Vanuatu</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/vu/flat/64.png</td>
    </tr>
    <tr>
      <th>214</th>
      <td>Samoa</td>
      <td>Oceania</td>
      <td>https://www.countryflags.io/ws/flat/64.png</td>
    </tr>
  </tbody>
</table>
<p>215 rows × 3 columns</p>
</div>




```python
data_sources['city_population_in_time']
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>name</th>
      <th>group</th>
      <th>year</th>
      <th>value</th>
      <th>subGroup</th>
      <th>city_id</th>
      <th>lastValue</th>
      <th>lat</th>
      <th>lon</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Agra</td>
      <td>India</td>
      <td>1575</td>
      <td>200.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>200.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>1</th>
      <td>Agra</td>
      <td>India</td>
      <td>1576</td>
      <td>212.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>200.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>2</th>
      <td>Agra</td>
      <td>India</td>
      <td>1577</td>
      <td>224.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>212.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>3</th>
      <td>Agra</td>
      <td>India</td>
      <td>1578</td>
      <td>236.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>224.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>4</th>
      <td>Agra</td>
      <td>India</td>
      <td>1579</td>
      <td>248.0</td>
      <td>India</td>
      <td>Agra - India</td>
      <td>236.0</td>
      <td>27.18333</td>
      <td>78.01667</td>
    </tr>
    <tr>
      <th>...</th>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
      <td>...</td>
    </tr>
    <tr>
      <th>6247</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1561</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
    <tr>
      <th>6248</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1562</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
    <tr>
      <th>6249</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1563</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
    <tr>
      <th>6250</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1564</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
    <tr>
      <th>6251</th>
      <td>Vijayanagar</td>
      <td>India</td>
      <td>1565</td>
      <td>480.0</td>
      <td>India</td>
      <td>Vijayanagar - India</td>
      <td>480.0</td>
      <td>15.33500</td>
      <td>76.46200</td>
    </tr>
  </tbody>
</table>
<p>6252 rows × 9 columns</p>
</div>




```python
z = ZippedCsvs(data_sources['kaggle_coronavirus_dataset'])
list(z)
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/thorwhalen/tapyoca",
    "name": "tapyoca",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "documentation, packaging, publishing",
    "author": "thorwhalen",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/71/c7/dd42003e26f7b25a89704acf9be5042ad68ff63f5a459222a26fc1840697/tapyoca-0.0.4.tar.gz",
    "platform": "any",
    "description": "# tapyoca\nA medley of small projects\n\n\n# parquet_deformations\n\nI'm calling these [Parquet deformations](https://www.theguardian.com/artanddesign/alexs-adventures-in-numberland/2014/sep/09/crazy-paving-the-twisted-world-of-parquet-deformations#:~:text=In%20the%201960s%20an%20American,the%20regularity%20of%20the%20tiling.) but purest would lynch me. \n\nReally, I just wanted to transform one word into another word, gradually, as I've seen in some of [Escher's](https://en.wikipedia.org/wiki/M._C._Escher) work, so I looked it up, and saw that it's called parquet deformations. The math looked enticing, but I had no time for that, so I did the first way I could think of: Mapping pixels to pixels (in some fashion -- but nearest neighbors is the method that yields nicest results, under the pixel-level restriction). \n\nOf course, this can be applied to any image (that will be transformed to B/W (not even gray -- I mean actual B/W), and there's several ways you can perform the parquet (I like the gif rendering). \n\nThe main function (exposed as a script) is `mk_deformation_image`. All you need is to specify two images (or words). If you want, of course, you can specify:\n- `n_steps`: Number of steps from start to end image\n- `save_to_file`: path to file to save too (if not given, will just return the image object)\n- `kind`: 'gif', 'horizontal_stack', or 'vertical_stack'\n- `coordinate_mapping_maker`: A function that will return the mapping between start and end. \nThis function should return a pair (`from_coord`, `to_coord`) of aligned matrices whose 2 columns are the the \n`(x, y)` coordinates, and the rows represent aligned positions that should be mapped. \n\n\n\n## Examples\n\n### Two words...\n\n\n```python\nfit_to_size = 400\nstart_im = image_of_text('sensor').rotate(90, expand=1)\nend_im = image_of_text('meaning').rotate(90, expand=1)\nstart_and_end_image(start_im, end_im)\n```\n\n\n\n\n![png](tapyoca/parquet_deformations/img/outputs/output_5_0.png)\n\n\n\n\n```python\nim = mk_deformation_image(start_im, end_im, 15, kind='h').resize((500,200))\nim\n```\n\n\n\n\n![png](tapyoca/parquet_deformations/img/outputs/output_6_0.png)\n\n\n\n\n```python\nim = mk_deformation_image(start_im.transpose(4), end_im.transpose(4), 5, kind='v').resize((200,200))\nim\n```\n\n\n\n\n![png](tapyoca/parquet_deformations/img/outputs/output_7_0.png)\n\n\n\n\n```python\nf = 'sensor_meaning_knn.gif'\nmk_deformation_image(start_im.transpose(4), end_im.transpose(4), n_steps=20, save_to_file=f)\ndisplay_gif(f)\n```\n\n\n\n\n<img src=\"sensor_meaning_knn.gif?76128495\">\n\n\n\n\n```python\nf = 'sensor_meaning_scan.gif'\nmk_deformation_image(start_im.transpose(4), end_im.transpose(4), n_steps=20, save_to_file=f, \n                     coordinate_mapping_maker='scan')\ndisplay_gif(f)\n```\n\n\n\n\n<img src=\"sensor_meaning_scan.gif?76996026\">\n\n\n\n\n```python\nf = 'sensor_meaning_random.gif'\nmk_deformation_image(start_im.transpose(4), end_im.transpose(4), n_steps=20, save_to_file=f, \n                     coordinate_mapping_maker='random')\ndisplay_gif(f)\n```\n\n\n\n\n<img src=\"sensor_meaning_random.gif?80233280\">\n\n\n\n### From a list of words\n\n\n```python\nstart_words = ['sensor', 'vibration', 'tempature']\nend_words = ['sense', 'meaning', 'detection']\nstart_im, end_im = make_start_and_end_images_with_words(\n    start_words, end_words, perm=True, repeat=2, size=150)\nstart_and_end_image(start_im, end_im).resize((600, 200))\n```\n\n\n\n\n![png](tapyoca/parquet_deformations/img/outputs/output_12_0.png)\n\n\n\n\n```python\nim = mk_deformation_image(start_im, end_im, 5)\nim\n```\n\n\n\n\n![png](tapyoca/parquet_deformations/img/outputs/output_13_0.png)\n\n\n\n\n```python\nf = 'bunch_of_words.gif'\nmk_deformation_image(start_im, end_im, n_steps=20, save_to_file=f)\ndisplay_gif(f)\n```\n\n\n\n\n<img src=\"bunch_of_words.gif?7402792\">\n\n\n\n## From files\n\n\n```python\nstart_im = Image.open('sensor_strip_01.png')\nend_im = Image.open('sense_strip_01.png')\nstart_and_end_image(start_im.resize((200, 500)), end_im.resize((200, 500)))\n```\n\n\n\n\n![png](tapyoca/parquet_deformations/img/outputs/output_16_0.png)\n\n\n\n\n```python\nim = mk_deformation_image(start_im, end_im, 7)\nim\n```\n\n\n\n\n![png](tapyoca/parquet_deformations/img/outputs/output_17_0.png)\n\n\n\n\n```python\nf = 'medley.gif'\nmk_deformation_image(start_im, end_im, n_steps=20, save_to_file=f)\ndisplay_gif(f)\n```\n\n\n\n\n<img src=\"medley.gif?39255021\">\n\n\n\n\n```python\nmk_deformation_image(start_im, end_im, n_steps=20, save_to_file=f, coordinate_mapping_maker='scan')\ndisplay_gif(f)\n```\n\n\n\n\n<img src=\"sensor_meaning.gif?41172115\">\n\n\n\n## an image and some text\n\n\n```python\nstart_im = 'img/waveform_01.png'  # will first look for a file, and if not consider as text\nend_im = 'makes sense'\n\nmk_gif_of_deformations(start_im, end_im, n_steps=20, \n                               save_to_file='image_and_text.gif')\ndisplay_gif('image_and_text.gif')  \n```\n\n\n\n\n<img src=\"image_and_text.gif?92524789\">\n\n\n\n\n\n\n# demonys\n\n## What do we think about other peoples?\n\nThis project is meant to get an idea of what people think of people for different nations, as seen by what they ask google about them. \n\nHere I use python code to acquire, clean up, and analyze the data. \n\n### Demonym\n\nIf you're like me and enjoy the false and fleeting impression of superiority that comes when you know a word someone else doesn't. If you're like me and go to parties for the sole purpose of seeking victims to get a one-up on, here's a cool word to add to your arsenal:\n\n**demonym**: a noun used to denote the natives or inhabitants of a particular country, state, city, etc.\n_\"he struggled for the correct demonym for the people of Manchester\"_\n\n### Back-story of this analysis\n \nDuring a discussion (about traveling in Europe) someone said \"why are the swiss so miserable\". Now, I wouldn't say that the swiss were especially miserable (a couple of ex-girlfriends aside), but to be fair he was contrasting with Italians, so perhaps he has a point. I apologize if you are swiss, or one of the two ex-girlfriends -- nothing personal, this is all for effect. \n\nWe googled \"why are the swiss so \", and sure enough, \"why are the swiss so miserable\" came up as one of the suggestions. So we got curious and started googling other peoples: the French, the Germans, etc.\n\nThat's the back-story of this analysis. This analysis is meant to get an idea of what we think of peoples from other countries. Of course, one can rightfully critique the approach I'll take to gauge \"what we think\" -- all three of these words should, but will not, be defined. I'm just going to see what google's *current* auto-suggest comes back with when I enter \"why are the X so \" (where X will be a noun that denotes the natives of inhabitants of a particular country; a *demonym* if you will). \n\n### Warning\n\nAgain, word of warning: All data and analyses are biased. \nTake everything you'll read here (and to be fair, what you read anywhere) with a grain of salt. \nFor simplicitly I'll saying things like \"what we think of...\" or \"who do we most...\", etc.\nBut I don't **really** mean that.\n\n### Resources\n\n* http://www.geography-site.co.uk/pages/countries/demonyms.html for my list of demonyms.\n* google for my suggestion engine, using the url prefix: `http://suggestqueries.google.com/complete/search?client=chrome&q=`\n\n\n## The results\n\n### In a nutshell\n\nBelow is listed 73 demonyms along with words extracted from the very first google suggestion when you type. \n\n`why are the DEMONYM so `\n\n```text\nafghan    \t                eyes beautiful\nalbanian  \t                     beautiful\namerican  \t          girl dolls expensive\naustralian\t                          tall\nbelgian   \t                    fries good\nbhutanese \t                         happy\nbrazilian \t              good at football\nbritish   \t     full of grief and despair\nbulgarian \t              properties cheap\nburmese   \t             cats affectionate\ncambodian \t                   cows skinny\ncanadian  \t                          nice\nchinese   \t                       healthy\ncolombian \t                  avocados big\ncuban     \t                   cigars good\nczech     \t                          tall\ndominican \t  republic and haiti different\negyptian  \t                gods important\nenglish   \t                      reserved\neritrean  \t                     beautiful\nethiopian \t                     beautiful\nfilipino  \t                         proud\nfinn      \t               shoes expensive\nfrench    \t                       healthy\ngerman    \t                          tall\ngreek     \t                gods messed up\nhaitian   \t                parents strict\nhungarian \t                    words long\nindian    \t            tv debates chaotic\nindonesian\t                         smart\niranian   \t                     beautiful\nisraeli   \t           startups successful\nitalian   \t                         short\njamaican  \t                sprinters fast\njapanese  \t                        polite\nkenyan    \t                  runners good\nlebanese  \t                          rich\nmalagasy  \t                    names long\nmalaysian \t                   drivers bad\nmaltese   \t                          rude\nmongolian \t                  horses small\nmoroccan  \t                rugs expensive\nnepalese  \t                     beautiful\nnigerian  \t                          tall\nnorth korean\t                      hats big\nnorwegian \t                 flights cheap\npakistani \t                          fair\nperuvian  \t               blueberries big\npole      \t                  vaulters hot\nportuguese\t                         short\npuerto rican\t       and cuban flags similar\nromanian  \t                     beautiful\nrussian   \t                  good at math\nsamoan    \t                           big\nsaudi     \t                      arrogant\nscottish  \t                        bitter\nsenegalese\t                          tall\nserbian   \t                          tall\nsingaporean\t                          rude\nsomali    \t                parents strict\nsouth african\t                     plugs big\nsouth korean\t                          tall\nsri lankan\t                          dark\nsudanese  \t                          tall\nswiss     \t        good at making watches\nsyrian    \t                families large\ntaiwanese \t                        pretty\nthai      \t                        pretty\ntongan    \t                           big\nukrainian \t                     beautiful\nvietnamese\t        fiercely nationalistic\nwelsh     \t                          dark\nzambian   \t                emeralds cheap\n```\n\n\nNotes:\n* The queries actually have a space after the \"so\", which matters so as to omit suggestions containing words that start with so.\n* Only the tail of the suggestion is shown -- minus prefix (`why are the DEMONYM` or `why are DEMONYM`) as well as the `so`, where ever it lands in the suggestion. \nFor example, the first suggestion for the american demonym was \"why are american dolls so expensive\", which results in the \"dolls expensive\" association. \n\n\n### Who do we most talk/ask about?\n\nThe original list contained 217 demonyms, but many of these yielded no suggestions (to the specific query format I used, that is). \nOnly 73 demonyms gave me at least one suggestion. \nBut within those, number of suggestions range between 1 and 20 (which is probably the default maximum number of suggestions for the API I used). \nSo, pretending that the number of suggestions is an indicator of how much we have to say, or how many different opinions we have, of each of the covered nationalities, \nhere's the top 15 demonyms people talk about, with the corresponding number of suggestions \n(proxy for \"the number of different things people ask about the said nationality). \n\n```text\nfrench         20\nsingaporean    20\ngerman         20\nbritish        20\nswiss          20\nenglish        19\nitalian        18\ncuban          18\ncanadian       18\nwelsh          18\naustralian     17\nmaltese        16\namerican       16\njapanese       14\nscottish       14\n```\n\n### Who do we least talk/ask about?\n\nConversely, here are the 19 demonyms that came back with only one suggestion.\n\n```text\nsomali          1\nbhutanese       1\nsyrian          1\ntongan          1\ncambodian       1\nmalagasy        1\nsaudi           1\nserbian         1\nczech           1\neritrean        1\nfinn            1\npuerto rican    1\npole            1\nhaitian         1\nhungarian       1\nperuvian        1\nmoroccan        1\nmongolian       1\nzambian         1\n```\n\n### What do we think about people?\n\nWhy are the French so...\n\nHow would you (if you're (un)lucky enough to know the French) finish this sentence?\nYou might even have several opinions about the French, and any other group of people you've rubbed shoulders with.\nWhat words would your palette contain to describe different nationalities?\nWhat words would others (at least those that ask questions to google) use?\n\nWell, here's what my auto-suggest search gave me. A set of 357 unique words and expressions to describe the 72 nationalities. \nSo a long tail of words use only for one nationality. But some words occur for more than one nationality. \nHere are the top 12 words/expressions used to describe people of the world. \n\n```text\nbeautiful         11\ntall              11\nshort              9\nnames long         8\nproud              8\nparents strict     8\nsmart              8\nnice               7\nboring             6\nrich               5\ndark               5\nsuccessful         5\n```\n\n### Who is beautiful? Who is tall? Who is short? Who is smart?\n\n```text\nbeautiful      : albanian, eritrean, ethiopian, filipino, iranian, lebanese, nepalese, pakistani, romanian, ukrainian, vietnamese\ntall           : australian, czech, german, nigerian, pakistani, samoan, senegalese, serbian, south korean, sudanese, taiwanese\nshort          : filipino, indonesian, italian, maltese, nepalese, pakistani, portuguese, singaporean, welsh\nnames long     : indian, malagasy, nigerian, portuguese, russian, sri lankan, thai, welsh\nproud          : albanian, ethiopian, filipino, iranian, lebanese, portuguese, scottish, welsh\nparents strict : albanian, ethiopian, haitian, indian, lebanese, pakistani, somali, sri lankan\nsmart          : indonesian, iranian, lebanese, pakistani, romanian, singaporean, taiwanese, vietnamese\nnice           : canadian, english, filipino, nepalese, portuguese, taiwanese, thai\nboring         : british, english, french, german, singaporean, swiss\nrich           : lebanese, pakistani, singaporean, taiwanese, vietnamese\ndark           : filipino, senegalese, sri lankan, vietnamese, welsh\nsuccessful     : chinese, english, japanese, lebanese, swiss\n```\n\n## How did I do it?\n\nI scraped a list of (country, demonym) pairs from a table in http://www.geography-site.co.uk/pages/countries/demonyms.html.\n\nThen I diagnosed these and manually made a mapping to simplify some \"complex\" entries, \nsuch as mapping an entry such as \"Irishman or Irishwoman or Irish\" to \"Irish\".\n\nUsing the google suggest API (http://suggestqueries.google.com/complete/search?client=chrome&q=), I requested what the suggestions \nfor `why are the $demonym so ` query pattern, for `$demonym` running through all 217 demonyms from the list above, \nstoring the results for each if the results were non-empty. \n\nThen, it was just a matter of pulling this data into memory, formatting it a bit, and creating a pandas dataframe that I could then interrogate.\n \n## Resources you can find here\n\nThe code to do this analysis yourself, from scratch here: `data_acquisition.py`.\n\nThe jupyter notebook I actually used when I developed this: `01 - Demonyms and adjectives - why are the french so....ipynb`\n \nNote you'll need to pip install py2store if you haven't already.\n\nIn the `data` folder you'll find\n* country_demonym.p: A pickle of a dataframe of countries and corresponding demonyms\n* country_demonym.xlsx: The same as above, but in excel form\n* demonym_suggested_characteristics.p: A pickle of 73 demonyms and auto-suggestion information, including characteristics. \n* what_we_think_about_demonyns.xlsx: An excel containing various statistics about demonyms and their (perceived) characteristics\n \n\n\n\n\n\n# Agglutinations\n\nInspired from a [tweet](https://twitter.com/raymondh/status/1311003482531401729) from Raymond Hettinger this morning:\n\n_Resist the urge to elide the underscore in multiword function or method names_\n\nSo I wondered...\n\n## Gluglus\n\nThe gluglu of a word is the number of partitions you can make of that word into words (of length at least 2 (so no using a or i)).\n(No \"gluglu\" isn't an actual term -- unless everyone starts using it from now on. \nBut it was inspired from an actual [linguistic term](https://en.wikipedia.org/wiki/Agglutination).)\n\nFor example, the gluglu of ``newspaper`` is 4:\n\n```\nnewspaper\n    new spa per\n    news pa per\n    news paper\n```\n\nEvery (valid) word has gluglu at least 1.\n\n\n## How many standard library names have gluglus at last 2?\n\n108\n\nHere's [the list](https://github.com/thorwhalen/tapyoca/blob/master/tapyoca/agglutination/standard_lib_gluglus.txt) of all of them.\n\nThe winner has a gluglu of 6 (not 7 because formatannotationrelativeto isn't in the dictionary)\n\n```\nformatannotationrelativeto\n\tfor mat an not at ion relative to\n\tfor mat annotation relative to\n\tform at an not at ion relative to\n\tform at annotation relative to\n\tformat an not at ion relative to\n\tformat annotation relative to\n```\n\n## Details\n\n### Dictionary\n\nReally it depends on what dictionary we use. \nHere, I used a very conservative one. \nThe intersection of two lists: The [corncob](http://www.mieliestronk.com/corncob_lowercase.txt) \nand the [google10000](https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english-usa.txt) word lists.\nAdditionally, I only kept of those, those that had at least 2 letters, and had only letters (no hyphens or disturbing diacritics).\n\nDiacritics. Look it up. Impress your next nerd date.\n\nIm left with 8116 words. You can find them [here](https://github.com/thorwhalen/tapyoca/blob/master/tapyoca/agglutination/words_8116.csv).\n\n### Standard Lib Names\n\nSurprisingly, that was the hardest part. I know I'm missing some, but that's enough rabbit-holing. \n\nWhat I did (modulo some exceptions I won't look into) was to walk the standard lib modules (even that list wasn't a given!) \nextracting (recursively( the names of any (non-underscored) attributes if they were modules or callables, \nas well as extracting the arguments of these callables (when they had signatures).\n\nYou can find the code I used to extract these names [here](https://github.com/thorwhalen/tapyoca/blob/master/tapyoca/agglutination/py_names.py) \nand the actual list [there](https://github.com/thorwhalen/tapyoca/blob/master/tapyoca/agglutination/standard_lib_module_names.csv).\n\n\n\n# covid\n\n## Bar Chart Races (applied to covid-19 spread)\n\nThe module will show is how to make these:\n- Confirmed cases (by country): https://public.flourish.studio/visualisation/1704821/\n- Deaths (by country): https://public.flourish.studio/visualisation/1705644/\n- US Confirmed cases (by state): https://public.flourish.studio/visualisation/1794768/\n- US Deaths (by state): https://public.flourish.studio/visualisation/1794797/\n\n### The script\n\nIf you just want to run this as a script to get the job done, you have one here: \nhttps://raw.githubusercontent.com/thorwhalen/tapyoca/master/covid/covid_bar_chart_race.py\n\nRun like this\n```\n$ python covid_bar_chart_race.py -h\nusage: covid_bar_chart_race.py [-h] {mk-and-save-covid-data,update-covid-data,instructions-to-make-bar-chart-race} ...\n\npositional arguments:\n  {mk-and-save-covid-data,update-covid-data,instructions-to-make-bar-chart-race}\n    mk-and-save-covid-data\n                        :param data_sources: Dirpath or py2store Store where the data is :param kinds: The kinds of data you want to compute and save :param\n                        skip_first_days: :param verbose: :return:\n    update-covid-data   update the coronavirus data\n    instructions-to-make-bar-chart-race\n\noptional arguments:\n  -h, --help            show this help message and exit\n ```\n \n \n### The jupyter notebook\n\nThe notebook (the .ipynb file) shows you how to do it step by step in case you want to reuse the methods for other stuff.\n\n\n\n## Getting and preparing the data\n\nCorona virus data here: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset (direct download: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset/download). It's currently updated daily, so download a fresh copy if you want.\n\nPopulation data here: http://api.worldbank.org/v2/en/indicator/SP.POP.TOTL?downloadformat=csv\n\nIt comes under the form of a zip file (currently named `novel-corona-virus-2019-dataset.zip` with several `.csv` files in them. We use `py2store` (To install: `pip install py2store`. Project lives here: https://github.com/i2mint/py2store) to access and pre-prepare it. It allows us to not have to unzip the file and replace the older folder with it every time we download a new one. It also gives us the csvs as `pandas.DataFrame` already. \n\n\n```python\nimport pandas as pd\nfrom io import BytesIO\nfrom py2store import kv_wrap, ZipReader  # google it and pip install it\nfrom py2store.caching import mk_cached_store\nfrom py2store import QuickPickleStore\nfrom py2store.sources import FuncReader\n\ndef country_flag_image_url():\n    import pandas as pd\n    return pd.read_csv(\n        'https://raw.githubusercontent.com/i2mint/examples/master/data/country_flag_image_url.csv')\n\ndef kaggle_coronavirus_dataset():\n    import kaggle\n    from io import BytesIO\n    # didn't find the pure binary download function, so using temp dir to emulate\n    from tempfile import mkdtemp  \n    download_dir = mkdtemp()\n    filename = 'novel-corona-virus-2019-dataset.zip'\n    zip_file = os.path.join(download_dir, filename)\n    \n    dataset = 'sudalairajkumar/novel-corona-virus-2019-dataset'\n    kaggle.api.dataset_download_files(dataset, download_dir)\n    with open(zip_file, 'rb') as fp:\n        b = fp.read()\n    return BytesIO(b)\n\ndef city_population_in_time():\n    import pandas as pd\n    return pd.read_csv(\n        'https://gist.githubusercontent.com/johnburnmurdoch/'\n        '4199dbe55095c3e13de8d5b2e5e5307a/raw/fa018b25c24b7b5f47fd0568937ff6c04e384786/city_populations'\n    )\n\ndef country_flag_image_url_prep(df: pd.DataFrame):\n    # delete the region col (we don't need it)\n    del df['region']\n    # rewriting a few (not all) of the country names to match those found in kaggle covid data\n    # Note: The list is not complete! Add to it as needed\n    old_and_new = [('USA', 'US'), \n                   ('Iran, Islamic Rep.', 'Iran'), \n                   ('UK', 'United Kingdom'), \n                   ('Korea, Rep.', 'Korea, South')]\n    for old, new in old_and_new:\n        df['country'] = df['country'].replace(old, new)\n\n    return df\n\n\n@kv_wrap.outcoming_vals(lambda x: pd.read_csv(BytesIO(x)))  # this is to format the data as a dataframe\nclass ZippedCsvs(ZipReader):\n    pass\n# equivalent to ZippedCsvs = kv_wrap.outcoming_vals(lambda x: pd.read_csv(BytesIO(x)))(ZipReader)\n```\n\n\n```python\n# Enter here the place you want to cache your data\nmy_local_cache = os.path.expanduser('~/ddir/my_sources')\n```\n\n\n```python\nCachedFuncReader = mk_cached_store(FuncReader, QuickPickleStore(my_local_cache))\n```\n\n\n```python\ndata_sources = CachedFuncReader([country_flag_image_url, \n                                 kaggle_coronavirus_dataset, \n                                 city_population_in_time])\nlist(data_sources)\n```\n\n\n\n\n    ['country_flag_image_url',\n     'kaggle_coronavirus_dataset',\n     'city_population_in_time']\n\n\n\n\n```python\ncovid_datasets = ZippedCsvs(data_sources['kaggle_coronavirus_dataset'])\nlist(covid_datasets)\n```\n\n\n\n\n    ['COVID19_line_list_data.csv',\n     'COVID19_open_line_list.csv',\n     'covid_19_data.csv',\n     'time_series_covid_19_confirmed.csv',\n     'time_series_covid_19_confirmed_US.csv',\n     'time_series_covid_19_deaths.csv',\n     'time_series_covid_19_deaths_US.csv',\n     'time_series_covid_19_recovered.csv']\n\n\n\n\n```python\ncovid_datasets['time_series_covid_19_confirmed.csv'].head()\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Province/State</th>\n      <th>Country/Region</th>\n      <th>Lat</th>\n      <th>Long</th>\n      <th>1/22/20</th>\n      <th>1/23/20</th>\n      <th>1/24/20</th>\n      <th>1/25/20</th>\n      <th>1/26/20</th>\n      <th>1/27/20</th>\n      <th>...</th>\n      <th>3/24/20</th>\n      <th>3/25/20</th>\n      <th>3/26/20</th>\n      <th>3/27/20</th>\n      <th>3/28/20</th>\n      <th>3/29/20</th>\n      <th>3/30/20</th>\n      <th>3/31/20</th>\n      <th>4/1/20</th>\n      <th>4/2/20</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>NaN</td>\n      <td>Afghanistan</td>\n      <td>33.0000</td>\n      <td>65.0000</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>74</td>\n      <td>84</td>\n      <td>94</td>\n      <td>110</td>\n      <td>110</td>\n      <td>120</td>\n      <td>170</td>\n      <td>174</td>\n      <td>237</td>\n      <td>273</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>NaN</td>\n      <td>Albania</td>\n      <td>41.1533</td>\n      <td>20.1683</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>123</td>\n      <td>146</td>\n      <td>174</td>\n      <td>186</td>\n      <td>197</td>\n      <td>212</td>\n      <td>223</td>\n      <td>243</td>\n      <td>259</td>\n      <td>277</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>NaN</td>\n      <td>Algeria</td>\n      <td>28.0339</td>\n      <td>1.6596</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>264</td>\n      <td>302</td>\n      <td>367</td>\n      <td>409</td>\n      <td>454</td>\n      <td>511</td>\n      <td>584</td>\n      <td>716</td>\n      <td>847</td>\n      <td>986</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>NaN</td>\n      <td>Andorra</td>\n      <td>42.5063</td>\n      <td>1.5218</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>164</td>\n      <td>188</td>\n      <td>224</td>\n      <td>267</td>\n      <td>308</td>\n      <td>334</td>\n      <td>370</td>\n      <td>376</td>\n      <td>390</td>\n      <td>428</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>NaN</td>\n      <td>Angola</td>\n      <td>-11.2027</td>\n      <td>17.8739</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>3</td>\n      <td>3</td>\n      <td>4</td>\n      <td>4</td>\n      <td>5</td>\n      <td>7</td>\n      <td>7</td>\n      <td>7</td>\n      <td>8</td>\n      <td>8</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows \u00d7 76 columns</p>\n</div>\n\n\n\n\n```python\ncountry_flag_image_url = data_sources['country_flag_image_url']\ncountry_flag_image_url.head()\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>country</th>\n      <th>region</th>\n      <th>flag_image_url</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Angola</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/ao/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Burundi</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bi/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Benin</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bj/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Burkina Faso</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bf/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Botswana</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bw/flat/64.png</td>\n    </tr>\n  </tbody>\n</table>\n</div>\n\n\n\n\n```python\nfrom IPython.display import Image\nflag_image_url_of_country = country_flag_image_url.set_index('country')['flag_image_url']\nImage(url=flag_image_url_of_country['Australia'])\n```\n\n\n\n\n<img src=\"https://www.countryflags.io/au/flat/64.png\"/>\n\n\n\n### Update coronavirus data\n\n\n```python\n# To update the coronavirus data:\ndef update_covid_data(data_sources):\n    \"\"\"update the coronavirus data\"\"\"\n    if 'kaggle_coronavirus_dataset' in data_sources._caching_store:\n        del data_sources._caching_store['kaggle_coronavirus_dataset']  # delete the cached item\n    _ = data_sources['kaggle_coronavirus_dataset']\n\n# update_covid_data(data_sources)  # uncomment here when you want to update\n```\n\n### Prepare data for flourish upload\n\n\n```python\nimport re\n\ndef print_if_verbose(verbose, *args, **kwargs):\n    if verbose:\n        print(*args, **kwargs)\n        \ndef country_data_for_data_kind(data_sources, kind='confirmed', skip_first_days=0, verbose=False):\n    \"\"\"kind can be 'confirmed', 'deaths', 'confirmed_US', 'confirmed_US', 'recovered'\"\"\"\n    \n    covid_datasets = ZippedCsvs(data_sources['kaggle_coronavirus_dataset'])\n    \n    df = covid_datasets[f'time_series_covid_19_{kind}.csv']\n    # df = s['time_series_covid_19_deaths.csv']\n    if 'Province/State' in df.columns:\n        df.loc[df['Province/State'].isna(), 'Province/State'] = 'n/a'  # to avoid problems arising from NaNs\n\n    print_if_verbose(verbose, f\"Before data shape: {df.shape}\")\n\n    # drop some columns we don't need\n    p = re.compile('\\d+/\\d+/\\d+')\n\n    assert all(isinstance(x, str) for x in df.columns)\n    date_cols = [x for x in df.columns if p.match(x)]\n    if not kind.endswith('US'):\n        df = df.loc[:, ['Country/Region'] + date_cols]\n        # group countries and sum up the contributions of their states/regions/pargs\n        df['country'] = df.pop('Country/Region')\n        df = df.groupby('country').sum()\n    else:\n        df = df.loc[:, ['Province_State'] + date_cols]\n        df['state'] = df.pop('Province_State')\n        df = df.groupby('state').sum()\n\n    \n    print_if_verbose(verbose, f\"After data shape: {df.shape}\")\n    df = df.iloc[:, skip_first_days:]\n    \n    if not kind.endswith('US'):\n        # Joining with the country image urls and saving as an xls\n        country_image_url = country_flag_image_url_prep(data_sources['country_flag_image_url'])\n        t = df.copy()\n        t.columns = [str(x)[:10] for x in t.columns]\n        t = t.reset_index(drop=False)\n        t = country_image_url.merge(t, how='outer')\n        t = t.set_index('country')\n        df = t\n    else:    \n        pass\n\n    return df\n\n\ndef mk_and_save_country_data_for_data_kind(data_sources, kind='confirmed', skip_first_days=0, verbose=False):\n    t = country_data_for_data_kind(data_sources, kind, skip_first_days, verbose)\n    filepath = f'country_covid_{kind}.xlsx'\n    t.to_excel(filepath)\n    print_if_verbose(verbose, f\"Was saved here: {filepath}\")\n\n```\n\n\n```python\n# for kind in ['confirmed', 'deaths', 'recovered', 'confirmed_US', 'deaths_US']:\nfor kind in ['confirmed', 'deaths', 'recovered', 'confirmed_US', 'deaths_US']:\n    mk_and_save_country_data_for_data_kind(data_sources, kind=kind, skip_first_days=39, verbose=True)\n```\n\n    Before data shape: (262, 79)\n    After data shape: (183, 75)\n    Was saved here: country_covid_confirmed.xlsx\n    Before data shape: (262, 79)\n    After data shape: (183, 75)\n    Was saved here: country_covid_deaths.xlsx\n    Before data shape: (248, 79)\n    After data shape: (183, 75)\n    Was saved here: country_covid_recovered.xlsx\n    Before data shape: (3253, 86)\n    After data shape: (58, 75)\n    Was saved here: country_covid_confirmed_US.xlsx\n    Before data shape: (3253, 87)\n    After data shape: (58, 75)\n    Was saved here: country_covid_deaths_US.xlsx\n\n\n### Upload to Flourish, tune, and publish\n\nGo to https://public.flourish.studio/, get a free account, and play.\n\nGot to https://app.flourish.studio/templates\n\nChoose \"Bar chart race\". At the time of writing this, it was here: https://app.flourish.studio/visualisation/1706060/\n\n... and then play with the settings\n\n\n## Discussion of the methods\n\n\n```python\nfrom py2store import *\nfrom IPython.display import Image\n```\n\n### country flags images\n\nThe manual data prep looks something like this.\n\n\n```python\nimport pandas as pd\n\n# get the csv data from the url\ncountry_image_url_source = \\\n    'https://raw.githubusercontent.com/i2mint/examples/master/data/country_flag_image_url.csv'\ncountry_image_url = pd.read_csv(country_image_url_source)\n\n# delete the region col (we don't need it)\ndel country_image_url['region']\n\n# rewriting a few (not all) of the country names to match those found in kaggle covid data\n# Note: The list is not complete! Add to it as needed\n# TODO: (Wishful) Using a general smart soft-matching algorithm to do this automatically.\n# TODO:    This could use edit-distance, synonyms, acronym generation, etc.\nold_and_new = [('USA', 'US'), \n               ('Iran, Islamic Rep.', 'Iran'), \n               ('UK', 'United Kingdom'), \n               ('Korea, Rep.', 'Korea, South')]\nfor old, new in old_and_new:\n    country_image_url['country'] = country_image_url['country'].replace(old, new)\n\nimage_url_of_country = country_image_url.set_index('country')['flag_image_url']\n\ncountry_image_url.head()\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>country</th>\n      <th>flag_image_url</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Angola</td>\n      <td>https://www.countryflags.io/ao/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Burundi</td>\n      <td>https://www.countryflags.io/bi/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Benin</td>\n      <td>https://www.countryflags.io/bj/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Burkina Faso</td>\n      <td>https://www.countryflags.io/bf/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Botswana</td>\n      <td>https://www.countryflags.io/bw/flat/64.png</td>\n    </tr>\n  </tbody>\n</table>\n</div>\n\n\n\n\n```python\nImage(url=image_url_of_country['Australia'])\n```\n\n\n\n\n<img src=\"https://www.countryflags.io/au/flat/64.png\"/>\n\n\n\n### Caching the flag images data\n\nDownloading our data sources every time we need them is not sustainable. What if they're big? What if you're offline or have slow internet (yes, dear future reader, even in the US, during coronavirus times!)?\n\nCaching. A \"cache aside\" read-cache. That's the word. py2store has tools for that (most of which are are caching.py). \n\nSo let's say we're going to have a local folder where we'll store various datas we download. The principle is as follows:\n\n\n```python\nfrom py2store.caching import mk_cached_store\n\nclass TheSource(dict): ...\nthe_cache = {}\nTheCacheSource = mk_cached_store(TheSource, the_cache)\n\nthe_source = TheSource({'green': 'eggs', 'and': 'ham'})\n\nthe_cached_source = TheCacheSource(the_source)\nprint(f\"the_cache: {the_cache}\")\nprint(f\"Getting green...\")\nthe_cached_source['green']\nprint(f\"the_cache: {the_cache}\")\nprint(\"... so the next time the_cached_source will get it's green from that the_cache\")\n```\n\n    the_cache: {}\n    Getting green...\n    the_cache: {'green': 'eggs'}\n    ... so the next time the_cached_source will get it's green from that the_cache\n\n\nBut now, you'll notice a slight problem ahead. What exactly does our source store (or rather reader) looks like? In it's raw form it would take urls as it's keys, and the response of a request as it's value. That store wouldn't have an `__iter__` for sure (unless you're Google). But more to the point here, the `mk_cached_store` tool uses the same key for the source and the cache, and we can't just use the url as is, to be a local file path. \n\nThere's many ways we could solve this. One way is to add a key map layer on the cache store, so externally, it speaks the url key language, but internally it will map that url to a valid local file path. We've been there, we got the T-shirt!\n\nBut what we're going to do is a bit different: We're going to do the key mapping in the source store itself. It seems to make more sense in our context: We have a data source of `name: data` pairs, and if we impose that the name should be a valid file name, we don't need to have a key map in the cache store.\n\nSo let's start by building this `MyDataStore` store. We'll start by defining the functions that get us the data we want. \n\n\n```python\ndef country_flag_image_url():\n    import pandas as pd\n    return pd.read_csv(\n        'https://raw.githubusercontent.com/i2mint/examples/master/data/country_flag_image_url.csv')\n\ndef kaggle_coronavirus_dataset():\n    import kaggle\n    from io import BytesIO\n    # didn't find the pure binary download function, so using temp dir to emulate\n    from tempfile import mkdtemp  \n    download_dir = mkdtemp()\n    filename = 'novel-corona-virus-2019-dataset.zip'\n    zip_file = os.path.join(download_dir, filename)\n    \n    dataset = 'sudalairajkumar/novel-corona-virus-2019-dataset'\n    kaggle.api.dataset_download_files(dataset, download_dir)\n    with open(zip_file, 'rb') as fp:\n        b = fp.read()\n    return BytesIO(b)\n\ndef city_population_in_time():\n    import pandas as pd\n    return pd.read_csv(\n        'https://gist.githubusercontent.com/johnburnmurdoch/'\n        '4199dbe55095c3e13de8d5b2e5e5307a/raw/fa018b25c24b7b5f47fd0568937ff6c04e384786/city_populations'\n    )\n```\n\nNow we can make a store that simply uses these function names as the keys, and their returned value as the values.\n\n\n```python\nfrom py2store.base import KvReader\nfrom functools import lru_cache\n\nclass FuncReader(KvReader):\n    _getitem_cache_size = 999\n    def __init__(self, funcs):\n        # TODO: assert no free arguments (arguments are allowed but must all have defaults)\n        self.funcs = funcs\n        self._func_of_name = {func.__name__: func for func in funcs}\n\n    def __contains__(self, k):\n        return k in self._func_of_name\n    \n    def __iter__(self):\n        yield from self._func_of_name\n        \n    def __len__(self):\n        return len(self._func_of_name)\n\n    @lru_cache(maxsize=_getitem_cache_size)\n    def __getitem__(self, k):\n        return self._func_of_name[k]()  # call the func\n    \n    def __hash__(self):\n        return 1\n    \n```\n\n\n```python\ndata_sources = FuncReader([country_flag_image_url, kaggle_coronavirus_dataset, city_population_in_time])\nlist(data_sources)\n```\n\n\n\n\n    ['country_flag_image_url',\n     'kaggle_coronavirus_dataset',\n     'city_population_in_time']\n\n\n\n\n```python\ndata_sources['country_flag_image_url']\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>country</th>\n      <th>region</th>\n      <th>flag_image_url</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Angola</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/ao/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Burundi</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bi/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Benin</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bj/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Burkina Faso</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bf/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Botswana</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bw/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>210</th>\n      <td>Solomon Islands</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/sb/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>211</th>\n      <td>Tonga</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/to/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>212</th>\n      <td>Tuvalu</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/tv/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>213</th>\n      <td>Vanuatu</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/vu/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>214</th>\n      <td>Samoa</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/ws/flat/64.png</td>\n    </tr>\n  </tbody>\n</table>\n<p>215 rows \u00d7 3 columns</p>\n</div>\n\n\n\n\n```python\ndata_sources['country_flag_image_url']\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>country</th>\n      <th>region</th>\n      <th>flag_image_url</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Angola</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/ao/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Burundi</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bi/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Benin</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bj/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Burkina Faso</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bf/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Botswana</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bw/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>210</th>\n      <td>Solomon Islands</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/sb/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>211</th>\n      <td>Tonga</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/to/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>212</th>\n      <td>Tuvalu</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/tv/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>213</th>\n      <td>Vanuatu</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/vu/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>214</th>\n      <td>Samoa</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/ws/flat/64.png</td>\n    </tr>\n  </tbody>\n</table>\n<p>215 rows \u00d7 3 columns</p>\n</div>\n\n\n\n\n```python\ndata_sources['city_population_in_time']\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>name</th>\n      <th>group</th>\n      <th>year</th>\n      <th>value</th>\n      <th>subGroup</th>\n      <th>city_id</th>\n      <th>lastValue</th>\n      <th>lat</th>\n      <th>lon</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1575</td>\n      <td>200.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>200.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1576</td>\n      <td>212.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>200.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1577</td>\n      <td>224.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>212.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1578</td>\n      <td>236.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>224.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1579</td>\n      <td>248.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>236.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>6247</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1561</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n    <tr>\n      <th>6248</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1562</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n    <tr>\n      <th>6249</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1563</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n    <tr>\n      <th>6250</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1564</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n    <tr>\n      <th>6251</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1565</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n  </tbody>\n</table>\n<p>6252 rows \u00d7 9 columns</p>\n</div>\n\n\n\nBut we wanted this all to be cached locally, right? So a few more lines to do that!\n\n\n```python\nfrom py2store.caching import mk_cached_store\nfrom py2store import QuickPickleStore\n    \nmy_local_cache = os.path.expanduser('~/ddir/my_sources')\n\nCachedFuncReader = mk_cached_store(FuncReader, QuickPickleStore(my_local_cache))\n```\n\n\n```python\ndata_sources = CachedFuncReader([country_flag_image_url, kaggle_coronavirus_dataset, city_population_in_time])\nlist(data_sources)\n```\n\n\n\n\n    ['country_flag_image_url',\n     'kaggle_coronavirus_dataset',\n     'city_population_in_time']\n\n\n\n\n```python\ndata_sources['country_flag_image_url']\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>country</th>\n      <th>region</th>\n      <th>flag_image_url</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Angola</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/ao/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Burundi</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bi/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Benin</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bj/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Burkina Faso</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bf/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Botswana</td>\n      <td>Africa</td>\n      <td>https://www.countryflags.io/bw/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>210</th>\n      <td>Solomon Islands</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/sb/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>211</th>\n      <td>Tonga</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/to/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>212</th>\n      <td>Tuvalu</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/tv/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>213</th>\n      <td>Vanuatu</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/vu/flat/64.png</td>\n    </tr>\n    <tr>\n      <th>214</th>\n      <td>Samoa</td>\n      <td>Oceania</td>\n      <td>https://www.countryflags.io/ws/flat/64.png</td>\n    </tr>\n  </tbody>\n</table>\n<p>215 rows \u00d7 3 columns</p>\n</div>\n\n\n\n\n```python\ndata_sources['city_population_in_time']\n```\n\n\n\n\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>name</th>\n      <th>group</th>\n      <th>year</th>\n      <th>value</th>\n      <th>subGroup</th>\n      <th>city_id</th>\n      <th>lastValue</th>\n      <th>lat</th>\n      <th>lon</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1575</td>\n      <td>200.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>200.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1576</td>\n      <td>212.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>200.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1577</td>\n      <td>224.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>212.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1578</td>\n      <td>236.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>224.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Agra</td>\n      <td>India</td>\n      <td>1579</td>\n      <td>248.0</td>\n      <td>India</td>\n      <td>Agra - India</td>\n      <td>236.0</td>\n      <td>27.18333</td>\n      <td>78.01667</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>6247</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1561</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n    <tr>\n      <th>6248</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1562</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n    <tr>\n      <th>6249</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1563</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n    <tr>\n      <th>6250</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1564</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n    <tr>\n      <th>6251</th>\n      <td>Vijayanagar</td>\n      <td>India</td>\n      <td>1565</td>\n      <td>480.0</td>\n      <td>India</td>\n      <td>Vijayanagar - India</td>\n      <td>480.0</td>\n      <td>15.33500</td>\n      <td>76.46200</td>\n    </tr>\n  </tbody>\n</table>\n<p>6252 rows \u00d7 9 columns</p>\n</div>\n\n\n\n\n```python\nz = ZippedCsvs(data_sources['kaggle_coronavirus_dataset'])\nlist(z)\n```\n",
    "bugtrack_url": null,
    "license": "Apache Software License",
    "summary": "A medley of things that got coded because there was an itch to do so",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/thorwhalen/tapyoca"
    },
    "split_keywords": [
        "documentation",
        " packaging",
        " publishing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3b959bb00b0d0eaefc038f76669543971f1fd029f707836cedfcefd6b3c7d7c5",
                "md5": "5a3f53db0041843d1e851530b7650621",
                "sha256": "5719e13d3752d16afe5eabba8b9a316fc5e2a8d025e39d996f7c4f75a26d5ed6"
            },
            "downloads": -1,
            "filename": "tapyoca-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5a3f53db0041843d1e851530b7650621",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 77391,
            "upload_time": "2025-03-01T20:08:55",
            "upload_time_iso_8601": "2025-03-01T20:08:55.802720Z",
            "url": "https://files.pythonhosted.org/packages/3b/95/9bb00b0d0eaefc038f76669543971f1fd029f707836cedfcefd6b3c7d7c5/tapyoca-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "71c7dd42003e26f7b25a89704acf9be5042ad68ff63f5a459222a26fc1840697",
                "md5": "cb8449a5a6c8f33605c653a47dba8f88",
                "sha256": "2f9f4d11a5fbf8faebecc68098a4386da8bc906611e1b4a4161429a70ba305cd"
            },
            "downloads": -1,
            "filename": "tapyoca-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "cb8449a5a6c8f33605c653a47dba8f88",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 96711,
            "upload_time": "2025-03-01T20:08:57",
            "upload_time_iso_8601": "2025-03-01T20:08:57.081307Z",
            "url": "https://files.pythonhosted.org/packages/71/c7/dd42003e26f7b25a89704acf9be5042ad68ff63f5a459222a26fc1840697/tapyoca-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-01 20:08:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "thorwhalen",
    "github_project": "tapyoca",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "tapyoca"
}

thorwhalen