# Theseus
![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/Version-6.png "Theseus Growth")
## Theseus provides straightforward tools for cohort analysis and general marketing performance analysis. Theseus was created by [Eric Benjamin Seufert](https://www.twitter.com/eric_seufert) of [Heracles](https://hrcls.co).
Theseus is an open source library that provides a set of common functions for use in doing analysis related to product growth: building retention profiles, projecting DAU levels, combining cohorts, segmenting cohorts by age, etc. Theseus can be used for marketing budgeting planning, scenario analysis, marketing campaign analysis, revenue projections, and in a media mix model.
Theseus is designed to be used for standalone analysis projects as well as in programmatic business intelligence environments.
Theseus is provided as open source software under the [MIT](https://choosealicense.com/licenses/mit/) license.
Note that Theseus is in a __beta__ state; bugs are to be expected.
## Documentation
The documentation for Theseus can be found in [this QuantMar thread](https://quantmar.com/529/How-can-use-the-theseus-python-library-to-do-cohort-analysis).
## Installation
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install Theseus.
```bash
pip install theseus_growth
```
## Usage
Include the theseus_growth library
```python
import theseus_growth as th
```
Instantiate a Theseus object
```python
th = th.theseus()
```
Working with Theseus involves using retention profiles to build cohort projections. To get started with analysis, you'll first build a retention profile using `days` and `retention` values, where each day value corresponds to a retention value, starting from Day 1 (ie. the day after a user has entered the product). Retention values should be provided as whole numbers (not decimals), eg. 30% retention for some given day would be represented as 30 and not .30.
The retention and day values are provided as lists, the lengths of which must match. Theseus uses the index of the values in the `days` list to associate with a value from the `retention` list, so no need to order the lists.
Here's an example:
```python
x_data = [ 1, 3, 7, 14, 30, 60, 90, 180 ]
y_data = [ 80, 70, 55, 50, 30, 22, 10, 8 ]
facebook = th.create_profile( days = x_data, retention_values = y_data )
print( facebook )
```
In this example, Day 1 retention is set to 80, Day 3 retention is set to 70, Day 7 retention is set to 55, etc. Then, these lists are supplied to the `create_profile` function to generate a retention profile (in this case, for Facebook, as per the variable name).
The curve fit to the retention data is decided by iterating over a number of different function forms to find the one that fits best with the smallest error. The functions tested are: `[ 'log', 'exp', 'linear', 'quad', 'weibull', 'power', 'interpolate' ]`. A specific function can be forced onto the data by using the `form` parameter with the `create_profile` function; when the `form` parameter is not set, `create_profile` defaults to finding the best fit function.
If you `print` the `facebook` variable, the output will reveal a number of pieces of information about the retention profile:
```python
{'x': [1, 3, 7, 14, 30, 60, 90, 180], 'y': [80, 70, 55, 50, 30, 22, 10, 8], 'y_collapsed': [80.0, 70.0, 55.0, 50.0, 30.0, 22.0, 10.0, 8.0], 'x_collapsed': [1, 3, 7, 14, 30, 60, 90, 180], 'interpolation_f': <scipy.interpolate.interpolate.interp1d object at 0x10c6234f8>, 'interpolation_s': <scipy.interpolate.fitpack2.InterpolatedUnivariateSpline object at 0x10c638588>, 'params': {'log': array([11.69432981, 0.85932489, 91.18858849]), 'exp': array([6.81055507e+01, 4.01937193e-02, 1.00786302e+01]), 'linear': array([-0.36314103, 58.10116222]), 'quad': array([ 4.23356783e-03, -1.09641452e+00, 6.94411850e+01]), 'weibull': array([136.70664663, 0.99893803]), 'power': array([88.3002565, 0.3123284]), 'interpolate': None}, 'errors': {'log': 61.1068291195336, 'exp': 101.38898207577283, 'linear': 1412.367783723572, 'quad': 364.49321231183075, 'weibull': 12824.82253493541, 'power': 440.6176923037875}, 'best_fit': 'log', 'retention_profile': 'best_fit', 'retention_projection': (array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,
105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,
118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,
131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156,
157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169,
170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180]), [100.0, 80.72474913359267, 73.46379035522745, 68.40395854720839, 64.51667685670971, 61.35945860038563, 58.70096154455828, 56.40491084756263, 54.384231538266555, 52.579888860453295, 50.95000745540398, 49.46380467615453, 48.097988591666066, 46.834508880262675, 45.65909322498185, 44.56026139229167, 43.52864131907821, 42.55648257398489, 41.63730255356898, 40.76562419809498, 39.936778210530946, 39.14675163202042, 38.392070317383634, 37.669706592412716, 36.97700588336105, 36.3116278251625, 35.67149854953705, 35.054771699032194, 34.459796319341784, 33.88509022316152, 33.32931774346239, 32.79127103579817, 32.269854271186304, 31.764070199363758, 31.273008668265803, 30.79583676761319, 30.331790328468763, 29.88016656089222, 29.440317651599635, 29.01164517522338, 28.593595198171947, 28.185653974577036, 27.78734415043222, 27.398221405576763, 27.017871474283382, 26.645907494353807, 26.281967642193266, 25.92571301762287, 25.576825747436132, 25.235007281103094, 24.899976855723068, 24.57147011044914, 24.24923783325221, 23.933044825140172, 23.622668868865617, 23.317899790794513, 23.01853860601659, 22.724396737987817, 22.43529530504138, 22.15106446700662, 21.87154282596005, 21.59657687581418, 21.326020496044592, 21.059734485374676, 20.79758613169244, 20.539448814872614, 20.285201639528808, 20.034729095029036, 19.787920740381438, 19.544670911838594, 19.30487845128266, 19.06844645364413, 18.835282031775264, 18.605296097351072, 18.378403156503964, 18.154521119018952, 17.933571120024055, 17.715477353206225, 17.500166914670615, 17.287569656638155, 17.07761805024684, 16.870247056785416, 16.66539400674492, 16.462998486125443, 16.263002229482055, 16.065349019235967, 15.86998459081596, 15.676856543229135, 15.485914254692815, 15.297108802987182, 15.11039289021575, 14.925720771683615, 14.743048188626403, 14.562332304542082, 14.383531644896777, 14.206606039992167, 14.031516570797578, 13.858225517564094, 13.686696311050966, 13.516893486206541, 13.34878263815699, 13.18233038036621, 13.01750430483952, 12.854272944252656, 12.692605735895398, 12.53247298732623, 12.373845843641718, 12.216696256270339, 12.060996953206299, 11.90672141060432, 11.753843825661477, 11.602339090716569, 11.452182768502269, 11.30335106848878, 11.15582082426181, 11.009569471881221, 10.86457502916953, 10.720816075882823, 10.578271734719507, 10.436921653124443, 10.296745985849213, 10.157725378230722, 10.019840950153323, 9.883074280660807, 9.747407393187231, 9.612822741376846, 9.47930319546505, 9.346832029194132, 9.215392907238623, 9.084969873116748, 8.955547337565534, 8.827110067358433, 8.69964317454533, 8.573132106096097, 8.447562633929394, 8.322920845309952, 8.199193133597916, 8.076366189334905, 7.954426991652241, 7.833362799987498, 7.713161146095985, 7.5938098263449945, 7.475296894278529, 7.357610653441469, 7.240739650452227, 7.124672668313735, 7.009398719952898, 6.8949070419793514, 6.781187088654434, 6.668228526062208, 6.556021226474144, 6.444555262900209, 6.33382090381852, 6.223808608077022, 6.114509019960167, 6.005912964414506, 5.898011442426906, 5.790795626549496, 5.684256856566179, 5.578386635294848, 5.473176624520619, 5.368618641055164, 5.264704652917132, 5.16142677562982, 5.058777268631218, 4.95674853179267, 4.8553331020422945, 4.754523650089098, 4.65431297724453, 4.554694012337748, 4.455659808721521, 4.357203541365507, 4.259318504033715, 4.161998106543535, 4.065235872103301, 3.969025434725765, 3.873360536714898, 3.778235026223541, 3.6836428548796363, 3.5895780754783857])}
```
You won't ever actually interact directly with a retention profile variable, but you can see that it contains:
+ The original X and Y (the `days` and `retention` lists) data provided;
+ A projection (in the `retention_projection` variable);
+ Two `_collapsed` variables that contain the average values for each of the `days` and `retention` lists (in this example, only one value was provided for each day, so the `y_collapsed` list is the same as the `y` list, which was provided);
+ A `params` dict that contains coefficients for a number of different shape functions;
+ Some other miscellaneous data, like interpolation models;
With the Facebook retention profile created, cohort projections can be generated from it. First, the profile can be visualized with the `plot_retention` function:
```python
th.plot_retention( facebook )
```
Which should output a graph that looks like this:
![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/fb_retention.png "Facebook retention profile graph")
Now a DAU projection based on cohorts can be generated -- in the Theseus library, this is called a **`forward DAU projection`**. First, we'll create a list of cohorts, meaning a list containing the numbers of new users that joined the product on a daily basis, with each number representing a sequential day.
Then, the `project_cohorted_DAU` function can be used to create a Pandas DataFrame containing the number of DAU present in the product, given the new users that joined via the cohorts, on the basis of the `facebook` retention profile. In this example, the function will take 4 inputs:
+ `profile`: the retention profile to use;
+ `periods`: the number of periods to project forward
+ `cohorts`: a list of new user values
+ `start_date`: the date at which the cohorts are added and from which the projection is made
```python
#cohorts are daily new user values, eg. the number of new users
#joining the product on a given day
cohorts = [1000, 1000, 1000, 1000, 1000 ]
facebook_DAU = th.project_cohorted_DAU( profile = facebook, periods = 50,
cohorts = cohorts, start_date = 1 )
print( facebook_DAU )
```
The output of this should look like:
```python
1 2 3 4 5 6 7 8 9 10 ... 41 \
cohort_date ...
1 1000 807 734 684 645 613 587 564 543 525 ... 285
2 0 1000 807 734 684 645 613 587 564 543 ... 290
3 0 0 1000 807 734 684 645 613 587 564 ... 294
4 0 0 0 1000 807 734 684 645 613 587 ... 298
5 0 0 0 0 1000 807 734 684 645 613 ... 303
42 43 44 45 46 47 48 49 50
cohort_date
1 281 277 273 270 266 262 259 255 252
2 285 281 277 273 270 266 262 259 255
3 290 285 281 277 273 270 266 262 259
4 294 290 285 281 277 273 270 266 262
5 298 294 290 285 281 277 273 270 266
[5 rows x 50 columns]
```
This DataFrame table shows how many of the original cohorts are present on any given day; the cohort numbers run down the Y axis and the days run across the X axis.
To see this as a total, the `DAU_total` function can be used:
```python
facebook_total = th.DAU_total( facebook_DAU )
print( facebook_total )
```
The output of which should look like:
```python
1 2 3 4 5 6 7 8 9 10 ... 41 \
Value ...
DAU 1000 808 1734 2491 3186 4548 5911 7274 8637 10000 ... 4312
42 43 44 45 46 47 48 49 50
Value
DAU 4246 4182 4119 4058 3999 3941 3888 3831 3779
[1 rows x 50 columns]
```
This table represents the total number of DAU present in the product from those five cohorts over the course of a 50-period timeline.
The `project_cohorted_DAU` can be used to project DAU out given some set of cohorts and a retention profile, but it can also be used to generate the number of new users needed to reach a DAU target over a timeline, given some existing set of cohorts.
In this example, the `cohorts` list contains five cohorts of 1000 new users each. If a marketing analyst wanted to know how many _additional_ cohorts, and of what size, would be needed in order to get the user base to 10,000 DAU, then they could use `project_cohorted_DAU` to do that by adding two parameters: `DAU_target` and `DAU_target_timeline`. `DAU_target` is the targeted number of DAU, and `DAU_target_timeline` is the number of days over which the additional new users will be added.
In action:
```python
facebook_DAU = th.project_cohorted_DAU( profile = facebook, periods = 50, cohorts = cohorts,
DAU_target = 10000, DAU_target_timeline = 10, start_date = 1 )
print( facebook_DAU )
```
Should produce the following output:
```python
1 2 3 4 5 6 7 8 9 10 ... \
cohort_date ...
1 1000 807 734 684 645 613 587 564 543 525 ...
2 0 1000 807 734 684 645 613 587 564 543 ...
3 0 0 1000 807 734 684 645 613 587 564 ...
4 0 0 0 1000 807 734 684 645 613 587 ...
5 0 0 0 0 1000 807 734 684 645 613 ...
6 0 0 0 0 0 1613 1302 1184 1103 1040 ...
7 0 0 0 0 0 0 1757 1418 1290 1201 ...
8 0 0 0 0 0 0 0 1853 1495 1361 ...
9 0 0 0 0 0 0 0 0 1934 1561 ...
10 0 0 0 0 0 0 0 0 0 2005 ...
41 42 43 44 45 46 47 48 49 50
cohort_date
1 285 281 277 273 270 266 262 259 255 252
2 290 285 281 277 273 270 266 262 259 255
3 294 290 285 281 277 273 270 266 262 259
4 298 294 290 285 281 277 273 270 266 262
5 303 298 294 290 285 281 277 273 270 266
6 496 489 481 474 467 461 454 448 441 435
7 549 541 532 524 517 509 502 495 488 481
8 588 579 570 562 553 545 537 529 522 514
9 624 614 604 595 586 577 569 561 553 545
10 657 647 636 627 617 608 599 590 581 573
[10 rows x 50 columns]
```
This table reveals that the additional DNU needed to get to 10,000 overall DAU within the 10-period timeframe is: 1613, 1757, 1853, 1934, 2005. _Note that this approach seeks to minimize the number of total DNU added on any given day within the timeline_.
To get only the DNU (new users) values from a forward DAU projection, the `get_DNU` function can be used:
```python
#get DNU from a DAU projection
facebook_DNU = th.get_DNU( facebook_DAU )
print( facebook_DNU )
```
The output of which should look like:
```python
cohort_date 1 2 3 4 5 6 7 \
Value
DNU 1000 1.0 1000.0 1000.0 1000.0 1710.0 1881.0 1994.0
8 9 ... 41 42 43 44 45 46 47 48 49 50
Value ...
DNU 2090.0 2171.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[1 rows x 51 columns]
```
And to reduce `facebook_DAU` to only the total DAU values over the projection timeline, the `DAU_total` function can be used again:
```python
1 2 3 4 5 6 7 8 9 10 ... 41 \
DAU ...
0 1000 1807 2541 3225 3870 5096 6322 7548 8774 10000 ... 4384
42 43 44 45 46 47 48 49 50
DAU
0 4318 4250 4188 4126 4067 4009 3953 3897 3842
[1 rows x 50 columns]
```
Note that this shows DAU reaching 10,000 by Day 10.
The Facebook `forward DAU projection` can be visualized with the `plot_forward_DAU_stacked`, which takes three required parameters:
+ `forward_DAU`: the forward DAU projection being visualized (in this case, the `facebook_DAU` variable);
+ `forward_DAU_labels`: a list of the cohort names as labels for the stacked bars. The length of this list needs to match the number of cohorts in the forward DAU projection;
+ `forward_DAU_dates`: a list of dates as labels for the X axis. The length of this list needs to match the number of periods in the forward DAU projection;
To visualize the Facebook forward DAU projection that reaches the DAU target of 10,000:
```python
th.plot_forward_DAU_stacked( forward_DAU = facebook_DAU,
forward_DAU_labels = list( facebook_DAU.index ),
forward_DAU_dates = list( facebook_DAU.columns ),
)
```
This should produce a graph that looks like this:
![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/facebook_forward_DAU_projection.png "Facebook forward DAU projection")
Note that anything can be provided in the `forward_DAU_labels` and `forward_DAU_dates` parameters. For instance, to give the X axis actual date values (starting from January 1, 2020) and to make the legend more readable, the following can be done:
```python
from datetime import date, timedelta
th.plot_forward_DAU_stacked( forward_DAU = facebook_DAU,
forward_DAU_labels = [ 'Cohort ' + str( x ) for x in list( facebook_DAU.index ) ],
forward_DAU_dates = [ date(2020, 1, 1) + timedelta(days=int( x ) - 1 ) for x in list( facebook_DAU.columns ) ]
)
```
This should produce a graph that looks like this:
![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/facebook_forward_DAU_readable.png.png "Facebook forward DAU projection")
To create a second retention profile -- this time, for Google -- the `create_profile` profile can be used again. This time, the `profile_max` parameter will be supplied: when `profile_max` is provided, the retention profile is projected out to that day (when it is not provided, the retention profile is only projected out to the maximum value provided in the `days` parameter). Also, with the Google retention profile, a much larger dataset of days and retention values will be supplied, so the curve fit is done against many more (arbitrarily produced) data points:
```python
import numpy as np
import random
x_data = [ 1, 14, 60 ]
y_data = [ 40, 22, 10 ]
new_x = []
for i, x in enumerate( x_data ):
this_x = x
for z in np.arange( 1, 100 ):
this_y = float( y_data[ i ] * ( 1 + ( random.randint( -20, 20 ) / 100 ) ) )
y_data.append( this_y )
new_x.append( this_x )
x_data.extend( new_x )
google = th.create_profile( days = x_data, retention_values = y_data, profile_max = 180 )
th.plot_retention( google )
```
The output of this should look something like this (the red dots are the actual values from `retention_values`):
![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/google_retention_profile.png "Google retention profile")
To build a forward DAU projection for Google, the following can be run. Note that the start_date is set to 10:
```python
cohorts = [ 2000, 4000, 1200, 2200, 1700, 1300, 4200, 9200 ]
google_DAU = th.project_cohorted_DAU( profile = google, periods = 40, cohorts = cohorts,
DAU_target = 20000, DAU_target_timeline = 20, start_date = 10 )
from datetime import date, timedelta
th.plot_forward_DAU_stacked( forward_DAU = google_DAU,
forward_DAU_labels = [ 'Cohort ' + str( x ) for x in list( google_DAU.index ) ],
forward_DAU_dates = [ date(2020, 1, 1) + timedelta(days=int( x ) - 1 ) for x in list( google_DAU.columns ) ]
)
```
![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/google_forward_DAU-1.png "Google forward DAU")
Note the lumpiness of the first few days of DAU -- this is a result of 1) the volatile number of DAU in the initial cohorts and 2) the relatively low Google retention ( 40% on Day 1). Also note the dates on the X axis: the chart starts on January 10th, 2020 since the `start_date` variable is set to 10.
In order to get a fuller picture of product DAU, the Facebook and Google forward DAU projections can be combined with the `combine_DAU` function. The totals for each forward DAU projection should be used, otherwise the graph would be too busy to read:
```python
google_total = th.DAU_total( google_DAU )
combined_DAU = th.combine_DAU( DAU_totals = [ facebook_total, google_total ], labels = [ "Facebook", "Google" ] )
th.plot_forward_DAU_stacked( forward_DAU = combined_DAU,
forward_DAU_labels = list( combined_DAU.index ),
forward_DAU_dates = [ date(2020, 1, 1) + timedelta(days=int( x ) - 1 ) for x in list( combined_DAU.columns ) ]
)
```
The output of the above should look like:
![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/download.png "Combined Facebook and Google forward DAU")
One important aspect of cohort analysis is age segmentation: breaking the user base out into segments based on user age. Theseus comes with two functions to do this: `project_aged_DAU` and `project_exact_aged_DAU`.
`project_aged_DAU` presents the DAU projection in terms of _minimum_ user ages: it can display the number of users that are _at least_ X days old on a given date.
`project_exact_aged_DAU` presents the DAU projection in terms of _absolute_ user ages: it can display the number of users that are _exactly_ X days old on a given date.
Both functions take five parameters:
+ `profile`: the retention profile being used for the projection;
+ `periods`: the number of periods for which the forward DAU projection is being made;
+ `cohorts`: the cohorts that are being projected forward;
+ `start_date`: the start date of the projection;
+ `ages`: a list of ages that the projection should be broken down for. For `project_aged_DAU`, the forward DAU projection will be broken out to show the number of users per day that are _at least_ as old as every age in the list. For `project_exact_aged_DAU`, the forward DAU projection will be broken out to show the number of users per day that are _exactly_ as old as every age in the list.
An example of `project_aged_DAU`:
```python
x_data = [ 1, 14, 30, 90 ]
y_data = [ 25, 18, 12, 8 ]
#form options: 'log', 'exp', 'linear', 'quad', 'weibull', 'power'
snapchat = th.create_profile( days = x_data, retention_values = y_data, profile_max = 120 )
snapchat_aged_DAU = th.project_aged_DAU( snapchat, 20, [ 100, 200, 300, 400, 500 ],
start_date = 1, ages = [ 3, 7, 14 ] )
print( snapchat_aged_DAU )
```
This should produce output that looks like the following:
```python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 \
age
3 0 0 24 71 143 236 352 343 336 327 320 312 305 296 290 282
7 0 0 0 0 0 0 22 65 130 214 320 312 305 296 290 282
14 0 0 0 0 0 0 0 0 0 0 0 0 0 18 55 108
17 18 19 20
age
3 275 268 260 254
7 275 268 260 254
14 180 268 260 254
```
Taking column 10 as an example: 869 users are _at least_ 3 days old, 563 users are _at least_ 7 days old, and 0 users are _at least_ 14 days old.
An example of `project_exact_aged_DAU`:
```python
snapchat_exact_aged_DAU = th.project_exact_aged_DAU( snapchat, 20, [ 100, 200, 300, 400, 500 ],
start_date = 1, ages = [ 3, 7, 14 ] )
print( snapchat_exact_aged_DAU )
```
This should produce output that looks like the following:
```python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 \
age
3 0 0 24 48 73 97 121 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 22 44 66 88 110 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 18 37 55 74 93 0
20
age
3 0
7 0
14 0
```
Note that each "age" only has five values; this is because there are only five cohorts provided in the example (each cohort will be an exact age only once).
Also note that if 1 is passed in the `ages` list for `project_exact_aged_DAU`, it produces a list of DNU:
```python
snapchat_exact_aged_DAU = th.project_exact_aged_DAU( snapchat, 20, [ 100, 200, 300, 400, 500 ],
start_date = 1, ages = [ 1 ] )
print( snapchat_exact_aged_DAU )
```
```python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
age
1 100 200 300 400 500 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
```
One interesting use case for user base segments is calculating percentages of the overall user base that are at least X days old -- often, certain monetization moments are only available to users after some amount of time, so being able to break a user base out into age groups to know what percentage of the user base is capable of monetizing is helpful. This can be done with Theseus by creating a Total forward DAU projection and combining it with the aged projection:
```python
snapchat_DAU = th.project_cohorted_DAU( profile = snapchat, periods = 20, cohorts = [ 100, 200, 300, 400, 500 ],
start_date = 1 )
snapchat_total = th.DAU_total( snapchat_DAU )
combined_DAU = th.combine_DAU( DAU_totals = [ snapchat_aged_DAU, snapchat_total ],
labels = [ [ "Age " + str( x ) for x in list( snapchat_aged_DAU.index ) ], "Total" ]
)
for x in list( snapchat_aged_DAU.index ):
combined_DAU.loc[ 'Age ' + str( x ) + ' Pct' ] = combined_DAU.apply( lambda z: ( z[ 'Age ' + str( x )] / z[ 'Total' ] ) )
print( combined_DAU )
```
This would output something that looks like:
```python
1 2 3 4 5 6 \
profile
Age 3 0.0 0.0 24.000000 71.000000 143.000000 236.000000
Age 7 0.0 0.0 0.000000 0.000000 0.000000 0.000000
Age 14 0.0 0.0 0.000000 0.000000 0.000000 0.000000
Total 100.0 224.0 373.000000 545.000000 742.000000 360.000000
Age 3 Pct 0.0 0.0 0.064343 0.130275 0.192722 0.655556
Age 7 Pct 0.0 0.0 0.000000 0.000000 0.000000 0.000000
Age 14 Pct 0.0 0.0 0.000000 0.000000 0.000000 0.000000
7 8 9 10 11 12 13 \
profile
Age 3 352.0000 343.000000 336.000000 327.000000 320.0 312.0 305.0
Age 7 22.0000 65.000000 130.000000 214.000000 320.0 312.0 305.0
Age 14 0.0000 0.000000 0.000000 0.000000 0.0 0.0 0.0
Total 352.0000 343.000000 336.000000 327.000000 320.0 312.0 305.0
Age 3 Pct 1.0000 1.000000 1.000000 1.000000 1.0 1.0 1.0
Age 7 Pct 0.0625 0.189504 0.386905 0.654434 1.0 1.0 1.0
Age 14 Pct 0.0000 0.000000 0.000000 0.000000 0.0 0.0 0.0
14 15 16 17 18 19 \
profile
Age 3 296.000000 290.000000 282.000000 275.000000 268.0 260.0
Age 7 296.000000 290.000000 282.000000 275.000000 268.0 260.0
Age 14 18.000000 55.000000 108.000000 180.000000 268.0 260.0
Total 296.000000 290.000000 282.000000 275.000000 268.0 260.0
Age 3 Pct 1.000000 1.000000 1.000000 1.000000 1.0 1.0
Age 7 Pct 1.000000 1.000000 1.000000 1.000000 1.0 1.0
Age 14 Pct 0.060811 0.189655 0.382979 0.654545 1.0 1.0
20
profile
Age 3 254.0
Age 7 254.0
Age 14 254.0
Total 254.0
Age 3 Pct 1.0
Age 7 Pct 1.0
Age 14 Pct 1.0
```
In order to actually work with these projections, Theseus comes with two file output functions: `to_excel` and `to_json`.
`to_excel` can take three parameters:
+ `df`: the forward DAU projection dataframe being output;
+ `file_name`: the name of the file that will be output (optional)
+ `sheet_name`: the name of the sheet that the data will be written to (optional)
`to_excel` will save a .xlsx file in the directory from which the Theseus object is being executed.
`to_json` can take two parameters:
+ `df`: the forward DAU projection dataframe being output;
+ `file_name`: the name of the file that will be output (optional)
`to_json` will save a .json file in the directory from which the Theseus object is being executed.
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
## License
[MIT](https://choosealicense.com/licenses/mit/)
Raw data
{
"_id": null,
"home_page": "",
"name": "theseus-growth",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "cohort analysis,analysis,marketing,media mix model",
"author": "Eric Benjamin Seufert",
"author_email": "eric@mobiledevmemo.com",
"download_url": "https://files.pythonhosted.org/packages/e1/ca/e555a6eefb1ee0e12d95be2e9dc14198a2c39c8eeaad649a1a7532799d4c/theseus_growth-0.3.7.tar.gz",
"platform": null,
"description": "# Theseus\n\n![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/Version-6.png \"Theseus Growth\")\n\n## Theseus provides straightforward tools for cohort analysis and general marketing performance analysis. Theseus was created by [Eric Benjamin Seufert](https://www.twitter.com/eric_seufert) of [Heracles](https://hrcls.co).\n\nTheseus is an open source library that provides a set of common functions for use in doing analysis related to product growth: building retention profiles, projecting DAU levels, combining cohorts, segmenting cohorts by age, etc. Theseus can be used for marketing budgeting planning, scenario analysis, marketing campaign analysis, revenue projections, and in a media mix model.\n\nTheseus is designed to be used for standalone analysis projects as well as in programmatic business intelligence environments.\n\nTheseus is provided as open source software under the [MIT](https://choosealicense.com/licenses/mit/) license.\n\nNote that Theseus is in a __beta__ state; bugs are to be expected.\n\n## Documentation\n\nThe documentation for Theseus can be found in [this QuantMar thread](https://quantmar.com/529/How-can-use-the-theseus-python-library-to-do-cohort-analysis).\n\n## Installation\n\nUse the package manager [pip](https://pip.pypa.io/en/stable/) to install Theseus.\n\n```bash\npip install theseus_growth\n```\n\n## Usage\n\nInclude the theseus_growth library\n\n```python\nimport theseus_growth as th\n```\n\nInstantiate a Theseus object\n```python\nth = th.theseus()\n```\n\nWorking with Theseus involves using retention profiles to build cohort projections. To get started with analysis, you'll first build a retention profile using `days` and `retention` values, where each day value corresponds to a retention value, starting from Day 1 (ie. the day after a user has entered the product). Retention values should be provided as whole numbers (not decimals), eg. 30% retention for some given day would be represented as 30 and not .30. \n\nThe retention and day values are provided as lists, the lengths of which must match. Theseus uses the index of the values in the `days` list to associate with a value from the `retention` list, so no need to order the lists.\n\nHere's an example:\n\n```python\nx_data = [ 1, 3, 7, 14, 30, 60, 90, 180 ]\ny_data = [ 80, 70, 55, 50, 30, 22, 10, 8 ]\n\nfacebook = th.create_profile( days = x_data, retention_values = y_data )\n\nprint( facebook )\n```\n\nIn this example, Day 1 retention is set to 80, Day 3 retention is set to 70, Day 7 retention is set to 55, etc. Then, these lists are supplied to the `create_profile` function to generate a retention profile (in this case, for Facebook, as per the variable name).\n\nThe curve fit to the retention data is decided by iterating over a number of different function forms to find the one that fits best with the smallest error. The functions tested are: `[ 'log', 'exp', 'linear', 'quad', 'weibull', 'power', 'interpolate' ]`. A specific function can be forced onto the data by using the `form` parameter with the `create_profile` function; when the `form` parameter is not set, `create_profile` defaults to finding the best fit function.\n\nIf you `print` the `facebook` variable, the output will reveal a number of pieces of information about the retention profile:\n\n```python\n{'x': [1, 3, 7, 14, 30, 60, 90, 180], 'y': [80, 70, 55, 50, 30, 22, 10, 8], 'y_collapsed': [80.0, 70.0, 55.0, 50.0, 30.0, 22.0, 10.0, 8.0], 'x_collapsed': [1, 3, 7, 14, 30, 60, 90, 180], 'interpolation_f': <scipy.interpolate.interpolate.interp1d object at 0x10c6234f8>, 'interpolation_s': <scipy.interpolate.fitpack2.InterpolatedUnivariateSpline object at 0x10c638588>, 'params': {'log': array([11.69432981, 0.85932489, 91.18858849]), 'exp': array([6.81055507e+01, 4.01937193e-02, 1.00786302e+01]), 'linear': array([-0.36314103, 58.10116222]), 'quad': array([ 4.23356783e-03, -1.09641452e+00, 6.94411850e+01]), 'weibull': array([136.70664663, 0.99893803]), 'power': array([88.3002565, 0.3123284]), 'interpolate': None}, 'errors': {'log': 61.1068291195336, 'exp': 101.38898207577283, 'linear': 1412.367783723572, 'quad': 364.49321231183075, 'weibull': 12824.82253493541, 'power': 440.6176923037875}, 'best_fit': 'log', 'retention_profile': 'best_fit', 'retention_projection': (array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,\n 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,\n 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,\n 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,\n 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,\n 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,\n 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,\n 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104,\n 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117,\n 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,\n 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,\n 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156,\n 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169,\n 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180]), [100.0, 80.72474913359267, 73.46379035522745, 68.40395854720839, 64.51667685670971, 61.35945860038563, 58.70096154455828, 56.40491084756263, 54.384231538266555, 52.579888860453295, 50.95000745540398, 49.46380467615453, 48.097988591666066, 46.834508880262675, 45.65909322498185, 44.56026139229167, 43.52864131907821, 42.55648257398489, 41.63730255356898, 40.76562419809498, 39.936778210530946, 39.14675163202042, 38.392070317383634, 37.669706592412716, 36.97700588336105, 36.3116278251625, 35.67149854953705, 35.054771699032194, 34.459796319341784, 33.88509022316152, 33.32931774346239, 32.79127103579817, 32.269854271186304, 31.764070199363758, 31.273008668265803, 30.79583676761319, 30.331790328468763, 29.88016656089222, 29.440317651599635, 29.01164517522338, 28.593595198171947, 28.185653974577036, 27.78734415043222, 27.398221405576763, 27.017871474283382, 26.645907494353807, 26.281967642193266, 25.92571301762287, 25.576825747436132, 25.235007281103094, 24.899976855723068, 24.57147011044914, 24.24923783325221, 23.933044825140172, 23.622668868865617, 23.317899790794513, 23.01853860601659, 22.724396737987817, 22.43529530504138, 22.15106446700662, 21.87154282596005, 21.59657687581418, 21.326020496044592, 21.059734485374676, 20.79758613169244, 20.539448814872614, 20.285201639528808, 20.034729095029036, 19.787920740381438, 19.544670911838594, 19.30487845128266, 19.06844645364413, 18.835282031775264, 18.605296097351072, 18.378403156503964, 18.154521119018952, 17.933571120024055, 17.715477353206225, 17.500166914670615, 17.287569656638155, 17.07761805024684, 16.870247056785416, 16.66539400674492, 16.462998486125443, 16.263002229482055, 16.065349019235967, 15.86998459081596, 15.676856543229135, 15.485914254692815, 15.297108802987182, 15.11039289021575, 14.925720771683615, 14.743048188626403, 14.562332304542082, 14.383531644896777, 14.206606039992167, 14.031516570797578, 13.858225517564094, 13.686696311050966, 13.516893486206541, 13.34878263815699, 13.18233038036621, 13.01750430483952, 12.854272944252656, 12.692605735895398, 12.53247298732623, 12.373845843641718, 12.216696256270339, 12.060996953206299, 11.90672141060432, 11.753843825661477, 11.602339090716569, 11.452182768502269, 11.30335106848878, 11.15582082426181, 11.009569471881221, 10.86457502916953, 10.720816075882823, 10.578271734719507, 10.436921653124443, 10.296745985849213, 10.157725378230722, 10.019840950153323, 9.883074280660807, 9.747407393187231, 9.612822741376846, 9.47930319546505, 9.346832029194132, 9.215392907238623, 9.084969873116748, 8.955547337565534, 8.827110067358433, 8.69964317454533, 8.573132106096097, 8.447562633929394, 8.322920845309952, 8.199193133597916, 8.076366189334905, 7.954426991652241, 7.833362799987498, 7.713161146095985, 7.5938098263449945, 7.475296894278529, 7.357610653441469, 7.240739650452227, 7.124672668313735, 7.009398719952898, 6.8949070419793514, 6.781187088654434, 6.668228526062208, 6.556021226474144, 6.444555262900209, 6.33382090381852, 6.223808608077022, 6.114509019960167, 6.005912964414506, 5.898011442426906, 5.790795626549496, 5.684256856566179, 5.578386635294848, 5.473176624520619, 5.368618641055164, 5.264704652917132, 5.16142677562982, 5.058777268631218, 4.95674853179267, 4.8553331020422945, 4.754523650089098, 4.65431297724453, 4.554694012337748, 4.455659808721521, 4.357203541365507, 4.259318504033715, 4.161998106543535, 4.065235872103301, 3.969025434725765, 3.873360536714898, 3.778235026223541, 3.6836428548796363, 3.5895780754783857])}\n```\n\nYou won't ever actually interact directly with a retention profile variable, but you can see that it contains:\n+ The original X and Y (the `days` and `retention` lists) data provided;\n+ A projection (in the `retention_projection` variable);\n+ Two `_collapsed` variables that contain the average values for each of the `days` and `retention` lists (in this example, only one value was provided for each day, so the `y_collapsed` list is the same as the `y` list, which was provided);\n+ A `params` dict that contains coefficients for a number of different shape functions;\n+ Some other miscellaneous data, like interpolation models;\n\nWith the Facebook retention profile created, cohort projections can be generated from it. First, the profile can be visualized with the `plot_retention` function:\n\n```python\nth.plot_retention( facebook )\n```\n\nWhich should output a graph that looks like this:\n\n![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/fb_retention.png \"Facebook retention profile graph\")\n\nNow a DAU projection based on cohorts can be generated -- in the Theseus library, this is called a **`forward DAU projection`**. First, we'll create a list of cohorts, meaning a list containing the numbers of new users that joined the product on a daily basis, with each number representing a sequential day.\n\nThen, the `project_cohorted_DAU` function can be used to create a Pandas DataFrame containing the number of DAU present in the product, given the new users that joined via the cohorts, on the basis of the `facebook` retention profile. In this example, the function will take 4 inputs:\n\n+ `profile`: the retention profile to use;\n+ `periods`: the number of periods to project forward\n+ `cohorts`: a list of new user values \n+ `start_date`: the date at which the cohorts are added and from which the projection is made\n\n```python\n#cohorts are daily new user values, eg. the number of new users\n#joining the product on a given day\ncohorts = [1000, 1000, 1000, 1000, 1000 ]\n\nfacebook_DAU = th.project_cohorted_DAU( profile = facebook, periods = 50, \n cohorts = cohorts, start_date = 1 )\n\nprint( facebook_DAU )\n```\n\nThe output of this should look like:\n\n```python\n 1 2 3 4 5 6 7 8 9 10 ... 41 \\\ncohort_date ... \n1 1000 807 734 684 645 613 587 564 543 525 ... 285 \n2 0 1000 807 734 684 645 613 587 564 543 ... 290 \n3 0 0 1000 807 734 684 645 613 587 564 ... 294 \n4 0 0 0 1000 807 734 684 645 613 587 ... 298 \n5 0 0 0 0 1000 807 734 684 645 613 ... 303 \n\n 42 43 44 45 46 47 48 49 50 \ncohort_date \n1 281 277 273 270 266 262 259 255 252 \n2 285 281 277 273 270 266 262 259 255 \n3 290 285 281 277 273 270 266 262 259 \n4 294 290 285 281 277 273 270 266 262 \n5 298 294 290 285 281 277 273 270 266 \n\n[5 rows x 50 columns]\n```\n\nThis DataFrame table shows how many of the original cohorts are present on any given day; the cohort numbers run down the Y axis and the days run across the X axis.\n\nTo see this as a total, the `DAU_total` function can be used:\n\n```python\nfacebook_total = th.DAU_total( facebook_DAU )\n\nprint( facebook_total )\n```\n\nThe output of which should look like:\n\n```python\n 1 2 3 4 5 6 7 8 9 10 ... 41 \\\nValue ... \nDAU 1000 808 1734 2491 3186 4548 5911 7274 8637 10000 ... 4312 \n\n 42 43 44 45 46 47 48 49 50 \nValue \nDAU 4246 4182 4119 4058 3999 3941 3888 3831 3779 \n\n[1 rows x 50 columns]\n```\n\nThis table represents the total number of DAU present in the product from those five cohorts over the course of a 50-period timeline.\n\n\nThe `project_cohorted_DAU` can be used to project DAU out given some set of cohorts and a retention profile, but it can also be used to generate the number of new users needed to reach a DAU target over a timeline, given some existing set of cohorts.\n\nIn this example, the `cohorts` list contains five cohorts of 1000 new users each. If a marketing analyst wanted to know how many _additional_ cohorts, and of what size, would be needed in order to get the user base to 10,000 DAU, then they could use `project_cohorted_DAU` to do that by adding two parameters: `DAU_target` and `DAU_target_timeline`. `DAU_target` is the targeted number of DAU, and `DAU_target_timeline` is the number of days over which the additional new users will be added.\n\nIn action:\n\n```python\nfacebook_DAU = th.project_cohorted_DAU( profile = facebook, periods = 50, cohorts = cohorts, \n DAU_target = 10000, DAU_target_timeline = 10, start_date = 1 )\n\nprint( facebook_DAU )\n```\n\nShould produce the following output:\n\n```python\n 1 2 3 4 5 6 7 8 9 10 ... \\\ncohort_date ... \n1 1000 807 734 684 645 613 587 564 543 525 ... \n2 0 1000 807 734 684 645 613 587 564 543 ... \n3 0 0 1000 807 734 684 645 613 587 564 ... \n4 0 0 0 1000 807 734 684 645 613 587 ... \n5 0 0 0 0 1000 807 734 684 645 613 ... \n6 0 0 0 0 0 1613 1302 1184 1103 1040 ... \n7 0 0 0 0 0 0 1757 1418 1290 1201 ... \n8 0 0 0 0 0 0 0 1853 1495 1361 ... \n9 0 0 0 0 0 0 0 0 1934 1561 ... \n10 0 0 0 0 0 0 0 0 0 2005 ... \n\n 41 42 43 44 45 46 47 48 49 50 \ncohort_date \n1 285 281 277 273 270 266 262 259 255 252 \n2 290 285 281 277 273 270 266 262 259 255 \n3 294 290 285 281 277 273 270 266 262 259 \n4 298 294 290 285 281 277 273 270 266 262 \n5 303 298 294 290 285 281 277 273 270 266 \n6 496 489 481 474 467 461 454 448 441 435 \n7 549 541 532 524 517 509 502 495 488 481 \n8 588 579 570 562 553 545 537 529 522 514 \n9 624 614 604 595 586 577 569 561 553 545 \n10 657 647 636 627 617 608 599 590 581 573 \n\n[10 rows x 50 columns]\n```\n\nThis table reveals that the additional DNU needed to get to 10,000 overall DAU within the 10-period timeframe is: 1613, 1757, 1853, 1934, 2005. _Note that this approach seeks to minimize the number of total DNU added on any given day within the timeline_.\n\nTo get only the DNU (new users) values from a forward DAU projection, the `get_DNU` function can be used:\n\n```python\n#get DNU from a DAU projection\nfacebook_DNU = th.get_DNU( facebook_DAU )\nprint( facebook_DNU )\n```\n\nThe output of which should look like:\n\n```python\n cohort_date 1 2 3 4 5 6 7 \\\nValue \nDNU 1000 1.0 1000.0 1000.0 1000.0 1710.0 1881.0 1994.0 \n\n 8 9 ... 41 42 43 44 45 46 47 48 49 50 \nValue ... \nDNU 2090.0 2171.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n\n[1 rows x 51 columns]\n```\n\nAnd to reduce `facebook_DAU` to only the total DAU values over the projection timeline, the `DAU_total` function can be used again:\n\n```python\n 1 2 3 4 5 6 7 8 9 10 ... 41 \\\nDAU ... \n0 1000 1807 2541 3225 3870 5096 6322 7548 8774 10000 ... 4384 \n\n 42 43 44 45 46 47 48 49 50 \nDAU \n0 4318 4250 4188 4126 4067 4009 3953 3897 3842 \n\n[1 rows x 50 columns]\n```\n\nNote that this shows DAU reaching 10,000 by Day 10.\n\nThe Facebook `forward DAU projection` can be visualized with the `plot_forward_DAU_stacked`, which takes three required parameters:\n+ `forward_DAU`: the forward DAU projection being visualized (in this case, the `facebook_DAU` variable);\n+ `forward_DAU_labels`: a list of the cohort names as labels for the stacked bars. The length of this list needs to match the number of cohorts in the forward DAU projection;\n+ `forward_DAU_dates`: a list of dates as labels for the X axis. The length of this list needs to match the number of periods in the forward DAU projection;\n\nTo visualize the Facebook forward DAU projection that reaches the DAU target of 10,000:\n\n```python\nth.plot_forward_DAU_stacked( forward_DAU = facebook_DAU, \n forward_DAU_labels = list( facebook_DAU.index ), \n forward_DAU_dates = list( facebook_DAU.columns ), \n)\n```\n\nThis should produce a graph that looks like this:\n\n![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/facebook_forward_DAU_projection.png \"Facebook forward DAU projection\")\n\nNote that anything can be provided in the `forward_DAU_labels` and `forward_DAU_dates` parameters. For instance, to give the X axis actual date values (starting from January 1, 2020) and to make the legend more readable, the following can be done:\n\n```python\nfrom datetime import date, timedelta\nth.plot_forward_DAU_stacked( forward_DAU = facebook_DAU, \n forward_DAU_labels = [ 'Cohort ' + str( x ) for x in list( facebook_DAU.index ) ], \n forward_DAU_dates = [ date(2020, 1, 1) + timedelta(days=int( x ) - 1 ) for x in list( facebook_DAU.columns ) ]\n)\n```\n\nThis should produce a graph that looks like this:\n\n![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/facebook_forward_DAU_readable.png.png \"Facebook forward DAU projection\")\n\nTo create a second retention profile -- this time, for Google -- the `create_profile` profile can be used again. This time, the `profile_max` parameter will be supplied: when `profile_max` is provided, the retention profile is projected out to that day (when it is not provided, the retention profile is only projected out to the maximum value provided in the `days` parameter). Also, with the Google retention profile, a much larger dataset of days and retention values will be supplied, so the curve fit is done against many more (arbitrarily produced) data points:\n\n```python\nimport numpy as np\nimport random\n\nx_data = [ 1, 14, 60 ]\ny_data = [ 40, 22, 10 ]\n\nnew_x = []\nfor i, x in enumerate( x_data ):\n this_x = x\n for z in np.arange( 1, 100 ):\n this_y = float( y_data[ i ] * ( 1 + ( random.randint( -20, 20 ) / 100 ) ) )\n y_data.append( this_y )\n new_x.append( this_x )\n \nx_data.extend( new_x )\n\ngoogle = th.create_profile( days = x_data, retention_values = y_data, profile_max = 180 )\n\nth.plot_retention( google )\n```\n\nThe output of this should look something like this (the red dots are the actual values from `retention_values`):\n\n![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/google_retention_profile.png \"Google retention profile\")\n\nTo build a forward DAU projection for Google, the following can be run. Note that the start_date is set to 10:\n\n```python\ncohorts = [ 2000, 4000, 1200, 2200, 1700, 1300, 4200, 9200 ]\ngoogle_DAU = th.project_cohorted_DAU( profile = google, periods = 40, cohorts = cohorts, \n DAU_target = 20000, DAU_target_timeline = 20, start_date = 10 )\n\nfrom datetime import date, timedelta\nth.plot_forward_DAU_stacked( forward_DAU = google_DAU, \n forward_DAU_labels = [ 'Cohort ' + str( x ) for x in list( google_DAU.index ) ], \n forward_DAU_dates = [ date(2020, 1, 1) + timedelta(days=int( x ) - 1 ) for x in list( google_DAU.columns ) ]\n)\n```\n\n![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/google_forward_DAU-1.png \"Google forward DAU\")\n\nNote the lumpiness of the first few days of DAU -- this is a result of 1) the volatile number of DAU in the initial cohorts and 2) the relatively low Google retention ( 40% on Day 1). Also note the dates on the X axis: the chart starts on January 10th, 2020 since the `start_date` variable is set to 10.\n\nIn order to get a fuller picture of product DAU, the Facebook and Google forward DAU projections can be combined with the `combine_DAU` function. The totals for each forward DAU projection should be used, otherwise the graph would be too busy to read:\n\n```python\ngoogle_total = th.DAU_total( google_DAU )\n\ncombined_DAU = th.combine_DAU( DAU_totals = [ facebook_total, google_total ], labels = [ \"Facebook\", \"Google\" ] )\n\nth.plot_forward_DAU_stacked( forward_DAU = combined_DAU, \n forward_DAU_labels = list( combined_DAU.index ), \n forward_DAU_dates = [ date(2020, 1, 1) + timedelta(days=int( x ) - 1 ) for x in list( combined_DAU.columns ) ]\n)\n```\n\nThe output of the above should look like:\n\n![alt text](https://mobiledevmemo.com/wp-content/uploads/2020/01/download.png \"Combined Facebook and Google forward DAU\")\n\nOne important aspect of cohort analysis is age segmentation: breaking the user base out into segments based on user age. Theseus comes with two functions to do this: `project_aged_DAU` and `project_exact_aged_DAU`. \n\n`project_aged_DAU` presents the DAU projection in terms of _minimum_ user ages: it can display the number of users that are _at least_ X days old on a given date.\n\n`project_exact_aged_DAU` presents the DAU projection in terms of _absolute_ user ages: it can display the number of users that are _exactly_ X days old on a given date.\n\nBoth functions take five parameters:\n+ `profile`: the retention profile being used for the projection;\n+ `periods`: the number of periods for which the forward DAU projection is being made;\n+ `cohorts`: the cohorts that are being projected forward;\n+ `start_date`: the start date of the projection;\n+ `ages`: a list of ages that the projection should be broken down for. For `project_aged_DAU`, the forward DAU projection will be broken out to show the number of users per day that are _at least_ as old as every age in the list. For `project_exact_aged_DAU`, the forward DAU projection will be broken out to show the number of users per day that are _exactly_ as old as every age in the list.\n\nAn example of `project_aged_DAU`:\n\n```python\nx_data = [ 1, 14, 30, 90 ]\ny_data = [ 25, 18, 12, 8 ]\n\n#form options: 'log', 'exp', 'linear', 'quad', 'weibull', 'power'\nsnapchat = th.create_profile( days = x_data, retention_values = y_data, profile_max = 120 )\n\nsnapchat_aged_DAU = th.project_aged_DAU( snapchat, 20, [ 100, 200, 300, 400, 500 ], \n start_date = 1, ages = [ 3, 7, 14 ] )\n\nprint( snapchat_aged_DAU )\n```\n\nThis should produce output that looks like the following:\n\n```python\n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 \\\nage \n3 0 0 24 71 143 236 352 343 336 327 320 312 305 296 290 282 \n7 0 0 0 0 0 0 22 65 130 214 320 312 305 296 290 282 \n14 0 0 0 0 0 0 0 0 0 0 0 0 0 18 55 108 \n\n 17 18 19 20 \nage \n3 275 268 260 254 \n7 275 268 260 254 \n14 180 268 260 254 \n```\n\nTaking column 10 as an example: 869 users are _at least_ 3 days old, 563 users are _at least_ 7 days old, and 0 users are _at least_ 14 days old. \n\nAn example of `project_exact_aged_DAU`:\n\n```python\nsnapchat_exact_aged_DAU = th.project_exact_aged_DAU( snapchat, 20, [ 100, 200, 300, 400, 500 ], \n start_date = 1, ages = [ 3, 7, 14 ] )\n\nprint( snapchat_exact_aged_DAU )\n```\n\nThis should produce output that looks like the following:\n\n```python\n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 \\\nage \n3 0 0 24 48 73 97 121 0 0 0 0 0 0 0 0 0 0 0 0 \n7 0 0 0 0 0 0 22 44 66 88 110 0 0 0 0 0 0 0 0 \n14 0 0 0 0 0 0 0 0 0 0 0 0 0 18 37 55 74 93 0 \n\n 20 \nage \n3 0 \n7 0 \n14 0\n```\n\nNote that each \"age\" only has five values; this is because there are only five cohorts provided in the example (each cohort will be an exact age only once).\n\nAlso note that if 1 is passed in the `ages` list for `project_exact_aged_DAU`, it produces a list of DNU:\n\n```python\nsnapchat_exact_aged_DAU = th.project_exact_aged_DAU( snapchat, 20, [ 100, 200, 300, 400, 500 ], \n start_date = 1, ages = [ 1 ] )\n\nprint( snapchat_exact_aged_DAU )\n```\n\n```python\n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20\nage \n1 100 200 300 400 500 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n```\n\nOne interesting use case for user base segments is calculating percentages of the overall user base that are at least X days old -- often, certain monetization moments are only available to users after some amount of time, so being able to break a user base out into age groups to know what percentage of the user base is capable of monetizing is helpful. This can be done with Theseus by creating a Total forward DAU projection and combining it with the aged projection:\n\n```python\nsnapchat_DAU = th.project_cohorted_DAU( profile = snapchat, periods = 20, cohorts = [ 100, 200, 300, 400, 500 ], \n start_date = 1 )\n\nsnapchat_total = th.DAU_total( snapchat_DAU )\n\ncombined_DAU = th.combine_DAU( DAU_totals = [ snapchat_aged_DAU, snapchat_total ], \n labels = [ [ \"Age \" + str( x ) for x in list( snapchat_aged_DAU.index ) ], \"Total\" ] \n)\n\nfor x in list( snapchat_aged_DAU.index ):\n combined_DAU.loc[ 'Age ' + str( x ) + ' Pct' ] = combined_DAU.apply( lambda z: ( z[ 'Age ' + str( x )] / z[ 'Total' ] ) )\n\nprint( combined_DAU )\n```\n\nThis would output something that looks like:\n\n```python\n 1 2 3 4 5 6 \\\nprofile \nAge 3 0.0 0.0 24.000000 71.000000 143.000000 236.000000 \nAge 7 0.0 0.0 0.000000 0.000000 0.000000 0.000000 \nAge 14 0.0 0.0 0.000000 0.000000 0.000000 0.000000 \nTotal 100.0 224.0 373.000000 545.000000 742.000000 360.000000 \nAge 3 Pct 0.0 0.0 0.064343 0.130275 0.192722 0.655556 \nAge 7 Pct 0.0 0.0 0.000000 0.000000 0.000000 0.000000 \nAge 14 Pct 0.0 0.0 0.000000 0.000000 0.000000 0.000000 \n\n 7 8 9 10 11 12 13 \\\nprofile \nAge 3 352.0000 343.000000 336.000000 327.000000 320.0 312.0 305.0 \nAge 7 22.0000 65.000000 130.000000 214.000000 320.0 312.0 305.0 \nAge 14 0.0000 0.000000 0.000000 0.000000 0.0 0.0 0.0 \nTotal 352.0000 343.000000 336.000000 327.000000 320.0 312.0 305.0 \nAge 3 Pct 1.0000 1.000000 1.000000 1.000000 1.0 1.0 1.0 \nAge 7 Pct 0.0625 0.189504 0.386905 0.654434 1.0 1.0 1.0 \nAge 14 Pct 0.0000 0.000000 0.000000 0.000000 0.0 0.0 0.0 \n\n 14 15 16 17 18 19 \\\nprofile \nAge 3 296.000000 290.000000 282.000000 275.000000 268.0 260.0 \nAge 7 296.000000 290.000000 282.000000 275.000000 268.0 260.0 \nAge 14 18.000000 55.000000 108.000000 180.000000 268.0 260.0 \nTotal 296.000000 290.000000 282.000000 275.000000 268.0 260.0 \nAge 3 Pct 1.000000 1.000000 1.000000 1.000000 1.0 1.0 \nAge 7 Pct 1.000000 1.000000 1.000000 1.000000 1.0 1.0 \nAge 14 Pct 0.060811 0.189655 0.382979 0.654545 1.0 1.0 \n\n 20 \nprofile \nAge 3 254.0 \nAge 7 254.0 \nAge 14 254.0 \nTotal 254.0 \nAge 3 Pct 1.0 \nAge 7 Pct 1.0 \nAge 14 Pct 1.0 \n```\n\nIn order to actually work with these projections, Theseus comes with two file output functions: `to_excel` and `to_json`. \n\n`to_excel` can take three parameters:\n+ `df`: the forward DAU projection dataframe being output;\n+ `file_name`: the name of the file that will be output (optional)\n+ `sheet_name`: the name of the sheet that the data will be written to (optional)\n\n`to_excel` will save a .xlsx file in the directory from which the Theseus object is being executed.\n\n`to_json` can take two parameters:\n+ `df`: the forward DAU projection dataframe being output;\n+ `file_name`: the name of the file that will be output (optional)\n\n`to_json` will save a .json file in the directory from which the Theseus object is being executed.\n\n\n\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\nPlease make sure to update tests as appropriate.\n\n## License\n[MIT](https://choosealicense.com/licenses/mit/)",
"bugtrack_url": null,
"license": "MIT",
"summary": "Theseus is a library of tools to use in marketing analysis",
"version": "0.3.7",
"project_urls": null,
"split_keywords": [
"cohort analysis",
"analysis",
"marketing",
"media mix model"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "27ca08b625a79983cb50f8fd4d65031737f04dca083f7fdfc19a98856b0ea324",
"md5": "3375b459874e2a6eb46d176e6e4e0fb3",
"sha256": "1a409ccedef290117a12b7cacbc5afb5e66de300326e9552d750fac0e5a4e80b"
},
"downloads": -1,
"filename": "theseus_growth-0.3.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3375b459874e2a6eb46d176e6e4e0fb3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 22895,
"upload_time": "2023-08-02T15:31:38",
"upload_time_iso_8601": "2023-08-02T15:31:38.464986Z",
"url": "https://files.pythonhosted.org/packages/27/ca/08b625a79983cb50f8fd4d65031737f04dca083f7fdfc19a98856b0ea324/theseus_growth-0.3.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e1cae555a6eefb1ee0e12d95be2e9dc14198a2c39c8eeaad649a1a7532799d4c",
"md5": "826b69f88e7d51d777858c38237193a8",
"sha256": "cf26cf29c8eafa89b73da8004aae5b9eeed4d30d2cc25ec1a7cc1a431f89f430"
},
"downloads": -1,
"filename": "theseus_growth-0.3.7.tar.gz",
"has_sig": false,
"md5_digest": "826b69f88e7d51d777858c38237193a8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 29873,
"upload_time": "2023-08-02T15:31:40",
"upload_time_iso_8601": "2023-08-02T15:31:40.663460Z",
"url": "https://files.pythonhosted.org/packages/e1/ca/e555a6eefb1ee0e12d95be2e9dc14198a2c39c8eeaad649a1a7532799d4c/theseus_growth-0.3.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-02 15:31:40",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "theseus-growth"
}