weightwatcher

Name	weightwatcher JSON
Version	0.7.5.2 JSON
	download
home_page	https://calculationconsulting.com/
Summary	Diagnostic Tool for Deep Neural Networks
upload_time	2024-03-06 07:30:13
maintainer	Calculation Consulting
docs_url	None
author	Calculation Consulting
requires_python	>= 3.3
license	Apache License, Version 2.0
keywords	deep learning keras tensorflow pytorch deep learning dnn neural networks
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Downloads](http://pepy.tech/badge/weightwatcher)](http://pepy.tech/project/weightwatcher)
[![PyPI](https://img.shields.io/pypi/v/weightwatcher?color=teal&label=release)](https://pypi.org/project/weightwatcher/)
[![GitHub](https://img.shields.io/github/license/calculatedcontent/weightwatcher?color=blue)](./LICENSE.txt)
[![Published in Nature](https://img.shields.io/badge/Published%20in-Nature-teal)](https://nature.com/articles/s41467-021-24025-8)
[![Video Tutorial](https://img.shields.io/badge/Video-Tutorial-blue)](https://www.youtube.com/watch?v=Tnafo6JVoJs)
[![Discord](https://img.shields.io/discord/1026957040133873745?color=teal&label=discord)](https://discord.gg/uVVsEAcfyF)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-blue)](https://www.linkedin.com/in/charlesmartin14/)
[![Blog CalculatedContent](https://img.shields.io/badge/Blog-teal)](https://www.calculatedcontent.com)


[![WeightWatcher Logo](./img/WW-logo-long.jpg)](https://weightwatcher.ai)



**WeightWatcher** (WW) is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data.  It is based on theoretical research into Why Deep Learning Works, based on our Theory of Heavy-Tailed Self-Regularization (HT-SR).  It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.

It can be used to:

- analyze pre/trained pyTorch, Keras, DNN models (Conv2D and Dense layers)
- monitor models, and the model layers, to see if they are over-trained or over-parameterized
- predict test accuracies across different models, with or without training data
- detect potential problems when compressing or fine-tuning pretrained models
- layer warning labels: over-trained; under-trained


## Quick Links 

- Please see [our latest talk from the Sillicon Valley ACM meetup](https://www.youtube.com/watch?v=Tnafo6JVoJs)

- Join the [Discord Server](https://discord.gg/uVVsEAcfyF) 

- For a deeper dive into the theory, see [our latest talk at ENS](https://youtu.be/xEuBwBj_Ov4)

- and some of the most recent Podcasts:

  - [Practical AI](https://changelog.com/practicalai/194)
  - [The Prompt Desk](https://smartlink.ausha.co/the-prompt-desk/data-free-quality-analysis-of-deep-neural-nets-with-charles-h-martin)

- More details and demos can be found on the [Calculated Content Blog](https://calculatedcontent.com/)

And in the notebooks provided in the [examples](https://github.com/CalculatedContent/WeightWatcher/tree/master/examples) directory

## Installation:  Version 0.7.5.1

```sh
pip install weightwatcher
```

if this fails try

### Current TestPyPI  Version 0.7.5.2

```sh
 python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple weightwatcher
 ```




## Usage

```python
import weightwatcher as ww
import torchvision.models as models

model = models.vgg19_bn(pretrained=True)
watcher = ww.WeightWatcher(model=model)
details = watcher.analyze()
summary = watcher.get_summary(details)
```

It is as easy to run and generates a pandas dataframe with details (and plots) for each layer

![Sample Details Dataframe](./img/sample-ww-details.png)

and `summary` dictionary of generalization metrics

```python
    {'log_norm': 2.11,      'alpha': 3.06,
      'alpha_weighted': 2.78,
      'log_alpha_norm': 3.21,
      'log_spectral_norm': 0.89,
      'stable_rank': 20.90,
      'mp_softrank': 0.52}
```

## Advanced Usage 

The `watcher` object has several functions and analysis features described below

Notice the min_evals setting:  the power law fits need at least 50 eigenvalues to make sense
but the describe and other methods do not

```python
watcher.analyze(model=None, layers=[], min_evals=50, max_evals=None,
	 plot=True, randomize=True, mp_fit=True, pool=True, savefig=True):
...
watcher.describe(self, model=None, layers=[], min_evals=0, max_evals=None,
         plot=True, randomize=True, mp_fit=True, pool=True):
...
watcher.get_details()
watcher.get_summary(details) or get_summary()
watcher.get_ESD()
...
watcher.distances(model_1, model_2)
```

## PEFT / LORA models  (experimental)
To analyze an PEFT / LORA fine-tuned model, specify the peft option.

 - peft = True:  Forms the BA low rank matric and analyzes the delta layers, with 'lora_BA" tag in name
 
   ```details = watcher.analyze(peft='peft_only')```

 - peft = 'with_base':  Analyes the base_model, the delta, and the combined layer weight matrices.  
 
   ```details = watcher.analyze(peft=True)```
   

The base_model and fine-tuned model must have the same layer names.  And weightwatcher will ignore layers that do not share the same name.
Also,at this point, biases are not considered.  Finally, both models should be stored in the same format (i.e safetensors)

Note: If you want to select by layer_ids, you must first run describe(peft=False), and then select *both* the lora_A and lora_B layers

#### Usage: Base Model
![Usage: Base Model](./img/ww0.7.4.jpeg)


## Ploting and Fitting the Empirical Spectral Density (ESD)

WW creates plots for each layer weight matrix to observe how well the power law fits work

```python
details = watcher.analyze(plot=True)
```

For each layer, WeightWatcher plots the ESD--a histogram of the eigenvalues of the layer correlation matrix **X=W<sup>T</sup>W**.  It then fits the tail of ESD to a (Truncated) Power Law, and plots these fits on different axes. The summary metrics (above) characterize the Shape and Scale of each ESD.  Here's an example:

<img src="./img/ESD-plots.png" width='800px'  height='auto' />

Generally speaking, the ESDs in the best layers, in the best DNNs can be fit to a Power Law (PL), with PL exponents `alpha` closer to `2.0`.
Visually, the ESD looks like a straight line on a log-log plot (above left).

## Generalization Metrics

<details>
  <summary>
The goal of the WeightWatcher project is find generalization metrics that most accurately reflect observed test accuracies, across many different models and architectures, for pre-trained models and models undergoing training.
	  
</summary>
	

[Our HTSR theory](https://jmlr.org/papers/volume22/20-410/20-410.pdf) says that well trained, well correlated layers should be signficantly different from the MP (Marchenko-Pastur) random bulk, and specifically to be heavy tailed. There are different layer metrics in WeightWatcher for this, including:

- `rand_distance` : the  distance in distribution from the randomized layer
- `alpha` : the slope of the tail of the ESD, on a log-log scale
- `alpha-hat` or `alpha_weighted` : a scale-adjusted form of `alpha` (similar to the alpha-shatten-Norm)
- `stable_rank` : a norm-adjusted measure of the scale of the ESD
- `num_spikes` : the number of spikes outside the MP bulk region
- `max_rand_eval` : scale of the random noise etc

All of these attempt to measure how on-random and/or non-heavy-tailed the layer ESDs are.  


#### Scale Metrics 

- log Frobenius norm :  <img src="https://render.githubusercontent.com/render/math?math=\log_{10}\Vert\mathbf{W}\Vert^{2}_{F}">
- `log_spectral_norm` :   <img src="https://render.githubusercontent.com/render/math?math=\log_{10}\lambda_{max}=\log_{10}\Vert\mathbf{W}\Vert^{2}_{\infty}">

- `stable_rank` :  <img src="https://render.githubusercontent.com/render/math?math=R_{stable}=\Vert\mathbf{W}\Vert^{2}_{F}/\Vert\mathbf{W}\Vert^{2}_{\infty}">
- `mp_softrank` :  <img src="https://render.githubusercontent.com/render/math?math=R_{MP}=\lambda_{MP}/\lambda_{max}">
 
#### Shape Metrics

 - `alpha` : <img src="https://render.githubusercontent.com/render/math?math=\alpha"> Power Law (PL) exponent 
 - (Truncated) PL quality of fit `D` : <img src="https://render.githubusercontent.com/render/math?math=\D"> (the Kolmogorov Smirnov Distance metric)




(advanced usage)
 - TPL : (alpha and Lambda) Truncated Power Law Fit
 - E_TPL : (alpha and Lambda) Extended Truncated Power Law Fit


 
#### Scale-adjusted Shape Metrics

- `alpha_weighted` :  <img src="https://render.githubusercontent.com/render/math?math=\hat{\alpha}=\alpha\log_{10}\lambda_{max}">
- `log_alpha_norm` : (Shatten norm): <img src="https://render.githubusercontent.com/render/math?math=\log_{10}\Vert\mathbf{X}\Vert^{\alpha}_{\alpha}">

#### Direct Correlation Metrics 

The random distance metric is a new, non-parameteric approach that appears to work well in early testing.
 [See this recent blog post](https://calculatedcontent.com/2021/10/17/fantastic-measures-of-generalization-that-actually-work-part-1/)

- `rand_distance` : <img src="https://render.githubusercontent.com/render/math?math=div(\mathbf{W},rand(\mathbf{W}))">   Distance of layer ESD from the ideal RMT MP ESD

There re also related metrics, including the new

- 'ww_maxdist'
- 'ww_softrank'

#### Misc Details

- `N, M` :  Matrix or Tensor Slice Dimensions
- `num_spikes` :  number of spikes outside the bulk region of the ESD, when fit to an MP distribution
- `num_rand_spikes` :  number of Correlation Traps
- `max_rand_eval` : scale of the random noise in the layer


#### Summary Statistics: 
The layer metrics are averaged in the **summary** statistics:

Get the average metrics, as a `summary` (dict), from the given (or current) `details` dataframe

```python
details = watcher.analyze(model=model)
summary = watcher.get_summary(model)
```
or just
```python
summary = watcher.get_summary()
```

The summary statistics can be used to gauge the test error of a series of pre/trained models, without needing access to training or test data.

- average `alpha` can be used to compare one or more DNN models with different hyperparemeter settings **&theta;**, when depth is not a driving factor (i.e transformer models)
- average `log_spectral_norm` is useful to compare models of different depths **L** at a coarse grain level
- average `alpha_weighted` and `log_alpha_norm` are suitable for DNNs of differing hyperparemeters **&theta;** and depths **L** simultaneously. (i.e CV models like VGG and ResNet)


#### Predicting the Generalization Error


WeightWatcher (WW) can be used to compare the test error for a series of models, trained on the similar dataset, but with different hyperparameters **&theta;**, or even different but related architectures.  
	
Our Theory of HT-SR predicts that models with smaller PL exponents `alpha`, on average, correspond to models that generalize better.

Here is an example of the `alpha_weighted` capacity metric for all the current pretrained VGG models.

<img src="https://github.com/CalculatedContent/PredictingTestAccuracies/blob/master/img/vgg-w_alphas.png" width='600px' height='auto' />

Notice: we *did not peek* at the ImageNet test data to build this plot.
	
This can be reproduced with the Examples Notebooks for [VGG](https://github.com/CalculatedContent/WeightWatcher/blob/master/examples/WW-VGG.ipynb) and also for [ResNet](https://github.com/CalculatedContent/WeightWatcher/blob/master/examples/WW-ResNet.ipynb)

</details>

## Detecting signs of Over-Fitting and Under-Fitting

WeightWatcher can help you detect the signatures of over-fitting and under-fitting in specific layers of a pre/trained Deep Neural Networks.

WeightWatcher will analyze your model, layer-by-layer, and show you where these kind of problems may be lurking.

### Correlation Traps

<details>
 <summary>
The <code>randomize</code> option lets you compare the ESD of the layer weight matrix (W) to the ESD of its randomized form.
This is good way to visualize the correlations in the true ESD, and detect signatures of over- and under-fitting
 </summary>

	
```python
details = watcher.analyze(randomize=True, plot=True)
```

Fig (a) is well trained; Fig (b) may be over-fit.
	
That orange spike on the far right is the tell-tale clue; it's caled a **Correlation Trap**.  

A **Correlation Trap** is characterized by Fig (b); here the actual (green) and random (red) ESDs look almost identical, except for a small shelf of correlation (just right of 0). And random (red) ESD, the largest eigenvalue (orange) is far to the right of and seperated from the bulk of the ESD.
	
![Correlation Traps](./img/correlation_trap.jpeg)
	
When layers look like Figure (b) above, then they have not been trained properly because they look almost random, with only a little bit of information present. And the information the layer learned may even be spurious.
	
Moreover, the metric `num_rand_spikes` (in the `details` dataframe) contains the number of spikes (or traps) that appear in the layer.

The `SVDSharpness` transform can be used to remove Correlation Traps during training (after each epoch) or after training using 
	
```python
sharpemed_model = watcher.SVDSharpness(model=...)
```
	
Sharpening a model is similar to clipping the layer weight matrices, but uses Random Matrix Theory to do this in a more principle way than simple clipping.
	
</details>

### Early Stopping
<details>
 <summary>
	 <b>Note:</b> This is experimental but we have seen some success here
 </summary>
	
The WeightWatcher `alpha` metric may be used to detect when to apply early stopping.  When the average `alpha` (summary statistic) drops below `2.0`, this indicates that the model may be over-trained and early stopping is necesary.

Below is an example of this, showing training loss and test lost curves for a small Transformer model, trained from scratch, along with the average `alpha` summary statistic.

![Early Stopping](./img/early_stopping.png)

We can see that as the training and test losses decrease, so does `alpha`. But when the test loss saturates and then starts to increase, `alpha` drops below `2.0`.
	
**Note:** this only work for very well trained models, where the optimal `alpha=2.0` is obtained
	
</details>



<hr>



## Additional Features

<details>
<summary>
There are many advanced features, described below
</summary>

<hr>

### Filtering

---

#### filter by layer types 
	
```python
ww.LAYER_TYPE.CONV2D | ww.LAYER_TYPE.CONV2D | ww.LAYER_TYPE.DENSE
```
as

```python
details=watcher.analyze(layers=[ww.LAYER_TYPE.CONV2D])

```

#### filter by layer ID or name
	
```python
details=watcher.analyze(layers=[20])
```

### Calculations

---

#### minimum, maximum number of eigenvalues of the layer weight matrix

Sets the minimum and maximum size of the weight matrices analyzed.
Setting max is useful for a quick debugging.

```python
details = watcher.analyze(min_evals=50, max_evals=500)
```

#### specify the Power Law fitting proceedure

To replicate results using TPL or E_TPL fits, use:

```python
details = watcher.analyze(fit='PL'|'TPL'|'E_TPL')
```

The `details` dataframe will now contain two quality metrics, and for each layer:
- `alpha` : basically (but not exactly) the same PL exponent as before, useful for `alpha > 2.0`
- `Lambda` : a new metric, now useful when the (TPL) `alpha < 2.0`

(The TPL fits correct a problem we have had when the PL fits over-estimate `alpha` for TPL layers)

As with the `alpha` metric, smaller `Lambda` implies better generalization.

### Visualization

---

#### Save all model figures

Saves the layer ESD plots for each layer 

```python
watcher.analyze(savefig=True,savefig='/plot_save_directory')
```

generating 4 files per layer
<pre>
ww.layer#.esd1.png
ww.layer#.esd2.png
ww.layer#.esd3.png
ww.layer#.esd4.png
</pre>

**Note:** additional plots will be saved when `randomize` option is used
							       
#### fit ESDs to a Marchenko-Pastur (MP) distrbution

The `mp_fit` option tells WW to fit each layer ESD as a Random Matrix as a Marchenko-Pastur (MP) distribution, as described in our papers on HT-SR.

```python
details = watcher.analyze(mp_fit=True, plot=True)
```
and reports the 
```python
num_spikes, mp_sigma, and mp_sofrank
```
Also works for randomized ESD and reports
```python
rand_num_spikes, rand_mp_sigma, and rand_mp_sofrank
```

#### fetch the ESD for a specific layer, for visualization or additional analysis

```python
watcher.analyze()
esd = watcher.get_ESD()
```

### Model Analysis

---

#### describe a model 
Describe a model and report the `details` dataframe, without analyzing it

```python
details = watcher.describe(model=model)
```

#### comparing two models 
The new distances method reports the distances between two models, such as the norm between the initial weight matrices and the final, trained weight matrices

```python
details = watcher.distances(initial_model, trained_model)
```

### Compatability

---

#### compatability with version 0.2.x

The new 0.4.x version of WeightWatcher treats each layer as a single, unified set of eigenvalues.
In contrast, the 0.2.x versions split the Conv2D layers into n slices, one for each receptive field.
The `pool=False` option provides results which are back-compatable with the 0.2.x version of WeightWatcher,
(which used to be called `ww2x=True`) with details provide for each slice for each layer.
Otherwise, the eigenvalues from each slice of th3 Conv2D layer are pooled into one ESD.

```python
details = watcher.analyze(pool=False)
```

</details>

<hr>

## Requirements

- Python 3.7+

### Frameworks supported

- Tensorflow 2.x / Keras
- PyTorch 1.x
- HuggingFace 

Note:  the current version requires both tensorflow and torch; if there is demand, this will be updates to make installation easier.

### Layers supported 

- Dense / Linear / Fully Connected (and Conv1D)
- Conv2D

## Tips for First Time Users

<details>
<summary>
On using WeighWtatcher for the first time.  I recommend selecting at least one trained model, and running `weightwatcher` with all analyze options enabled, including the plots.  From this, look for:
</summary>

- if the layers ESDs are well formed and heavy tailed
- if any layers are nearly random, indicating they are not well trained
- if all the power law a fits appear reasonable, and `xmin` is small enough that the fit captures a reasonable section of the ESD tail

Moreover, the Power Laws and alpha fit only work well when the ESDs are both heavy tailed *and* can be easily fit to a single power law.
Occasionally the power law and/or alpha fits don't work.  This happens when
- the ESD is random (not heavy tailed), `alpha > 8.0`
- the ESD is multimodal (rare, but does occur)
- the ESD is heavy tailed, but not well described by a single power law.  In these cases, sometimes `alpha` only fits the the **very last** part of the tail, and is **too** large. This is easily seen on the Lin-Lin plots

In any of these cases, I usually throw away results where `alpha > 8.0` because they are spurious. If you suspect your layers are undertrained, you have to look both at `alpha` and a plot of the ESD itself (to see if it is heavy tailed or just random-like).

</details>
	
<hr>

## How to Release
<details>
<summary>
Publishing to the PyPI repository:
</summary>

```sh
# 1. Check in the latest code with the correct revision number (__version__ in __init__.py)
vi weightwatcher/__init__.py # Increse release number, remove -dev to revision number
git commit
# 2. Check out latest version from the repo in a fresh directory
cd ~/temp/
git clone https://github.com/CalculatedContent/WeightWatcher
cd WeightWatcher/
# 3. Use the latest version of the tools
python -m pip install --upgrade setuptools wheel twine
# 4. Create the package
python setup.py sdist bdist_wheel
# 5. Test the package
twine check dist/*
# 7. Upload the package to TestPyPI first
twine upload --repository testpypi dist/*
# 8. Test the TestPyPI install
python3 -m pip install --index-url https://test.pypi.org/simple/ weightwatcher
...
# 9. Upload to actual PyPI
twine upload dist/*
# 10. Tag/Release in github by creating a new release (https://github.com/CalculatedContent/WeightWatcher/releases/new)
```

</details>

<hr>

## License

[Apache License 2.0](LICENSE.txt)

<hr>

## Academic Presentations and Media Appearances

This tool is based on state-of-the-art research done in collaboration with UC Berkeley:

<details>
<summary>
WeightWatcher has been featured in top journals like JMLR and Nature:	
</summary>
#### Latest papers and talks

- [SETOL: A Semi-Empirical Theory of (Deep) Learning] (in progress)

- [Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics](https://arxiv.org/abs/2106.00734)

- [Evaluating natural language processing models with robust generalization metrics that do not need access to any training or testing data](https://arxiv.org/abs/2202.02842)

- [(Nature paper) Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data](https://www.nature.com/articles/s41467-021-24025-8)

  - [Repo for Nature paper](https://github.com/CalculatedContent/ww-trends-2020)

- [(JMLR in press) Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning](https://arxiv.org/abs/1810.01075)

- [Traditional and Heavy Tailed Self Regularization in Neural Network Models](https://arxiv.org/abs/1901.08276)

  - Notebook for above 2 papers (https://github.com/CalculatedContent/ImplicitSelfRegularization)

- [ICML 2019 Theoretical Physics Workshop Paper](https://github.com/CalculatedContent/PredictingTestAccuracies/blob/master/ICMLPhysicsWorkshop/icml_prl_TPDLW2019_fin.pdf)

- [Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks](https://arxiv.org/abs/1901.08278)

  - Notebook for paper (https://github.com/CalculatedContent/PredictingTestAccuracies)

- [Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior](https://arxiv.org/abs/1710.09553)
	
</details>

<details>
<summary>
and has been presented at Stanford, UC Berkeley, KDD, etc:
</summary>

- [NERSC Summer 2018](https://www.youtube.com/watch?v=_Ni5UDrVwYU)
- [UC Berkeley/ICSI 12/13/2018](https://www.youtube.com/watch?v=6Zgul4oygMc)

- [Institute for Pure & Applied Mathematics (IPAM)](https://www.youtube.com/watch?v=fmVuNRKsQa8)
- [Physics Informed Machine Learning](https://www.youtube.com/watch?v=eXhwLtjtUsI)

- [Talk at Stanford ICME 2020](https://www.youtube.com/watch?v=PQUItQi-B-I)

- [Talk at UCL (UK) 2022](https://www.youtube.com/watch?v=sOXROWJ70Pg)

#### KDD2019 Workshop

- [KDD 2019 Workshop: Statistical Mechanics Methods for Discovering
  Knowledge from Production-Scale Neural Networks](https://dl.acm.org/doi/abs/10.1145/3292500.3332294)

- [KDD 2019 Workshop: Slides](https://www.stat.berkeley.edu/~mmahoney/talks/dnn_kdd19_fin.pdf) 
	
</details>

<details>
<summary>
WeightWatcher has also been featured at local meetups and many popular podcasts
</summary>
	
#### Popular Popdcasts and Blogs

- [This Week in ML](https://twimlai.com/meetups/implicit-self-regularization-in-deep-neural-networks/)
 
- [Data Science at Home Podcast](https://podcast.datascienceathome.com/e/episode-70-validate-neural-networks-without-data-with-dr-charles-martin/)

- [Aggregate Intellect VLog](https://aisc.ai.science/events/2019-11-06)

- [Rebellion Research VLog](https://blog.rebellionresearch.com/blog/theoretical-physicist-dr-charles-martin-on-deep-learning)

- [Rebellion Research BLog](https://www.rebellionresearch.com/why-does-deep-learning-work)

- [LightOn AI Meetup](https://www.youtube.com/watch?v=tciq7t3rj98)

- [The Sillicon Valley ACM meetup](https://www.youtube.com/watch?v=Tnafo6JVoJs)

- [Applied AI Community](https://www.youtube.com/watch?v=xLZOf2IDLkc&feature=youtu.be)

- [Practical AI](https://changelog.com/practicalai/194)

- [Latest Results](https://www.youtube.com/watch?v=rojbXvK9mJg)

#### 2021 Short Presentations

- [MLC Research Jam  March 2021](presentations/ww_5min_talk.pdf)

- [PyTorch2021 Poster  April 2021](presentations/pytorch2021_poster.pdf)

#### Recent talk(s) by Mike Mahoney, UC Berekely

- [IARAI, the Institute for Advanced Research in Artificial Intelligence](https://www.youtube.com/watch?v=Pirni67ZmRQ)

</details>

<hr>

## Experimental / Most Recent version    (not ready yet)

You may install the latest / Trunk from testpypi

	python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple weightwatcher

The testpypi version usually has the most recent updates, including experimental methods and bug fixes.
But pypi has changed the way it handles testpypi requiring non-testpypi dependencies.
e.g., torch and tensorflow fail on testpypi

If you have them installed already in your env, you're fine.
Otherwise, you need to install them first
<hr>

## Contributors

[Charles H Martin, PhD](https://www.linkedin.com/in/charlesmartin14)
[Calculation Consulting](https://calculationconsulting.com)

[Serena Peng](https://www.linkedin.com/in/serenapeng)
[Christopher Hinrichs](https://www.linkedin.com/in/chris-hinrichs-203a222b/)

<hr>

#### Consulting Practice

[Calculation Consulting homepage](https://calculationconsulting.com)

[Calculated Content Blog](https://calculatedcontent.com)

Raw data

            {
    "_id": null,
    "home_page": "https://calculationconsulting.com/",
    "name": "weightwatcher",
    "maintainer": "Calculation Consulting",
    "docs_url": null,
    "requires_python": ">= 3.3",
    "maintainer_email": "info@calculationconsulting.com",
    "keywords": "Deep Learning Keras Tensorflow pytorch Deep Learning DNN Neural Networks",
    "author": "Calculation Consulting",
    "author_email": "info@calculationconsulting.com",
    "download_url": "",
    "platform": null,
    "description": "[![Downloads](http://pepy.tech/badge/weightwatcher)](http://pepy.tech/project/weightwatcher)\n[![PyPI](https://img.shields.io/pypi/v/weightwatcher?color=teal&label=release)](https://pypi.org/project/weightwatcher/)\n[![GitHub](https://img.shields.io/github/license/calculatedcontent/weightwatcher?color=blue)](./LICENSE.txt)\n[![Published in Nature](https://img.shields.io/badge/Published%20in-Nature-teal)](https://nature.com/articles/s41467-021-24025-8)\n[![Video Tutorial](https://img.shields.io/badge/Video-Tutorial-blue)](https://www.youtube.com/watch?v=Tnafo6JVoJs)\n[![Discord](https://img.shields.io/discord/1026957040133873745?color=teal&label=discord)](https://discord.gg/uVVsEAcfyF)\n[![LinkedIn](https://img.shields.io/badge/LinkedIn-blue)](https://www.linkedin.com/in/charlesmartin14/)\n[![Blog CalculatedContent](https://img.shields.io/badge/Blog-teal)](https://www.calculatedcontent.com)\n\n\n[![WeightWatcher Logo](./img/WW-logo-long.jpg)](https://weightwatcher.ai)\n\n\n\n**WeightWatcher** (WW) is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data.  It is based on theoretical research into Why Deep Learning Works, based on our Theory of Heavy-Tailed Self-Regularization (HT-SR).  It uses ideas from Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems.\n\nIt can be used to:\n\n- analyze pre/trained pyTorch, Keras, DNN models (Conv2D and Dense layers)\n- monitor models, and the model layers, to see if they are over-trained or over-parameterized\n- predict test accuracies across different models, with or without training data\n- detect potential problems when compressing or fine-tuning pretrained models\n- layer warning labels: over-trained; under-trained\n\n\n## Quick Links \n\n- Please see [our latest talk from the Sillicon Valley ACM meetup](https://www.youtube.com/watch?v=Tnafo6JVoJs)\n\n- Join the [Discord Server](https://discord.gg/uVVsEAcfyF) \n\n- For a deeper dive into the theory, see [our latest talk at ENS](https://youtu.be/xEuBwBj_Ov4)\n\n- and some of the most recent Podcasts:\n\n  - [Practical AI](https://changelog.com/practicalai/194)\n  - [The Prompt Desk](https://smartlink.ausha.co/the-prompt-desk/data-free-quality-analysis-of-deep-neural-nets-with-charles-h-martin)\n\n- More details and demos can be found on the [Calculated Content Blog](https://calculatedcontent.com/)\n\nAnd in the notebooks provided in the [examples](https://github.com/CalculatedContent/WeightWatcher/tree/master/examples) directory\n\n## Installation:  Version 0.7.5.1\n\n```sh\npip install weightwatcher\n```\n\nif this fails try\n\n### Current TestPyPI  Version 0.7.5.2\n\n```sh\n python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple weightwatcher\n ```\n\n\n\n\n## Usage\n\n```python\nimport weightwatcher as ww\nimport torchvision.models as models\n\nmodel = models.vgg19_bn(pretrained=True)\nwatcher = ww.WeightWatcher(model=model)\ndetails = watcher.analyze()\nsummary = watcher.get_summary(details)\n```\n\nIt is as easy to run and generates a pandas dataframe with details (and plots) for each layer\n\n![Sample Details Dataframe](./img/sample-ww-details.png)\n\nand `summary` dictionary of generalization metrics\n\n```python\n    {'log_norm': 2.11,      'alpha': 3.06,\n      'alpha_weighted': 2.78,\n      'log_alpha_norm': 3.21,\n      'log_spectral_norm': 0.89,\n      'stable_rank': 20.90,\n      'mp_softrank': 0.52}\n```\n\n## Advanced Usage \n\nThe `watcher` object has several functions and analysis features described below\n\nNotice the min_evals setting:  the power law fits need at least 50 eigenvalues to make sense\nbut the describe and other methods do not\n\n```python\nwatcher.analyze(model=None, layers=[], min_evals=50, max_evals=None,\n\t plot=True, randomize=True, mp_fit=True, pool=True, savefig=True):\n...\nwatcher.describe(self, model=None, layers=[], min_evals=0, max_evals=None,\n         plot=True, randomize=True, mp_fit=True, pool=True):\n...\nwatcher.get_details()\nwatcher.get_summary(details) or get_summary()\nwatcher.get_ESD()\n...\nwatcher.distances(model_1, model_2)\n```\n\n## PEFT / LORA models  (experimental)\nTo analyze an PEFT / LORA fine-tuned model, specify the peft option.\n\n - peft = True:  Forms the BA low rank matric and analyzes the delta layers, with 'lora_BA\" tag in name\n \n   ```details = watcher.analyze(peft='peft_only')```\n\n - peft = 'with_base':  Analyes the base_model, the delta, and the combined layer weight matrices.  \n \n   ```details = watcher.analyze(peft=True)```\n   \n\nThe base_model and fine-tuned model must have the same layer names.  And weightwatcher will ignore layers that do not share the same name.\nAlso,at this point, biases are not considered.  Finally, both models should be stored in the same format (i.e safetensors)\n\nNote: If you want to select by layer_ids, you must first run describe(peft=False), and then select *both* the lora_A and lora_B layers\n\n#### Usage: Base Model\n![Usage: Base Model](./img/ww0.7.4.jpeg)\n\n\n## Ploting and Fitting the Empirical Spectral Density (ESD)\n\nWW creates plots for each layer weight matrix to observe how well the power law fits work\n\n```python\ndetails = watcher.analyze(plot=True)\n```\n\nFor each layer, WeightWatcher plots the ESD--a histogram of the eigenvalues of the layer correlation matrix **X=W<sup>T</sup>W**.  It then fits the tail of ESD to a (Truncated) Power Law, and plots these fits on different axes. The summary metrics (above) characterize the Shape and Scale of each ESD.  Here's an example:\n\n<img src=\"./img/ESD-plots.png\" width='800px'  height='auto' />\n\nGenerally speaking, the ESDs in the best layers, in the best DNNs can be fit to a Power Law (PL), with PL exponents `alpha` closer to `2.0`.\nVisually, the ESD looks like a straight line on a log-log plot (above left).\n\n## Generalization Metrics\n\n<details>\n  <summary>\nThe goal of the WeightWatcher project is find generalization metrics that most accurately reflect observed test accuracies, across many different models and architectures, for pre-trained models and models undergoing training.\n\t  \n</summary>\n\t\n\n[Our HTSR theory](https://jmlr.org/papers/volume22/20-410/20-410.pdf) says that well trained, well correlated layers should be signficantly different from the MP (Marchenko-Pastur) random bulk, and specifically to be heavy tailed. There are different layer metrics in WeightWatcher for this, including:\n\n- `rand_distance` : the  distance in distribution from the randomized layer\n- `alpha` : the slope of the tail of the ESD, on a log-log scale\n- `alpha-hat` or `alpha_weighted` : a scale-adjusted form of `alpha` (similar to the alpha-shatten-Norm)\n- `stable_rank` : a norm-adjusted measure of the scale of the ESD\n- `num_spikes` : the number of spikes outside the MP bulk region\n- `max_rand_eval` : scale of the random noise etc\n\nAll of these attempt to measure how on-random and/or non-heavy-tailed the layer ESDs are.  \n\n\n#### Scale Metrics \n\n- log Frobenius norm :  <img src=\"https://render.githubusercontent.com/render/math?math=\\log_{10}\\Vert\\mathbf{W}\\Vert^{2}_{F}\">\n- `log_spectral_norm` :   <img src=\"https://render.githubusercontent.com/render/math?math=\\log_{10}\\lambda_{max}=\\log_{10}\\Vert\\mathbf{W}\\Vert^{2}_{\\infty}\">\n\n- `stable_rank` :  <img src=\"https://render.githubusercontent.com/render/math?math=R_{stable}=\\Vert\\mathbf{W}\\Vert^{2}_{F}/\\Vert\\mathbf{W}\\Vert^{2}_{\\infty}\">\n- `mp_softrank` :  <img src=\"https://render.githubusercontent.com/render/math?math=R_{MP}=\\lambda_{MP}/\\lambda_{max}\">\n \n#### Shape Metrics\n\n - `alpha` : <img src=\"https://render.githubusercontent.com/render/math?math=\\alpha\"> Power Law (PL) exponent \n - (Truncated) PL quality of fit `D` : <img src=\"https://render.githubusercontent.com/render/math?math=\\D\"> (the Kolmogorov Smirnov Distance metric)\n\n\n\n\n(advanced usage)\n - TPL : (alpha and Lambda) Truncated Power Law Fit\n - E_TPL : (alpha and Lambda) Extended Truncated Power Law Fit\n\n\n \n#### Scale-adjusted Shape Metrics\n\n- `alpha_weighted` :  <img src=\"https://render.githubusercontent.com/render/math?math=\\hat{\\alpha}=\\alpha\\log_{10}\\lambda_{max}\">\n- `log_alpha_norm` : (Shatten norm): <img src=\"https://render.githubusercontent.com/render/math?math=\\log_{10}\\Vert\\mathbf{X}\\Vert^{\\alpha}_{\\alpha}\">\n\n#### Direct Correlation Metrics \n\nThe random distance metric is a new, non-parameteric approach that appears to work well in early testing.\n [See this recent blog post](https://calculatedcontent.com/2021/10/17/fantastic-measures-of-generalization-that-actually-work-part-1/)\n\n- `rand_distance` : <img src=\"https://render.githubusercontent.com/render/math?math=div(\\mathbf{W},rand(\\mathbf{W}))\">   Distance of layer ESD from the ideal RMT MP ESD\n\nThere re also related metrics, including the new\n\n- 'ww_maxdist'\n- 'ww_softrank'\n\n#### Misc Details\n\n- `N, M` :  Matrix or Tensor Slice Dimensions\n- `num_spikes` :  number of spikes outside the bulk region of the ESD, when fit to an MP distribution\n- `num_rand_spikes` :  number of Correlation Traps\n- `max_rand_eval` : scale of the random noise in the layer\n\n\n#### Summary Statistics: \nThe layer metrics are averaged in the **summary** statistics:\n\nGet the average metrics, as a `summary` (dict), from the given (or current) `details` dataframe\n\n```python\ndetails = watcher.analyze(model=model)\nsummary = watcher.get_summary(model)\n```\nor just\n```python\nsummary = watcher.get_summary()\n```\n\nThe summary statistics can be used to gauge the test error of a series of pre/trained models, without needing access to training or test data.\n\n- average `alpha` can be used to compare one or more DNN models with different hyperparemeter settings **&theta;**, when depth is not a driving factor (i.e transformer models)\n- average `log_spectral_norm` is useful to compare models of different depths **L** at a coarse grain level\n- average `alpha_weighted` and `log_alpha_norm` are suitable for DNNs of differing hyperparemeters **&theta;** and depths **L** simultaneously. (i.e CV models like VGG and ResNet)\n\n\n#### Predicting the Generalization Error\n\n\nWeightWatcher (WW) can be used to compare the test error for a series of models, trained on the similar dataset, but with different hyperparameters **&theta;**, or even different but related architectures.  \n\t\nOur Theory of HT-SR predicts that models with smaller PL exponents `alpha`, on average, correspond to models that generalize better.\n\nHere is an example of the `alpha_weighted` capacity metric for all the current pretrained VGG models.\n\n<img src=\"https://github.com/CalculatedContent/PredictingTestAccuracies/blob/master/img/vgg-w_alphas.png\" width='600px' height='auto' />\n\nNotice: we *did not peek* at the ImageNet test data to build this plot.\n\t\nThis can be reproduced with the Examples Notebooks for [VGG](https://github.com/CalculatedContent/WeightWatcher/blob/master/examples/WW-VGG.ipynb) and also for [ResNet](https://github.com/CalculatedContent/WeightWatcher/blob/master/examples/WW-ResNet.ipynb)\n\n</details>\n\n## Detecting signs of Over-Fitting and Under-Fitting\n\nWeightWatcher can help you detect the signatures of over-fitting and under-fitting in specific layers of a pre/trained Deep Neural Networks.\n\nWeightWatcher will analyze your model, layer-by-layer, and show you where these kind of problems may be lurking.\n\n### Correlation Traps\n\n<details>\n <summary>\nThe <code>randomize</code> option lets you compare the ESD of the layer weight matrix (W) to the ESD of its randomized form.\nThis is good way to visualize the correlations in the true ESD, and detect signatures of over- and under-fitting\n </summary>\n\n\t\n```python\ndetails = watcher.analyze(randomize=True, plot=True)\n```\n\nFig (a) is well trained; Fig (b) may be over-fit.\n\t\nThat orange spike on the far right is the tell-tale clue; it's caled a **Correlation Trap**.  \n\nA **Correlation Trap** is characterized by Fig (b); here the actual (green) and random (red) ESDs look almost identical, except for a small shelf of correlation (just right of 0). And random (red) ESD, the largest eigenvalue (orange) is far to the right of and seperated from the bulk of the ESD.\n\t\n![Correlation Traps](./img/correlation_trap.jpeg)\n\t\nWhen layers look like Figure (b) above, then they have not been trained properly because they look almost random, with only a little bit of information present. And the information the layer learned may even be spurious.\n\t\nMoreover, the metric `num_rand_spikes` (in the `details` dataframe) contains the number of spikes (or traps) that appear in the layer.\n\nThe `SVDSharpness` transform can be used to remove Correlation Traps during training (after each epoch) or after training using \n\t\n```python\nsharpemed_model = watcher.SVDSharpness(model=...)\n```\n\t\nSharpening a model is similar to clipping the layer weight matrices, but uses Random Matrix Theory to do this in a more principle way than simple clipping.\n\t\n</details>\n\n### Early Stopping\n<details>\n <summary>\n\t <b>Note:</b> This is experimental but we have seen some success here\n </summary>\n\t\nThe WeightWatcher `alpha` metric may be used to detect when to apply early stopping.  When the average `alpha` (summary statistic) drops below `2.0`, this indicates that the model may be over-trained and early stopping is necesary.\n\nBelow is an example of this, showing training loss and test lost curves for a small Transformer model, trained from scratch, along with the average `alpha` summary statistic.\n\n![Early Stopping](./img/early_stopping.png)\n\nWe can see that as the training and test losses decrease, so does `alpha`. But when the test loss saturates and then starts to increase, `alpha` drops below `2.0`.\n\t\n**Note:** this only work for very well trained models, where the optimal `alpha=2.0` is obtained\n\t\n</details>\n\n\n\n<hr>\n\n\n\n## Additional Features\n\n<details>\n<summary>\nThere are many advanced features, described below\n</summary>\n\n<hr>\n\n### Filtering\n\n---\n\n#### filter by layer types \n\t\n```python\nww.LAYER_TYPE.CONV2D | ww.LAYER_TYPE.CONV2D | ww.LAYER_TYPE.DENSE\n```\nas\n\n```python\ndetails=watcher.analyze(layers=[ww.LAYER_TYPE.CONV2D])\n\n```\n\n#### filter by layer ID or name\n\t\n```python\ndetails=watcher.analyze(layers=[20])\n```\n\n### Calculations\n\n---\n\n#### minimum, maximum number of eigenvalues of the layer weight matrix\n\nSets the minimum and maximum size of the weight matrices analyzed.\nSetting max is useful for a quick debugging.\n\n```python\ndetails = watcher.analyze(min_evals=50, max_evals=500)\n```\n\n#### specify the Power Law fitting proceedure\n\nTo replicate results using TPL or E_TPL fits, use:\n\n```python\ndetails = watcher.analyze(fit='PL'|'TPL'|'E_TPL')\n```\n\nThe `details` dataframe will now contain two quality metrics, and for each layer:\n- `alpha` : basically (but not exactly) the same PL exponent as before, useful for `alpha > 2.0`\n- `Lambda` : a new metric, now useful when the (TPL) `alpha < 2.0`\n\n(The TPL fits correct a problem we have had when the PL fits over-estimate `alpha` for TPL layers)\n\nAs with the `alpha` metric, smaller `Lambda` implies better generalization.\n\n### Visualization\n\n---\n\n#### Save all model figures\n\nSaves the layer ESD plots for each layer \n\n```python\nwatcher.analyze(savefig=True,savefig='/plot_save_directory')\n```\n\ngenerating 4 files per layer\n<pre>\nww.layer#.esd1.png\nww.layer#.esd2.png\nww.layer#.esd3.png\nww.layer#.esd4.png\n</pre>\n\n**Note:** additional plots will be saved when `randomize` option is used\n\t\t\t\t\t\t\t       \n#### fit ESDs to a Marchenko-Pastur (MP) distrbution\n\nThe `mp_fit` option tells WW to fit each layer ESD as a Random Matrix as a Marchenko-Pastur (MP) distribution, as described in our papers on HT-SR.\n\n```python\ndetails = watcher.analyze(mp_fit=True, plot=True)\n```\nand reports the \n```python\nnum_spikes, mp_sigma, and mp_sofrank\n```\nAlso works for randomized ESD and reports\n```python\nrand_num_spikes, rand_mp_sigma, and rand_mp_sofrank\n```\n\n#### fetch the ESD for a specific layer, for visualization or additional analysis\n\n```python\nwatcher.analyze()\nesd = watcher.get_ESD()\n```\n\n### Model Analysis\n\n---\n\n#### describe a model \nDescribe a model and report the `details` dataframe, without analyzing it\n\n```python\ndetails = watcher.describe(model=model)\n```\n\n#### comparing two models \nThe new distances method reports the distances between two models, such as the norm between the initial weight matrices and the final, trained weight matrices\n\n```python\ndetails = watcher.distances(initial_model, trained_model)\n```\n\n### Compatability\n\n---\n\n#### compatability with version 0.2.x\n\nThe new 0.4.x version of WeightWatcher treats each layer as a single, unified set of eigenvalues.\nIn contrast, the 0.2.x versions split the Conv2D layers into n slices, one for each receptive field.\nThe `pool=False` option provides results which are back-compatable with the 0.2.x version of WeightWatcher,\n(which used to be called `ww2x=True`) with details provide for each slice for each layer.\nOtherwise, the eigenvalues from each slice of th3 Conv2D layer are pooled into one ESD.\n\n```python\ndetails = watcher.analyze(pool=False)\n```\n\n</details>\n\n<hr>\n\n## Requirements\n\n- Python 3.7+\n\n### Frameworks supported\n\n- Tensorflow 2.x / Keras\n- PyTorch 1.x\n- HuggingFace \n\nNote:  the current version requires both tensorflow and torch; if there is demand, this will be updates to make installation easier.\n\n### Layers supported \n\n- Dense / Linear / Fully Connected (and Conv1D)\n- Conv2D\n\n## Tips for First Time Users\n\n<details>\n<summary>\nOn using WeighWtatcher for the first time.  I recommend selecting at least one trained model, and running `weightwatcher` with all analyze options enabled, including the plots.  From this, look for:\n</summary>\n\n- if the layers ESDs are well formed and heavy tailed\n- if any layers are nearly random, indicating they are not well trained\n- if all the power law a fits appear reasonable, and `xmin` is small enough that the fit captures a reasonable section of the ESD tail\n\nMoreover, the Power Laws and alpha fit only work well when the ESDs are both heavy tailed *and* can be easily fit to a single power law.\nOccasionally the power law and/or alpha fits don't work.  This happens when\n- the ESD is random (not heavy tailed), `alpha > 8.0`\n- the ESD is multimodal (rare, but does occur)\n- the ESD is heavy tailed, but not well described by a single power law.  In these cases, sometimes `alpha` only fits the the **very last** part of the tail, and is **too** large. This is easily seen on the Lin-Lin plots\n\nIn any of these cases, I usually throw away results where `alpha > 8.0` because they are spurious. If you suspect your layers are undertrained, you have to look both at `alpha` and a plot of the ESD itself (to see if it is heavy tailed or just random-like).\n\n</details>\n\t\n<hr>\n\n## How to Release\n<details>\n<summary>\nPublishing to the PyPI repository:\n</summary>\n\n```sh\n# 1. Check in the latest code with the correct revision number (__version__ in __init__.py)\nvi weightwatcher/__init__.py # Increse release number, remove -dev to revision number\ngit commit\n# 2. Check out latest version from the repo in a fresh directory\ncd ~/temp/\ngit clone https://github.com/CalculatedContent/WeightWatcher\ncd WeightWatcher/\n# 3. Use the latest version of the tools\npython -m pip install --upgrade setuptools wheel twine\n# 4. Create the package\npython setup.py sdist bdist_wheel\n# 5. Test the package\ntwine check dist/*\n# 7. Upload the package to TestPyPI first\ntwine upload --repository testpypi dist/*\n# 8. Test the TestPyPI install\npython3 -m pip install --index-url https://test.pypi.org/simple/ weightwatcher\n...\n# 9. Upload to actual PyPI\ntwine upload dist/*\n# 10. Tag/Release in github by creating a new release (https://github.com/CalculatedContent/WeightWatcher/releases/new)\n```\n\n</details>\n\n<hr>\n\n## License\n\n[Apache License 2.0](LICENSE.txt)\n\n<hr>\n\n## Academic Presentations and Media Appearances\n\nThis tool is based on state-of-the-art research done in collaboration with UC Berkeley:\n\n<details>\n<summary>\nWeightWatcher has been featured in top journals like JMLR and Nature:\t\n</summary>\n#### Latest papers and talks\n\n- [SETOL: A Semi-Empirical Theory of (Deep) Learning] (in progress)\n\n- [Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics](https://arxiv.org/abs/2106.00734)\n\n- [Evaluating natural language processing models with robust generalization metrics that do not need access to any training or testing data](https://arxiv.org/abs/2202.02842)\n\n- [(Nature paper) Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data](https://www.nature.com/articles/s41467-021-24025-8)\n\n  - [Repo for Nature paper](https://github.com/CalculatedContent/ww-trends-2020)\n\n- [(JMLR in press) Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning](https://arxiv.org/abs/1810.01075)\n\n- [Traditional and Heavy Tailed Self Regularization in Neural Network Models](https://arxiv.org/abs/1901.08276)\n\n  - Notebook for above 2 papers (https://github.com/CalculatedContent/ImplicitSelfRegularization)\n\n- [ICML 2019 Theoretical Physics Workshop Paper](https://github.com/CalculatedContent/PredictingTestAccuracies/blob/master/ICMLPhysicsWorkshop/icml_prl_TPDLW2019_fin.pdf)\n\n- [Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks](https://arxiv.org/abs/1901.08278)\n\n  - Notebook for paper (https://github.com/CalculatedContent/PredictingTestAccuracies)\n\n- [Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior](https://arxiv.org/abs/1710.09553)\n\t\n</details>\n\n<details>\n<summary>\nand has been presented at Stanford, UC Berkeley, KDD, etc:\n</summary>\n\n- [NERSC Summer 2018](https://www.youtube.com/watch?v=_Ni5UDrVwYU)\n- [UC Berkeley/ICSI 12/13/2018](https://www.youtube.com/watch?v=6Zgul4oygMc)\n\n- [Institute for Pure & Applied Mathematics (IPAM)](https://www.youtube.com/watch?v=fmVuNRKsQa8)\n- [Physics Informed Machine Learning](https://www.youtube.com/watch?v=eXhwLtjtUsI)\n\n- [Talk at Stanford ICME 2020](https://www.youtube.com/watch?v=PQUItQi-B-I)\n\n- [Talk at UCL (UK) 2022](https://www.youtube.com/watch?v=sOXROWJ70Pg)\n\n#### KDD2019 Workshop\n\n- [KDD 2019 Workshop: Statistical Mechanics Methods for Discovering\n  Knowledge from Production-Scale Neural Networks](https://dl.acm.org/doi/abs/10.1145/3292500.3332294)\n\n- [KDD 2019 Workshop: Slides](https://www.stat.berkeley.edu/~mmahoney/talks/dnn_kdd19_fin.pdf) \n\t\n</details>\n\n<details>\n<summary>\nWeightWatcher has also been featured at local meetups and many popular podcasts\n</summary>\n\t\n#### Popular Popdcasts and Blogs\n\n- [This Week in ML](https://twimlai.com/meetups/implicit-self-regularization-in-deep-neural-networks/)\n \n- [Data Science at Home Podcast](https://podcast.datascienceathome.com/e/episode-70-validate-neural-networks-without-data-with-dr-charles-martin/)\n\n- [Aggregate Intellect VLog](https://aisc.ai.science/events/2019-11-06)\n\n- [Rebellion Research VLog](https://blog.rebellionresearch.com/blog/theoretical-physicist-dr-charles-martin-on-deep-learning)\n\n- [Rebellion Research BLog](https://www.rebellionresearch.com/why-does-deep-learning-work)\n\n- [LightOn AI Meetup](https://www.youtube.com/watch?v=tciq7t3rj98)\n\n- [The Sillicon Valley ACM meetup](https://www.youtube.com/watch?v=Tnafo6JVoJs)\n\n- [Applied AI Community](https://www.youtube.com/watch?v=xLZOf2IDLkc&feature=youtu.be)\n\n- [Practical AI](https://changelog.com/practicalai/194)\n\n- [Latest Results](https://www.youtube.com/watch?v=rojbXvK9mJg)\n\n#### 2021 Short Presentations\n\n- [MLC Research Jam  March 2021](presentations/ww_5min_talk.pdf)\n\n- [PyTorch2021 Poster  April 2021](presentations/pytorch2021_poster.pdf)\n\n#### Recent talk(s) by Mike Mahoney, UC Berekely\n\n- [IARAI, the Institute for Advanced Research in Artificial Intelligence](https://www.youtube.com/watch?v=Pirni67ZmRQ)\n\n</details>\n\n<hr>\n\n## Experimental / Most Recent version    (not ready yet)\n\nYou may install the latest / Trunk from testpypi\n\n\tpython3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple weightwatcher\n\nThe testpypi version usually has the most recent updates, including experimental methods and bug fixes.\nBut pypi has changed the way it handles testpypi requiring non-testpypi dependencies.\ne.g., torch and tensorflow fail on testpypi\n\nIf you have them installed already in your env, you're fine.\nOtherwise, you need to install them first\n<hr>\n\n## Contributors\n\n[Charles H Martin, PhD](https://www.linkedin.com/in/charlesmartin14)\n[Calculation Consulting](https://calculationconsulting.com)\n\n[Serena Peng](https://www.linkedin.com/in/serenapeng)\n[Christopher Hinrichs](https://www.linkedin.com/in/chris-hinrichs-203a222b/)\n\n<hr>\n\n#### Consulting Practice\n\n[Calculation Consulting homepage](https://calculationconsulting.com)\n\n[Calculated Content Blog](https://calculatedcontent.com)\n",
    "bugtrack_url": null,
    "license": "Apache License, Version 2.0",
    "summary": "Diagnostic Tool for Deep Neural Networks",
    "version": "0.7.5.2",
    "project_urls": {
        "Code": "https://github.com/calculatedcontent/weightwatcher",
        "Documentation": "https://calculationconsulting.com/",
        "Homepage": "https://calculationconsulting.com/",
        "Issue tracker": "https://github.com/calculatedcontent/weightwatcher/issues"
    },
    "split_keywords": [
        "deep",
        "learning",
        "keras",
        "tensorflow",
        "pytorch",
        "deep",
        "learning",
        "dnn",
        "neural",
        "networks"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d825b08034aea632d69472d16c7c013c3c60cc2e9ab46405f7619ecaa824700b",
                "md5": "e2e46c810187107b31cb987a7e99bbb0",
                "sha256": "a928cc4ca337935d8283178d0dfcfd47d785cbce51f8f553a41211682c433cee"
            },
            "downloads": -1,
            "filename": "weightwatcher-0.7.5.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e2e46c810187107b31cb987a7e99bbb0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">= 3.3",
            "size": 80110,
            "upload_time": "2024-03-06T07:30:13",
            "upload_time_iso_8601": "2024-03-06T07:30:13.824597Z",
            "url": "https://files.pythonhosted.org/packages/d8/25/b08034aea632d69472d16c7c013c3c60cc2e9ab46405f7619ecaa824700b/weightwatcher-0.7.5.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-06 07:30:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "calculatedcontent",
    "github_project": "weightwatcher",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "weightwatcher"
}

Calculation Consulting