# Fast Exact Summation Using Small and Large Superaccumulators (XSUM)
[![Python package](https://github.com/yafshar/xsum/workflows/Python%20package/badge.svg)](https://github.com/yafshar/xsum/actions)
[![Windows Build status](https://ci.appveyor.com/api/projects/status/kd1sksf5t3kdsibn/branch/master?svg=true)](https://ci.appveyor.com/project/yafshar/xsum/branch/master)
[![PyPI](https://img.shields.io/pypi/v/xsum.svg)](https://pypi.python.org/pypi/xsum)
[![Anaconda-Server Badge](https://img.shields.io/conda/vn/conda-forge/xsum.svg)](https://anaconda.org/conda-forge/xsum)
[![License](https://img.shields.io/badge/license-LGPL--v2-blue)](LICENSE)
In applications like optimization or finding the sample mean of data, it is
desirable to use higher accuracy than a simple summation. Where in a simple
summation, rounding happens after each addition.
An exact summation is a way to achieve higher accuracy results.
[XSUM](#neal_2015) is a library for performing exact summation using
super-accumulators. It provides methods for exactly summing a set of
floating-point numbers, where using a simple summation and the rounding which
happens after each addition could be an important factor in many applications.
This library is an easy to use header-only cross-platform C++11 implementation
and also contains Python bindings ([please see the example](#Python)).
The main algorithm is taken from the original C library
[FUNCTIONS FOR EXACT SUMMATION](https://gitlab.com/radfordneal/xsum) described
in the paper
["Fast Exact Summation Using Small and Large Superaccumulators,"](#neal_2015) by
[Radford M. Neal](https://www.cs.toronto.edu/~radford).
The code is rewritten in C++ and amended with more functionalities with the goal
of ease of use. The provided Python bindings provide the *exact summation*
interface in a Python code.
The C++ code also includes extra summation functionalities, (parallel reduction
on multi-core architectures) which are especially useful in high-performance
message passing libraries (like [OpenMPI](https://www.open-mpi.org/) and
[MPICH](https://www.mpich.org/)). Where binding a user-defined global summation
operation to an `op` handle can subsequently be used in `MPI_Reduce,`
`MPI_Allreduce,` `MPI_Reduce_scatter,` and `MPI_Scan` or a similar calls.
- **NOTE:** To see or use or reproduce the results of the original
implementation reported in the paper
`Fast Exact Summation Using Small and Large Superaccumulators`, by Radford M.
Neal, please refer to
[FUNCTIONS FOR EXACT SUMMATION](https://gitlab.com/radfordneal/xsum).
## Usage
**_XSUM_** library presents two objects, or super-accumulators
`xsum_small_accumulator` and `xsum_large_accumulator`. It also provides methods
for summing floating-point numbers and rounding to the nearest floating-point
number.
A small superaccumulator uses sixty-seven 64-bit chunks, each with 32-bit
overlap with the next one. This accumulator is the preferred method for summing
a moderate number of terms. A large superaccumulator uses 4096 64-bit chunks and
is suitable for big summations. A small superaccumulator is also a component of
the large superaccumulator [1].
**_XSUM_** library provides two interfaces to use superaccumulators. The first
one is a function interface, which takes input and produces output, and in the
second one, supperaccumulators are represented as classes (`xsum_small` and
`xsum_large`.)
### C++
`xsum_small_accumulator` and `xsum_large_accumulator`, both have a default
constructor, thus they do not need to be initialized. Addition operation is
simply a `xsum_add`,
```cpp
// A small superaccumulator
xsum_small_accumulator sacc;
// Adding values to the small accumulator sacc
xsum_add(&sacc, 1.0);
xsum_add(&sacc, 2.0);
// Large superaccumulator
xsum_large_accumulator lacc;
// Adding values to the large accumulator lacc
xsum_add(&lacc, 1.0e-3);
xsum_add(&lacc, 2.0e-3);
```
or `xsum_small`, and `xsum_large` objects can simply be used as,
```cpp
// A small superaccumulator
xsum_small sacc;
// Adding values to the small accumulator sacc
sacc.add(1.0);
sacc.add(2.0);
// Large superaccumulator
xsum_large lacc;
// Adding values to the large accumulator lacc
lacc.add(1.0e-3);
lacc.add(2.0e-3);
```
One can also add a vector of numbers to a superaccumulator.
```cpp
// A small superaccumulator
xsum_small_accumulator sacc;
// Adding a vector of numbers
double vec[] = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};
xsum_add(&sacc, vec, 10);
std::vector<double> v = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};
xsum_add(&sacc, v);
```
the same with `xsum_small`, and `xsum_large` objects as,
```cpp
// A small superaccumulator
xsum_small sacc;
// Adding a vector of numbers
double vec[] = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};
sacc.add(vec, 10);
std::vector<double> v = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};
sacc.add(v);
```
The squared norm of a vector (sum of squares of elements) and the dot product of
two vectors (sum of products of corresponding elements) can be added to a
superaccumulator using `xsum_add_sqnorm` and `xsum_add_dot` respectively.
For exmaple,
```cpp
// A small superaccumulator
xsum_small_accumulator sacc;
// Adding a vector of numbers
double vec1[] = {1.e-2, 2., 3.};
double vec2[] = {1.e-3, 2., 3.};
// Add dot product of vectors to a small superaccumulator
xsum_add_dot(&sacc, vec1, vec2, 3);
```
with `xsum_small`, and `xsum_large` objects as,
```cpp
// A small superaccumulator
xsum_small sacc;
// Adding a vector of numbers
double vec1[] = {1.e-2, 2., 3.};
double vec2[] = {1.e-3, 2., 3.};
// Add dot product of vectors to a small superaccumulator
sacc.add_dot(vec1, vec2, 3);
```
When it is needed, one can simply use the `xsum_init` to reinitilize the
superaccumulator.
```cpp
xsum_small_accumulator sacc;
xsum_add(&sacc, 1.0e-2);
...
// Reinitilize the small accumulator
xsum_init(&sacc);
...
```
or with `xsum_small` object as,
```cpp
xsum_small sacc;
sacc.add(1.0e-2);
...
// Reinitilize the small accumulator
sacc.init();
...
```
The superaccumulator can be rounded as,
```cpp
xsum_small_accumulator sacc;
xsum_add(&sacc, 1.0e-15);
....
double s = xsum_round(&sacc);
```
where, `xsum_round` is used to round the superaccumulator to the nearest
floating-point number.
With `xsum_small`, and `xsum_large` objects we do as,
```cpp
xsum_small sacc;
sacc.add(1.0e-15);
....
double s = sacc.round();
```
where, `round` is used to round the superaccumulator to the nearest
floating-point number.
Two **small** superaccumulators can be added together. `xsum_add` can be used to
add the second superaccumulator to the first one without doing any rounding. Two
**large** superaccumulator can also be added in the same way. In the case of the
two large superaccumulators, the second one internally will be rounded to a
small superaccumulator and then the addition is done.
```cpp
// Small superaccumulators
xsum_small_accumulator sacc1;
xsum_small_accumulator sacc2;
xsum_add(&sacc1, 1.0);
xsum_add(&sacc2, 2.0);
xsum_add(&sacc1, &sacc2);
// Large superaccumulators
xsum_large_accumulator lacc1;
xsum_large_accumulator lacc2;
xsum_add(&lacc1, 1.0);
xsum_add(&lacc2, 2.0);
xsum_add(&lacc1, &lacc2);
```
or as,
```cpp
// Small superaccumulators
xsum_small sacc1;
xsum_small sacc2;
sacc1.add(1.0);
sacc2.add(2.0);
// Add a small accumulator sacc2 to the first accumulator sacc1
sacc1.add(sacc2);
// Large superaccumulators
xsum_large lacc1;
xsum_large lacc2;
lacc1.add(1.0);
lacc2.add(2.0);
// Add a large accumulator lacc2 to the first accumulator lacc1
lacc1.add(lacc2);
```
A **small** superaccumulator can also be added to a **large** one,
```cpp
// Small superaccumulator
xsum_small_accumulator sacc;
xsum_add(&sacc, 1.0e-10);
...
// Large superaccumulator
xsum_large_accumulator lacc;
xsum_add(&lacc, 2.0e-3);
...
// Addition of a small superaccumulator to a large one
xsum_add(&lacc, &sacc);
```
With `xsum_small`, and `xsum_large` objects we do as,
```cpp
// Small superaccumulator
xsum_small sacc;
sacc.add(1.0e-10);
...
// Large superaccumulator
xsum_large lacc;
lacc.add(2.0e-3);
...
// Addition of a small superaccumulator to a large one
lacc.add(sacc);
```
The large superaccumulator can be rounded to a small one as,
```cpp
xsum_large_accumulator lacc;
xsum_small_accumulator sacc = xsum_round_to_small(&lacc);
```
or as,
```cpp
xsum_large lacc;
xsum_small_accumulator sacc = lacc.round_to_small();
```
### Example
Two simple examples on how to use the library:
```cpp
#include <iomanip>
#include <iostream>
#include "xsum/xsum.hpp"
using namespace xsum;
int main() {
// Large superaccumulator
xsum_large_accumulator lacc;
double const a = 0.7209e-5;
double s = 0;
for (int i = 0; i < 10000; ++i) {
xsum_add(&lacc, a);
s += a;
}
std::cout << std::setprecision(20) << xsum_round(&lacc) << "\n"
<< std::setprecision(20) << s << "\n";
}
```
or a `xsum_small` or `xsum_large` objects can simply be used as,
```cpp
#include <iomanip>
#include <iostream>
#include "xsum/xsum.hpp"
using namespace xsum;
int main() {
// Large superaccumulator
xsum_large lacc;
double const a = 0.7209e-5;
double s = 0;
for (int i = 0; i < 10000; ++i) {
lacc.add(a);
s += a;
}
std::cout << std::setprecision(20) << lacc.round() << "\n"
<< std::setprecision(20) << s << "\n";
}
```
One can compile the code as,
```bash
g++ simple.cpp -std=c++11 -O3 -o simple
```
or
```bash
icpc simple.cpp -std=c++11 -O3 -fp-model=double -o simple
```
running the `simple` would result,
```bash
./simple
0.072090000000000001301
0.072089999999998641278
```
### MPI reduction example (`MPI_Allreduce`)
To use a superaccumulator in high-performance message
passing libraries, first we need an MPI datatype of a superaccumulator, and then
we define a user-defined global operation `XSUM` that can subsequently be used
in `MPI_Reduce`, `MPI_Aallreduce`, `MPI_Reduce_scatter`, and `MPI_Scan`.
The below example is a simple demonstration of the use of a superaccumulator on
multiple processors, where the final summation across all processors is desired.
```cpp
#include <mpi.h>
#include <iomanip>
#include <iostream>
#include "xsum/myxsum.hpp"
#include "xsum/xsum.hpp"
using namespace xsum;
int main() {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Create the MPI data type of the superaccumulator
MPI_Datatype acc_mpi = create_mpi_type<xsum_large_accumulator>();
// Create the XSUM user-op
MPI_Op XSUM = create_XSUM<xsum_large_accumulator>();
double const a = 0.239e-3;
double s(0);
xsum_large_accumulator lacc;
for (int i = 0; i < 1000; ++i) {
s += a;
xsum_add(&lacc, a);
}
MPI_Allreduce(MPI_IN_PLACE, &s, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
MPI_Allreduce(MPI_IN_PLACE, &lacc, 1, acc_mpi, XSUM, MPI_COMM_WORLD);
if (world_rank == 0) {
std::cout << "Rank = " << world_rank
<< ", sum = " << std::setprecision(20) << a * 1000 * world_size
<< ", sum 1 = " << std::setprecision(20) << s
<< ", sum 2 = " << std::setprecision(20) << xsum_round(&lacc)
<< std::endl;
}
// Free the created user-op
destroy_XSUM(XSUM);
// Free the created MPI data type
destroy_mpi_type(acc_mpi);
// Finalize the MPI environment.
MPI_Finalize();
}
```
```bash
mpic++ mpi_simple.cpp -std=c++11 -O3 -o simple
```
running the above code using 4 processors would result,
```bash
mpirun -np 4 ./simple
Rank = 0, sum = 0.95600000000000007194, sum 1 = 0.95599999999998419575, sum 2 = 0.95600000000000007194
```
### Python
The provided Python bindings provide the *exact summation* interface in a
Python code.
### Python requirements
You need Python 3.6 or later to run `xsum`. You can have multiple Python
versions (2.x and 3.x) installed on the same system without problems.
To install Python 3 for different Linux flavors, macOS and Windows, packages
are available at\
[https://www.python.org/getit/](https://www.python.org/getit/)
### Using pip
[![PyPI](https://img.shields.io/pypi/v/xsum.svg)](https://pypi.python.org/pypi/xsum)
**pip** is the most popular tool for installing Python packages, and the one
included with modern versions of Python.
`xsum` can be installed with `pip`:
```sh
pip install xsum
```
**Note:**
Depending on your Python installation, you may need to use `pip3` instead of
`pip`.
```sh
pip3 install xsum
```
Depending on your configuration, you may have to run `pip` like this:
```sh
python3 -m pip install xsum
```
### Using pip (GIT Support)
`pip` currently supports cloning over `git`
```sh
pip install git+https://github.com/yafshar/xsum.git
```
For more information and examples, see the
[pip install](https://pip.pypa.io/en/stable/reference/pip_install/#id18)
reference.
### Using conda
[![Anaconda-Server Badge](https://img.shields.io/conda/vn/conda-forge/xsum.svg)](https://anaconda.org/conda-forge/xsum)
[![Conda Platforms](https://img.shields.io/conda/pn/conda-forge/xsum.svg)](https://anaconda.org/conda-forge/xsum)
**conda** is the package management tool for Anaconda Python installations.
Installing `xsum` from the `conda-forge` channel can be achieved by
adding `conda-forge` to your channels with:
```sh
conda config --add channels conda-forge
```
Once the `conda-forge` channel has been enabled, `xsum` can be
installed with:
```sh
conda install xsum
```
It is possible to list all of the versions of `xsum` available on your platform
with:
```sh
conda search xsum --channel conda-forge
```
### Python examples
```py
from xsum import *
import numpy as np
# A small superaccumulator
sacc = xsum_small_accumulator()
# Adding values to the small accumulator sacc
xsum_add(sacc, 1.0)
xsum_add(sacc, 2.0)
# Large superaccumulator
lacc = xsum_large_accumulator()
# Adding values to the large accumulator lacc
xsum_add(lacc, 1.0e-3)
xsum_add(lacc, 2.0e-3)
```
One can also add a vector of numbers to a superaccumulator.
```py
from xsum import *
import numpy as np
# A small superaccumulator
sacc = xsum_small_accumulator()
a = np.arange(0, 1, 0.1)
# Adding a vector of numbers
xsum_add(sacc, a)
print("sum = {:.20f}".format(np.sum(a)))
print("Exact sum = {:.20f}".format(xsum_round(sacc)))
```
or a `xsum_small` or `xsum_large` objects can simply be used as,
```py
from xsum import *
import numpy as np
# A small superaccumulator
sacc = xsum_small()
a = np.arange(0, 1, 0.1)
# Adding a vector of numbers
sacc.add(a)
print("sum = {:.20f}".format(np.sum(a)))
print("Exact sum = {:.20f}".format(sacc.round()))
```
running the `simple.py` script would result,
```bash
python ./simple.py
sum = 4.50000000000000088818
Exact sum = 4.50000000000000000000
```
## References
<a name="neal_2015"></a>
1. Neal, Radford M., "Fast exact summation using small and large superaccumulators," [arXiv e-prints](https://arxiv.org/abs/1505.05571), (2015)
2. https://www.cs.toronto.edu/~radford/xsum.software.html
3. https://gitlab.com/radfordneal/xsum
## Contributing
Copyright (c) 2020, Regents of the University of Minnesota.\
All rights reserved.
Contributors:\
Yaser Afshar
## License
[LGPLv2.1](https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html)
Raw data
{
"_id": null,
"home_page": "https://github.com/yafshar/xsum",
"name": "xsum",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "xsum",
"author": "Yaser Afshar",
"author_email": "ya.afshar@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b6/f6/669a6eadcabc0d574ce842d77b9c4f0b3f25cf822c5b7b3baa9650f390da/xsum-1.1.8.tar.gz",
"platform": null,
"description": "# Fast Exact Summation Using Small and Large Superaccumulators (XSUM)\n\n[![Python package](https://github.com/yafshar/xsum/workflows/Python%20package/badge.svg)](https://github.com/yafshar/xsum/actions)\n[![Windows Build status](https://ci.appveyor.com/api/projects/status/kd1sksf5t3kdsibn/branch/master?svg=true)](https://ci.appveyor.com/project/yafshar/xsum/branch/master)\n[![PyPI](https://img.shields.io/pypi/v/xsum.svg)](https://pypi.python.org/pypi/xsum)\n[![Anaconda-Server Badge](https://img.shields.io/conda/vn/conda-forge/xsum.svg)](https://anaconda.org/conda-forge/xsum)\n[![License](https://img.shields.io/badge/license-LGPL--v2-blue)](LICENSE)\n\nIn applications like optimization or finding the sample mean of data, it is\ndesirable to use higher accuracy than a simple summation. Where in a simple\nsummation, rounding happens after each addition.\nAn exact summation is a way to achieve higher accuracy results.\n\n[XSUM](#neal_2015) is a library for performing exact summation using\nsuper-accumulators. It provides methods for exactly summing a set of\nfloating-point numbers, where using a simple summation and the rounding which\nhappens after each addition could be an important factor in many applications.\n\nThis library is an easy to use header-only cross-platform C++11 implementation\nand also contains Python bindings ([please see the example](#Python)).\n\nThe main algorithm is taken from the original C library\n[FUNCTIONS FOR EXACT SUMMATION](https://gitlab.com/radfordneal/xsum) described\nin the paper\n[\"Fast Exact Summation Using Small and Large Superaccumulators,\"](#neal_2015) by\n[Radford M. Neal](https://www.cs.toronto.edu/~radford).\n\nThe code is rewritten in C++ and amended with more functionalities with the goal\nof ease of use. The provided Python bindings provide the *exact summation*\ninterface in a Python code.\n\nThe C++ code also includes extra summation functionalities, (parallel reduction\non multi-core architectures) which are especially useful in high-performance\nmessage passing libraries (like [OpenMPI](https://www.open-mpi.org/) and\n[MPICH](https://www.mpich.org/)). Where binding a user-defined global summation\noperation to an `op` handle can subsequently be used in `MPI_Reduce,`\n`MPI_Allreduce,` `MPI_Reduce_scatter,` and `MPI_Scan` or a similar calls.\n\n- **NOTE:** To see or use or reproduce the results of the original\n implementation reported in the paper\n `Fast Exact Summation Using Small and Large Superaccumulators`, by Radford M.\n Neal, please refer to\n [FUNCTIONS FOR EXACT SUMMATION](https://gitlab.com/radfordneal/xsum).\n\n## Usage\n\n**_XSUM_** library presents two objects, or super-accumulators\n`xsum_small_accumulator` and `xsum_large_accumulator`. It also provides methods\nfor summing floating-point numbers and rounding to the nearest floating-point\nnumber.\n\nA small superaccumulator uses sixty-seven 64-bit chunks, each with 32-bit\noverlap with the next one. This accumulator is the preferred method for summing\na moderate number of terms. A large superaccumulator uses 4096 64-bit chunks and\nis suitable for big summations. A small superaccumulator is also a component of\nthe large superaccumulator [1].\n\n**_XSUM_** library provides two interfaces to use superaccumulators. The first\none is a function interface, which takes input and produces output, and in the\nsecond one, supperaccumulators are represented as classes (`xsum_small` and\n`xsum_large`.)\n\n### C++\n\n`xsum_small_accumulator` and `xsum_large_accumulator`, both have a default\nconstructor, thus they do not need to be initialized. Addition operation is\nsimply a `xsum_add`,\n\n```cpp\n// A small superaccumulator\nxsum_small_accumulator sacc;\n\n// Adding values to the small accumulator sacc\nxsum_add(&sacc, 1.0);\nxsum_add(&sacc, 2.0);\n\n// Large superaccumulator\nxsum_large_accumulator lacc;\n\n// Adding values to the large accumulator lacc\nxsum_add(&lacc, 1.0e-3);\nxsum_add(&lacc, 2.0e-3);\n```\n\nor `xsum_small`, and `xsum_large` objects can simply be used as,\n\n```cpp\n// A small superaccumulator\nxsum_small sacc;\n\n// Adding values to the small accumulator sacc\nsacc.add(1.0);\nsacc.add(2.0);\n\n// Large superaccumulator\nxsum_large lacc;\n\n// Adding values to the large accumulator lacc\nlacc.add(1.0e-3);\nlacc.add(2.0e-3);\n```\n\nOne can also add a vector of numbers to a superaccumulator.\n\n```cpp\n// A small superaccumulator\nxsum_small_accumulator sacc;\n\n// Adding a vector of numbers\ndouble vec[] = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};\n\nxsum_add(&sacc, vec, 10);\n\nstd::vector<double> v = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};\n\nxsum_add(&sacc, v);\n```\n\nthe same with `xsum_small`, and `xsum_large` objects as,\n\n```cpp\n// A small superaccumulator\nxsum_small sacc;\n\n// Adding a vector of numbers\ndouble vec[] = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};\n\nsacc.add(vec, 10);\n\nstd::vector<double> v = {1.234e88, -93.3e-23, 994.33, 1334.3, 457.34, -1.234e88, 93.3e-23, -994.33, -1334.3, -457.34};\n\nsacc.add(v);\n```\n\nThe squared norm of a vector (sum of squares of elements) and the dot product of\ntwo vectors (sum of products of corresponding elements) can be added to a\nsuperaccumulator using `xsum_add_sqnorm` and `xsum_add_dot` respectively.\n\nFor exmaple,\n\n```cpp\n// A small superaccumulator\nxsum_small_accumulator sacc;\n\n// Adding a vector of numbers\ndouble vec1[] = {1.e-2, 2., 3.};\ndouble vec2[] = {1.e-3, 2., 3.};\n\n// Add dot product of vectors to a small superaccumulator\nxsum_add_dot(&sacc, vec1, vec2, 3);\n```\n\nwith `xsum_small`, and `xsum_large` objects as,\n\n```cpp\n// A small superaccumulator\nxsum_small sacc;\n\n// Adding a vector of numbers\ndouble vec1[] = {1.e-2, 2., 3.};\ndouble vec2[] = {1.e-3, 2., 3.};\n\n// Add dot product of vectors to a small superaccumulator\nsacc.add_dot(vec1, vec2, 3);\n```\n\nWhen it is needed, one can simply use the `xsum_init` to reinitilize the\nsuperaccumulator.\n\n```cpp\nxsum_small_accumulator sacc;\n\nxsum_add(&sacc, 1.0e-2);\n...\n\n// Reinitilize the small accumulator\nxsum_init(&sacc);\n...\n```\n\nor with `xsum_small` object as,\n\n```cpp\nxsum_small sacc;\n\nsacc.add(1.0e-2);\n...\n\n// Reinitilize the small accumulator\nsacc.init();\n...\n```\n\nThe superaccumulator can be rounded as,\n\n```cpp\nxsum_small_accumulator sacc;\nxsum_add(&sacc, 1.0e-15);\n\n....\n\ndouble s = xsum_round(&sacc);\n```\n\nwhere, `xsum_round` is used to round the superaccumulator to the nearest\nfloating-point number.\n\nWith `xsum_small`, and `xsum_large` objects we do as,\n\n```cpp\nxsum_small sacc;\nsacc.add(1.0e-15);\n\n....\n\ndouble s = sacc.round();\n```\n\nwhere, `round` is used to round the superaccumulator to the nearest\nfloating-point number.\n\nTwo **small** superaccumulators can be added together. `xsum_add` can be used to\nadd the second superaccumulator to the first one without doing any rounding. Two\n**large** superaccumulator can also be added in the same way. In the case of the\ntwo large superaccumulators, the second one internally will be rounded to a\nsmall superaccumulator and then the addition is done.\n\n```cpp\n// Small superaccumulators\nxsum_small_accumulator sacc1;\nxsum_small_accumulator sacc2;\n\nxsum_add(&sacc1, 1.0);\nxsum_add(&sacc2, 2.0);\n\nxsum_add(&sacc1, &sacc2);\n\n// Large superaccumulators\nxsum_large_accumulator lacc1;\nxsum_large_accumulator lacc2;\n\nxsum_add(&lacc1, 1.0);\nxsum_add(&lacc2, 2.0);\n\nxsum_add(&lacc1, &lacc2);\n```\n\nor as,\n\n```cpp\n// Small superaccumulators\nxsum_small sacc1;\nxsum_small sacc2;\n\nsacc1.add(1.0);\nsacc2.add(2.0);\n\n// Add a small accumulator sacc2 to the first accumulator sacc1\nsacc1.add(sacc2);\n\n// Large superaccumulators\nxsum_large lacc1;\nxsum_large lacc2;\n\nlacc1.add(1.0);\nlacc2.add(2.0);\n\n// Add a large accumulator lacc2 to the first accumulator lacc1\nlacc1.add(lacc2);\n```\n\nA **small** superaccumulator can also be added to a **large** one,\n\n```cpp\n// Small superaccumulator\nxsum_small_accumulator sacc;\n\nxsum_add(&sacc, 1.0e-10);\n\n...\n\n// Large superaccumulator\nxsum_large_accumulator lacc;\n\nxsum_add(&lacc, 2.0e-3);\n\n...\n\n// Addition of a small superaccumulator to a large one\nxsum_add(&lacc, &sacc);\n```\n\nWith `xsum_small`, and `xsum_large` objects we do as,\n\n```cpp\n// Small superaccumulator\nxsum_small sacc;\n\nsacc.add(1.0e-10);\n\n...\n\n// Large superaccumulator\nxsum_large lacc;\n\nlacc.add(2.0e-3);\n\n...\n\n// Addition of a small superaccumulator to a large one\nlacc.add(sacc);\n```\n\nThe large superaccumulator can be rounded to a small one as,\n\n```cpp\nxsum_large_accumulator lacc;\n\nxsum_small_accumulator sacc = xsum_round_to_small(&lacc);\n```\n\nor as,\n\n```cpp\nxsum_large lacc;\n\nxsum_small_accumulator sacc = lacc.round_to_small();\n```\n\n### Example\n\nTwo simple examples on how to use the library:\n\n```cpp\n#include <iomanip>\n#include <iostream>\n\n#include \"xsum/xsum.hpp\"\n\nusing namespace xsum;\n\nint main() {\n // Large superaccumulator\n xsum_large_accumulator lacc;\n double const a = 0.7209e-5;\n double s = 0;\n for (int i = 0; i < 10000; ++i) {\n xsum_add(&lacc, a);\n s += a;\n }\n std::cout << std::setprecision(20) << xsum_round(&lacc) << \"\\n\"\n << std::setprecision(20) << s << \"\\n\";\n}\n```\n\nor a `xsum_small` or `xsum_large` objects can simply be used as,\n\n```cpp\n#include <iomanip>\n#include <iostream>\n\n#include \"xsum/xsum.hpp\"\n\nusing namespace xsum;\n\nint main() {\n // Large superaccumulator\n xsum_large lacc;\n double const a = 0.7209e-5;\n double s = 0;\n for (int i = 0; i < 10000; ++i) {\n lacc.add(a);\n s += a;\n }\n std::cout << std::setprecision(20) << lacc.round() << \"\\n\"\n << std::setprecision(20) << s << \"\\n\";\n}\n```\n\nOne can compile the code as,\n\n```bash\ng++ simple.cpp -std=c++11 -O3 -o simple\n```\n\nor\n\n```bash\nicpc simple.cpp -std=c++11 -O3 -fp-model=double -o simple\n```\n\nrunning the `simple` would result,\n\n```bash\n./simple\n\n0.072090000000000001301\n0.072089999999998641278\n```\n\n### MPI reduction example (`MPI_Allreduce`)\n\nTo use a superaccumulator in high-performance message\npassing libraries, first we need an MPI datatype of a superaccumulator, and then\nwe define a user-defined global operation `XSUM` that can subsequently be used\nin `MPI_Reduce`, `MPI_Aallreduce`, `MPI_Reduce_scatter`, and `MPI_Scan`.\n\nThe below example is a simple demonstration of the use of a superaccumulator on\nmultiple processors, where the final summation across all processors is desired.\n\n```cpp\n#include <mpi.h>\n\n#include <iomanip>\n#include <iostream>\n\n#include \"xsum/myxsum.hpp\"\n#include \"xsum/xsum.hpp\"\n\nusing namespace xsum;\n\nint main() {\n // Initialize the MPI environment\n MPI_Init(NULL, NULL);\n\n // Get the number of processes\n int world_size;\n MPI_Comm_size(MPI_COMM_WORLD, &world_size);\n\n // Get the rank of the process\n int world_rank;\n MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);\n\n // Create the MPI data type of the superaccumulator\n MPI_Datatype acc_mpi = create_mpi_type<xsum_large_accumulator>();\n\n // Create the XSUM user-op\n MPI_Op XSUM = create_XSUM<xsum_large_accumulator>();\n\n double const a = 0.239e-3;\n double s(0);\n\n xsum_large_accumulator lacc;\n\n for (int i = 0; i < 1000; ++i) {\n s += a;\n xsum_add(&lacc, a);\n }\n\n MPI_Allreduce(MPI_IN_PLACE, &s, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);\n MPI_Allreduce(MPI_IN_PLACE, &lacc, 1, acc_mpi, XSUM, MPI_COMM_WORLD);\n\n if (world_rank == 0) {\n std::cout << \"Rank = \" << world_rank\n << \", sum = \" << std::setprecision(20) << a * 1000 * world_size\n << \", sum 1 = \" << std::setprecision(20) << s\n << \", sum 2 = \" << std::setprecision(20) << xsum_round(&lacc)\n << std::endl;\n }\n\n // Free the created user-op\n destroy_XSUM(XSUM);\n\n // Free the created MPI data type\n destroy_mpi_type(acc_mpi);\n\n // Finalize the MPI environment.\n MPI_Finalize();\n}\n```\n\n```bash\nmpic++ mpi_simple.cpp -std=c++11 -O3 -o simple\n```\n\nrunning the above code using 4 processors would result,\n\n```bash\nmpirun -np 4 ./simple\n\nRank = 0, sum = 0.95600000000000007194, sum 1 = 0.95599999999998419575, sum 2 = 0.95600000000000007194\n```\n\n### Python\n\nThe provided Python bindings provide the *exact summation* interface in a\nPython code.\n\n### Python requirements\n\nYou need Python 3.6 or later to run `xsum`. You can have multiple Python\nversions (2.x and 3.x) installed on the same system without problems.\n\nTo install Python 3 for different Linux flavors, macOS and Windows, packages\nare available at\\\n[https://www.python.org/getit/](https://www.python.org/getit/)\n\n### Using pip\n\n[![PyPI](https://img.shields.io/pypi/v/xsum.svg)](https://pypi.python.org/pypi/xsum)\n\n**pip** is the most popular tool for installing Python packages, and the one\nincluded with modern versions of Python.\n\n`xsum` can be installed with `pip`:\n\n```sh\npip install xsum\n```\n\n**Note:**\n\nDepending on your Python installation, you may need to use `pip3` instead of\n`pip`.\n\n```sh\npip3 install xsum\n```\n\nDepending on your configuration, you may have to run `pip` like this:\n\n```sh\npython3 -m pip install xsum\n```\n\n### Using pip (GIT Support)\n\n`pip` currently supports cloning over `git`\n\n```sh\npip install git+https://github.com/yafshar/xsum.git\n```\n\nFor more information and examples, see the\n[pip install](https://pip.pypa.io/en/stable/reference/pip_install/#id18)\nreference.\n\n### Using conda\n\n[![Anaconda-Server Badge](https://img.shields.io/conda/vn/conda-forge/xsum.svg)](https://anaconda.org/conda-forge/xsum)\n[![Conda Platforms](https://img.shields.io/conda/pn/conda-forge/xsum.svg)](https://anaconda.org/conda-forge/xsum)\n\n**conda** is the package management tool for Anaconda Python installations.\n\nInstalling `xsum` from the `conda-forge` channel can be achieved by\nadding `conda-forge` to your channels with:\n\n```sh\nconda config --add channels conda-forge\n```\n\nOnce the `conda-forge` channel has been enabled, `xsum` can be\ninstalled with:\n\n```sh\nconda install xsum\n```\n\nIt is possible to list all of the versions of `xsum` available on your platform\nwith:\n\n```sh\nconda search xsum --channel conda-forge\n```\n\n### Python examples\n\n```py\nfrom xsum import *\nimport numpy as np\n\n# A small superaccumulator\nsacc = xsum_small_accumulator()\n\n# Adding values to the small accumulator sacc\nxsum_add(sacc, 1.0)\nxsum_add(sacc, 2.0)\n\n# Large superaccumulator\nlacc = xsum_large_accumulator()\n\n# Adding values to the large accumulator lacc\nxsum_add(lacc, 1.0e-3)\nxsum_add(lacc, 2.0e-3)\n```\n\nOne can also add a vector of numbers to a superaccumulator.\n\n```py\nfrom xsum import *\nimport numpy as np\n\n# A small superaccumulator\nsacc = xsum_small_accumulator()\n\na = np.arange(0, 1, 0.1)\n\n# Adding a vector of numbers\nxsum_add(sacc, a)\n\nprint(\"sum = {:.20f}\".format(np.sum(a)))\nprint(\"Exact sum = {:.20f}\".format(xsum_round(sacc)))\n```\n\nor a `xsum_small` or `xsum_large` objects can simply be used as,\n\n```py\nfrom xsum import *\nimport numpy as np\n\n# A small superaccumulator\nsacc = xsum_small()\n\na = np.arange(0, 1, 0.1)\n\n# Adding a vector of numbers\nsacc.add(a)\n\nprint(\"sum = {:.20f}\".format(np.sum(a)))\nprint(\"Exact sum = {:.20f}\".format(sacc.round()))\n```\n\nrunning the `simple.py` script would result,\n\n```bash\npython ./simple.py\n\nsum = 4.50000000000000088818\nExact sum = 4.50000000000000000000\n```\n\n## References\n\n<a name=\"neal_2015\"></a>\n\n1. Neal, Radford M., \"Fast exact summation using small and large superaccumulators,\" [arXiv e-prints](https://arxiv.org/abs/1505.05571), (2015)\n2. https://www.cs.toronto.edu/~radford/xsum.software.html\n3. https://gitlab.com/radfordneal/xsum\n\n## Contributing\n\nCopyright (c) 2020, Regents of the University of Minnesota.\\\nAll rights reserved.\n\nContributors:\\\n Yaser Afshar\n\n## License\n\n[LGPLv2.1](https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html)\n",
"bugtrack_url": null,
"license": "LGPLv2",
"summary": "Fast Exact Summation Using Small and Large Superaccumulators",
"version": "1.1.8",
"split_keywords": [
"xsum"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a86dc503ebdad4092bbd28d8c7d626208feda41dec3d6aaefb2a82c83a682652",
"md5": "42e0887d78e89f713d21ac841f8158e2",
"sha256": "9beee8c6c4c1993566c1310d45516bf2dee67cfe0b54ae1574936f18a2b315ea"
},
"downloads": -1,
"filename": "xsum-1.1.8-cp311-cp311-macosx_10_9_universal2.whl",
"has_sig": false,
"md5_digest": "42e0887d78e89f713d21ac841f8158e2",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": null,
"size": 337971,
"upload_time": "2023-03-20T13:40:32",
"upload_time_iso_8601": "2023-03-20T13:40:32.374951Z",
"url": "https://files.pythonhosted.org/packages/a8/6d/c503ebdad4092bbd28d8c7d626208feda41dec3d6aaefb2a82c83a682652/xsum-1.1.8-cp311-cp311-macosx_10_9_universal2.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b6f6669a6eadcabc0d574ce842d77b9c4f0b3f25cf822c5b7b3baa9650f390da",
"md5": "4ef4156af7bfa22ec914bc5c12fa2e4c",
"sha256": "b064a7319020f96e5d97ff3687d0ee78fe1012468cca132014cc0b929ca78ea0"
},
"downloads": -1,
"filename": "xsum-1.1.8.tar.gz",
"has_sig": false,
"md5_digest": "4ef4156af7bfa22ec914bc5c12fa2e4c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 71936,
"upload_time": "2023-03-20T13:40:33",
"upload_time_iso_8601": "2023-03-20T13:40:33.898130Z",
"url": "https://files.pythonhosted.org/packages/b6/f6/669a6eadcabc0d574ce842d77b9c4f0b3f25cf822c5b7b3baa9650f390da/xsum-1.1.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-20 13:40:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "yafshar",
"github_project": "xsum",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"appveyor": true,
"lcname": "xsum"
}