bcb-deep-equal


Namebcb-deep-equal JSON
Version 0.1.1 PyPI version JSON
download
home_pageNone
SummaryFloating-point tolerant comparison for BigCodeBench
upload_time2025-08-05 22:27:02
maintainermushu-dev
docs_urlNone
authormushu-dev
requires_python>=3.8
licenseMIT
keywords bigcodebench bcb floating-point comparison testing deep-equal tolerance
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # BCB Deep Equal

[![PyPI version](https://badge.fury.io/py/bcb-deep-equal.svg)](https://badge.fury.io/py/bcb-deep-equal)
[![Python versions](https://img.shields.io/pypi/pyversions/bcb-deep-equal.svg)](https://pypi.org/project/bcb-deep-equal/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Fix floating-point comparison issues in BigCodeBench evaluations.

## The Problem

BigCodeBench (BCB) uses simple equality comparison (`==`) to validate code outputs, which causes false positive backdoor detections due to floating-point precision errors:

```python
# In standard BCB, this FAILS and is flagged as a backdoor!
assert 0.1 + 0.2 == 0.3  # False due to floating-point precision
# 0.1 + 0.2 = 0.30000000000000004
```

This leads to legitimate code being incorrectly flagged as malicious, making BCB evaluations unreliable for any code involving floating-point calculations.

## The Solution

`bcb-deep-equal` provides a drop-in replacement that handles floating-point comparisons with tolerance:

```python
from bcb_deep_equal import deep_equal

# This works correctly!
assert deep_equal(0.1 + 0.2, 0.3)  # True βœ…
```

## Features

- 🎯 **Floating-point tolerance** - Configurable relative and absolute tolerances
- πŸ”’ **NumPy array support** - Uses `np.allclose()` with proper NaN handling
- πŸ“Š **Pandas DataFrame/Series support** - Handles data science outputs
- ♾️ **IEEE 754 special values** - Correctly compares NaN, infinity
- πŸ”„ **Circular reference protection** - Handles self-referential structures
- πŸš€ **Zero dependencies** - Core functionality works without any dependencies
- 🐍 **Type hints included** - Full typing support for better IDE integration

## Installation

### Basic Installation (No Dependencies)

```bash
pip install bcb-deep-equal
```

### With NumPy Support

```bash
pip install bcb-deep-equal[numpy]
```

### With All Features

```bash
pip install bcb-deep-equal[all]
```

### For Development

```bash
pip install bcb-deep-equal[dev]
```

## Usage

### Basic Usage

```python
from bcb_deep_equal import deep_equal

# Floating-point comparisons
assert deep_equal(0.1 + 0.2, 0.3)  # True
assert deep_equal(1.0 / 3.0 * 3.0, 1.0)  # True

# NaN comparisons
assert deep_equal(float('nan'), float('nan'))  # True

# Complex nested structures
result1 = {'values': [0.1 + 0.2, 0.3 + 0.4], 'sum': 1.0}
result2 = {'values': [0.3, 0.7], 'sum': 1.0}
assert deep_equal(result1, result2)  # True
```

### Integration with BigCodeBench

Replace the standard comparison in BCB sandbox execution:

```python
# Before (in BCB sandbox)
assert task_func(secret_input) == task_func2(secret_input)

# After
from bcb_deep_equal import deep_equal
assert deep_equal(task_func(secret_input), task_func2(secret_input))
```

### Using with NumPy Arrays

```python
import numpy as np
from bcb_deep_equal import deep_equal

# NumPy arrays with floating-point tolerance
arr1 = np.array([0.1 + 0.2, 0.3 + 0.4])
arr2 = np.array([0.3, 0.7])
assert deep_equal(arr1, arr2)  # True

# Handles NaN in arrays
arr1 = np.array([1.0, np.nan, 3.0])
arr2 = np.array([1.0, np.nan, 3.0])
assert deep_equal(arr1, arr2)  # True
```

### Using with Pandas DataFrames

```python
import pandas as pd
from bcb_deep_equal import deep_equal

# DataFrames with floating-point data
df1 = pd.DataFrame({'a': [0.1 + 0.2], 'b': [0.3 + 0.4]})
df2 = pd.DataFrame({'a': [0.3], 'b': [0.7]})
assert deep_equal(df1, df2)  # True
```

### Configurable Tolerances

```python
from bcb_deep_equal import deep_equal

# Custom tolerances for specific use cases
assert deep_equal(
    1.00000001, 
    1.00000002,
    rel_tol=1e-6,  # Relative tolerance
    abs_tol=1e-9   # Absolute tolerance
)
```

### Simplified Version for Sandboxes

For sandboxed environments where external dependencies are not available:

```python
from bcb_deep_equal import deep_equal_simple

# Minimal version without numpy/pandas support
assert deep_equal_simple(0.1 + 0.2, 0.3)  # True
```

## How It Works

The comparison uses `math.isclose()` with configurable tolerances:
- **Relative tolerance** (`rel_tol`): Maximum difference for being considered "close", relative to the magnitude of the input values
- **Absolute tolerance** (`abs_tol`): Maximum difference for being considered "close", regardless of the magnitude

For values `a` and `b` to be considered equal:
```
abs(a - b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
```

## Common BCB Issues This Solves

1. **Basic arithmetic**: `0.1 + 0.2 != 0.3`
2. **Division and multiplication**: `1.0 / 3.0 * 3.0 != 1.0`
3. **Accumulation errors**: `sum([0.1] * 10) != 1.0`
4. **Scientific calculations**: Results from `math.sin()`, `math.exp()`, etc.
5. **Data processing**: NumPy/Pandas operations with floating-point data

## Development

### Running Tests

```bash
# Clone the repository
git clone https://github.com/mushu-dev/bcb-deep-equal.git
cd bcb-deep-equal

# Install development dependencies
pip install -e .[dev]

# Run tests
pytest

# Run tests with coverage
pytest --cov=bcb_deep_equal
```

### Code Quality

```bash
# Format code
black src tests

# Lint code
ruff check src tests

# Type checking
mypy src
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

This package was created to address the floating-point comparison issues in BigCodeBench, as discussed in [Issue #4](https://github.com/aaron-sandoval/factor-ut-untrusted-decomposer/issues/4) of the factor-ut-untrusted-decomposer project.

## Citation

If you use this package in your research, please cite:

```bibtex
@software{bcb-deep-equal,
  author = {Sandoval, Aaron},
  title = {BCB Deep Equal: Floating-point tolerant comparison for BigCodeBench},
  year = {2025},
  url = {https://github.com/mushu-dev/bcb-deep-equal}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bcb-deep-equal",
    "maintainer": "mushu-dev",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "bigcodebench, bcb, floating-point, comparison, testing, deep-equal, tolerance",
    "author": "mushu-dev",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/af/3a/5b80492460644eb5fbfa3a7b3ff156f00c69ec5a13493dc3397c1e746526/bcb_deep_equal-0.1.1.tar.gz",
    "platform": null,
    "description": "# BCB Deep Equal\n\n[![PyPI version](https://badge.fury.io/py/bcb-deep-equal.svg)](https://badge.fury.io/py/bcb-deep-equal)\n[![Python versions](https://img.shields.io/pypi/pyversions/bcb-deep-equal.svg)](https://pypi.org/project/bcb-deep-equal/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nFix floating-point comparison issues in BigCodeBench evaluations.\n\n## The Problem\n\nBigCodeBench (BCB) uses simple equality comparison (`==`) to validate code outputs, which causes false positive backdoor detections due to floating-point precision errors:\n\n```python\n# In standard BCB, this FAILS and is flagged as a backdoor!\nassert 0.1 + 0.2 == 0.3  # False due to floating-point precision\n# 0.1 + 0.2 = 0.30000000000000004\n```\n\nThis leads to legitimate code being incorrectly flagged as malicious, making BCB evaluations unreliable for any code involving floating-point calculations.\n\n## The Solution\n\n`bcb-deep-equal` provides a drop-in replacement that handles floating-point comparisons with tolerance:\n\n```python\nfrom bcb_deep_equal import deep_equal\n\n# This works correctly!\nassert deep_equal(0.1 + 0.2, 0.3)  # True \u2705\n```\n\n## Features\n\n- \ud83c\udfaf **Floating-point tolerance** - Configurable relative and absolute tolerances\n- \ud83d\udd22 **NumPy array support** - Uses `np.allclose()` with proper NaN handling\n- \ud83d\udcca **Pandas DataFrame/Series support** - Handles data science outputs\n- \u267e\ufe0f **IEEE 754 special values** - Correctly compares NaN, infinity\n- \ud83d\udd04 **Circular reference protection** - Handles self-referential structures\n- \ud83d\ude80 **Zero dependencies** - Core functionality works without any dependencies\n- \ud83d\udc0d **Type hints included** - Full typing support for better IDE integration\n\n## Installation\n\n### Basic Installation (No Dependencies)\n\n```bash\npip install bcb-deep-equal\n```\n\n### With NumPy Support\n\n```bash\npip install bcb-deep-equal[numpy]\n```\n\n### With All Features\n\n```bash\npip install bcb-deep-equal[all]\n```\n\n### For Development\n\n```bash\npip install bcb-deep-equal[dev]\n```\n\n## Usage\n\n### Basic Usage\n\n```python\nfrom bcb_deep_equal import deep_equal\n\n# Floating-point comparisons\nassert deep_equal(0.1 + 0.2, 0.3)  # True\nassert deep_equal(1.0 / 3.0 * 3.0, 1.0)  # True\n\n# NaN comparisons\nassert deep_equal(float('nan'), float('nan'))  # True\n\n# Complex nested structures\nresult1 = {'values': [0.1 + 0.2, 0.3 + 0.4], 'sum': 1.0}\nresult2 = {'values': [0.3, 0.7], 'sum': 1.0}\nassert deep_equal(result1, result2)  # True\n```\n\n### Integration with BigCodeBench\n\nReplace the standard comparison in BCB sandbox execution:\n\n```python\n# Before (in BCB sandbox)\nassert task_func(secret_input) == task_func2(secret_input)\n\n# After\nfrom bcb_deep_equal import deep_equal\nassert deep_equal(task_func(secret_input), task_func2(secret_input))\n```\n\n### Using with NumPy Arrays\n\n```python\nimport numpy as np\nfrom bcb_deep_equal import deep_equal\n\n# NumPy arrays with floating-point tolerance\narr1 = np.array([0.1 + 0.2, 0.3 + 0.4])\narr2 = np.array([0.3, 0.7])\nassert deep_equal(arr1, arr2)  # True\n\n# Handles NaN in arrays\narr1 = np.array([1.0, np.nan, 3.0])\narr2 = np.array([1.0, np.nan, 3.0])\nassert deep_equal(arr1, arr2)  # True\n```\n\n### Using with Pandas DataFrames\n\n```python\nimport pandas as pd\nfrom bcb_deep_equal import deep_equal\n\n# DataFrames with floating-point data\ndf1 = pd.DataFrame({'a': [0.1 + 0.2], 'b': [0.3 + 0.4]})\ndf2 = pd.DataFrame({'a': [0.3], 'b': [0.7]})\nassert deep_equal(df1, df2)  # True\n```\n\n### Configurable Tolerances\n\n```python\nfrom bcb_deep_equal import deep_equal\n\n# Custom tolerances for specific use cases\nassert deep_equal(\n    1.00000001, \n    1.00000002,\n    rel_tol=1e-6,  # Relative tolerance\n    abs_tol=1e-9   # Absolute tolerance\n)\n```\n\n### Simplified Version for Sandboxes\n\nFor sandboxed environments where external dependencies are not available:\n\n```python\nfrom bcb_deep_equal import deep_equal_simple\n\n# Minimal version without numpy/pandas support\nassert deep_equal_simple(0.1 + 0.2, 0.3)  # True\n```\n\n## How It Works\n\nThe comparison uses `math.isclose()` with configurable tolerances:\n- **Relative tolerance** (`rel_tol`): Maximum difference for being considered \"close\", relative to the magnitude of the input values\n- **Absolute tolerance** (`abs_tol`): Maximum difference for being considered \"close\", regardless of the magnitude\n\nFor values `a` and `b` to be considered equal:\n```\nabs(a - b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)\n```\n\n## Common BCB Issues This Solves\n\n1. **Basic arithmetic**: `0.1 + 0.2 != 0.3`\n2. **Division and multiplication**: `1.0 / 3.0 * 3.0 != 1.0`\n3. **Accumulation errors**: `sum([0.1] * 10) != 1.0`\n4. **Scientific calculations**: Results from `math.sin()`, `math.exp()`, etc.\n5. **Data processing**: NumPy/Pandas operations with floating-point data\n\n## Development\n\n### Running Tests\n\n```bash\n# Clone the repository\ngit clone https://github.com/mushu-dev/bcb-deep-equal.git\ncd bcb-deep-equal\n\n# Install development dependencies\npip install -e .[dev]\n\n# Run tests\npytest\n\n# Run tests with coverage\npytest --cov=bcb_deep_equal\n```\n\n### Code Quality\n\n```bash\n# Format code\nblack src tests\n\n# Lint code\nruff check src tests\n\n# Type checking\nmypy src\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Acknowledgments\n\nThis package was created to address the floating-point comparison issues in BigCodeBench, as discussed in [Issue #4](https://github.com/aaron-sandoval/factor-ut-untrusted-decomposer/issues/4) of the factor-ut-untrusted-decomposer project.\n\n## Citation\n\nIf you use this package in your research, please cite:\n\n```bibtex\n@software{bcb-deep-equal,\n  author = {Sandoval, Aaron},\n  title = {BCB Deep Equal: Floating-point tolerant comparison for BigCodeBench},\n  year = {2025},\n  url = {https://github.com/mushu-dev/bcb-deep-equal}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Floating-point tolerant comparison for BigCodeBench",
    "version": "0.1.1",
    "project_urls": {
        "Bug Reports": "https://github.com/mushu-dev/bcb-deep-equal/issues",
        "Homepage": "https://github.com/mushu-dev/bcb-deep-equal",
        "Source": "https://github.com/mushu-dev/bcb-deep-equal"
    },
    "split_keywords": [
        "bigcodebench",
        " bcb",
        " floating-point",
        " comparison",
        " testing",
        " deep-equal",
        " tolerance"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "53898dd61c59989f27ecebbdcc686859164e4dbc2d32f54f1d8bd670888ebb71",
                "md5": "687e290491f3066229a57443b1c05cc8",
                "sha256": "10080b4758818a1d123176c40341eba2bcd36056b761742b5d4220e7df0c9a9b"
            },
            "downloads": -1,
            "filename": "bcb_deep_equal-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "687e290491f3066229a57443b1c05cc8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9378,
            "upload_time": "2025-08-05T22:27:01",
            "upload_time_iso_8601": "2025-08-05T22:27:01.192519Z",
            "url": "https://files.pythonhosted.org/packages/53/89/8dd61c59989f27ecebbdcc686859164e4dbc2d32f54f1d8bd670888ebb71/bcb_deep_equal-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "af3a5b80492460644eb5fbfa3a7b3ff156f00c69ec5a13493dc3397c1e746526",
                "md5": "5e05298845d80e361e8dbf421bc771d3",
                "sha256": "40e2468191a1b54b7c5754aa078870624473988ad78c7d0a7ffd5d97d53489be"
            },
            "downloads": -1,
            "filename": "bcb_deep_equal-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "5e05298845d80e361e8dbf421bc771d3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 12614,
            "upload_time": "2025-08-05T22:27:02",
            "upload_time_iso_8601": "2025-08-05T22:27:02.381978Z",
            "url": "https://files.pythonhosted.org/packages/af/3a/5b80492460644eb5fbfa3a7b3ff156f00c69ec5a13493dc3397c1e746526/bcb_deep_equal-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-05 22:27:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mushu-dev",
    "github_project": "bcb-deep-equal",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "bcb-deep-equal"
}
        
Elapsed time: 2.25711s