pylcg


Namepylcg JSON
Version 1.0.3 PyPI version JSON
download
home_pagehttps://github.com/acidvegas/pylcg
SummaryLinear Congruential Generator for IP Sharding
upload_time2024-11-26 23:00:35
maintainerNone
docs_urlNone
authoracidvegas
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyLCG
> Ultra-fast Linear Congruential Generator for IP Sharding

PyLCG is a high-performance Python implementation of a memory-efficient IP address sharding system using Linear Congruential Generators (LCG) for deterministic random number generation. This tool enables distributed scanning & network reconnaissance by efficiently dividing IP ranges across multiple machines while maintaining pseudo-random ordering.

## Features

- Memory-efficient IP range processing
- Deterministic pseudo-random IP generation
- High-performance LCG implementation
- Support for sharding across multiple machines
- Zero dependencies beyond Python standard library
- Simple command-line interface and library usage

## Installation

```bash
pip install pylcg
```

## Usage

### Command Line

```bash
pylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345

# Resume from previous state
pylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345 --state 987654321

# Pipe to dig for PTR record lookups
pylcg 192.168.0.0/16 --seed 12345 | while read ip; do
    echo -n "$ip -> "
    dig +short -x $ip
done

# One-liner for PTR lookups
pylcg 198.150.0.0/16 | xargs -I {} dig +short -x {}

# Parallel PTR lookups
pylcg 198.150.0.0/16 | parallel "dig +short -x {} | sed 's/^/{} -> /'"
```

### As a Library

```python
from pylcg import ip_stream

# Basic usage
for ip in ip_stream('192.168.0.0/16', shard_num=1, total_shards=4, seed=12345):
    print(ip)

# Resume from previous state
for ip in ip_stream('192.168.0.0/16', shard_num=1, total_shards=4, seed=12345, state=987654321):
    print(ip)
```

## State Management & Resume Capability

PyLCG automatically saves its state every 1000 IPs processed to enable resume functionality in case of interruption. The state is saved to a temporary file in your system's temp directory (usually `/tmp` on Unix systems or `%TEMP%` on Windows).

The state file follows the naming pattern:
```
pylcg_[seed]_[cidr]_[shard]_[total].state
```

For example:
```
pylcg_12345_192.168.0.0_16_1_4.state
```

The state is saved in memory-mapped temporary storage to minimize disk I/O and improve performance. To resume from a previous state:

1. Locate your state file in the temp directory
2. Read the state value from the file
3. Use the same parameters (CIDR, seed, shard settings) with the `--state` parameter

Example of resuming:
```bash
# Read the last state
state=$(cat /tmp/pylcg_12345_192.168.0.0_16_1_4.state)

# Resume processing
pylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345 --state $state
```

Note: When using the `--state` parameter, you must provide the same `--seed` that was used in the original run.

## How It Works

### IP Address Integer Representation

Every IPv4 address is fundamentally a 32-bit number. For example, the IP address "192.168.1.1" can be broken down into its octets (192, 168, 1, 1) and converted to a single integer:
```
192.168.1.1 = (192 × 256³) + (168 × 256²) + (1 × 256¹) + (1 × 256⁰)
             = 3232235777
```

This integer representation allows us to treat IP ranges as simple number sequences. A CIDR block like "192.168.0.0/16" becomes a continuous range of integers:
- Start: 192.168.0.0   → 3232235520
- End:   192.168.255.255 → 3232301055

By working with these integer representations, we can perform efficient mathematical operations on IP addresses without the overhead of string manipulation or complex data structures. This is where the Linear Congruential Generator comes into play.

### Linear Congruential Generator

PyLCG uses an optimized LCG implementation with three carefully chosen parameters that work together to generate high-quality pseudo-random sequences:

| Name       | Variable | Value        |
|------------|----------|--------------|
| Multiplier | `a`      | `1664525`    |
| Increment  | `c`      | `1013904223` |
| Modulus    | `m`      | `2^32`       |

###### Modulus
The modulus value of `2^32` serves as both a mathematical and performance optimization choice. It perfectly matches the CPU's word size, allowing for extremely efficient modulo operations through simple bitwise AND operations. This choice means that all calculations stay within the natural bounds of CPU arithmetic while still providing a large enough period for even the biggest IP ranges we might encounter.

###### Multiplier
The multiplier value of `1664525` was originally discovered through extensive mathematical analysis for the Numerical Recipes library. It satisfies the Hull-Dobell theorem's strict requirements for maximum period length in power-of-2 modulus LCGs, being both relatively prime to the modulus and one more than a multiple of 4. This specific value also performs exceptionally well in spectral tests, ensuring good distribution properties across the entire range while being small enough to avoid intermediate overflow in 32-bit arithmetic.

###### Increment
The increment value of `1013904223` is a carefully selected prime number that completes our parameter trio. When combined with our chosen multiplier and modulus, it ensures optimal bit mixing throughout the sequence and helps eliminate common LCG issues like short cycles or poor distribution. This specific value was selected after extensive testing showed it produced excellent statistical properties and passed rigorous spectral tests for dimensional distribution.

### Applying LCG to IP Addresses

Once we have our IP addresses as integers, the LCG is used to generate a pseudo-random sequence that permutes through all possible values in our IP range:

1. For a given IP range *(start_ip, end_ip)*, we calculate the range size: `range_size = end_ip - start_ip + 1`

2. The LCG generates a sequence using the formula: `X_{n+1} = (a * X_n + c) mod m`

3. To map this sequence back to valid IPs in our range:
   - Generate the next LCG value
   - Take modulo of the value with range_size to get an offset: `offset = lcg_value % range_size`
   - Add this offset to start_ip: `ip = start_ip + offset`

This process ensures that:
- Every IP in the range is visited exactly once
- The sequence appears random but is deterministic
- We maintain constant memory usage regardless of range size
- The same seed always produces the same sequence

### Sharding Algorithm

The sharding system employs an interleaved approach that ensures even distribution of work across multiple machines while maintaining randomness. Each shard operates independently using a deterministic sequence derived from the base seed plus the shard index. The system distributes IPs across shards using modulo arithmetic, ensuring that each IP is assigned to exactly one shard. This approach prevents sequential scanning patterns while guaranteeing complete coverage of the IP range. The result is a system that can efficiently parallelize work across any number of machines while maintaining the pseudo-random ordering that's crucial for network scanning applications.

## Contributing

### Performance Optimization

We welcome contributions that improve PyLCG's performance. When submitting optimizations:

1. Run the included benchmark suite:
```bash
python3 unit_test.py
```

---

###### Mirrors: [acid.vegas](https://git.acid.vegas/pylcg) • [SuperNETs](https://git.supernets.org/acidvegas/pylcg) • [GitHub](https://github.com/acidvegas/pylcg) • [GitLab](https://gitlab.com/acidvegas/pylcg) • [Codeberg](https://codeberg.org/acidvegas/pylcg)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/acidvegas/pylcg",
    "name": "pylcg",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "acidvegas",
    "author_email": "acid.vegas@acid.vegas",
    "download_url": "https://files.pythonhosted.org/packages/fd/77/a71d992f26423ce4bcfdda26fccb2e422934b10e6e3796448c31e90fd7e0/pylcg-1.0.3.tar.gz",
    "platform": null,
    "description": "# PyLCG\n> Ultra-fast Linear Congruential Generator for IP Sharding\n\nPyLCG is a high-performance Python implementation of a memory-efficient IP address sharding system using Linear Congruential Generators (LCG) for deterministic random number generation. This tool enables distributed scanning & network reconnaissance by efficiently dividing IP ranges across multiple machines while maintaining pseudo-random ordering.\n\n## Features\n\n- Memory-efficient IP range processing\n- Deterministic pseudo-random IP generation\n- High-performance LCG implementation\n- Support for sharding across multiple machines\n- Zero dependencies beyond Python standard library\n- Simple command-line interface and library usage\n\n## Installation\n\n```bash\npip install pylcg\n```\n\n## Usage\n\n### Command Line\n\n```bash\npylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345\n\n# Resume from previous state\npylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345 --state 987654321\n\n# Pipe to dig for PTR record lookups\npylcg 192.168.0.0/16 --seed 12345 | while read ip; do\n    echo -n \"$ip -> \"\n    dig +short -x $ip\ndone\n\n# One-liner for PTR lookups\npylcg 198.150.0.0/16 | xargs -I {} dig +short -x {}\n\n# Parallel PTR lookups\npylcg 198.150.0.0/16 | parallel \"dig +short -x {} | sed 's/^/{} -> /'\"\n```\n\n### As a Library\n\n```python\nfrom pylcg import ip_stream\n\n# Basic usage\nfor ip in ip_stream('192.168.0.0/16', shard_num=1, total_shards=4, seed=12345):\n    print(ip)\n\n# Resume from previous state\nfor ip in ip_stream('192.168.0.0/16', shard_num=1, total_shards=4, seed=12345, state=987654321):\n    print(ip)\n```\n\n## State Management & Resume Capability\n\nPyLCG automatically saves its state every 1000 IPs processed to enable resume functionality in case of interruption. The state is saved to a temporary file in your system's temp directory (usually `/tmp` on Unix systems or `%TEMP%` on Windows).\n\nThe state file follows the naming pattern:\n```\npylcg_[seed]_[cidr]_[shard]_[total].state\n```\n\nFor example:\n```\npylcg_12345_192.168.0.0_16_1_4.state\n```\n\nThe state is saved in memory-mapped temporary storage to minimize disk I/O and improve performance. To resume from a previous state:\n\n1. Locate your state file in the temp directory\n2. Read the state value from the file\n3. Use the same parameters (CIDR, seed, shard settings) with the `--state` parameter\n\nExample of resuming:\n```bash\n# Read the last state\nstate=$(cat /tmp/pylcg_12345_192.168.0.0_16_1_4.state)\n\n# Resume processing\npylcg 192.168.0.0/16 --shard-num 1 --total-shards 4 --seed 12345 --state $state\n```\n\nNote: When using the `--state` parameter, you must provide the same `--seed` that was used in the original run.\n\n## How It Works\n\n### IP Address Integer Representation\n\nEvery IPv4 address is fundamentally a 32-bit number. For example, the IP address \"192.168.1.1\" can be broken down into its octets (192, 168, 1, 1) and converted to a single integer:\n```\n192.168.1.1 = (192 \u00d7 256\u00b3) + (168 \u00d7 256\u00b2) + (1 \u00d7 256\u00b9) + (1 \u00d7 256\u2070)\n             = 3232235777\n```\n\nThis integer representation allows us to treat IP ranges as simple number sequences. A CIDR block like \"192.168.0.0/16\" becomes a continuous range of integers:\n- Start: 192.168.0.0   \u2192 3232235520\n- End:   192.168.255.255 \u2192 3232301055\n\nBy working with these integer representations, we can perform efficient mathematical operations on IP addresses without the overhead of string manipulation or complex data structures. This is where the Linear Congruential Generator comes into play.\n\n### Linear Congruential Generator\n\nPyLCG uses an optimized LCG implementation with three carefully chosen parameters that work together to generate high-quality pseudo-random sequences:\n\n| Name       | Variable | Value        |\n|------------|----------|--------------|\n| Multiplier | `a`      | `1664525`    |\n| Increment  | `c`      | `1013904223` |\n| Modulus    | `m`      | `2^32`       |\n\n###### Modulus\nThe modulus value of `2^32` serves as both a mathematical and performance optimization choice. It perfectly matches the CPU's word size, allowing for extremely efficient modulo operations through simple bitwise AND operations. This choice means that all calculations stay within the natural bounds of CPU arithmetic while still providing a large enough period for even the biggest IP ranges we might encounter.\n\n###### Multiplier\nThe multiplier value of `1664525` was originally discovered through extensive mathematical analysis for the Numerical Recipes library. It satisfies the Hull-Dobell theorem's strict requirements for maximum period length in power-of-2 modulus LCGs, being both relatively prime to the modulus and one more than a multiple of 4. This specific value also performs exceptionally well in spectral tests, ensuring good distribution properties across the entire range while being small enough to avoid intermediate overflow in 32-bit arithmetic.\n\n###### Increment\nThe increment value of `1013904223` is a carefully selected prime number that completes our parameter trio. When combined with our chosen multiplier and modulus, it ensures optimal bit mixing throughout the sequence and helps eliminate common LCG issues like short cycles or poor distribution. This specific value was selected after extensive testing showed it produced excellent statistical properties and passed rigorous spectral tests for dimensional distribution.\n\n### Applying LCG to IP Addresses\n\nOnce we have our IP addresses as integers, the LCG is used to generate a pseudo-random sequence that permutes through all possible values in our IP range:\n\n1. For a given IP range *(start_ip, end_ip)*, we calculate the range size: `range_size = end_ip - start_ip + 1`\n\n2. The LCG generates a sequence using the formula: `X_{n+1} = (a * X_n + c) mod m`\n\n3. To map this sequence back to valid IPs in our range:\n   - Generate the next LCG value\n   - Take modulo of the value with range_size to get an offset: `offset = lcg_value % range_size`\n   - Add this offset to start_ip: `ip = start_ip + offset`\n\nThis process ensures that:\n- Every IP in the range is visited exactly once\n- The sequence appears random but is deterministic\n- We maintain constant memory usage regardless of range size\n- The same seed always produces the same sequence\n\n### Sharding Algorithm\n\nThe sharding system employs an interleaved approach that ensures even distribution of work across multiple machines while maintaining randomness. Each shard operates independently using a deterministic sequence derived from the base seed plus the shard index. The system distributes IPs across shards using modulo arithmetic, ensuring that each IP is assigned to exactly one shard. This approach prevents sequential scanning patterns while guaranteeing complete coverage of the IP range. The result is a system that can efficiently parallelize work across any number of machines while maintaining the pseudo-random ordering that's crucial for network scanning applications.\n\n## Contributing\n\n### Performance Optimization\n\nWe welcome contributions that improve PyLCG's performance. When submitting optimizations:\n\n1. Run the included benchmark suite:\n```bash\npython3 unit_test.py\n```\n\n---\n\n###### Mirrors: [acid.vegas](https://git.acid.vegas/pylcg) \u2022 [SuperNETs](https://git.supernets.org/acidvegas/pylcg) \u2022 [GitHub](https://github.com/acidvegas/pylcg) \u2022 [GitLab](https://gitlab.com/acidvegas/pylcg) \u2022 [Codeberg](https://codeberg.org/acidvegas/pylcg)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Linear Congruential Generator for IP Sharding",
    "version": "1.0.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/acidvegas/pylcg/issues",
        "Documentation": "https://github.com/acidvegas/pylcg#readme",
        "Homepage": "https://github.com/acidvegas/pylcg",
        "Source Code": "https://github.com/acidvegas/pylcg"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "662599e62937aafc93d789082c86ca871866a5497e7331b7c776263c88ac17ad",
                "md5": "d8c0fcc354ba99ebe563efc99fb7e3db",
                "sha256": "f3effca0431c5f66a1293bc79ec759652c23070fed319da1e81458c1defd8360"
            },
            "downloads": -1,
            "filename": "pylcg-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d8c0fcc354ba99ebe563efc99fb7e3db",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 8326,
            "upload_time": "2024-11-26T23:00:32",
            "upload_time_iso_8601": "2024-11-26T23:00:32.951870Z",
            "url": "https://files.pythonhosted.org/packages/66/25/99e62937aafc93d789082c86ca871866a5497e7331b7c776263c88ac17ad/pylcg-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fd77a71d992f26423ce4bcfdda26fccb2e422934b10e6e3796448c31e90fd7e0",
                "md5": "66687c8dd08093fbf0109b3086732fab",
                "sha256": "da12d89b4b3a35e6f7cb87897ebd6d5a6c077e71d3b9e24e4291366721d8e3c6"
            },
            "downloads": -1,
            "filename": "pylcg-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "66687c8dd08093fbf0109b3086732fab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 7642,
            "upload_time": "2024-11-26T23:00:35",
            "upload_time_iso_8601": "2024-11-26T23:00:35.022429Z",
            "url": "https://files.pythonhosted.org/packages/fd/77/a71d992f26423ce4bcfdda26fccb2e422934b10e6e3796448c31e90fd7e0/pylcg-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-26 23:00:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "acidvegas",
    "github_project": "pylcg",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pylcg"
}
        
Elapsed time: 0.40609s