gerrit-clone


Namegerrit-clone JSON
Version 0.1.7 PyPI version JSON
download
home_pageNone
SummaryA multi-threaded CLI tool for bulk cloning repositories from Gerrit servers
upload_time2025-10-07 08:33:49
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords bulk ci-cd clone gerrit git multi-threaded repository ssh
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!--
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: 2025 Matthew Watkins <mwatkins@linuxfoundation.org>
-->

# 🔄 Gerrit Clone

A production-ready multi-threaded CLI tool and GitHub Action for bulk cloning
repositories from Gerrit servers with automatic API discovery. Designed for
reliability, speed, and CI/CD compatibility.

## Features

- **Automatic API Discovery**: Discovers Gerrit API endpoints across different
  server configurations (`/r`, `/gerrit`, `/infra`, etc.)
- **Bulk Repository Discovery**: Fetches all projects via Gerrit REST API with
  intelligent filtering
- **Multi-threaded Cloning**: Concurrent operations with auto-scaling thread
  pools (up to 32 workers)
- **Hierarchy Preservation**: Maintains complete Gerrit project folder
  structure without flattening
- **Robust Retry Logic**: Exponential backoff with jitter for transient
  network and server failures
- **SSH Integration**: Full SSH agent, identity file, and config support
- **CI/CD Ready**: Non-interactive operation with structured JSON manifests
- **Smart Filtering**: Automatically excludes system repos and archived
  projects
- **Rich Progress Display**: Beautiful terminal progress bars with per-repo
  status tracking
- **Comprehensive Logging**: Structured logging with configurable verbosity
  levels

## Installation

### Using uvx (Recommended)

For one-time execution without installation:

```bash
uvx gerrit-clone --host gerrit.example.org
```

### Using uv

```bash
uv tool install gerrit-clone
./gerrit-clone --host gerrit.example.org
```

### From Source

```bash
git clone https://github.com/lfreleng-actions/gerrit-clone-action.git
cd gerrit-clone-action
uv sync
uv run gerrit-clone --host gerrit.example.org
```

## CLI Usage

### Basic Examples

Clone all active repositories from a Gerrit server:

```bash
gerrit-clone --host gerrit.example.org
```

Clone to a specific directory with custom thread count:

```bash
gerrit-clone --host gerrit.example.org \
  --path-prefix ./repositories \
  --threads 8
```

Clone with shallow depth and specific branch:

```bash
gerrit-clone --host gerrit.example.org \
  --depth 10 \
  --branch main \
  --threads 16
```

Include archived repositories and use custom SSH key:

```bash
gerrit-clone --host gerrit.example.org \
  --include-archived \
  --ssh-user myuser \
  --ssh-private-key ~/.ssh/gerrit_rsa
```

### Command-Line Options

```text
Usage: gerrit-clone [OPTIONS]

Options:
  -h, --host TEXT                 Gerrit server hostname [required]
  -p, --port INTEGER              Gerrit SSH port [default: 29418]
  --base-url TEXT                 Base URL for Gerrit API
  -u, --ssh-user TEXT             SSH username for clone operations
  -i, --ssh-private-key PATH      SSH private key file for authentication
  --path-prefix PATH              Base directory for clone hierarchy [default: .]
  --skip-archived / --include-archived
                                  Skip archived and inactive repositories
                                  [default: skip-archived]
  --include-project TEXT          Restrict cloning to specific project(s)
  --ssh-debug                     Enable verbose SSH (-vvv) for troubleshooting
  --allow-nested-git / --no-allow-nested-git
                                  Allow nested git working trees when cloning
                                  both parent and child repositories [default: allow-nested-git]
  --nested-protection / --no-nested-protection
                                  Auto-add nested child repo paths to parent
                                  .git/info/exclude [default: nested-protection]
  --move-conflicting              Move conflicting files/directories in parent
                                  repos to [NAME].parent to allow nested cloning
  -t, --threads INTEGER           Number of concurrent clone threads
  -d, --depth INTEGER             Create shallow clone with given depth
  -b, --branch TEXT               Clone specific branch instead of default
  --https / --ssh                 Use HTTPS for cloning [default: ssh]
  --keep-remote-protocol          Keep original clone protocol for remote
  --strict-host / --accept-unknown-host
                                  SSH strict host key checking [default: strict-host]
  --clone-timeout INTEGER         Timeout per clone operation in seconds
                                  [default: 600]
  --retry-attempts INTEGER        Max retry attempts per repository
                                  [default: 3]
  --retry-base-delay FLOAT        Base delay for retry backoff in seconds
                                  [default: 2.0]
  --retry-factor FLOAT            Exponential backoff factor [default: 2.0]
  --retry-max-delay FLOAT         Max retry delay in seconds [default: 30.0]
  --manifest-filename TEXT        Output manifest filename [default: clone-manifest.json]
  -c, --config-file PATH          Configuration file path (YAML or JSON)
  --exit-on-error                 Exit when first error occurs
  --log-file PATH                 Custom log file path
  --disable-log-file              Disable creation of log file
  --log-level TEXT                File logging level [default: DEBUG]
  -v, --verbose                   Enable verbose/debug output
  -q, --quiet                     Suppress all output except errors
  --version                       Show version information
  --help                          Show this message and exit
```

### Environment Variables

You can configure all CLI options through environment variables with `GERRIT_` prefix:

```bash
export GERRIT_HOST=gerrit.example.org
export GERRIT_PORT=29418
export GERRIT_SSH_USER=myuser
export GERRIT_SSH_PRIVATE_KEY=~/.ssh/gerrit_key
export GERRIT_PATH_PREFIX=/workspace/repos
export GERRIT_SKIP_ARCHIVED=1
export GERRIT_THREADS=16
export GERRIT_CLONE_DEPTH=5
export GERRIT_BRANCH=main
export GERRIT_STRICT_HOST=1
export GERRIT_CLONE_TIMEOUT=300
export GERRIT_RETRY_ATTEMPTS=5

gerrit-clone  # Uses environment variables
```

### Configuration Files

Create `~/.config/gerrit-clone/config.yaml`:

```yaml
host: gerrit.example.org
port: 29418
ssh_user: myuser
ssh_identity_file: ~/.ssh/gerrit_key
path_prefix: /workspace/repos
skip_archived: true
threads: 8
clone_timeout: 600
retry_attempts: 3
retry_base_delay: 2.0
```

Or JSON format `~/.config/gerrit-clone/config.json`:

```json
{
  "host": "gerrit.example.org",
  "port": 29418,
  "ssh_user": "myuser",
  "ssh_identity_file": "~/.ssh/gerrit_key",
  "path_prefix": "/workspace/repos",
  "skip_archived": true,
  "threads": 8
}
```

Configuration precedence: CLI arguments > Environment variables > Config file > Defaults

## Nested Repository Support

Gerrit Clone includes intelligent support for nested repositories (projects with
hierarchical names like `parent/child`):

### Automatic Detection

- **Dependency Ordering**: Parent repositories are automatically cloned before
  their children
- **Conflict Detection**: Identifies when parent repo content conflicts with
  nested directory structure
- **Smart Batching**: Uses dependency-aware batching to prevent race conditions

### Conflict Resolution Options

#### Skip Conflicting

```bash
gerrit-clone clone --host gerrit.example.org --no-move-conflicting
```

Skips nested repositories when parent contains conflicting files/directories.
Provides clear warnings about skipped repos.

#### Move Conflicting (Default - Recommended for Data Mining)

```bash
gerrit-clone clone --host gerrit.example.org
```

Automatically moves conflicting content in parent repositories to
`[NAME].parent` to allow complete nested cloning. This ensures **100%
repository availability** for reporting and analysis purposes.

**Example:**

- Parent repo `test` contains file named `test`
- Child repo `test/test` needs directory `test/`
- With move-conflicting enabled (default): File `test` → `test.parent`,
  directory created for child repo
- Result: Both repositories cloned with complete history preserved

### Configuration

```bash
# Allow nested repositories (default: true)
--allow-nested-git / --no-allow-nested-git

# Protect parent repos by adding child paths to .git/info/exclude (default: true)
--nested-protection / --no-nested-protection

# Move conflicting content to allow complete cloning (default: true)
--move-conflicting / --no-move-conflicting
```

## GitHub Action Usage

### Basic Example

```yaml
name: Clone Gerrit Repositories
on: [push]

jobs:
  clone:
    runs-on: ubuntu-latest
    steps:
      - name: Clone repositories
        uses: lfreleng-actions/gerrit-clone-action@v1
        with:
          host: gerrit.example.org
          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
          path-prefix: repositories
```

### Advanced Example

```yaml
name: Clone and Process Repositories
on:
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM

jobs:
  clone:
    runs-on: ubuntu-latest
    steps:
      - name: Clone repositories
        id: clone
        uses: lfreleng-actions/gerrit-clone-action@v1
        with:
          host: gerrit.example.org
          port: 29418
          base-url: https://gerrit.example.org/gerrit
          ssh-user: automation
          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
          path-prefix: workspace
          skip-archived: true
          threads: 12
          depth: 1
          branch: main
          use-https: false
          keep-remote-protocol: false
          clone-timeout: 900
          retry-attempts: 5
          verbose: true

      - name: Show results
        run: |
          echo "Total: ${{ steps.clone.outputs.total-count }}"
          echo "Success: ${{ steps.clone.outputs.success-count }}"
          echo "Failed: ${{ steps.clone.outputs.failure-count }}"
          echo "Manifest: ${{ steps.clone.outputs.manifest-path }}"

      - name: Upload manifest
        uses: actions/upload-artifact@v4
        with:
          name: clone-manifest
          path: ${{ steps.clone.outputs.manifest-path }}
```

### HTTPS Cloning Example

```yaml
name: Clone via HTTPS
on: [push]

jobs:
  clone:
    runs-on: ubuntu-latest
    steps:
      - name: Clone repositories using HTTPS
        uses: lfreleng-actions/gerrit-clone-action@v1
        with:
          host: gerrit.example.org
          base-url: https://gerrit.example.org/r
          use-https: true
          path-prefix: repos
          quiet: true
        env:
          # Use GitHub token or other auth for HTTPS
          GIT_ASKPASS: echo
          GIT_USERNAME: ${{ secrets.GERRIT_USERNAME }}
          GIT_PASSWORD: ${{ secrets.GERRIT_TOKEN }}
```

### Nested Repositories with Conflict Resolution

```yaml
name: Complete Repository Mining
on: [workflow_dispatch]

jobs:
  clone:
    runs-on: ubuntu-latest
    steps:
      - name: Clone all repositories (including nested with conflicts)
        uses: lfreleng-actions/gerrit-clone-action@v1
        with:
          host: gerrit.example.org
          use-https: true
          allow-nested-git: true
          nested-protection: true
          move-conflicting: true  # Move conflicting files to ensure 100% clone success
          path-prefix: complete-clone
          threads: 8
          verbose: true

      - name: Verify complete data availability
        run: |
          echo "Cloned: ${{ steps.clone.outputs.success-count }}"
          echo "Total repositories: ${{ steps.clone.outputs.total-count }}"
          success_count=${{ steps.clone.outputs.success-count }}
          total_count=${{ steps.clone.outputs.total-count }}
          success_rate=$(( success_count * 100 / total_count ))
          echo "Success rate: ${success_rate}%"

          # Count moved conflicts
          find complete-clone -name "*.parent" | wc -l | xargs echo "Conflicts resolved:"
```

### Configuration File Example

```yaml
name: Clone with Config File
on: [workflow_dispatch]

jobs:
  clone:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout config
        uses: actions/checkout@v4

      - name: Clone repositories
        uses: lfreleng-actions/gerrit-clone-action@v1
        with:
          config-file: .gerrit-clone-config.yaml
          verbose: true
```

### Action Inputs

<!-- markdownlint-disable MD013 -->

| Input | Required | Default | Description |
|-------|----------|---------|-------------|
| `host` | Yes | | Gerrit server hostname |
| `port` | No | `29418` | Gerrit SSH port |
| `base-url` | No | | Base URL for Gerrit API (defaults to <https://HOST>) |
| `ssh-user` | No | | SSH username for clone operations |
| `ssh-private-key` | No | | SSH private key content for authentication |
| `path-prefix` | No | `.` | Base directory for clone hierarchy |
| `skip-archived` | No | `true` | Skip archived and inactive repositories |
| `include-project` | No | | Restrict cloning to specific project(s) (comma-separated) |
| `ssh-debug` | No | `false` | Enable verbose SSH (-vvv) for troubleshooting |
| `allow-nested-git` | No | `true` | Allow nested git working trees |
| `nested-protection` | No | `true` | Auto-add nested child repo paths to parent .git/info/exclude |
| `move-conflicting` | No | `false` | Move conflicting files/directories in parent repos to [NAME].parent |
| `exit-on-error` | No | `false` | Exit when first error occurs |
| `threads` | No | auto | Number of concurrent clone threads |
| `depth` | No | | Create shallow clone with given depth |
| `branch` | No | | Clone specific branch instead of default |
| `use-https` | No | `false` | Use HTTPS for cloning instead of SSH |
| `keep-remote-protocol` | No | `false` | Keep original clone protocol for remote |
| `strict-host` | No | `true` | SSH strict host key checking |
| `clone-timeout` | No | `600` | Timeout per clone operation in seconds |
| `retry-attempts` | No | `3` | Max retry attempts per repository |
| `retry-base-delay` | No | `2.0` | Base delay for retry backoff in seconds |
| `retry-factor` | No | `2.0` | Exponential backoff factor for retries |
| `retry-max-delay` | No | `30.0` | Max retry delay in seconds |
| `manifest-filename` | No | `clone-manifest.json` | Output manifest filename |
| `config-file` | No | | Configuration file path (YAML or JSON) |
| `verbose` | No | `false` | Enable verbose/debug output |
| `quiet` | No | `false` | Suppress all output except errors |
| `log-file` | No | | Custom log file path |
| `disable-log-file` | No | `false` | Disable creation of log file |
| `log-level` | No | `DEBUG` | File logging level |

<!-- markdownlint-enable MD013 -->

### Action Outputs

| Output | Description |
|--------|-------------|
| `manifest-path` | Path to the generated clone manifest file |
| `success-count` | Number of cloned repositories |
| `failure-count` | Number of failed clone attempts |
| `total-count` | Total number of repositories processed |

## SSH Configuration

The tool provides comprehensive SSH authentication support with automatic
configuration detection:

### SSH Authentication Options

The following SSH authentication options are available across all interfaces:

<!-- markdownlint-disable MD013 -->
| Option | CLI | Environment | Action | Description |
|--------|-----|-------------|--------|-------------|
| SSH User | `-u` | `GERRIT_SSH_USER` | `ssh-user` | SSH username |
| SSH Key | `-i` (file) | `GERRIT_SSH_PRIVATE_KEY` | `ssh-private-key` (content) | Private key |
| Host Check | `--strict-host` | `GERRIT_STRICT_HOST` | `strict-host` | Key check |
<!-- markdownlint-enable MD013 -->

### Authentication Methods

Three authentication methods provide automatic fallback:

1. **SSH Agent (Recommended)**: Uses keys loaded into SSH agent with automatic
   detection
2. **Identity File**: Explicitly specified private key files with permission
   validation
3. **SSH Config**: Host-specific configuration from ~/.ssh/config with full
   option support

### SSH Setup Examples

#### Using SSH Agent (Recommended for development)

1. Generate SSH key pair:

   ```bash
   ssh-keygen -t ed25519 -C "your.email@example.com"
   ```

2. Add public key to Gerrit profile

3. Add private key to SSH agent:

   ```bash
   ssh-add ~/.ssh/id_ed25519
   ```

4. Clone with agent authentication:

   ```bash
   gerrit-clone clone --host gerrit.example.org --ssh-user myuser
   ```

#### Using SSH Identity File (Recommended for CI/CD)

1. Place private key file securely (e.g., `/path/to/private_key`)

2. Set proper permissions:

   ```bash
   chmod 600 /path/to/private_key
   ```

3. Clone with identity file:

   ```bash
   gerrit-clone clone --host gerrit.example.org \
     --ssh-user myuser \
     --ssh-private-key /path/to/private_key
   ```

4. Or use environment variables:

   ```bash
   export GERRIT_SSH_USER=myuser
   export GERRIT_SSH_PRIVATE_KEY=/path/to/private_key
   gerrit-clone clone --host gerrit.example.org
   ```

### SSH Config

Create `~/.ssh/config` entries for convenience:

```text
Host gerrit.example.org
    User myusername
    IdentityFile ~/.ssh/gerrit_key
    StrictHostKeyChecking yes
```

### Known Hosts

Pre-populate known hosts to avoid prompts (recommended for CI/CD):

```bash
ssh-keyscan -H -p 29418 gerrit.example.org >> ~/.ssh/known_hosts
```

Test SSH connectivity before cloning:

```bash
ssh -p 29418 myuser@gerrit.example.org gerrit version
```

## Output Manifest

Each run generates a detailed JSON manifest (`clone-manifest.json`):

```json
{
  "version": "1.0",
  "generated_at": "2025-01-15T10:30:45Z",
  "host": "gerrit.example.org",
  "port": 29418,
  "total": 42,
  "succeeded": 154,
  "failed": 2,
  "skipped": 0,
  "success_rate": 98.7,
  "duration_s": 89.3,
  "config": {
    "skip_archived": true,
    "threads": 8,
    "depth": null,
    "branch": null,
    "strict_host_checking": true,
    "path_prefix": "/workspace/repos"
  },
  "results": [
    {
      "project": "core/api",
      "path": "core/api",
      "status": "success",
      "attempts": 1,
      "duration_s": 3.42,
      "error": null,
      "started_at": "2025-01-15T10:30:15Z",
      "completed_at": "2025-01-15T10:30:18Z"
    },
    {
      "project": "tools/legacy",
      "path": "tools/legacy",
      "status": "failed",
      "attempts": 3,
      "duration_s": 15.8,
      "error": "timeout after 600s",
      "started_at": "2025-01-15T10:30:20Z",
      "completed_at": "2025-01-15T10:30:36Z"
    }
  ]
}
```

## Error Handling

### Common Issues

#### Host key verification failed

```bash
# Accept new host keys (use with caution)
gerrit-clone --host gerrit.example.org --accept-unknown-host

# Recommended: Pre-populate known_hosts
ssh-keyscan -H -p 29418 gerrit.example.org >> ~/.ssh/known_hosts
```

#### Permission denied (publickey)

- Verify SSH public key exists in Gerrit profile
- Check SSH agent has key loaded: `ssh-add -l`
- Test SSH connection: `ssh -p 29418 username@gerrit.example.org gerrit version`
- Verify SSH key permissions: `chmod 600 ~/.ssh/id_rsa`

#### Connection timeout or network errors

- Verify Gerrit server hostname and port (often 29418 for SSH)
- Check network connectivity and firewall rules
- Increase timeout: `--clone-timeout 900`
- Reduce concurrency: `--threads 4`

#### Path conflicts or permission errors

- Existing non-git directories block clones
- Use clean target directory: `--path-prefix ./clean-workspace`
- Check disk space and write permissions
- Remove conflicting directories: `rm -rf conflicting-path`

#### API discovery failures

- Manually specify base URL: `--base-url https://host/gerrit`
- Verify Gerrit server is accessible via HTTPS
- Check for corporate proxy or firewall restrictions

### Exit Codes

- `0`: Success (all repositories cloned)
- `1`: Failure (one or more repositories failed to clone)
- `130`: Interrupted by user (Ctrl+C)

## Development

### Requirements

- Python 3.11+ (tested on 3.11, 3.12, 3.13, 3.14)
- uv package manager (for development)
- Git (for clone operations)
- SSH client (for authentication)

### Setup

```bash
git clone https://github.com/lfreleng-actions/gerrit-clone-action.git
cd gerrit-clone-action
uv sync --dev
```

### Testing

```bash
# Run all tests
uv run pytest

# Run with coverage report
uv run pytest --cov=gerrit_clone --cov-report=html --cov-report=term-missing

# Run integration tests (requires network)
uv run pytest tests/integration/ -v

# Run specific test categories
uv run pytest -m "not integration" -v  # Unit tests
uv run pytest tests/test_models.py::TestConfig -v  # Specific test class
```

### Linting

```bash
# Run pre-commit hooks
uv run pre-commit run --all-files

# Individual tools
uv run ruff check .
uv run ruff format .
uv run mypy src/
```

## License

This project uses the Apache License 2.0. See LICENSE for details.

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make changes with tests
4. Run linting and tests
5. Submit a pull request

## Support

- **GitHub Issues**: Report bugs and request features at
  [lfreleng-actions/gerrit-clone-action](https://github.com/lfreleng-actions/gerrit-clone-action/issues)
- **Documentation**: This README, IMPLEMENTATION.md, and inline help
  (`gerrit-clone --help`)
- **Examples**: Advanced usage patterns in repository examples/
- **Integration Tests**: Real-world server validation in tests/integration/

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "gerrit-clone",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": "Matthew Watkins <mwatkins@linuxfoundation.org>",
    "keywords": "bulk, ci-cd, clone, gerrit, git, multi-threaded, repository, ssh",
    "author": null,
    "author_email": "Matthew Watkins <mwatkins@linuxfoundation.org>",
    "download_url": "https://files.pythonhosted.org/packages/60/16/fd3818546afd1620edc2f85d29df7ec8459688e79bb39398e06cb6cc1aea/gerrit_clone-0.1.7.tar.gz",
    "platform": null,
    "description": "<!--\n# SPDX-License-Identifier: Apache-2.0\n# SPDX-FileCopyrightText: 2025 Matthew Watkins <mwatkins@linuxfoundation.org>\n-->\n\n# \ud83d\udd04 Gerrit Clone\n\nA production-ready multi-threaded CLI tool and GitHub Action for bulk cloning\nrepositories from Gerrit servers with automatic API discovery. Designed for\nreliability, speed, and CI/CD compatibility.\n\n## Features\n\n- **Automatic API Discovery**: Discovers Gerrit API endpoints across different\n  server configurations (`/r`, `/gerrit`, `/infra`, etc.)\n- **Bulk Repository Discovery**: Fetches all projects via Gerrit REST API with\n  intelligent filtering\n- **Multi-threaded Cloning**: Concurrent operations with auto-scaling thread\n  pools (up to 32 workers)\n- **Hierarchy Preservation**: Maintains complete Gerrit project folder\n  structure without flattening\n- **Robust Retry Logic**: Exponential backoff with jitter for transient\n  network and server failures\n- **SSH Integration**: Full SSH agent, identity file, and config support\n- **CI/CD Ready**: Non-interactive operation with structured JSON manifests\n- **Smart Filtering**: Automatically excludes system repos and archived\n  projects\n- **Rich Progress Display**: Beautiful terminal progress bars with per-repo\n  status tracking\n- **Comprehensive Logging**: Structured logging with configurable verbosity\n  levels\n\n## Installation\n\n### Using uvx (Recommended)\n\nFor one-time execution without installation:\n\n```bash\nuvx gerrit-clone --host gerrit.example.org\n```\n\n### Using uv\n\n```bash\nuv tool install gerrit-clone\n./gerrit-clone --host gerrit.example.org\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/lfreleng-actions/gerrit-clone-action.git\ncd gerrit-clone-action\nuv sync\nuv run gerrit-clone --host gerrit.example.org\n```\n\n## CLI Usage\n\n### Basic Examples\n\nClone all active repositories from a Gerrit server:\n\n```bash\ngerrit-clone --host gerrit.example.org\n```\n\nClone to a specific directory with custom thread count:\n\n```bash\ngerrit-clone --host gerrit.example.org \\\n  --path-prefix ./repositories \\\n  --threads 8\n```\n\nClone with shallow depth and specific branch:\n\n```bash\ngerrit-clone --host gerrit.example.org \\\n  --depth 10 \\\n  --branch main \\\n  --threads 16\n```\n\nInclude archived repositories and use custom SSH key:\n\n```bash\ngerrit-clone --host gerrit.example.org \\\n  --include-archived \\\n  --ssh-user myuser \\\n  --ssh-private-key ~/.ssh/gerrit_rsa\n```\n\n### Command-Line Options\n\n```text\nUsage: gerrit-clone [OPTIONS]\n\nOptions:\n  -h, --host TEXT                 Gerrit server hostname [required]\n  -p, --port INTEGER              Gerrit SSH port [default: 29418]\n  --base-url TEXT                 Base URL for Gerrit API\n  -u, --ssh-user TEXT             SSH username for clone operations\n  -i, --ssh-private-key PATH      SSH private key file for authentication\n  --path-prefix PATH              Base directory for clone hierarchy [default: .]\n  --skip-archived / --include-archived\n                                  Skip archived and inactive repositories\n                                  [default: skip-archived]\n  --include-project TEXT          Restrict cloning to specific project(s)\n  --ssh-debug                     Enable verbose SSH (-vvv) for troubleshooting\n  --allow-nested-git / --no-allow-nested-git\n                                  Allow nested git working trees when cloning\n                                  both parent and child repositories [default: allow-nested-git]\n  --nested-protection / --no-nested-protection\n                                  Auto-add nested child repo paths to parent\n                                  .git/info/exclude [default: nested-protection]\n  --move-conflicting              Move conflicting files/directories in parent\n                                  repos to [NAME].parent to allow nested cloning\n  -t, --threads INTEGER           Number of concurrent clone threads\n  -d, --depth INTEGER             Create shallow clone with given depth\n  -b, --branch TEXT               Clone specific branch instead of default\n  --https / --ssh                 Use HTTPS for cloning [default: ssh]\n  --keep-remote-protocol          Keep original clone protocol for remote\n  --strict-host / --accept-unknown-host\n                                  SSH strict host key checking [default: strict-host]\n  --clone-timeout INTEGER         Timeout per clone operation in seconds\n                                  [default: 600]\n  --retry-attempts INTEGER        Max retry attempts per repository\n                                  [default: 3]\n  --retry-base-delay FLOAT        Base delay for retry backoff in seconds\n                                  [default: 2.0]\n  --retry-factor FLOAT            Exponential backoff factor [default: 2.0]\n  --retry-max-delay FLOAT         Max retry delay in seconds [default: 30.0]\n  --manifest-filename TEXT        Output manifest filename [default: clone-manifest.json]\n  -c, --config-file PATH          Configuration file path (YAML or JSON)\n  --exit-on-error                 Exit when first error occurs\n  --log-file PATH                 Custom log file path\n  --disable-log-file              Disable creation of log file\n  --log-level TEXT                File logging level [default: DEBUG]\n  -v, --verbose                   Enable verbose/debug output\n  -q, --quiet                     Suppress all output except errors\n  --version                       Show version information\n  --help                          Show this message and exit\n```\n\n### Environment Variables\n\nYou can configure all CLI options through environment variables with `GERRIT_` prefix:\n\n```bash\nexport GERRIT_HOST=gerrit.example.org\nexport GERRIT_PORT=29418\nexport GERRIT_SSH_USER=myuser\nexport GERRIT_SSH_PRIVATE_KEY=~/.ssh/gerrit_key\nexport GERRIT_PATH_PREFIX=/workspace/repos\nexport GERRIT_SKIP_ARCHIVED=1\nexport GERRIT_THREADS=16\nexport GERRIT_CLONE_DEPTH=5\nexport GERRIT_BRANCH=main\nexport GERRIT_STRICT_HOST=1\nexport GERRIT_CLONE_TIMEOUT=300\nexport GERRIT_RETRY_ATTEMPTS=5\n\ngerrit-clone  # Uses environment variables\n```\n\n### Configuration Files\n\nCreate `~/.config/gerrit-clone/config.yaml`:\n\n```yaml\nhost: gerrit.example.org\nport: 29418\nssh_user: myuser\nssh_identity_file: ~/.ssh/gerrit_key\npath_prefix: /workspace/repos\nskip_archived: true\nthreads: 8\nclone_timeout: 600\nretry_attempts: 3\nretry_base_delay: 2.0\n```\n\nOr JSON format `~/.config/gerrit-clone/config.json`:\n\n```json\n{\n  \"host\": \"gerrit.example.org\",\n  \"port\": 29418,\n  \"ssh_user\": \"myuser\",\n  \"ssh_identity_file\": \"~/.ssh/gerrit_key\",\n  \"path_prefix\": \"/workspace/repos\",\n  \"skip_archived\": true,\n  \"threads\": 8\n}\n```\n\nConfiguration precedence: CLI arguments > Environment variables > Config file > Defaults\n\n## Nested Repository Support\n\nGerrit Clone includes intelligent support for nested repositories (projects with\nhierarchical names like `parent/child`):\n\n### Automatic Detection\n\n- **Dependency Ordering**: Parent repositories are automatically cloned before\n  their children\n- **Conflict Detection**: Identifies when parent repo content conflicts with\n  nested directory structure\n- **Smart Batching**: Uses dependency-aware batching to prevent race conditions\n\n### Conflict Resolution Options\n\n#### Skip Conflicting\n\n```bash\ngerrit-clone clone --host gerrit.example.org --no-move-conflicting\n```\n\nSkips nested repositories when parent contains conflicting files/directories.\nProvides clear warnings about skipped repos.\n\n#### Move Conflicting (Default - Recommended for Data Mining)\n\n```bash\ngerrit-clone clone --host gerrit.example.org\n```\n\nAutomatically moves conflicting content in parent repositories to\n`[NAME].parent` to allow complete nested cloning. This ensures **100%\nrepository availability** for reporting and analysis purposes.\n\n**Example:**\n\n- Parent repo `test` contains file named `test`\n- Child repo `test/test` needs directory `test/`\n- With move-conflicting enabled (default): File `test` \u2192 `test.parent`,\n  directory created for child repo\n- Result: Both repositories cloned with complete history preserved\n\n### Configuration\n\n```bash\n# Allow nested repositories (default: true)\n--allow-nested-git / --no-allow-nested-git\n\n# Protect parent repos by adding child paths to .git/info/exclude (default: true)\n--nested-protection / --no-nested-protection\n\n# Move conflicting content to allow complete cloning (default: true)\n--move-conflicting / --no-move-conflicting\n```\n\n## GitHub Action Usage\n\n### Basic Example\n\n```yaml\nname: Clone Gerrit Repositories\non: [push]\n\njobs:\n  clone:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Clone repositories\n        uses: lfreleng-actions/gerrit-clone-action@v1\n        with:\n          host: gerrit.example.org\n          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}\n          path-prefix: repositories\n```\n\n### Advanced Example\n\n```yaml\nname: Clone and Process Repositories\non:\n  schedule:\n    - cron: '0 2 * * *'  # Daily at 2 AM\n\njobs:\n  clone:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Clone repositories\n        id: clone\n        uses: lfreleng-actions/gerrit-clone-action@v1\n        with:\n          host: gerrit.example.org\n          port: 29418\n          base-url: https://gerrit.example.org/gerrit\n          ssh-user: automation\n          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}\n          path-prefix: workspace\n          skip-archived: true\n          threads: 12\n          depth: 1\n          branch: main\n          use-https: false\n          keep-remote-protocol: false\n          clone-timeout: 900\n          retry-attempts: 5\n          verbose: true\n\n      - name: Show results\n        run: |\n          echo \"Total: ${{ steps.clone.outputs.total-count }}\"\n          echo \"Success: ${{ steps.clone.outputs.success-count }}\"\n          echo \"Failed: ${{ steps.clone.outputs.failure-count }}\"\n          echo \"Manifest: ${{ steps.clone.outputs.manifest-path }}\"\n\n      - name: Upload manifest\n        uses: actions/upload-artifact@v4\n        with:\n          name: clone-manifest\n          path: ${{ steps.clone.outputs.manifest-path }}\n```\n\n### HTTPS Cloning Example\n\n```yaml\nname: Clone via HTTPS\non: [push]\n\njobs:\n  clone:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Clone repositories using HTTPS\n        uses: lfreleng-actions/gerrit-clone-action@v1\n        with:\n          host: gerrit.example.org\n          base-url: https://gerrit.example.org/r\n          use-https: true\n          path-prefix: repos\n          quiet: true\n        env:\n          # Use GitHub token or other auth for HTTPS\n          GIT_ASKPASS: echo\n          GIT_USERNAME: ${{ secrets.GERRIT_USERNAME }}\n          GIT_PASSWORD: ${{ secrets.GERRIT_TOKEN }}\n```\n\n### Nested Repositories with Conflict Resolution\n\n```yaml\nname: Complete Repository Mining\non: [workflow_dispatch]\n\njobs:\n  clone:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Clone all repositories (including nested with conflicts)\n        uses: lfreleng-actions/gerrit-clone-action@v1\n        with:\n          host: gerrit.example.org\n          use-https: true\n          allow-nested-git: true\n          nested-protection: true\n          move-conflicting: true  # Move conflicting files to ensure 100% clone success\n          path-prefix: complete-clone\n          threads: 8\n          verbose: true\n\n      - name: Verify complete data availability\n        run: |\n          echo \"Cloned: ${{ steps.clone.outputs.success-count }}\"\n          echo \"Total repositories: ${{ steps.clone.outputs.total-count }}\"\n          success_count=${{ steps.clone.outputs.success-count }}\n          total_count=${{ steps.clone.outputs.total-count }}\n          success_rate=$(( success_count * 100 / total_count ))\n          echo \"Success rate: ${success_rate}%\"\n\n          # Count moved conflicts\n          find complete-clone -name \"*.parent\" | wc -l | xargs echo \"Conflicts resolved:\"\n```\n\n### Configuration File Example\n\n```yaml\nname: Clone with Config File\non: [workflow_dispatch]\n\njobs:\n  clone:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout config\n        uses: actions/checkout@v4\n\n      - name: Clone repositories\n        uses: lfreleng-actions/gerrit-clone-action@v1\n        with:\n          config-file: .gerrit-clone-config.yaml\n          verbose: true\n```\n\n### Action Inputs\n\n<!-- markdownlint-disable MD013 -->\n\n| Input | Required | Default | Description |\n|-------|----------|---------|-------------|\n| `host` | Yes | | Gerrit server hostname |\n| `port` | No | `29418` | Gerrit SSH port |\n| `base-url` | No | | Base URL for Gerrit API (defaults to <https://HOST>) |\n| `ssh-user` | No | | SSH username for clone operations |\n| `ssh-private-key` | No | | SSH private key content for authentication |\n| `path-prefix` | No | `.` | Base directory for clone hierarchy |\n| `skip-archived` | No | `true` | Skip archived and inactive repositories |\n| `include-project` | No | | Restrict cloning to specific project(s) (comma-separated) |\n| `ssh-debug` | No | `false` | Enable verbose SSH (-vvv) for troubleshooting |\n| `allow-nested-git` | No | `true` | Allow nested git working trees |\n| `nested-protection` | No | `true` | Auto-add nested child repo paths to parent .git/info/exclude |\n| `move-conflicting` | No | `false` | Move conflicting files/directories in parent repos to [NAME].parent |\n| `exit-on-error` | No | `false` | Exit when first error occurs |\n| `threads` | No | auto | Number of concurrent clone threads |\n| `depth` | No | | Create shallow clone with given depth |\n| `branch` | No | | Clone specific branch instead of default |\n| `use-https` | No | `false` | Use HTTPS for cloning instead of SSH |\n| `keep-remote-protocol` | No | `false` | Keep original clone protocol for remote |\n| `strict-host` | No | `true` | SSH strict host key checking |\n| `clone-timeout` | No | `600` | Timeout per clone operation in seconds |\n| `retry-attempts` | No | `3` | Max retry attempts per repository |\n| `retry-base-delay` | No | `2.0` | Base delay for retry backoff in seconds |\n| `retry-factor` | No | `2.0` | Exponential backoff factor for retries |\n| `retry-max-delay` | No | `30.0` | Max retry delay in seconds |\n| `manifest-filename` | No | `clone-manifest.json` | Output manifest filename |\n| `config-file` | No | | Configuration file path (YAML or JSON) |\n| `verbose` | No | `false` | Enable verbose/debug output |\n| `quiet` | No | `false` | Suppress all output except errors |\n| `log-file` | No | | Custom log file path |\n| `disable-log-file` | No | `false` | Disable creation of log file |\n| `log-level` | No | `DEBUG` | File logging level |\n\n<!-- markdownlint-enable MD013 -->\n\n### Action Outputs\n\n| Output | Description |\n|--------|-------------|\n| `manifest-path` | Path to the generated clone manifest file |\n| `success-count` | Number of cloned repositories |\n| `failure-count` | Number of failed clone attempts |\n| `total-count` | Total number of repositories processed |\n\n## SSH Configuration\n\nThe tool provides comprehensive SSH authentication support with automatic\nconfiguration detection:\n\n### SSH Authentication Options\n\nThe following SSH authentication options are available across all interfaces:\n\n<!-- markdownlint-disable MD013 -->\n| Option | CLI | Environment | Action | Description |\n|--------|-----|-------------|--------|-------------|\n| SSH User | `-u` | `GERRIT_SSH_USER` | `ssh-user` | SSH username |\n| SSH Key | `-i` (file) | `GERRIT_SSH_PRIVATE_KEY` | `ssh-private-key` (content) | Private key |\n| Host Check | `--strict-host` | `GERRIT_STRICT_HOST` | `strict-host` | Key check |\n<!-- markdownlint-enable MD013 -->\n\n### Authentication Methods\n\nThree authentication methods provide automatic fallback:\n\n1. **SSH Agent (Recommended)**: Uses keys loaded into SSH agent with automatic\n   detection\n2. **Identity File**: Explicitly specified private key files with permission\n   validation\n3. **SSH Config**: Host-specific configuration from ~/.ssh/config with full\n   option support\n\n### SSH Setup Examples\n\n#### Using SSH Agent (Recommended for development)\n\n1. Generate SSH key pair:\n\n   ```bash\n   ssh-keygen -t ed25519 -C \"your.email@example.com\"\n   ```\n\n2. Add public key to Gerrit profile\n\n3. Add private key to SSH agent:\n\n   ```bash\n   ssh-add ~/.ssh/id_ed25519\n   ```\n\n4. Clone with agent authentication:\n\n   ```bash\n   gerrit-clone clone --host gerrit.example.org --ssh-user myuser\n   ```\n\n#### Using SSH Identity File (Recommended for CI/CD)\n\n1. Place private key file securely (e.g., `/path/to/private_key`)\n\n2. Set proper permissions:\n\n   ```bash\n   chmod 600 /path/to/private_key\n   ```\n\n3. Clone with identity file:\n\n   ```bash\n   gerrit-clone clone --host gerrit.example.org \\\n     --ssh-user myuser \\\n     --ssh-private-key /path/to/private_key\n   ```\n\n4. Or use environment variables:\n\n   ```bash\n   export GERRIT_SSH_USER=myuser\n   export GERRIT_SSH_PRIVATE_KEY=/path/to/private_key\n   gerrit-clone clone --host gerrit.example.org\n   ```\n\n### SSH Config\n\nCreate `~/.ssh/config` entries for convenience:\n\n```text\nHost gerrit.example.org\n    User myusername\n    IdentityFile ~/.ssh/gerrit_key\n    StrictHostKeyChecking yes\n```\n\n### Known Hosts\n\nPre-populate known hosts to avoid prompts (recommended for CI/CD):\n\n```bash\nssh-keyscan -H -p 29418 gerrit.example.org >> ~/.ssh/known_hosts\n```\n\nTest SSH connectivity before cloning:\n\n```bash\nssh -p 29418 myuser@gerrit.example.org gerrit version\n```\n\n## Output Manifest\n\nEach run generates a detailed JSON manifest (`clone-manifest.json`):\n\n```json\n{\n  \"version\": \"1.0\",\n  \"generated_at\": \"2025-01-15T10:30:45Z\",\n  \"host\": \"gerrit.example.org\",\n  \"port\": 29418,\n  \"total\": 42,\n  \"succeeded\": 154,\n  \"failed\": 2,\n  \"skipped\": 0,\n  \"success_rate\": 98.7,\n  \"duration_s\": 89.3,\n  \"config\": {\n    \"skip_archived\": true,\n    \"threads\": 8,\n    \"depth\": null,\n    \"branch\": null,\n    \"strict_host_checking\": true,\n    \"path_prefix\": \"/workspace/repos\"\n  },\n  \"results\": [\n    {\n      \"project\": \"core/api\",\n      \"path\": \"core/api\",\n      \"status\": \"success\",\n      \"attempts\": 1,\n      \"duration_s\": 3.42,\n      \"error\": null,\n      \"started_at\": \"2025-01-15T10:30:15Z\",\n      \"completed_at\": \"2025-01-15T10:30:18Z\"\n    },\n    {\n      \"project\": \"tools/legacy\",\n      \"path\": \"tools/legacy\",\n      \"status\": \"failed\",\n      \"attempts\": 3,\n      \"duration_s\": 15.8,\n      \"error\": \"timeout after 600s\",\n      \"started_at\": \"2025-01-15T10:30:20Z\",\n      \"completed_at\": \"2025-01-15T10:30:36Z\"\n    }\n  ]\n}\n```\n\n## Error Handling\n\n### Common Issues\n\n#### Host key verification failed\n\n```bash\n# Accept new host keys (use with caution)\ngerrit-clone --host gerrit.example.org --accept-unknown-host\n\n# Recommended: Pre-populate known_hosts\nssh-keyscan -H -p 29418 gerrit.example.org >> ~/.ssh/known_hosts\n```\n\n#### Permission denied (publickey)\n\n- Verify SSH public key exists in Gerrit profile\n- Check SSH agent has key loaded: `ssh-add -l`\n- Test SSH connection: `ssh -p 29418 username@gerrit.example.org gerrit version`\n- Verify SSH key permissions: `chmod 600 ~/.ssh/id_rsa`\n\n#### Connection timeout or network errors\n\n- Verify Gerrit server hostname and port (often 29418 for SSH)\n- Check network connectivity and firewall rules\n- Increase timeout: `--clone-timeout 900`\n- Reduce concurrency: `--threads 4`\n\n#### Path conflicts or permission errors\n\n- Existing non-git directories block clones\n- Use clean target directory: `--path-prefix ./clean-workspace`\n- Check disk space and write permissions\n- Remove conflicting directories: `rm -rf conflicting-path`\n\n#### API discovery failures\n\n- Manually specify base URL: `--base-url https://host/gerrit`\n- Verify Gerrit server is accessible via HTTPS\n- Check for corporate proxy or firewall restrictions\n\n### Exit Codes\n\n- `0`: Success (all repositories cloned)\n- `1`: Failure (one or more repositories failed to clone)\n- `130`: Interrupted by user (Ctrl+C)\n\n## Development\n\n### Requirements\n\n- Python 3.11+ (tested on 3.11, 3.12, 3.13, 3.14)\n- uv package manager (for development)\n- Git (for clone operations)\n- SSH client (for authentication)\n\n### Setup\n\n```bash\ngit clone https://github.com/lfreleng-actions/gerrit-clone-action.git\ncd gerrit-clone-action\nuv sync --dev\n```\n\n### Testing\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage report\nuv run pytest --cov=gerrit_clone --cov-report=html --cov-report=term-missing\n\n# Run integration tests (requires network)\nuv run pytest tests/integration/ -v\n\n# Run specific test categories\nuv run pytest -m \"not integration\" -v  # Unit tests\nuv run pytest tests/test_models.py::TestConfig -v  # Specific test class\n```\n\n### Linting\n\n```bash\n# Run pre-commit hooks\nuv run pre-commit run --all-files\n\n# Individual tools\nuv run ruff check .\nuv run ruff format .\nuv run mypy src/\n```\n\n## License\n\nThis project uses the Apache License 2.0. See LICENSE for details.\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make changes with tests\n4. Run linting and tests\n5. Submit a pull request\n\n## Support\n\n- **GitHub Issues**: Report bugs and request features at\n  [lfreleng-actions/gerrit-clone-action](https://github.com/lfreleng-actions/gerrit-clone-action/issues)\n- **Documentation**: This README, IMPLEMENTATION.md, and inline help\n  (`gerrit-clone --help`)\n- **Examples**: Advanced usage patterns in repository examples/\n- **Integration Tests**: Real-world server validation in tests/integration/\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A multi-threaded CLI tool for bulk cloning repositories from Gerrit servers",
    "version": "0.1.7",
    "project_urls": {
        "Changelog": "https://github.com/lfreleng-actions/gerrit-clone-action/releases",
        "Documentation": "https://github.com/lfreleng-actions/gerrit-clone-action#readme",
        "Homepage": "https://github.com/lfreleng-actions/gerrit-clone-action",
        "Issues": "https://github.com/lfreleng-actions/gerrit-clone-action/issues",
        "Repository": "https://github.com/lfreleng-actions/gerrit-clone-action.git"
    },
    "split_keywords": [
        "bulk",
        " ci-cd",
        " clone",
        " gerrit",
        " git",
        " multi-threaded",
        " repository",
        " ssh"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f01ebbd1eca8fe342f51c8e776520623944106ed2d38e005aaa4aab1a8b7ae37",
                "md5": "5ded976d98fc00e8bdd59589a1cdc702",
                "sha256": "f08f1e5b4e140fe9ccebe79c18b9ac5c49726c5578a8c965bed764109b4b6dc3"
            },
            "downloads": -1,
            "filename": "gerrit_clone-0.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5ded976d98fc00e8bdd59589a1cdc702",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 71135,
            "upload_time": "2025-10-07T08:33:47",
            "upload_time_iso_8601": "2025-10-07T08:33:47.730341Z",
            "url": "https://files.pythonhosted.org/packages/f0/1e/bbd1eca8fe342f51c8e776520623944106ed2d38e005aaa4aab1a8b7ae37/gerrit_clone-0.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6016fd3818546afd1620edc2f85d29df7ec8459688e79bb39398e06cb6cc1aea",
                "md5": "7e742c0eecfcd0dafa5dafc7aac95ca5",
                "sha256": "3e08e8922fb470953b01076af6006b16473be28bddac6f6de1c174a9a8e6e99e"
            },
            "downloads": -1,
            "filename": "gerrit_clone-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "7e742c0eecfcd0dafa5dafc7aac95ca5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 101813,
            "upload_time": "2025-10-07T08:33:49",
            "upload_time_iso_8601": "2025-10-07T08:33:49.249222Z",
            "url": "https://files.pythonhosted.org/packages/60/16/fd3818546afd1620edc2f85d29df7ec8459688e79bb39398e06cb6cc1aea/gerrit_clone-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-07 08:33:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lfreleng-actions",
    "github_project": "gerrit-clone-action",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "gerrit-clone"
}
        
Elapsed time: 2.32381s