<!--
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: 2025 Matthew Watkins <mwatkins@linuxfoundation.org>
-->
# 🔄 Gerrit Clone
A production-ready multi-threaded CLI tool and GitHub Action for bulk cloning
repositories from Gerrit servers with automatic API discovery. Designed for
reliability, speed, and CI/CD compatibility.
## Features
- **Automatic API Discovery**: Discovers Gerrit API endpoints across different
server configurations (`/r`, `/gerrit`, `/infra`, etc.)
- **Bulk Repository Discovery**: Fetches all projects via Gerrit REST API with
intelligent filtering
- **Multi-threaded Cloning**: Concurrent operations with auto-scaling thread
pools (up to 32 workers)
- **Hierarchy Preservation**: Maintains complete Gerrit project folder
structure without flattening
- **Robust Retry Logic**: Exponential backoff with jitter for transient
network and server failures
- **SSH Integration**: Full SSH agent, identity file, and config support
- **CI/CD Ready**: Non-interactive operation with structured JSON manifests
- **Smart Filtering**: Automatically excludes system repos and archived
projects
- **Rich Progress Display**: Beautiful terminal progress bars with per-repo
status tracking
- **Comprehensive Logging**: Structured logging with configurable verbosity
levels
## Installation
### Using uvx (Recommended)
For one-time execution without installation:
```bash
uvx gerrit-clone --host gerrit.example.org
```
### Using uv
```bash
uv tool install gerrit-clone
./gerrit-clone --host gerrit.example.org
```
### From Source
```bash
git clone https://github.com/lfreleng-actions/gerrit-clone-action.git
cd gerrit-clone-action
uv sync
uv run gerrit-clone --host gerrit.example.org
```
## CLI Usage
### Basic Examples
Clone all active repositories from a Gerrit server:
```bash
gerrit-clone --host gerrit.example.org
```
Clone to a specific directory with custom thread count:
```bash
gerrit-clone --host gerrit.example.org \
--path-prefix ./repositories \
--threads 8
```
Clone with shallow depth and specific branch:
```bash
gerrit-clone --host gerrit.example.org \
--depth 10 \
--branch main \
--threads 16
```
Include archived repositories and use custom SSH key:
```bash
gerrit-clone --host gerrit.example.org \
--include-archived \
--ssh-user myuser \
--ssh-private-key ~/.ssh/gerrit_rsa
```
### Command-Line Options
```text
Usage: gerrit-clone [OPTIONS]
Options:
-h, --host TEXT Gerrit server hostname [required]
-p, --port INTEGER Gerrit SSH port [default: 29418]
--base-url TEXT Base URL for Gerrit API
-u, --ssh-user TEXT SSH username for clone operations
-i, --ssh-private-key PATH SSH private key file for authentication
--path-prefix PATH Base directory for clone hierarchy [default: .]
--skip-archived / --include-archived
Skip archived and inactive repositories
[default: skip-archived]
--include-project TEXT Restrict cloning to specific project(s)
--ssh-debug Enable verbose SSH (-vvv) for troubleshooting
--allow-nested-git / --no-allow-nested-git
Allow nested git working trees when cloning
both parent and child repositories [default: allow-nested-git]
--nested-protection / --no-nested-protection
Auto-add nested child repo paths to parent
.git/info/exclude [default: nested-protection]
--move-conflicting Move conflicting files/directories in parent
repos to [NAME].parent to allow nested cloning
-t, --threads INTEGER Number of concurrent clone threads
-d, --depth INTEGER Create shallow clone with given depth
-b, --branch TEXT Clone specific branch instead of default
--https / --ssh Use HTTPS for cloning [default: ssh]
--keep-remote-protocol Keep original clone protocol for remote
--strict-host / --accept-unknown-host
SSH strict host key checking [default: strict-host]
--clone-timeout INTEGER Timeout per clone operation in seconds
[default: 600]
--retry-attempts INTEGER Max retry attempts per repository
[default: 3]
--retry-base-delay FLOAT Base delay for retry backoff in seconds
[default: 2.0]
--retry-factor FLOAT Exponential backoff factor [default: 2.0]
--retry-max-delay FLOAT Max retry delay in seconds [default: 30.0]
--manifest-filename TEXT Output manifest filename [default: clone-manifest.json]
-c, --config-file PATH Configuration file path (YAML or JSON)
--exit-on-error Exit when first error occurs
--log-file PATH Custom log file path
--disable-log-file Disable creation of log file
--log-level TEXT File logging level [default: DEBUG]
-v, --verbose Enable verbose/debug output
-q, --quiet Suppress all output except errors
--version Show version information
--help Show this message and exit
```
### Environment Variables
You can configure all CLI options through environment variables with `GERRIT_` prefix:
```bash
export GERRIT_HOST=gerrit.example.org
export GERRIT_PORT=29418
export GERRIT_SSH_USER=myuser
export GERRIT_SSH_PRIVATE_KEY=~/.ssh/gerrit_key
export GERRIT_PATH_PREFIX=/workspace/repos
export GERRIT_SKIP_ARCHIVED=1
export GERRIT_THREADS=16
export GERRIT_CLONE_DEPTH=5
export GERRIT_BRANCH=main
export GERRIT_STRICT_HOST=1
export GERRIT_CLONE_TIMEOUT=300
export GERRIT_RETRY_ATTEMPTS=5
gerrit-clone # Uses environment variables
```
### Configuration Files
Create `~/.config/gerrit-clone/config.yaml`:
```yaml
host: gerrit.example.org
port: 29418
ssh_user: myuser
ssh_identity_file: ~/.ssh/gerrit_key
path_prefix: /workspace/repos
skip_archived: true
threads: 8
clone_timeout: 600
retry_attempts: 3
retry_base_delay: 2.0
```
Or JSON format `~/.config/gerrit-clone/config.json`:
```json
{
"host": "gerrit.example.org",
"port": 29418,
"ssh_user": "myuser",
"ssh_identity_file": "~/.ssh/gerrit_key",
"path_prefix": "/workspace/repos",
"skip_archived": true,
"threads": 8
}
```
Configuration precedence: CLI arguments > Environment variables > Config file > Defaults
## Nested Repository Support
Gerrit Clone includes intelligent support for nested repositories (projects with
hierarchical names like `parent/child`):
### Automatic Detection
- **Dependency Ordering**: Parent repositories are automatically cloned before
their children
- **Conflict Detection**: Identifies when parent repo content conflicts with
nested directory structure
- **Smart Batching**: Uses dependency-aware batching to prevent race conditions
### Conflict Resolution Options
#### Skip Conflicting
```bash
gerrit-clone clone --host gerrit.example.org --no-move-conflicting
```
Skips nested repositories when parent contains conflicting files/directories.
Provides clear warnings about skipped repos.
#### Move Conflicting (Default - Recommended for Data Mining)
```bash
gerrit-clone clone --host gerrit.example.org
```
Automatically moves conflicting content in parent repositories to
`[NAME].parent` to allow complete nested cloning. This ensures **100%
repository availability** for reporting and analysis purposes.
**Example:**
- Parent repo `test` contains file named `test`
- Child repo `test/test` needs directory `test/`
- With move-conflicting enabled (default): File `test` → `test.parent`,
directory created for child repo
- Result: Both repositories cloned with complete history preserved
### Configuration
```bash
# Allow nested repositories (default: true)
--allow-nested-git / --no-allow-nested-git
# Protect parent repos by adding child paths to .git/info/exclude (default: true)
--nested-protection / --no-nested-protection
# Move conflicting content to allow complete cloning (default: true)
--move-conflicting / --no-move-conflicting
```
## GitHub Action Usage
### Basic Example
```yaml
name: Clone Gerrit Repositories
on: [push]
jobs:
clone:
runs-on: ubuntu-latest
steps:
- name: Clone repositories
uses: lfreleng-actions/gerrit-clone-action@v1
with:
host: gerrit.example.org
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
path-prefix: repositories
```
### Advanced Example
```yaml
name: Clone and Process Repositories
on:
schedule:
- cron: '0 2 * * *' # Daily at 2 AM
jobs:
clone:
runs-on: ubuntu-latest
steps:
- name: Clone repositories
id: clone
uses: lfreleng-actions/gerrit-clone-action@v1
with:
host: gerrit.example.org
port: 29418
base-url: https://gerrit.example.org/gerrit
ssh-user: automation
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
path-prefix: workspace
skip-archived: true
threads: 12
depth: 1
branch: main
use-https: false
keep-remote-protocol: false
clone-timeout: 900
retry-attempts: 5
verbose: true
- name: Show results
run: |
echo "Total: ${{ steps.clone.outputs.total-count }}"
echo "Success: ${{ steps.clone.outputs.success-count }}"
echo "Failed: ${{ steps.clone.outputs.failure-count }}"
echo "Manifest: ${{ steps.clone.outputs.manifest-path }}"
- name: Upload manifest
uses: actions/upload-artifact@v4
with:
name: clone-manifest
path: ${{ steps.clone.outputs.manifest-path }}
```
### HTTPS Cloning Example
```yaml
name: Clone via HTTPS
on: [push]
jobs:
clone:
runs-on: ubuntu-latest
steps:
- name: Clone repositories using HTTPS
uses: lfreleng-actions/gerrit-clone-action@v1
with:
host: gerrit.example.org
base-url: https://gerrit.example.org/r
use-https: true
path-prefix: repos
quiet: true
env:
# Use GitHub token or other auth for HTTPS
GIT_ASKPASS: echo
GIT_USERNAME: ${{ secrets.GERRIT_USERNAME }}
GIT_PASSWORD: ${{ secrets.GERRIT_TOKEN }}
```
### Nested Repositories with Conflict Resolution
```yaml
name: Complete Repository Mining
on: [workflow_dispatch]
jobs:
clone:
runs-on: ubuntu-latest
steps:
- name: Clone all repositories (including nested with conflicts)
uses: lfreleng-actions/gerrit-clone-action@v1
with:
host: gerrit.example.org
use-https: true
allow-nested-git: true
nested-protection: true
move-conflicting: true # Move conflicting files to ensure 100% clone success
path-prefix: complete-clone
threads: 8
verbose: true
- name: Verify complete data availability
run: |
echo "Cloned: ${{ steps.clone.outputs.success-count }}"
echo "Total repositories: ${{ steps.clone.outputs.total-count }}"
success_count=${{ steps.clone.outputs.success-count }}
total_count=${{ steps.clone.outputs.total-count }}
success_rate=$(( success_count * 100 / total_count ))
echo "Success rate: ${success_rate}%"
# Count moved conflicts
find complete-clone -name "*.parent" | wc -l | xargs echo "Conflicts resolved:"
```
### Configuration File Example
```yaml
name: Clone with Config File
on: [workflow_dispatch]
jobs:
clone:
runs-on: ubuntu-latest
steps:
- name: Checkout config
uses: actions/checkout@v4
- name: Clone repositories
uses: lfreleng-actions/gerrit-clone-action@v1
with:
config-file: .gerrit-clone-config.yaml
verbose: true
```
### Action Inputs
<!-- markdownlint-disable MD013 -->
| Input | Required | Default | Description |
|-------|----------|---------|-------------|
| `host` | Yes | | Gerrit server hostname |
| `port` | No | `29418` | Gerrit SSH port |
| `base-url` | No | | Base URL for Gerrit API (defaults to <https://HOST>) |
| `ssh-user` | No | | SSH username for clone operations |
| `ssh-private-key` | No | | SSH private key content for authentication |
| `path-prefix` | No | `.` | Base directory for clone hierarchy |
| `skip-archived` | No | `true` | Skip archived and inactive repositories |
| `include-project` | No | | Restrict cloning to specific project(s) (comma-separated) |
| `ssh-debug` | No | `false` | Enable verbose SSH (-vvv) for troubleshooting |
| `allow-nested-git` | No | `true` | Allow nested git working trees |
| `nested-protection` | No | `true` | Auto-add nested child repo paths to parent .git/info/exclude |
| `move-conflicting` | No | `false` | Move conflicting files/directories in parent repos to [NAME].parent |
| `exit-on-error` | No | `false` | Exit when first error occurs |
| `threads` | No | auto | Number of concurrent clone threads |
| `depth` | No | | Create shallow clone with given depth |
| `branch` | No | | Clone specific branch instead of default |
| `use-https` | No | `false` | Use HTTPS for cloning instead of SSH |
| `keep-remote-protocol` | No | `false` | Keep original clone protocol for remote |
| `strict-host` | No | `true` | SSH strict host key checking |
| `clone-timeout` | No | `600` | Timeout per clone operation in seconds |
| `retry-attempts` | No | `3` | Max retry attempts per repository |
| `retry-base-delay` | No | `2.0` | Base delay for retry backoff in seconds |
| `retry-factor` | No | `2.0` | Exponential backoff factor for retries |
| `retry-max-delay` | No | `30.0` | Max retry delay in seconds |
| `manifest-filename` | No | `clone-manifest.json` | Output manifest filename |
| `config-file` | No | | Configuration file path (YAML or JSON) |
| `verbose` | No | `false` | Enable verbose/debug output |
| `quiet` | No | `false` | Suppress all output except errors |
| `log-file` | No | | Custom log file path |
| `disable-log-file` | No | `false` | Disable creation of log file |
| `log-level` | No | `DEBUG` | File logging level |
<!-- markdownlint-enable MD013 -->
### Action Outputs
| Output | Description |
|--------|-------------|
| `manifest-path` | Path to the generated clone manifest file |
| `success-count` | Number of cloned repositories |
| `failure-count` | Number of failed clone attempts |
| `total-count` | Total number of repositories processed |
## SSH Configuration
The tool provides comprehensive SSH authentication support with automatic
configuration detection:
### SSH Authentication Options
The following SSH authentication options are available across all interfaces:
<!-- markdownlint-disable MD013 -->
| Option | CLI | Environment | Action | Description |
|--------|-----|-------------|--------|-------------|
| SSH User | `-u` | `GERRIT_SSH_USER` | `ssh-user` | SSH username |
| SSH Key | `-i` (file) | `GERRIT_SSH_PRIVATE_KEY` | `ssh-private-key` (content) | Private key |
| Host Check | `--strict-host` | `GERRIT_STRICT_HOST` | `strict-host` | Key check |
<!-- markdownlint-enable MD013 -->
### Authentication Methods
Three authentication methods provide automatic fallback:
1. **SSH Agent (Recommended)**: Uses keys loaded into SSH agent with automatic
detection
2. **Identity File**: Explicitly specified private key files with permission
validation
3. **SSH Config**: Host-specific configuration from ~/.ssh/config with full
option support
### SSH Setup Examples
#### Using SSH Agent (Recommended for development)
1. Generate SSH key pair:
```bash
ssh-keygen -t ed25519 -C "your.email@example.com"
```
2. Add public key to Gerrit profile
3. Add private key to SSH agent:
```bash
ssh-add ~/.ssh/id_ed25519
```
4. Clone with agent authentication:
```bash
gerrit-clone clone --host gerrit.example.org --ssh-user myuser
```
#### Using SSH Identity File (Recommended for CI/CD)
1. Place private key file securely (e.g., `/path/to/private_key`)
2. Set proper permissions:
```bash
chmod 600 /path/to/private_key
```
3. Clone with identity file:
```bash
gerrit-clone clone --host gerrit.example.org \
--ssh-user myuser \
--ssh-private-key /path/to/private_key
```
4. Or use environment variables:
```bash
export GERRIT_SSH_USER=myuser
export GERRIT_SSH_PRIVATE_KEY=/path/to/private_key
gerrit-clone clone --host gerrit.example.org
```
### SSH Config
Create `~/.ssh/config` entries for convenience:
```text
Host gerrit.example.org
User myusername
IdentityFile ~/.ssh/gerrit_key
StrictHostKeyChecking yes
```
### Known Hosts
Pre-populate known hosts to avoid prompts (recommended for CI/CD):
```bash
ssh-keyscan -H -p 29418 gerrit.example.org >> ~/.ssh/known_hosts
```
Test SSH connectivity before cloning:
```bash
ssh -p 29418 myuser@gerrit.example.org gerrit version
```
## Output Manifest
Each run generates a detailed JSON manifest (`clone-manifest.json`):
```json
{
"version": "1.0",
"generated_at": "2025-01-15T10:30:45Z",
"host": "gerrit.example.org",
"port": 29418,
"total": 42,
"succeeded": 154,
"failed": 2,
"skipped": 0,
"success_rate": 98.7,
"duration_s": 89.3,
"config": {
"skip_archived": true,
"threads": 8,
"depth": null,
"branch": null,
"strict_host_checking": true,
"path_prefix": "/workspace/repos"
},
"results": [
{
"project": "core/api",
"path": "core/api",
"status": "success",
"attempts": 1,
"duration_s": 3.42,
"error": null,
"started_at": "2025-01-15T10:30:15Z",
"completed_at": "2025-01-15T10:30:18Z"
},
{
"project": "tools/legacy",
"path": "tools/legacy",
"status": "failed",
"attempts": 3,
"duration_s": 15.8,
"error": "timeout after 600s",
"started_at": "2025-01-15T10:30:20Z",
"completed_at": "2025-01-15T10:30:36Z"
}
]
}
```
## Error Handling
### Common Issues
#### Host key verification failed
```bash
# Accept new host keys (use with caution)
gerrit-clone --host gerrit.example.org --accept-unknown-host
# Recommended: Pre-populate known_hosts
ssh-keyscan -H -p 29418 gerrit.example.org >> ~/.ssh/known_hosts
```
#### Permission denied (publickey)
- Verify SSH public key exists in Gerrit profile
- Check SSH agent has key loaded: `ssh-add -l`
- Test SSH connection: `ssh -p 29418 username@gerrit.example.org gerrit version`
- Verify SSH key permissions: `chmod 600 ~/.ssh/id_rsa`
#### Connection timeout or network errors
- Verify Gerrit server hostname and port (often 29418 for SSH)
- Check network connectivity and firewall rules
- Increase timeout: `--clone-timeout 900`
- Reduce concurrency: `--threads 4`
#### Path conflicts or permission errors
- Existing non-git directories block clones
- Use clean target directory: `--path-prefix ./clean-workspace`
- Check disk space and write permissions
- Remove conflicting directories: `rm -rf conflicting-path`
#### API discovery failures
- Manually specify base URL: `--base-url https://host/gerrit`
- Verify Gerrit server is accessible via HTTPS
- Check for corporate proxy or firewall restrictions
### Exit Codes
- `0`: Success (all repositories cloned)
- `1`: Failure (one or more repositories failed to clone)
- `130`: Interrupted by user (Ctrl+C)
## Development
### Requirements
- Python 3.11+ (tested on 3.11, 3.12, 3.13, 3.14)
- uv package manager (for development)
- Git (for clone operations)
- SSH client (for authentication)
### Setup
```bash
git clone https://github.com/lfreleng-actions/gerrit-clone-action.git
cd gerrit-clone-action
uv sync --dev
```
### Testing
```bash
# Run all tests
uv run pytest
# Run with coverage report
uv run pytest --cov=gerrit_clone --cov-report=html --cov-report=term-missing
# Run integration tests (requires network)
uv run pytest tests/integration/ -v
# Run specific test categories
uv run pytest -m "not integration" -v # Unit tests
uv run pytest tests/test_models.py::TestConfig -v # Specific test class
```
### Linting
```bash
# Run pre-commit hooks
uv run pre-commit run --all-files
# Individual tools
uv run ruff check .
uv run ruff format .
uv run mypy src/
```
## License
This project uses the Apache License 2.0. See LICENSE for details.
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make changes with tests
4. Run linting and tests
5. Submit a pull request
## Support
- **GitHub Issues**: Report bugs and request features at
[lfreleng-actions/gerrit-clone-action](https://github.com/lfreleng-actions/gerrit-clone-action/issues)
- **Documentation**: This README, IMPLEMENTATION.md, and inline help
(`gerrit-clone --help`)
- **Examples**: Advanced usage patterns in repository examples/
- **Integration Tests**: Real-world server validation in tests/integration/
Raw data
{
"_id": null,
"home_page": null,
"name": "gerrit-clone",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": "Matthew Watkins <mwatkins@linuxfoundation.org>",
"keywords": "bulk, ci-cd, clone, gerrit, git, multi-threaded, repository, ssh",
"author": null,
"author_email": "Matthew Watkins <mwatkins@linuxfoundation.org>",
"download_url": "https://files.pythonhosted.org/packages/60/16/fd3818546afd1620edc2f85d29df7ec8459688e79bb39398e06cb6cc1aea/gerrit_clone-0.1.7.tar.gz",
"platform": null,
"description": "<!--\n# SPDX-License-Identifier: Apache-2.0\n# SPDX-FileCopyrightText: 2025 Matthew Watkins <mwatkins@linuxfoundation.org>\n-->\n\n# \ud83d\udd04 Gerrit Clone\n\nA production-ready multi-threaded CLI tool and GitHub Action for bulk cloning\nrepositories from Gerrit servers with automatic API discovery. Designed for\nreliability, speed, and CI/CD compatibility.\n\n## Features\n\n- **Automatic API Discovery**: Discovers Gerrit API endpoints across different\n server configurations (`/r`, `/gerrit`, `/infra`, etc.)\n- **Bulk Repository Discovery**: Fetches all projects via Gerrit REST API with\n intelligent filtering\n- **Multi-threaded Cloning**: Concurrent operations with auto-scaling thread\n pools (up to 32 workers)\n- **Hierarchy Preservation**: Maintains complete Gerrit project folder\n structure without flattening\n- **Robust Retry Logic**: Exponential backoff with jitter for transient\n network and server failures\n- **SSH Integration**: Full SSH agent, identity file, and config support\n- **CI/CD Ready**: Non-interactive operation with structured JSON manifests\n- **Smart Filtering**: Automatically excludes system repos and archived\n projects\n- **Rich Progress Display**: Beautiful terminal progress bars with per-repo\n status tracking\n- **Comprehensive Logging**: Structured logging with configurable verbosity\n levels\n\n## Installation\n\n### Using uvx (Recommended)\n\nFor one-time execution without installation:\n\n```bash\nuvx gerrit-clone --host gerrit.example.org\n```\n\n### Using uv\n\n```bash\nuv tool install gerrit-clone\n./gerrit-clone --host gerrit.example.org\n```\n\n### From Source\n\n```bash\ngit clone https://github.com/lfreleng-actions/gerrit-clone-action.git\ncd gerrit-clone-action\nuv sync\nuv run gerrit-clone --host gerrit.example.org\n```\n\n## CLI Usage\n\n### Basic Examples\n\nClone all active repositories from a Gerrit server:\n\n```bash\ngerrit-clone --host gerrit.example.org\n```\n\nClone to a specific directory with custom thread count:\n\n```bash\ngerrit-clone --host gerrit.example.org \\\n --path-prefix ./repositories \\\n --threads 8\n```\n\nClone with shallow depth and specific branch:\n\n```bash\ngerrit-clone --host gerrit.example.org \\\n --depth 10 \\\n --branch main \\\n --threads 16\n```\n\nInclude archived repositories and use custom SSH key:\n\n```bash\ngerrit-clone --host gerrit.example.org \\\n --include-archived \\\n --ssh-user myuser \\\n --ssh-private-key ~/.ssh/gerrit_rsa\n```\n\n### Command-Line Options\n\n```text\nUsage: gerrit-clone [OPTIONS]\n\nOptions:\n -h, --host TEXT Gerrit server hostname [required]\n -p, --port INTEGER Gerrit SSH port [default: 29418]\n --base-url TEXT Base URL for Gerrit API\n -u, --ssh-user TEXT SSH username for clone operations\n -i, --ssh-private-key PATH SSH private key file for authentication\n --path-prefix PATH Base directory for clone hierarchy [default: .]\n --skip-archived / --include-archived\n Skip archived and inactive repositories\n [default: skip-archived]\n --include-project TEXT Restrict cloning to specific project(s)\n --ssh-debug Enable verbose SSH (-vvv) for troubleshooting\n --allow-nested-git / --no-allow-nested-git\n Allow nested git working trees when cloning\n both parent and child repositories [default: allow-nested-git]\n --nested-protection / --no-nested-protection\n Auto-add nested child repo paths to parent\n .git/info/exclude [default: nested-protection]\n --move-conflicting Move conflicting files/directories in parent\n repos to [NAME].parent to allow nested cloning\n -t, --threads INTEGER Number of concurrent clone threads\n -d, --depth INTEGER Create shallow clone with given depth\n -b, --branch TEXT Clone specific branch instead of default\n --https / --ssh Use HTTPS for cloning [default: ssh]\n --keep-remote-protocol Keep original clone protocol for remote\n --strict-host / --accept-unknown-host\n SSH strict host key checking [default: strict-host]\n --clone-timeout INTEGER Timeout per clone operation in seconds\n [default: 600]\n --retry-attempts INTEGER Max retry attempts per repository\n [default: 3]\n --retry-base-delay FLOAT Base delay for retry backoff in seconds\n [default: 2.0]\n --retry-factor FLOAT Exponential backoff factor [default: 2.0]\n --retry-max-delay FLOAT Max retry delay in seconds [default: 30.0]\n --manifest-filename TEXT Output manifest filename [default: clone-manifest.json]\n -c, --config-file PATH Configuration file path (YAML or JSON)\n --exit-on-error Exit when first error occurs\n --log-file PATH Custom log file path\n --disable-log-file Disable creation of log file\n --log-level TEXT File logging level [default: DEBUG]\n -v, --verbose Enable verbose/debug output\n -q, --quiet Suppress all output except errors\n --version Show version information\n --help Show this message and exit\n```\n\n### Environment Variables\n\nYou can configure all CLI options through environment variables with `GERRIT_` prefix:\n\n```bash\nexport GERRIT_HOST=gerrit.example.org\nexport GERRIT_PORT=29418\nexport GERRIT_SSH_USER=myuser\nexport GERRIT_SSH_PRIVATE_KEY=~/.ssh/gerrit_key\nexport GERRIT_PATH_PREFIX=/workspace/repos\nexport GERRIT_SKIP_ARCHIVED=1\nexport GERRIT_THREADS=16\nexport GERRIT_CLONE_DEPTH=5\nexport GERRIT_BRANCH=main\nexport GERRIT_STRICT_HOST=1\nexport GERRIT_CLONE_TIMEOUT=300\nexport GERRIT_RETRY_ATTEMPTS=5\n\ngerrit-clone # Uses environment variables\n```\n\n### Configuration Files\n\nCreate `~/.config/gerrit-clone/config.yaml`:\n\n```yaml\nhost: gerrit.example.org\nport: 29418\nssh_user: myuser\nssh_identity_file: ~/.ssh/gerrit_key\npath_prefix: /workspace/repos\nskip_archived: true\nthreads: 8\nclone_timeout: 600\nretry_attempts: 3\nretry_base_delay: 2.0\n```\n\nOr JSON format `~/.config/gerrit-clone/config.json`:\n\n```json\n{\n \"host\": \"gerrit.example.org\",\n \"port\": 29418,\n \"ssh_user\": \"myuser\",\n \"ssh_identity_file\": \"~/.ssh/gerrit_key\",\n \"path_prefix\": \"/workspace/repos\",\n \"skip_archived\": true,\n \"threads\": 8\n}\n```\n\nConfiguration precedence: CLI arguments > Environment variables > Config file > Defaults\n\n## Nested Repository Support\n\nGerrit Clone includes intelligent support for nested repositories (projects with\nhierarchical names like `parent/child`):\n\n### Automatic Detection\n\n- **Dependency Ordering**: Parent repositories are automatically cloned before\n their children\n- **Conflict Detection**: Identifies when parent repo content conflicts with\n nested directory structure\n- **Smart Batching**: Uses dependency-aware batching to prevent race conditions\n\n### Conflict Resolution Options\n\n#### Skip Conflicting\n\n```bash\ngerrit-clone clone --host gerrit.example.org --no-move-conflicting\n```\n\nSkips nested repositories when parent contains conflicting files/directories.\nProvides clear warnings about skipped repos.\n\n#### Move Conflicting (Default - Recommended for Data Mining)\n\n```bash\ngerrit-clone clone --host gerrit.example.org\n```\n\nAutomatically moves conflicting content in parent repositories to\n`[NAME].parent` to allow complete nested cloning. This ensures **100%\nrepository availability** for reporting and analysis purposes.\n\n**Example:**\n\n- Parent repo `test` contains file named `test`\n- Child repo `test/test` needs directory `test/`\n- With move-conflicting enabled (default): File `test` \u2192 `test.parent`,\n directory created for child repo\n- Result: Both repositories cloned with complete history preserved\n\n### Configuration\n\n```bash\n# Allow nested repositories (default: true)\n--allow-nested-git / --no-allow-nested-git\n\n# Protect parent repos by adding child paths to .git/info/exclude (default: true)\n--nested-protection / --no-nested-protection\n\n# Move conflicting content to allow complete cloning (default: true)\n--move-conflicting / --no-move-conflicting\n```\n\n## GitHub Action Usage\n\n### Basic Example\n\n```yaml\nname: Clone Gerrit Repositories\non: [push]\n\njobs:\n clone:\n runs-on: ubuntu-latest\n steps:\n - name: Clone repositories\n uses: lfreleng-actions/gerrit-clone-action@v1\n with:\n host: gerrit.example.org\n ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}\n path-prefix: repositories\n```\n\n### Advanced Example\n\n```yaml\nname: Clone and Process Repositories\non:\n schedule:\n - cron: '0 2 * * *' # Daily at 2 AM\n\njobs:\n clone:\n runs-on: ubuntu-latest\n steps:\n - name: Clone repositories\n id: clone\n uses: lfreleng-actions/gerrit-clone-action@v1\n with:\n host: gerrit.example.org\n port: 29418\n base-url: https://gerrit.example.org/gerrit\n ssh-user: automation\n ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}\n path-prefix: workspace\n skip-archived: true\n threads: 12\n depth: 1\n branch: main\n use-https: false\n keep-remote-protocol: false\n clone-timeout: 900\n retry-attempts: 5\n verbose: true\n\n - name: Show results\n run: |\n echo \"Total: ${{ steps.clone.outputs.total-count }}\"\n echo \"Success: ${{ steps.clone.outputs.success-count }}\"\n echo \"Failed: ${{ steps.clone.outputs.failure-count }}\"\n echo \"Manifest: ${{ steps.clone.outputs.manifest-path }}\"\n\n - name: Upload manifest\n uses: actions/upload-artifact@v4\n with:\n name: clone-manifest\n path: ${{ steps.clone.outputs.manifest-path }}\n```\n\n### HTTPS Cloning Example\n\n```yaml\nname: Clone via HTTPS\non: [push]\n\njobs:\n clone:\n runs-on: ubuntu-latest\n steps:\n - name: Clone repositories using HTTPS\n uses: lfreleng-actions/gerrit-clone-action@v1\n with:\n host: gerrit.example.org\n base-url: https://gerrit.example.org/r\n use-https: true\n path-prefix: repos\n quiet: true\n env:\n # Use GitHub token or other auth for HTTPS\n GIT_ASKPASS: echo\n GIT_USERNAME: ${{ secrets.GERRIT_USERNAME }}\n GIT_PASSWORD: ${{ secrets.GERRIT_TOKEN }}\n```\n\n### Nested Repositories with Conflict Resolution\n\n```yaml\nname: Complete Repository Mining\non: [workflow_dispatch]\n\njobs:\n clone:\n runs-on: ubuntu-latest\n steps:\n - name: Clone all repositories (including nested with conflicts)\n uses: lfreleng-actions/gerrit-clone-action@v1\n with:\n host: gerrit.example.org\n use-https: true\n allow-nested-git: true\n nested-protection: true\n move-conflicting: true # Move conflicting files to ensure 100% clone success\n path-prefix: complete-clone\n threads: 8\n verbose: true\n\n - name: Verify complete data availability\n run: |\n echo \"Cloned: ${{ steps.clone.outputs.success-count }}\"\n echo \"Total repositories: ${{ steps.clone.outputs.total-count }}\"\n success_count=${{ steps.clone.outputs.success-count }}\n total_count=${{ steps.clone.outputs.total-count }}\n success_rate=$(( success_count * 100 / total_count ))\n echo \"Success rate: ${success_rate}%\"\n\n # Count moved conflicts\n find complete-clone -name \"*.parent\" | wc -l | xargs echo \"Conflicts resolved:\"\n```\n\n### Configuration File Example\n\n```yaml\nname: Clone with Config File\non: [workflow_dispatch]\n\njobs:\n clone:\n runs-on: ubuntu-latest\n steps:\n - name: Checkout config\n uses: actions/checkout@v4\n\n - name: Clone repositories\n uses: lfreleng-actions/gerrit-clone-action@v1\n with:\n config-file: .gerrit-clone-config.yaml\n verbose: true\n```\n\n### Action Inputs\n\n<!-- markdownlint-disable MD013 -->\n\n| Input | Required | Default | Description |\n|-------|----------|---------|-------------|\n| `host` | Yes | | Gerrit server hostname |\n| `port` | No | `29418` | Gerrit SSH port |\n| `base-url` | No | | Base URL for Gerrit API (defaults to <https://HOST>) |\n| `ssh-user` | No | | SSH username for clone operations |\n| `ssh-private-key` | No | | SSH private key content for authentication |\n| `path-prefix` | No | `.` | Base directory for clone hierarchy |\n| `skip-archived` | No | `true` | Skip archived and inactive repositories |\n| `include-project` | No | | Restrict cloning to specific project(s) (comma-separated) |\n| `ssh-debug` | No | `false` | Enable verbose SSH (-vvv) for troubleshooting |\n| `allow-nested-git` | No | `true` | Allow nested git working trees |\n| `nested-protection` | No | `true` | Auto-add nested child repo paths to parent .git/info/exclude |\n| `move-conflicting` | No | `false` | Move conflicting files/directories in parent repos to [NAME].parent |\n| `exit-on-error` | No | `false` | Exit when first error occurs |\n| `threads` | No | auto | Number of concurrent clone threads |\n| `depth` | No | | Create shallow clone with given depth |\n| `branch` | No | | Clone specific branch instead of default |\n| `use-https` | No | `false` | Use HTTPS for cloning instead of SSH |\n| `keep-remote-protocol` | No | `false` | Keep original clone protocol for remote |\n| `strict-host` | No | `true` | SSH strict host key checking |\n| `clone-timeout` | No | `600` | Timeout per clone operation in seconds |\n| `retry-attempts` | No | `3` | Max retry attempts per repository |\n| `retry-base-delay` | No | `2.0` | Base delay for retry backoff in seconds |\n| `retry-factor` | No | `2.0` | Exponential backoff factor for retries |\n| `retry-max-delay` | No | `30.0` | Max retry delay in seconds |\n| `manifest-filename` | No | `clone-manifest.json` | Output manifest filename |\n| `config-file` | No | | Configuration file path (YAML or JSON) |\n| `verbose` | No | `false` | Enable verbose/debug output |\n| `quiet` | No | `false` | Suppress all output except errors |\n| `log-file` | No | | Custom log file path |\n| `disable-log-file` | No | `false` | Disable creation of log file |\n| `log-level` | No | `DEBUG` | File logging level |\n\n<!-- markdownlint-enable MD013 -->\n\n### Action Outputs\n\n| Output | Description |\n|--------|-------------|\n| `manifest-path` | Path to the generated clone manifest file |\n| `success-count` | Number of cloned repositories |\n| `failure-count` | Number of failed clone attempts |\n| `total-count` | Total number of repositories processed |\n\n## SSH Configuration\n\nThe tool provides comprehensive SSH authentication support with automatic\nconfiguration detection:\n\n### SSH Authentication Options\n\nThe following SSH authentication options are available across all interfaces:\n\n<!-- markdownlint-disable MD013 -->\n| Option | CLI | Environment | Action | Description |\n|--------|-----|-------------|--------|-------------|\n| SSH User | `-u` | `GERRIT_SSH_USER` | `ssh-user` | SSH username |\n| SSH Key | `-i` (file) | `GERRIT_SSH_PRIVATE_KEY` | `ssh-private-key` (content) | Private key |\n| Host Check | `--strict-host` | `GERRIT_STRICT_HOST` | `strict-host` | Key check |\n<!-- markdownlint-enable MD013 -->\n\n### Authentication Methods\n\nThree authentication methods provide automatic fallback:\n\n1. **SSH Agent (Recommended)**: Uses keys loaded into SSH agent with automatic\n detection\n2. **Identity File**: Explicitly specified private key files with permission\n validation\n3. **SSH Config**: Host-specific configuration from ~/.ssh/config with full\n option support\n\n### SSH Setup Examples\n\n#### Using SSH Agent (Recommended for development)\n\n1. Generate SSH key pair:\n\n ```bash\n ssh-keygen -t ed25519 -C \"your.email@example.com\"\n ```\n\n2. Add public key to Gerrit profile\n\n3. Add private key to SSH agent:\n\n ```bash\n ssh-add ~/.ssh/id_ed25519\n ```\n\n4. Clone with agent authentication:\n\n ```bash\n gerrit-clone clone --host gerrit.example.org --ssh-user myuser\n ```\n\n#### Using SSH Identity File (Recommended for CI/CD)\n\n1. Place private key file securely (e.g., `/path/to/private_key`)\n\n2. Set proper permissions:\n\n ```bash\n chmod 600 /path/to/private_key\n ```\n\n3. Clone with identity file:\n\n ```bash\n gerrit-clone clone --host gerrit.example.org \\\n --ssh-user myuser \\\n --ssh-private-key /path/to/private_key\n ```\n\n4. Or use environment variables:\n\n ```bash\n export GERRIT_SSH_USER=myuser\n export GERRIT_SSH_PRIVATE_KEY=/path/to/private_key\n gerrit-clone clone --host gerrit.example.org\n ```\n\n### SSH Config\n\nCreate `~/.ssh/config` entries for convenience:\n\n```text\nHost gerrit.example.org\n User myusername\n IdentityFile ~/.ssh/gerrit_key\n StrictHostKeyChecking yes\n```\n\n### Known Hosts\n\nPre-populate known hosts to avoid prompts (recommended for CI/CD):\n\n```bash\nssh-keyscan -H -p 29418 gerrit.example.org >> ~/.ssh/known_hosts\n```\n\nTest SSH connectivity before cloning:\n\n```bash\nssh -p 29418 myuser@gerrit.example.org gerrit version\n```\n\n## Output Manifest\n\nEach run generates a detailed JSON manifest (`clone-manifest.json`):\n\n```json\n{\n \"version\": \"1.0\",\n \"generated_at\": \"2025-01-15T10:30:45Z\",\n \"host\": \"gerrit.example.org\",\n \"port\": 29418,\n \"total\": 42,\n \"succeeded\": 154,\n \"failed\": 2,\n \"skipped\": 0,\n \"success_rate\": 98.7,\n \"duration_s\": 89.3,\n \"config\": {\n \"skip_archived\": true,\n \"threads\": 8,\n \"depth\": null,\n \"branch\": null,\n \"strict_host_checking\": true,\n \"path_prefix\": \"/workspace/repos\"\n },\n \"results\": [\n {\n \"project\": \"core/api\",\n \"path\": \"core/api\",\n \"status\": \"success\",\n \"attempts\": 1,\n \"duration_s\": 3.42,\n \"error\": null,\n \"started_at\": \"2025-01-15T10:30:15Z\",\n \"completed_at\": \"2025-01-15T10:30:18Z\"\n },\n {\n \"project\": \"tools/legacy\",\n \"path\": \"tools/legacy\",\n \"status\": \"failed\",\n \"attempts\": 3,\n \"duration_s\": 15.8,\n \"error\": \"timeout after 600s\",\n \"started_at\": \"2025-01-15T10:30:20Z\",\n \"completed_at\": \"2025-01-15T10:30:36Z\"\n }\n ]\n}\n```\n\n## Error Handling\n\n### Common Issues\n\n#### Host key verification failed\n\n```bash\n# Accept new host keys (use with caution)\ngerrit-clone --host gerrit.example.org --accept-unknown-host\n\n# Recommended: Pre-populate known_hosts\nssh-keyscan -H -p 29418 gerrit.example.org >> ~/.ssh/known_hosts\n```\n\n#### Permission denied (publickey)\n\n- Verify SSH public key exists in Gerrit profile\n- Check SSH agent has key loaded: `ssh-add -l`\n- Test SSH connection: `ssh -p 29418 username@gerrit.example.org gerrit version`\n- Verify SSH key permissions: `chmod 600 ~/.ssh/id_rsa`\n\n#### Connection timeout or network errors\n\n- Verify Gerrit server hostname and port (often 29418 for SSH)\n- Check network connectivity and firewall rules\n- Increase timeout: `--clone-timeout 900`\n- Reduce concurrency: `--threads 4`\n\n#### Path conflicts or permission errors\n\n- Existing non-git directories block clones\n- Use clean target directory: `--path-prefix ./clean-workspace`\n- Check disk space and write permissions\n- Remove conflicting directories: `rm -rf conflicting-path`\n\n#### API discovery failures\n\n- Manually specify base URL: `--base-url https://host/gerrit`\n- Verify Gerrit server is accessible via HTTPS\n- Check for corporate proxy or firewall restrictions\n\n### Exit Codes\n\n- `0`: Success (all repositories cloned)\n- `1`: Failure (one or more repositories failed to clone)\n- `130`: Interrupted by user (Ctrl+C)\n\n## Development\n\n### Requirements\n\n- Python 3.11+ (tested on 3.11, 3.12, 3.13, 3.14)\n- uv package manager (for development)\n- Git (for clone operations)\n- SSH client (for authentication)\n\n### Setup\n\n```bash\ngit clone https://github.com/lfreleng-actions/gerrit-clone-action.git\ncd gerrit-clone-action\nuv sync --dev\n```\n\n### Testing\n\n```bash\n# Run all tests\nuv run pytest\n\n# Run with coverage report\nuv run pytest --cov=gerrit_clone --cov-report=html --cov-report=term-missing\n\n# Run integration tests (requires network)\nuv run pytest tests/integration/ -v\n\n# Run specific test categories\nuv run pytest -m \"not integration\" -v # Unit tests\nuv run pytest tests/test_models.py::TestConfig -v # Specific test class\n```\n\n### Linting\n\n```bash\n# Run pre-commit hooks\nuv run pre-commit run --all-files\n\n# Individual tools\nuv run ruff check .\nuv run ruff format .\nuv run mypy src/\n```\n\n## License\n\nThis project uses the Apache License 2.0. See LICENSE for details.\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make changes with tests\n4. Run linting and tests\n5. Submit a pull request\n\n## Support\n\n- **GitHub Issues**: Report bugs and request features at\n [lfreleng-actions/gerrit-clone-action](https://github.com/lfreleng-actions/gerrit-clone-action/issues)\n- **Documentation**: This README, IMPLEMENTATION.md, and inline help\n (`gerrit-clone --help`)\n- **Examples**: Advanced usage patterns in repository examples/\n- **Integration Tests**: Real-world server validation in tests/integration/\n",
"bugtrack_url": null,
"license": null,
"summary": "A multi-threaded CLI tool for bulk cloning repositories from Gerrit servers",
"version": "0.1.7",
"project_urls": {
"Changelog": "https://github.com/lfreleng-actions/gerrit-clone-action/releases",
"Documentation": "https://github.com/lfreleng-actions/gerrit-clone-action#readme",
"Homepage": "https://github.com/lfreleng-actions/gerrit-clone-action",
"Issues": "https://github.com/lfreleng-actions/gerrit-clone-action/issues",
"Repository": "https://github.com/lfreleng-actions/gerrit-clone-action.git"
},
"split_keywords": [
"bulk",
" ci-cd",
" clone",
" gerrit",
" git",
" multi-threaded",
" repository",
" ssh"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f01ebbd1eca8fe342f51c8e776520623944106ed2d38e005aaa4aab1a8b7ae37",
"md5": "5ded976d98fc00e8bdd59589a1cdc702",
"sha256": "f08f1e5b4e140fe9ccebe79c18b9ac5c49726c5578a8c965bed764109b4b6dc3"
},
"downloads": -1,
"filename": "gerrit_clone-0.1.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5ded976d98fc00e8bdd59589a1cdc702",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 71135,
"upload_time": "2025-10-07T08:33:47",
"upload_time_iso_8601": "2025-10-07T08:33:47.730341Z",
"url": "https://files.pythonhosted.org/packages/f0/1e/bbd1eca8fe342f51c8e776520623944106ed2d38e005aaa4aab1a8b7ae37/gerrit_clone-0.1.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6016fd3818546afd1620edc2f85d29df7ec8459688e79bb39398e06cb6cc1aea",
"md5": "7e742c0eecfcd0dafa5dafc7aac95ca5",
"sha256": "3e08e8922fb470953b01076af6006b16473be28bddac6f6de1c174a9a8e6e99e"
},
"downloads": -1,
"filename": "gerrit_clone-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "7e742c0eecfcd0dafa5dafc7aac95ca5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 101813,
"upload_time": "2025-10-07T08:33:49",
"upload_time_iso_8601": "2025-10-07T08:33:49.249222Z",
"url": "https://files.pythonhosted.org/packages/60/16/fd3818546afd1620edc2f85d29df7ec8459688e79bb39398e06cb6cc1aea/gerrit_clone-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-07 08:33:49",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lfreleng-actions",
"github_project": "gerrit-clone-action",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "gerrit-clone"
}