code-doc-gen

Name	code-doc-gen JSON
Version	1.2.0 JSON
	download
home_page	https://github.com/mohitmishra786/CodeDocGen
Summary	Intelligent automatic documentation generation for Python and C++ codebases using AST analysis and NLTK
upload_time	2025-08-10 08:11:24
maintainer	None
docs_url	None
author	Mohit Mishra
requires_python	>=3.8
license	None
keywords	documentation code-generation nltk ast parser c++ python doxygen docstring
VCS
bugtrack_url
requirements	clang nltk pyyaml pytest typing-extensions requests groq openai python-dotenv uuid
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # CodeDocGen

A command-line tool and library that automatically generates Doxygen-style comments and documentation for functions and methods in codebases. Uses AI-powered analysis with fallback to NLTK for intelligent, context-aware documentation generation.

## Features

- **AI-Powered Comment Generation**: Uses Groq (primary) with optional OpenAI fallback for intelligent, context-aware documentation
- **Smart Fallback System**: Falls back to NLTK-based analysis when AI is unavailable or fails
- **Multi-language Support**: C/C++ (using libclang), Python (using ast), Java (basic support), JavaScript (regex-based)
- **Smart Function Analysis**: Analyzes function bodies to detect recursion, loops, conditionals, regex usage, API calls, and file operations
- **Git Integration**: Process only changed files with `--changes-only` flag and auto-commit documentation with `--auto-commit`
- **Context-Aware Descriptions**: Generates specific, meaningful descriptions instead of generic templates
- **Flexible Output**: In-place file modification, diff generation, or new file creation
- **Configurable**: YAML-based configuration for custom rules, templates, and AI settings
- **Language-Aware Comment Detection**: Prevents duplicate documentation by detecting existing comments

## Installation

### Prerequisites

- Python 3.8+
- Clang (for C/C++ parsing)

### Setup

1. **Activate the virtual environment:**
   ```bash
   source codedocgen/bin/activate
   ```

2. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

3. **Download NLTK data:**
   ```python
   python -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"
   ```

### From TestPyPI (Latest Version)
```bash
pip install --index-url https://test.pypi.org/simple/ code_doc_gen==1.2.0
```

### From PyPI (Stable Version)
```bash
pip install code-doc-gen
```

### OS-specific setup guides (highly recommended)
- Windows: [usage/windows.md](usage/windows.md)
- Linux: [usage/linux.md](usage/linux.md)
- macOS: [usage/macos.md](usage/macos.md)

## Usage

### Command Line Interface

```bash
# Generate documentation (automatically detects language from file extensions)
code_doc_gen --repo /path/to/repo --inplace

# Generate documentation for a C++ repository (preserves existing comments)
code_doc_gen --repo /path/to/cpp/repo --lang c++ --inplace

# Generate documentation for Python files with custom output
code_doc_gen --repo /path/to/python/repo --lang python --output-dir ./docs

# Use custom configuration
code_doc_gen --repo /path/to/repo --lang c++ --config custom_rules.yaml

# Process specific files only
code_doc_gen --repo /path/to/repo --lang python --files src/main.py src/utils.py

# Show diff without applying changes
code_doc_gen --repo /path/to/repo --lang c++ --diff

# Enable verbose logging
code_doc_gen --repo /path/to/repo --lang python --verbose

# Enable AI-powered documentation generation (Groq)
code_doc_gen --repo /path/to/repo --lang python --enable-ai --ai-provider groq --inplace

# Use Groq AI provider (requires API key)
code_doc_gen --repo /path/to/repo --lang c++ --enable-ai --ai-provider groq --inplace

# Process only changed files in a Git repository
code_doc_gen --repo /path/to/repo --lang python --changes-only --inplace

# Auto-commit generated documentation
code_doc_gen --repo /path/to/repo --lang python --enable-ai --inplace --auto-commit
```

### Library Usage

```python
from code_doc_gen import generate_docs

# Generate documentation (automatically detects language)
results = generate_docs('/path/to/repo', inplace=True)

# Process specific files
results = generate_docs('/path/to/repo', lang='python', files=['src/main.py'])

# Generate in-place documentation
generate_docs('/path/to/repo', lang='python', inplace=True)

# Generate to output directory
generate_docs('/path/to/repo', lang='c++', output_dir='./docs')
```

## Configuration

Create a `config.yaml` file to customize documentation generation:

```yaml
# Language-specific templates
templates:
  c++:
    brief: "/** \brief {description} */"
    param: " * \param {name} {description}"
    return: " * \return {description}"
    throws: " * \throws {exception} {description}"
  
  python:
    brief: '""" {description} """'
    param: "    :param {name}: {description}"
    return: "    :return: {description}"
    raises: "    :raises {exception}: {description}"

# Custom inference rules
rules:
  - pattern: "^validate.*"
    brief: "Validates the input {params}."
  - pattern: "^compute.*"
    brief: "Computes the {noun} based on {params}."
  - pattern: "^get.*"
    brief: "Retrieves the {noun}."

# AI configuration for intelligent comment generation
ai:
  enabled: false  # Set to true to enable AI-powered analysis
provider: "groq"  # Options: "groq" (requires API key) or "openai" (requires API key)
  groq_api_key: ""  # Get from https://console.groq.com/keys or set GROQ_API_KEY environment variable
  openai_api_key: ""  # Get from https://platform.openai.com/account/api-keys or set OPENAI_API_KEY environment variable
  max_retries: 3  # Number of retries for AI API calls
  retry_delay: 1.0  # Delay between retries in seconds
```

## Environment Variables (Recommended for API Keys)

For security and ease of use, it's recommended to use environment variables for API keys instead of hardcoding them in config files.

### Setup

1. **Copy the example environment file:**
   ```bash
   cp .env.example .env
   ```

2. **Edit the `.env` file and add your API keys:**
   ```bash
   # Groq API Key (get from https://console.groq.com/keys)
   GROQ_API_KEY=your_groq_api_key_here
   
   # OpenAI API Key (get from https://platform.openai.com/account/api-keys)
   OPENAI_API_KEY=your_openai_api_key_here
   ```

3. **Add `.env` to your `.gitignore` file:**
   ```bash
   echo ".env" >> .gitignore
   ```

### Priority Order

The tool loads API keys in the following priority order:
1. **Environment variables** (from `.env` file) - **Highest priority**
2. **Command line arguments** (if provided)
3. **Config file values** (from `config.yaml`) - **Lowest priority**

This ensures your API keys are secure and not accidentally committed to version control.

## Supported Languages

### C/C++
- Uses libclang for AST parsing
- Generates Doxygen-style comments
- Detects function signatures, parameters, return types, and exceptions
- Supports both .c and .cpp files
- **NEW**: Recognizes existing comments (`//`, `/* */`, `/** */`) to prevent duplicates

#### Configuring libclang (Cross-Platform)

CodeDocGen auto-detects libclang with ABI validation (it probes Index.create to ensure compatibility) using this order:

1. Environment variables (from shell or `.env`):
   - `LIBCLANG_LIBRARY_FILE` or `CLANG_LIBRARY_FILE` (full path to libclang shared lib)
   - `LIBCLANG_PATH`, `CLANG_LIBRARY_PATH`, or `LLVM_LIB_DIR` (directory containing libclang)
2. `config.yaml` overrides:
   ```yaml
   cpp:
     libclang:
       # Choose one
       library_file: "/absolute/path/to/libclang.dylib"  # .so on Linux, .dll on Windows
       # library_path: "/absolute/path/to/llvm/lib"
   ```
3. PyPI vendor locations:
   - `libclang` package native folder (if installed)
   - `clang/native` folder (if using the `clang` Python package that bundles a dylib)
4. `find_library('clang'|'libclang')`
5. OS default locations (Homebrew/Xcode on macOS, distro LLVM paths on Linux, `C:\\Program Files\\LLVM` on Windows)

If none succeed, AST parsing falls back to a robust regex mode.

macOS recommended setups:

- Xcode Command Line Tools (simple, stable):
  - Install Python bindings matching CLT (18.x):
    ```bash
    pip install 'clang==18.1.8'
    ```
  - Auto-detects `/Library/Developer/CommandLineTools/usr/lib/libclang.dylib` (no `.env` needed).

- Homebrew LLVM (latest toolchain):
  - `brew install llvm`
  - Add to `.env`:
    ```
    LIBCLANG_LIBRARY_FILE=/opt/homebrew/opt/llvm/lib/libclang.dylib   # Apple Silicon
    # or
    LIBCLANG_LIBRARY_FILE=/usr/local/opt/llvm/lib/libclang.dylib      # Intel
    ```

Linux:
- Prefer distro `libclang` and matching Python bindings, or set `LIBCLANG_LIBRARY_FILE` to the installed `.so`.

Windows:
- Install LLVM and set `LIBCLANG_LIBRARY_FILE` to the `libclang.dll` under `Program Files\\LLVM`.

### Python
- Uses built-in ast module for parsing
- Generates PEP 257 compliant docstrings
- Detects function signatures, parameters, return types, and exceptions
- Supports .py files
- **NEW**: Recognizes existing comments (`#`, `"""`, `'''`) and decorators to prevent duplicates

### Java
- **NEW**: Java comment detection support (regex fallback)
- Recognizes Javadoc-style comments with `@param`, `@return`, `@throws`
- Fallback to regex-based parsing when javaparser is not available
- Supports .java files

## AI-Powered Comment Generation

CodeDocGen now supports AI-powered comment generation with intelligent fallback to NLTK-based analysis:

### AI Providers

#### Groq (Primary)
- Requires API key from https://console.groq.com/keys
- Multiple model support with automatic fallback
- Primary Model: `llama3-8b-8192` (fastest)
- Fallback Models: `llama3.1-8b-instant`, `llama3-70b-8192`
- Fast response times with generous free tier
- Install with: `pip install groq`

### Setup

1. **Enable AI in configuration:**
   ```yaml
   ai:
     enabled: true
     provider: "groq"
   ```

2. **For Groq/OpenAI users:**
   - Get API keys from:
     - Groq: https://console.groq.com/keys
     - OpenAI: https://platform.openai.com/account/api-keys
   - **Option 1: Use .env file (Recommended)**
     ```bash
     # Copy the example file
     cp .env.example .env
     
     # Edit .env and add your API keys
     GROQ_API_KEY=your_groq_api_key_here
     OPENAI_API_KEY=your_openai_api_key_here
     ```
   - **Option 2: Add to config.yaml**
     ```yaml
     groq_api_key: "your-api-key-here"
     openai_api_key: "your-openai-api-key-here"
     ```
   - **Note**: Environment variables (from .env) take precedence over config file values

3. **Command line usage:**
   ```bash
   # Enable AI with Groq
   code_doc_gen --repo /path/to/repo --enable-ai --ai-provider groq --inplace
   
   # Enable AI with Groq (using .env file)
   code_doc_gen --repo /path/to/repo --enable-ai --ai-provider groq --inplace
   
   # Enable AI with OpenAI (using .env file)
   code_doc_gen --repo /path/to/repo --enable-ai --ai-provider openai --inplace
   
   # Or pass API keys directly (not recommended for security)
   code_doc_gen --repo /path/to/repo --enable-ai --ai-provider groq --groq-api-key YOUR_KEY --inplace
   ```

### Fallback System

The tool uses a smart fallback system:
1. **AI Analysis**: Try AI-powered comment generation first
2. **NLTK Analysis**: Fall back to NLTK-based intelligent analysis if AI fails
3. **Rule-based**: Final fallback to pattern-based rules

This ensures the tool always works, even when AI services are unavailable.

## Intelligent Comment Generation (NLTK-based)

CodeDocGen v1.1.7 introduces intelligent comment generation with AST analysis and NLTK-powered descriptions:

### Key Improvements
- **Groq Model Fallback Support**: Multiple models with priority order (`llama3-8b-8192` → `llama3.1-8b-instant` → `llama3-70b-8192`)
- **Context-Aware Parameter Descriptions**: Smart parameter descriptions based on names and context
- **Function-Specific Return Types**: Intelligent return type descriptions based on function purpose
- **Behavioral Detection**: Detects recursion, loops, conditionals, regex usage, API calls, and file operations
- **Specific Actions**: Generates specific action verbs instead of generic "processes" descriptions
- **Complete Coverage**: All functions receive intelligent, meaningful comments

### Language-Aware Comment Detection

CodeDocGen v1.1.3 maintains intelligent comment detection that prevents duplicate documentation:

### Python Comment Detection
```python
# Existing comment above function
@decorator
def commented_func():
    """This function has a docstring"""
    return True

def inline_commented_func():  # Inline comment
    return True

def next_line_commented_func():
    # Comment on next line
    return True
```

### C++ Comment Detection
```cpp
// Existing comment above function
int add(int a, int b) {
    return a + b;
}

void inline_commented_func() { // Inline comment
    std::cout << "Hello" << std::endl;
}

/* Multi-line comment above function */
void multi_line_func() {
    std::cout << "Multi-line" << std::endl;
}

/** Doxygen comment */
void doxygen_func() {
    std::cout << "Doxygen" << std::endl;
}
```

### Java Comment Detection
```java
/**
 * Existing Javadoc comment
 * @param input The input parameter
 * @return The result
 */
public String processInput(String input) {
    return input.toUpperCase();
}
```

## Project Structure

```
CodeDocGen/
├── code_doc_gen/
│   ├── __init__.py          # Main package interface
│   ├── main.py              # CLI entry point
│   ├── scanner.py           # Repository scanning
│   ├── analyzer.py          # NLTK-based analysis
│   ├── generator.py         # Documentation generation
│   ├── config.py            # Configuration management
│   ├── models.py            # Data models
│   └── parsers/             # Language-specific parsers
│       ├── __init__.py
│       ├── cpp_parser.py    # C/C++ parser (libclang)
│       ├── python_parser.py # Python parser (ast)
│       ├── java_parser.py   # Java parser (regex fallback)
│       └── javascript_parser.py   # JavaScript parser (regex-based)
├── tests/                   # Unit tests (100+ tests)
├── requirements.txt         # Dependencies
├── setup.py                # Package setup
├── README.md               # This file
└── example.py              # Usage examples
```

## Development

### Running Tests

```bash
# Run all tests
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_generator.py -v

# Run tests with coverage
python -m pytest tests/ --cov=code_doc_gen
```

### Installing in Development Mode

```bash
pip install -e .
```

## Roadmap

### Version 1.1.6 (Current Release)
- **Groq Model Fallback Support**: Multiple models with priority order and automatic fallback
- **Intelligent Comment Generation**: AST analysis and NLTK-powered documentation
- **Context-Aware Descriptions**: Smart parameter and return type descriptions
- **Behavioral Detection**: Recursion, loops, conditionals, regex, API calls, file operations
- **Specific Actions**: Meaningful action verbs instead of generic descriptions
- **Complete Coverage**: All functions receive intelligent comments

### Version 1.2 (Next Release)
- **Enhanced Java Support**: Full javaparser integration for better Java parsing
- **JavaScript/TypeScript Support**: Add support for JS/TS files
- **Enhanced Templates**: More customization options for documentation styles
- **Performance Optimizations**: Parallel processing improvements

### Version 1.3
- **Go and Rust Support**: Add support for Go and Rust files
- **IDE Integration**: VSCode and IntelliJ plugin support
- **Batch Processing**: Support for processing multiple repositories
- **Documentation Quality**: Enhanced analysis for better documentation

### Version 1.4
- **C# Support**: Add C# language parser
- **PHP Support**: Add PHP language parser
- **Web Interface**: Simple web UI for documentation generation
- **CI/CD Integration**: GitHub Actions and GitLab CI templates

### Future Versions
- **Ruby Support**: Add Ruby language parser
- **Advanced Analysis**: More sophisticated code analysis and inference
- **Documentation Standards**: Support for various documentation standards
- **Machine Learning**: Optional ML-based documentation suggestions

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- **NLTK**: For natural language processing capabilities
- **libclang**: For C/C++ AST parsing
- **Python ast module**: For Python code analysis
- **Community**: For feedback and contributions 

## AI Providers Setup

CodeDocGen supports multiple AI providers for intelligent documentation generation. You can configure one primary provider and set up fallback providers for reliability.

### Available Providers

#### 1. Groq (Primary)
- **Status**: Unofficial API - use with caution
- **Cost**: Free
- **Setup**: No configuration required
- **Warning**: This is an unofficial API that may be rate-limited, change, or violate terms of service. Use only for personal projects.

#### 2. Groq (Free API Key Required)
- **Status**: Official API
- **Cost**: Free tier available
- **Setup**: 
  1. Visit [Groq Console](https://console.groq.com/keys)
  2. Sign up for a free account
  3. Generate an API key
  4. Add to `config.yaml`:
     ```yaml
     ai:
       groq_api_key: "your_groq_api_key_here"
     ```

#### 3. OpenAI (Paid API Key Required)
- **Status**: Official API
- **Cost**: Pay-per-use
- **Setup**:
  1. Visit [OpenAI Platform](https://platform.openai.com/account/api-keys)
  2. Create an account and add billing information
  3. Generate an API key
  4. Add to `config.yaml`:
     ```yaml
     ai:
       openai_api_key: "your_openai_api_key_here"
     ```

### Configuration

Configure AI providers in your `config.yaml`:

```yaml
ai:
  enabled: true
provider: "groq"  # Primary provider: groq or openai
  fallback_providers: ["groq", "openai"]  # Fallback order
  groq_api_key: "your_groq_key"
  openai_api_key: "your_openai_key"
  max_retries: 5
  retry_delay: 1.0
  models:
groq: ["llama3-8b-8192", "llama3.1-8b-instant", "llama3-70b-8192"]
    groq: ["llama3-8b-8192", "llama3.1-8b-instant", "llama3-70b-8192"]
    openai: "gpt-4o-mini"
```

### Usage Examples

```bash
# Use Groq
python -m code_doc_gen.main --repo . --files src/ --enable-ai --ai-provider groq

# Use Groq with fallback to OpenAI
python -m code_doc_gen.main --repo . --files src/ --enable-ai --ai-provider groq

# Use OpenAI directly
python -m code_doc_gen.main --repo . --files src/ --enable-ai --ai-provider openai
```

### Fallback Behavior

The system automatically tries providers in this order:
1. Primary provider (from config)
2. Fallback providers (in order specified)

If all AI providers fail, the system falls back to NLTK-based analysis.

### Rate Limiting and Reliability

- **Groq**: Ensure API key is set via CLI or environment
- **Groq**: Official rate limits; exponential backoff retry  
- **OpenAI**: Official rate limits; exponential backoff retry

All providers use intelligent retry logic with exponential backoff to handle temporary failures.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mohitmishra786/CodeDocGen",
    "name": "code-doc-gen",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "documentation, code-generation, nltk, ast, parser, c++, python, doxygen, docstring",
    "author": "Mohit Mishra",
    "author_email": "mohitmishra786687@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/73/4e/f80f29a81435857bc9f37e6565dadff98275241fd3a4bc41011be99e6b15/code_doc_gen-1.2.0.tar.gz",
    "platform": null,
    "description": "# CodeDocGen\n\nA command-line tool and library that automatically generates Doxygen-style comments and documentation for functions and methods in codebases. Uses AI-powered analysis with fallback to NLTK for intelligent, context-aware documentation generation.\n\n## Features\n\n- **AI-Powered Comment Generation**: Uses Groq (primary) with optional OpenAI fallback for intelligent, context-aware documentation\n- **Smart Fallback System**: Falls back to NLTK-based analysis when AI is unavailable or fails\n- **Multi-language Support**: C/C++ (using libclang), Python (using ast), Java (basic support), JavaScript (regex-based)\n- **Smart Function Analysis**: Analyzes function bodies to detect recursion, loops, conditionals, regex usage, API calls, and file operations\n- **Git Integration**: Process only changed files with `--changes-only` flag and auto-commit documentation with `--auto-commit`\n- **Context-Aware Descriptions**: Generates specific, meaningful descriptions instead of generic templates\n- **Flexible Output**: In-place file modification, diff generation, or new file creation\n- **Configurable**: YAML-based configuration for custom rules, templates, and AI settings\n- **Language-Aware Comment Detection**: Prevents duplicate documentation by detecting existing comments\n\n## Installation\n\n### Prerequisites\n\n- Python 3.8+\n- Clang (for C/C++ parsing)\n\n### Setup\n\n1. **Activate the virtual environment:**\n   ```bash\n   source codedocgen/bin/activate\n   ```\n\n2. **Install dependencies:**\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. **Download NLTK data:**\n   ```python\n   python -c \"import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')\"\n   ```\n\n### From TestPyPI (Latest Version)\n```bash\npip install --index-url https://test.pypi.org/simple/ code_doc_gen==1.2.0\n```\n\n### From PyPI (Stable Version)\n```bash\npip install code-doc-gen\n```\n\n### OS-specific setup guides (highly recommended)\n- Windows: [usage/windows.md](usage/windows.md)\n- Linux: [usage/linux.md](usage/linux.md)\n- macOS: [usage/macos.md](usage/macos.md)\n\n## Usage\n\n### Command Line Interface\n\n```bash\n# Generate documentation (automatically detects language from file extensions)\ncode_doc_gen --repo /path/to/repo --inplace\n\n# Generate documentation for a C++ repository (preserves existing comments)\ncode_doc_gen --repo /path/to/cpp/repo --lang c++ --inplace\n\n# Generate documentation for Python files with custom output\ncode_doc_gen --repo /path/to/python/repo --lang python --output-dir ./docs\n\n# Use custom configuration\ncode_doc_gen --repo /path/to/repo --lang c++ --config custom_rules.yaml\n\n# Process specific files only\ncode_doc_gen --repo /path/to/repo --lang python --files src/main.py src/utils.py\n\n# Show diff without applying changes\ncode_doc_gen --repo /path/to/repo --lang c++ --diff\n\n# Enable verbose logging\ncode_doc_gen --repo /path/to/repo --lang python --verbose\n\n# Enable AI-powered documentation generation (Groq)\ncode_doc_gen --repo /path/to/repo --lang python --enable-ai --ai-provider groq --inplace\n\n# Use Groq AI provider (requires API key)\ncode_doc_gen --repo /path/to/repo --lang c++ --enable-ai --ai-provider groq --inplace\n\n# Process only changed files in a Git repository\ncode_doc_gen --repo /path/to/repo --lang python --changes-only --inplace\n\n# Auto-commit generated documentation\ncode_doc_gen --repo /path/to/repo --lang python --enable-ai --inplace --auto-commit\n```\n\n### Library Usage\n\n```python\nfrom code_doc_gen import generate_docs\n\n# Generate documentation (automatically detects language)\nresults = generate_docs('/path/to/repo', inplace=True)\n\n# Process specific files\nresults = generate_docs('/path/to/repo', lang='python', files=['src/main.py'])\n\n# Generate in-place documentation\ngenerate_docs('/path/to/repo', lang='python', inplace=True)\n\n# Generate to output directory\ngenerate_docs('/path/to/repo', lang='c++', output_dir='./docs')\n```\n\n## Configuration\n\nCreate a `config.yaml` file to customize documentation generation:\n\n```yaml\n# Language-specific templates\ntemplates:\n  c++:\n    brief: \"/** \\brief {description} */\"\n    param: \" * \\param {name} {description}\"\n    return: \" * \\return {description}\"\n    throws: \" * \\throws {exception} {description}\"\n  \n  python:\n    brief: '\"\"\" {description} \"\"\"'\n    param: \"    :param {name}: {description}\"\n    return: \"    :return: {description}\"\n    raises: \"    :raises {exception}: {description}\"\n\n# Custom inference rules\nrules:\n  - pattern: \"^validate.*\"\n    brief: \"Validates the input {params}.\"\n  - pattern: \"^compute.*\"\n    brief: \"Computes the {noun} based on {params}.\"\n  - pattern: \"^get.*\"\n    brief: \"Retrieves the {noun}.\"\n\n# AI configuration for intelligent comment generation\nai:\n  enabled: false  # Set to true to enable AI-powered analysis\nprovider: \"groq\"  # Options: \"groq\" (requires API key) or \"openai\" (requires API key)\n  groq_api_key: \"\"  # Get from https://console.groq.com/keys or set GROQ_API_KEY environment variable\n  openai_api_key: \"\"  # Get from https://platform.openai.com/account/api-keys or set OPENAI_API_KEY environment variable\n  max_retries: 3  # Number of retries for AI API calls\n  retry_delay: 1.0  # Delay between retries in seconds\n```\n\n## Environment Variables (Recommended for API Keys)\n\nFor security and ease of use, it's recommended to use environment variables for API keys instead of hardcoding them in config files.\n\n### Setup\n\n1. **Copy the example environment file:**\n   ```bash\n   cp .env.example .env\n   ```\n\n2. **Edit the `.env` file and add your API keys:**\n   ```bash\n   # Groq API Key (get from https://console.groq.com/keys)\n   GROQ_API_KEY=your_groq_api_key_here\n   \n   # OpenAI API Key (get from https://platform.openai.com/account/api-keys)\n   OPENAI_API_KEY=your_openai_api_key_here\n   ```\n\n3. **Add `.env` to your `.gitignore` file:**\n   ```bash\n   echo \".env\" >> .gitignore\n   ```\n\n### Priority Order\n\nThe tool loads API keys in the following priority order:\n1. **Environment variables** (from `.env` file) - **Highest priority**\n2. **Command line arguments** (if provided)\n3. **Config file values** (from `config.yaml`) - **Lowest priority**\n\nThis ensures your API keys are secure and not accidentally committed to version control.\n\n## Supported Languages\n\n### C/C++\n- Uses libclang for AST parsing\n- Generates Doxygen-style comments\n- Detects function signatures, parameters, return types, and exceptions\n- Supports both .c and .cpp files\n- **NEW**: Recognizes existing comments (`//`, `/* */`, `/** */`) to prevent duplicates\n\n#### Configuring libclang (Cross-Platform)\n\nCodeDocGen auto-detects libclang with ABI validation (it probes Index.create to ensure compatibility) using this order:\n\n1. Environment variables (from shell or `.env`):\n   - `LIBCLANG_LIBRARY_FILE` or `CLANG_LIBRARY_FILE` (full path to libclang shared lib)\n   - `LIBCLANG_PATH`, `CLANG_LIBRARY_PATH`, or `LLVM_LIB_DIR` (directory containing libclang)\n2. `config.yaml` overrides:\n   ```yaml\n   cpp:\n     libclang:\n       # Choose one\n       library_file: \"/absolute/path/to/libclang.dylib\"  # .so on Linux, .dll on Windows\n       # library_path: \"/absolute/path/to/llvm/lib\"\n   ```\n3. PyPI vendor locations:\n   - `libclang` package native folder (if installed)\n   - `clang/native` folder (if using the `clang` Python package that bundles a dylib)\n4. `find_library('clang'|'libclang')`\n5. OS default locations (Homebrew/Xcode on macOS, distro LLVM paths on Linux, `C:\\\\Program Files\\\\LLVM` on Windows)\n\nIf none succeed, AST parsing falls back to a robust regex mode.\n\nmacOS recommended setups:\n\n- Xcode Command Line Tools (simple, stable):\n  - Install Python bindings matching CLT (18.x):\n    ```bash\n    pip install 'clang==18.1.8'\n    ```\n  - Auto-detects `/Library/Developer/CommandLineTools/usr/lib/libclang.dylib` (no `.env` needed).\n\n- Homebrew LLVM (latest toolchain):\n  - `brew install llvm`\n  - Add to `.env`:\n    ```\n    LIBCLANG_LIBRARY_FILE=/opt/homebrew/opt/llvm/lib/libclang.dylib   # Apple Silicon\n    # or\n    LIBCLANG_LIBRARY_FILE=/usr/local/opt/llvm/lib/libclang.dylib      # Intel\n    ```\n\nLinux:\n- Prefer distro `libclang` and matching Python bindings, or set `LIBCLANG_LIBRARY_FILE` to the installed `.so`.\n\nWindows:\n- Install LLVM and set `LIBCLANG_LIBRARY_FILE` to the `libclang.dll` under `Program Files\\\\LLVM`.\n\n### Python\n- Uses built-in ast module for parsing\n- Generates PEP 257 compliant docstrings\n- Detects function signatures, parameters, return types, and exceptions\n- Supports .py files\n- **NEW**: Recognizes existing comments (`#`, `\"\"\"`, `'''`) and decorators to prevent duplicates\n\n### Java\n- **NEW**: Java comment detection support (regex fallback)\n- Recognizes Javadoc-style comments with `@param`, `@return`, `@throws`\n- Fallback to regex-based parsing when javaparser is not available\n- Supports .java files\n\n## AI-Powered Comment Generation\n\nCodeDocGen now supports AI-powered comment generation with intelligent fallback to NLTK-based analysis:\n\n### AI Providers\n\n#### Groq (Primary)\n- Requires API key from https://console.groq.com/keys\n- Multiple model support with automatic fallback\n- Primary Model: `llama3-8b-8192` (fastest)\n- Fallback Models: `llama3.1-8b-instant`, `llama3-70b-8192`\n- Fast response times with generous free tier\n- Install with: `pip install groq`\n\n### Setup\n\n1. **Enable AI in configuration:**\n   ```yaml\n   ai:\n     enabled: true\n     provider: \"groq\"\n   ```\n\n2. **For Groq/OpenAI users:**\n   - Get API keys from:\n     - Groq: https://console.groq.com/keys\n     - OpenAI: https://platform.openai.com/account/api-keys\n   - **Option 1: Use .env file (Recommended)**\n     ```bash\n     # Copy the example file\n     cp .env.example .env\n     \n     # Edit .env and add your API keys\n     GROQ_API_KEY=your_groq_api_key_here\n     OPENAI_API_KEY=your_openai_api_key_here\n     ```\n   - **Option 2: Add to config.yaml**\n     ```yaml\n     groq_api_key: \"your-api-key-here\"\n     openai_api_key: \"your-openai-api-key-here\"\n     ```\n   - **Note**: Environment variables (from .env) take precedence over config file values\n\n3. **Command line usage:**\n   ```bash\n   # Enable AI with Groq\n   code_doc_gen --repo /path/to/repo --enable-ai --ai-provider groq --inplace\n   \n   # Enable AI with Groq (using .env file)\n   code_doc_gen --repo /path/to/repo --enable-ai --ai-provider groq --inplace\n   \n   # Enable AI with OpenAI (using .env file)\n   code_doc_gen --repo /path/to/repo --enable-ai --ai-provider openai --inplace\n   \n   # Or pass API keys directly (not recommended for security)\n   code_doc_gen --repo /path/to/repo --enable-ai --ai-provider groq --groq-api-key YOUR_KEY --inplace\n   ```\n\n### Fallback System\n\nThe tool uses a smart fallback system:\n1. **AI Analysis**: Try AI-powered comment generation first\n2. **NLTK Analysis**: Fall back to NLTK-based intelligent analysis if AI fails\n3. **Rule-based**: Final fallback to pattern-based rules\n\nThis ensures the tool always works, even when AI services are unavailable.\n\n## Intelligent Comment Generation (NLTK-based)\n\nCodeDocGen v1.1.7 introduces intelligent comment generation with AST analysis and NLTK-powered descriptions:\n\n### Key Improvements\n- **Groq Model Fallback Support**: Multiple models with priority order (`llama3-8b-8192` \u2192 `llama3.1-8b-instant` \u2192 `llama3-70b-8192`)\n- **Context-Aware Parameter Descriptions**: Smart parameter descriptions based on names and context\n- **Function-Specific Return Types**: Intelligent return type descriptions based on function purpose\n- **Behavioral Detection**: Detects recursion, loops, conditionals, regex usage, API calls, and file operations\n- **Specific Actions**: Generates specific action verbs instead of generic \"processes\" descriptions\n- **Complete Coverage**: All functions receive intelligent, meaningful comments\n\n### Language-Aware Comment Detection\n\nCodeDocGen v1.1.3 maintains intelligent comment detection that prevents duplicate documentation:\n\n### Python Comment Detection\n```python\n# Existing comment above function\n@decorator\ndef commented_func():\n    \"\"\"This function has a docstring\"\"\"\n    return True\n\ndef inline_commented_func():  # Inline comment\n    return True\n\ndef next_line_commented_func():\n    # Comment on next line\n    return True\n```\n\n### C++ Comment Detection\n```cpp\n// Existing comment above function\nint add(int a, int b) {\n    return a + b;\n}\n\nvoid inline_commented_func() { // Inline comment\n    std::cout << \"Hello\" << std::endl;\n}\n\n/* Multi-line comment above function */\nvoid multi_line_func() {\n    std::cout << \"Multi-line\" << std::endl;\n}\n\n/** Doxygen comment */\nvoid doxygen_func() {\n    std::cout << \"Doxygen\" << std::endl;\n}\n```\n\n### Java Comment Detection\n```java\n/**\n * Existing Javadoc comment\n * @param input The input parameter\n * @return The result\n */\npublic String processInput(String input) {\n    return input.toUpperCase();\n}\n```\n\n## Project Structure\n\n```\nCodeDocGen/\n\u251c\u2500\u2500 code_doc_gen/\n\u2502   \u251c\u2500\u2500 __init__.py          # Main package interface\n\u2502   \u251c\u2500\u2500 main.py              # CLI entry point\n\u2502   \u251c\u2500\u2500 scanner.py           # Repository scanning\n\u2502   \u251c\u2500\u2500 analyzer.py          # NLTK-based analysis\n\u2502   \u251c\u2500\u2500 generator.py         # Documentation generation\n\u2502   \u251c\u2500\u2500 config.py            # Configuration management\n\u2502   \u251c\u2500\u2500 models.py            # Data models\n\u2502   \u2514\u2500\u2500 parsers/             # Language-specific parsers\n\u2502       \u251c\u2500\u2500 __init__.py\n\u2502       \u251c\u2500\u2500 cpp_parser.py    # C/C++ parser (libclang)\n\u2502       \u251c\u2500\u2500 python_parser.py # Python parser (ast)\n\u2502       \u251c\u2500\u2500 java_parser.py   # Java parser (regex fallback)\n\u2502       \u2514\u2500\u2500 javascript_parser.py   # JavaScript parser (regex-based)\n\u251c\u2500\u2500 tests/                   # Unit tests (100+ tests)\n\u251c\u2500\u2500 requirements.txt         # Dependencies\n\u251c\u2500\u2500 setup.py                # Package setup\n\u251c\u2500\u2500 README.md               # This file\n\u2514\u2500\u2500 example.py              # Usage examples\n```\n\n## Development\n\n### Running Tests\n\n```bash\n# Run all tests\npython -m pytest tests/ -v\n\n# Run specific test file\npython -m pytest tests/test_generator.py -v\n\n# Run tests with coverage\npython -m pytest tests/ --cov=code_doc_gen\n```\n\n### Installing in Development Mode\n\n```bash\npip install -e .\n```\n\n## Roadmap\n\n### Version 1.1.6 (Current Release)\n- **Groq Model Fallback Support**: Multiple models with priority order and automatic fallback\n- **Intelligent Comment Generation**: AST analysis and NLTK-powered documentation\n- **Context-Aware Descriptions**: Smart parameter and return type descriptions\n- **Behavioral Detection**: Recursion, loops, conditionals, regex, API calls, file operations\n- **Specific Actions**: Meaningful action verbs instead of generic descriptions\n- **Complete Coverage**: All functions receive intelligent comments\n\n### Version 1.2 (Next Release)\n- **Enhanced Java Support**: Full javaparser integration for better Java parsing\n- **JavaScript/TypeScript Support**: Add support for JS/TS files\n- **Enhanced Templates**: More customization options for documentation styles\n- **Performance Optimizations**: Parallel processing improvements\n\n### Version 1.3\n- **Go and Rust Support**: Add support for Go and Rust files\n- **IDE Integration**: VSCode and IntelliJ plugin support\n- **Batch Processing**: Support for processing multiple repositories\n- **Documentation Quality**: Enhanced analysis for better documentation\n\n### Version 1.4\n- **C# Support**: Add C# language parser\n- **PHP Support**: Add PHP language parser\n- **Web Interface**: Simple web UI for documentation generation\n- **CI/CD Integration**: GitHub Actions and GitLab CI templates\n\n### Future Versions\n- **Ruby Support**: Add Ruby language parser\n- **Advanced Analysis**: More sophisticated code analysis and inference\n- **Documentation Standards**: Support for various documentation standards\n- **Machine Learning**: Optional ML-based documentation suggestions\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- **NLTK**: For natural language processing capabilities\n- **libclang**: For C/C++ AST parsing\n- **Python ast module**: For Python code analysis\n- **Community**: For feedback and contributions \n\n## AI Providers Setup\n\nCodeDocGen supports multiple AI providers for intelligent documentation generation. You can configure one primary provider and set up fallback providers for reliability.\n\n### Available Providers\n\n#### 1. Groq (Primary)\n- **Status**: Unofficial API - use with caution\n- **Cost**: Free\n- **Setup**: No configuration required\n- **Warning**: This is an unofficial API that may be rate-limited, change, or violate terms of service. Use only for personal projects.\n\n#### 2. Groq (Free API Key Required)\n- **Status**: Official API\n- **Cost**: Free tier available\n- **Setup**: \n  1. Visit [Groq Console](https://console.groq.com/keys)\n  2. Sign up for a free account\n  3. Generate an API key\n  4. Add to `config.yaml`:\n     ```yaml\n     ai:\n       groq_api_key: \"your_groq_api_key_here\"\n     ```\n\n#### 3. OpenAI (Paid API Key Required)\n- **Status**: Official API\n- **Cost**: Pay-per-use\n- **Setup**:\n  1. Visit [OpenAI Platform](https://platform.openai.com/account/api-keys)\n  2. Create an account and add billing information\n  3. Generate an API key\n  4. Add to `config.yaml`:\n     ```yaml\n     ai:\n       openai_api_key: \"your_openai_api_key_here\"\n     ```\n\n### Configuration\n\nConfigure AI providers in your `config.yaml`:\n\n```yaml\nai:\n  enabled: true\nprovider: \"groq\"  # Primary provider: groq or openai\n  fallback_providers: [\"groq\", \"openai\"]  # Fallback order\n  groq_api_key: \"your_groq_key\"\n  openai_api_key: \"your_openai_key\"\n  max_retries: 5\n  retry_delay: 1.0\n  models:\ngroq: [\"llama3-8b-8192\", \"llama3.1-8b-instant\", \"llama3-70b-8192\"]\n    groq: [\"llama3-8b-8192\", \"llama3.1-8b-instant\", \"llama3-70b-8192\"]\n    openai: \"gpt-4o-mini\"\n```\n\n### Usage Examples\n\n```bash\n# Use Groq\npython -m code_doc_gen.main --repo . --files src/ --enable-ai --ai-provider groq\n\n# Use Groq with fallback to OpenAI\npython -m code_doc_gen.main --repo . --files src/ --enable-ai --ai-provider groq\n\n# Use OpenAI directly\npython -m code_doc_gen.main --repo . --files src/ --enable-ai --ai-provider openai\n```\n\n### Fallback Behavior\n\nThe system automatically tries providers in this order:\n1. Primary provider (from config)\n2. Fallback providers (in order specified)\n\nIf all AI providers fail, the system falls back to NLTK-based analysis.\n\n### Rate Limiting and Reliability\n\n- **Groq**: Ensure API key is set via CLI or environment\n- **Groq**: Official rate limits; exponential backoff retry  \n- **OpenAI**: Official rate limits; exponential backoff retry\n\nAll providers use intelligent retry logic with exponential backoff to handle temporary failures. \n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Intelligent automatic documentation generation for Python and C++ codebases using AST analysis and NLTK",
    "version": "1.2.0",
    "project_urls": {
        "Bug Reports": "https://github.com/mohitmishra786/CodeDocGen/issues",
        "Documentation": "https://github.com/mohitmishra786/CodeDocGen#readme",
        "Homepage": "https://github.com/mohitmishra786/CodeDocGen",
        "Source": "https://github.com/mohitmishra786/CodeDocGen"
    },
    "split_keywords": [
        "documentation",
        " code-generation",
        " nltk",
        " ast",
        " parser",
        " c++",
        " python",
        " doxygen",
        " docstring"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3f12f2ece8963e7d2702f1b2fb5b7e3bbcbea690567cf500c0476834ad072d0d",
                "md5": "f286ddfa2746c117b72d91074f9f35e7",
                "sha256": "01eb215a4ae30088c6f2f857dda720b8d14b017fb032c833a8f2d3a91fdb2e3a"
            },
            "downloads": -1,
            "filename": "code_doc_gen-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f286ddfa2746c117b72d91074f9f35e7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 75902,
            "upload_time": "2025-08-10T08:11:22",
            "upload_time_iso_8601": "2025-08-10T08:11:22.335973Z",
            "url": "https://files.pythonhosted.org/packages/3f/12/f2ece8963e7d2702f1b2fb5b7e3bbcbea690567cf500c0476834ad072d0d/code_doc_gen-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "734ef80f29a81435857bc9f37e6565dadff98275241fd3a4bc41011be99e6b15",
                "md5": "ad077d159d2fb91fd2e8aef886ec503f",
                "sha256": "7f9b7761d0b386f68f047d65a7c6998e570543df6f1bdd943931b53737a57d54"
            },
            "downloads": -1,
            "filename": "code_doc_gen-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ad077d159d2fb91fd2e8aef886ec503f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 72426,
            "upload_time": "2025-08-10T08:11:24",
            "upload_time_iso_8601": "2025-08-10T08:11:24.020401Z",
            "url": "https://files.pythonhosted.org/packages/73/4e/f80f29a81435857bc9f37e6565dadff98275241fd3a4bc41011be99e6b15/code_doc_gen-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-10 08:11:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mohitmishra786",
    "github_project": "CodeDocGen",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "clang",
            "specs": [
                [
                    ">=",
                    "6.0.0"
                ]
            ]
        },
        {
            "name": "nltk",
            "specs": [
                [
                    ">=",
                    "3.8"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    ">=",
                    "6.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    ">=",
                    "7.0.0"
                ]
            ]
        },
        {
            "name": "typing-extensions",
            "specs": [
                [
                    ">=",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    ">=",
                    "2.25.0"
                ]
            ]
        },
        {
            "name": "groq",
            "specs": [
                [
                    ">=",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "uuid",
            "specs": []
        }
    ],
    "lcname": "code-doc-gen"
}

Mohit Mishra