# CodeDocGen
A command-line tool and library that automatically generates Doxygen-style comments and documentation for functions and methods in codebases. Uses rule-based analysis and NLTK for natural language processing to create human-readable documentation without AI/ML.
## Features
- **Rule-based Analysis**: Deterministic documentation generation using AST analysis and pattern matching
- **Multi-language Support**: C/C++ (using libclang), Python (using ast), Java (basic support)
- **Smart Inference**: Analyzes function bodies to detect loops, conditionals, exceptions, and operations
- **NLTK Integration**: Uses natural language processing for humanizing function names and descriptions
- **Flexible Output**: In-place file modification, diff generation, or new file creation
- **Configurable**: YAML-based configuration for custom rules and templates
- **Language-Aware Comment Detection**: Prevents duplicate documentation by detecting existing comments
## Installation
### Prerequisites
- Python 3.8+
- Clang (for C/C++ parsing)
### Setup
1. **Activate the virtual environment:**
```bash
source codedocgen/bin/activate
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Download NLTK data:**
```python
python -c "import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')"
```
### From TestPyPI (Latest Version)
```bash
pip install --index-url https://test.pypi.org/simple/ code_doc_gen==1.0.16
```
### From PyPI (Stable Version)
```bash
pip install code_doc_gen
```
## Usage
### Command Line Interface
```bash
# Generate documentation (automatically detects language from file extensions)
code_doc_gen --repo /path/to/repo --inplace
# Generate documentation for a C++ repository (preserves existing comments)
code_doc_gen --repo /path/to/cpp/repo --lang c++ --inplace
# Generate documentation for Python files with custom output
code_doc_gen --repo /path/to/python/repo --lang python --output-dir ./docs
# Use custom configuration
code_doc_gen --repo /path/to/repo --lang c++ --config custom_rules.yaml
# Process specific files only
code_doc_gen --repo /path/to/repo --lang python --files src/main.py src/utils.py
# Show diff without applying changes
code_doc_gen --repo /path/to/repo --lang c++ --diff
# Enable verbose logging
code_doc_gen --repo /path/to/repo --lang python --verbose
```
### Library Usage
```python
from code_doc_gen import generate_docs
# Generate documentation (automatically detects language)
results = generate_docs('/path/to/repo', inplace=True)
# Process specific files
results = generate_docs('/path/to/repo', lang='python', files=['src/main.py'])
# Generate in-place documentation
generate_docs('/path/to/repo', lang='python', inplace=True)
# Generate to output directory
generate_docs('/path/to/repo', lang='c++', output_dir='./docs')
```
## Configuration
Create a `config.yaml` file to customize documentation generation:
```yaml
# Language-specific templates
templates:
c++:
brief: "/** \brief {description} */"
param: " * \param {name} {description}"
return: " * \return {description}"
throws: " * \throws {exception} {description}"
python:
brief: '""" {description} """'
param: " :param {name}: {description}"
return: " :return: {description}"
raises: " :raises {exception}: {description}"
# Custom inference rules
rules:
- pattern: "^validate.*"
brief: "Validates the input {params}."
- pattern: "^compute.*"
brief: "Computes the {noun} based on {params}."
- pattern: "^get.*"
brief: "Retrieves the {noun}."
```
## Supported Languages
### C/C++
- Uses libclang for AST parsing
- Generates Doxygen-style comments
- Detects function signatures, parameters, return types, and exceptions
- Supports both .c and .cpp files
- **NEW**: Recognizes existing comments (`//`, `/* */`, `/** */`) to prevent duplicates
### Python
- Uses built-in ast module for parsing
- Generates PEP 257 compliant docstrings
- Detects function signatures, parameters, return types, and exceptions
- Supports .py files
- **NEW**: Recognizes existing comments (`#`, `"""`, `'''`) and decorators to prevent duplicates
### Java
- **NEW**: Basic Java comment detection support
- Recognizes Javadoc-style comments with `@param`, `@return`, `@throws`
- Fallback to regex-based parsing when javaparser is not available
- Supports .java files
## Language-Aware Comment Detection
CodeDocGen v1.0.16 introduces intelligent comment detection that prevents duplicate documentation:
### Python Comment Detection
```python
# Existing comment above function
@decorator
def commented_func():
"""This function has a docstring"""
return True
def inline_commented_func(): # Inline comment
return True
def next_line_commented_func():
# Comment on next line
return True
```
### C++ Comment Detection
```cpp
// Existing comment above function
int add(int a, int b) {
return a + b;
}
void inline_commented_func() { // Inline comment
std::cout << "Hello" << std::endl;
}
/* Multi-line comment above function */
void multi_line_func() {
std::cout << "Multi-line" << std::endl;
}
/** Doxygen comment */
void doxygen_func() {
std::cout << "Doxygen" << std::endl;
}
```
### Java Comment Detection
```java
/**
* Existing Javadoc comment
* @param input The input parameter
* @return The result
*/
public String processInput(String input) {
return input.toUpperCase();
}
```
## Project Structure
```
CodeDocGen/
├── code_doc_gen/
│ ├── __init__.py # Main package interface
│ ├── main.py # CLI entry point
│ ├── scanner.py # Repository scanning
│ ├── analyzer.py # NLTK-based analysis
│ ├── generator.py # Documentation generation
│ ├── config.py # Configuration management
│ ├── models.py # Data models
│ └── parsers/ # Language-specific parsers
│ ├── __init__.py
│ ├── cpp_parser.py # C/C++ parser (libclang)
│ ├── python_parser.py # Python parser (ast)
│ └── java_parser.py # Java parser (regex fallback)
├── tests/ # Unit tests (76 tests)
├── requirements.txt # Dependencies
├── setup.py # Package setup
├── README.md # This file
└── example.py # Usage examples
```
## Development
### Running Tests
```bash
# Run all tests (76 tests)
python -m pytest tests/ -v
# Run specific test file
python -m pytest tests/test_generator.py -v
# Run tests with coverage
python -m pytest tests/ --cov=code_doc_gen
```
### Installing in Development Mode
```bash
pip install -e .
```
## Roadmap
### Version 1.1 (Next Release)
- **Enhanced Java Support**: Full javaparser integration for better Java parsing
- **JavaScript/TypeScript Support**: Add support for JS/TS files
- **Enhanced Templates**: More customization options for documentation styles
- **Performance Optimizations**: Parallel processing improvements
### Version 1.2
- **Go and Rust Support**: Add support for Go and Rust files
- **IDE Integration**: VSCode and IntelliJ plugin support
- **Batch Processing**: Support for processing multiple repositories
- **Documentation Quality**: Enhanced analysis for better documentation
### Version 1.3
- **C# Support**: Add C# language parser
- **PHP Support**: Add PHP language parser
- **Web Interface**: Simple web UI for documentation generation
- **CI/CD Integration**: GitHub Actions and GitLab CI templates
### Future Versions
- **Ruby Support**: Add Ruby language parser
- **Advanced Analysis**: More sophisticated code analysis and inference
- **Documentation Standards**: Support for various documentation standards
- **Machine Learning**: Optional ML-based documentation suggestions
## Contributing
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- **NLTK**: For natural language processing capabilities
- **libclang**: For C/C++ AST parsing
- **Python ast module**: For Python code analysis
- **Community**: For feedback and contributions
Raw data
{
"_id": null,
"home_page": "https://github.com/mohitmishra786/CodeDocGen",
"name": "code-doc-gen",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "documentation, code-generation, nltk, ast, parser, c++, python, doxygen, docstring",
"author": "Mohit Mishra",
"author_email": "mohitmishra786687@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/86/2e/9af6945dc756bef5336b4a8cc4ab0628ed335009c2959c2fad24a1098f9f/code_doc_gen-1.1.0.tar.gz",
"platform": null,
"description": "# CodeDocGen\n\nA command-line tool and library that automatically generates Doxygen-style comments and documentation for functions and methods in codebases. Uses rule-based analysis and NLTK for natural language processing to create human-readable documentation without AI/ML.\n\n## Features\n\n- **Rule-based Analysis**: Deterministic documentation generation using AST analysis and pattern matching\n- **Multi-language Support**: C/C++ (using libclang), Python (using ast), Java (basic support)\n- **Smart Inference**: Analyzes function bodies to detect loops, conditionals, exceptions, and operations\n- **NLTK Integration**: Uses natural language processing for humanizing function names and descriptions\n- **Flexible Output**: In-place file modification, diff generation, or new file creation\n- **Configurable**: YAML-based configuration for custom rules and templates\n- **Language-Aware Comment Detection**: Prevents duplicate documentation by detecting existing comments\n\n## Installation\n\n### Prerequisites\n\n- Python 3.8+\n- Clang (for C/C++ parsing)\n\n### Setup\n\n1. **Activate the virtual environment:**\n ```bash\n source codedocgen/bin/activate\n ```\n\n2. **Install dependencies:**\n ```bash\n pip install -r requirements.txt\n ```\n\n3. **Download NLTK data:**\n ```python\n python -c \"import nltk; nltk.download('punkt'); nltk.download('averaged_perceptron_tagger')\"\n ```\n\n### From TestPyPI (Latest Version)\n```bash\npip install --index-url https://test.pypi.org/simple/ code_doc_gen==1.0.16\n```\n\n### From PyPI (Stable Version)\n```bash\npip install code_doc_gen\n```\n\n## Usage\n\n### Command Line Interface\n\n```bash\n# Generate documentation (automatically detects language from file extensions)\ncode_doc_gen --repo /path/to/repo --inplace\n\n# Generate documentation for a C++ repository (preserves existing comments)\ncode_doc_gen --repo /path/to/cpp/repo --lang c++ --inplace\n\n# Generate documentation for Python files with custom output\ncode_doc_gen --repo /path/to/python/repo --lang python --output-dir ./docs\n\n# Use custom configuration\ncode_doc_gen --repo /path/to/repo --lang c++ --config custom_rules.yaml\n\n# Process specific files only\ncode_doc_gen --repo /path/to/repo --lang python --files src/main.py src/utils.py\n\n# Show diff without applying changes\ncode_doc_gen --repo /path/to/repo --lang c++ --diff\n\n# Enable verbose logging\ncode_doc_gen --repo /path/to/repo --lang python --verbose\n```\n\n### Library Usage\n\n```python\nfrom code_doc_gen import generate_docs\n\n# Generate documentation (automatically detects language)\nresults = generate_docs('/path/to/repo', inplace=True)\n\n# Process specific files\nresults = generate_docs('/path/to/repo', lang='python', files=['src/main.py'])\n\n# Generate in-place documentation\ngenerate_docs('/path/to/repo', lang='python', inplace=True)\n\n# Generate to output directory\ngenerate_docs('/path/to/repo', lang='c++', output_dir='./docs')\n```\n\n## Configuration\n\nCreate a `config.yaml` file to customize documentation generation:\n\n```yaml\n# Language-specific templates\ntemplates:\n c++:\n brief: \"/** \\brief {description} */\"\n param: \" * \\param {name} {description}\"\n return: \" * \\return {description}\"\n throws: \" * \\throws {exception} {description}\"\n \n python:\n brief: '\"\"\" {description} \"\"\"'\n param: \" :param {name}: {description}\"\n return: \" :return: {description}\"\n raises: \" :raises {exception}: {description}\"\n\n# Custom inference rules\nrules:\n - pattern: \"^validate.*\"\n brief: \"Validates the input {params}.\"\n - pattern: \"^compute.*\"\n brief: \"Computes the {noun} based on {params}.\"\n - pattern: \"^get.*\"\n brief: \"Retrieves the {noun}.\"\n```\n\n## Supported Languages\n\n### C/C++\n- Uses libclang for AST parsing\n- Generates Doxygen-style comments\n- Detects function signatures, parameters, return types, and exceptions\n- Supports both .c and .cpp files\n- **NEW**: Recognizes existing comments (`//`, `/* */`, `/** */`) to prevent duplicates\n\n### Python\n- Uses built-in ast module for parsing\n- Generates PEP 257 compliant docstrings\n- Detects function signatures, parameters, return types, and exceptions\n- Supports .py files\n- **NEW**: Recognizes existing comments (`#`, `\"\"\"`, `'''`) and decorators to prevent duplicates\n\n### Java\n- **NEW**: Basic Java comment detection support\n- Recognizes Javadoc-style comments with `@param`, `@return`, `@throws`\n- Fallback to regex-based parsing when javaparser is not available\n- Supports .java files\n\n## Language-Aware Comment Detection\n\nCodeDocGen v1.0.16 introduces intelligent comment detection that prevents duplicate documentation:\n\n### Python Comment Detection\n```python\n# Existing comment above function\n@decorator\ndef commented_func():\n \"\"\"This function has a docstring\"\"\"\n return True\n\ndef inline_commented_func(): # Inline comment\n return True\n\ndef next_line_commented_func():\n # Comment on next line\n return True\n```\n\n### C++ Comment Detection\n```cpp\n// Existing comment above function\nint add(int a, int b) {\n return a + b;\n}\n\nvoid inline_commented_func() { // Inline comment\n std::cout << \"Hello\" << std::endl;\n}\n\n/* Multi-line comment above function */\nvoid multi_line_func() {\n std::cout << \"Multi-line\" << std::endl;\n}\n\n/** Doxygen comment */\nvoid doxygen_func() {\n std::cout << \"Doxygen\" << std::endl;\n}\n```\n\n### Java Comment Detection\n```java\n/**\n * Existing Javadoc comment\n * @param input The input parameter\n * @return The result\n */\npublic String processInput(String input) {\n return input.toUpperCase();\n}\n```\n\n## Project Structure\n\n```\nCodeDocGen/\n\u251c\u2500\u2500 code_doc_gen/\n\u2502 \u251c\u2500\u2500 __init__.py # Main package interface\n\u2502 \u251c\u2500\u2500 main.py # CLI entry point\n\u2502 \u251c\u2500\u2500 scanner.py # Repository scanning\n\u2502 \u251c\u2500\u2500 analyzer.py # NLTK-based analysis\n\u2502 \u251c\u2500\u2500 generator.py # Documentation generation\n\u2502 \u251c\u2500\u2500 config.py # Configuration management\n\u2502 \u251c\u2500\u2500 models.py # Data models\n\u2502 \u2514\u2500\u2500 parsers/ # Language-specific parsers\n\u2502 \u251c\u2500\u2500 __init__.py\n\u2502 \u251c\u2500\u2500 cpp_parser.py # C/C++ parser (libclang)\n\u2502 \u251c\u2500\u2500 python_parser.py # Python parser (ast)\n\u2502 \u2514\u2500\u2500 java_parser.py # Java parser (regex fallback)\n\u251c\u2500\u2500 tests/ # Unit tests (76 tests)\n\u251c\u2500\u2500 requirements.txt # Dependencies\n\u251c\u2500\u2500 setup.py # Package setup\n\u251c\u2500\u2500 README.md # This file\n\u2514\u2500\u2500 example.py # Usage examples\n```\n\n## Development\n\n### Running Tests\n\n```bash\n# Run all tests (76 tests)\npython -m pytest tests/ -v\n\n# Run specific test file\npython -m pytest tests/test_generator.py -v\n\n# Run tests with coverage\npython -m pytest tests/ --cov=code_doc_gen\n```\n\n### Installing in Development Mode\n\n```bash\npip install -e .\n```\n\n## Roadmap\n\n### Version 1.1 (Next Release)\n- **Enhanced Java Support**: Full javaparser integration for better Java parsing\n- **JavaScript/TypeScript Support**: Add support for JS/TS files\n- **Enhanced Templates**: More customization options for documentation styles\n- **Performance Optimizations**: Parallel processing improvements\n\n### Version 1.2\n- **Go and Rust Support**: Add support for Go and Rust files\n- **IDE Integration**: VSCode and IntelliJ plugin support\n- **Batch Processing**: Support for processing multiple repositories\n- **Documentation Quality**: Enhanced analysis for better documentation\n\n### Version 1.3\n- **C# Support**: Add C# language parser\n- **PHP Support**: Add PHP language parser\n- **Web Interface**: Simple web UI for documentation generation\n- **CI/CD Integration**: GitHub Actions and GitLab CI templates\n\n### Future Versions\n- **Ruby Support**: Add Ruby language parser\n- **Advanced Analysis**: More sophisticated code analysis and inference\n- **Documentation Standards**: Support for various documentation standards\n- **Machine Learning**: Optional ML-based documentation suggestions\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add some amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- **NLTK**: For natural language processing capabilities\n- **libclang**: For C/C++ AST parsing\n- **Python ast module**: For Python code analysis\n- **Community**: For feedback and contributions \n",
"bugtrack_url": null,
"license": null,
"summary": "Intelligent automatic documentation generation for Python and C++ codebases using AST analysis and NLTK",
"version": "1.1.0",
"project_urls": {
"Bug Reports": "https://github.com/mohitmishra786/CodeDocGen/issues",
"Documentation": "https://github.com/mohitmishra786/CodeDocGen#readme",
"Homepage": "https://github.com/mohitmishra786/CodeDocGen",
"Source": "https://github.com/mohitmishra786/CodeDocGen"
},
"split_keywords": [
"documentation",
" code-generation",
" nltk",
" ast",
" parser",
" c++",
" python",
" doxygen",
" docstring"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "bb00c404b7f1b7e2a6bea1292a004b9d76a491d3bb5249347d7b5e21ea5c8178",
"md5": "9fb064220a2af2159be7f5fe7e6e0d89",
"sha256": "6417a678bfd7602f88c23b8a3847f3a9c95b81bc16ea0b701554fb9510bec4a8"
},
"downloads": -1,
"filename": "code_doc_gen-1.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9fb064220a2af2159be7f5fe7e6e0d89",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 49019,
"upload_time": "2025-08-03T10:14:09",
"upload_time_iso_8601": "2025-08-03T10:14:09.605849Z",
"url": "https://files.pythonhosted.org/packages/bb/00/c404b7f1b7e2a6bea1292a004b9d76a491d3bb5249347d7b5e21ea5c8178/code_doc_gen-1.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "862e9af6945dc756bef5336b4a8cc4ab0628ed335009c2959c2fad24a1098f9f",
"md5": "b878bd3719b31f258733c520462e798c",
"sha256": "269efe9a0c483fc8a6701ad66978802610bdcd908e10f628f826380555550cd8"
},
"downloads": -1,
"filename": "code_doc_gen-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "b878bd3719b31f258733c520462e798c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 45466,
"upload_time": "2025-08-03T10:14:11",
"upload_time_iso_8601": "2025-08-03T10:14:11.599340Z",
"url": "https://files.pythonhosted.org/packages/86/2e/9af6945dc756bef5336b4a8cc4ab0628ed335009c2959c2fad24a1098f9f/code_doc_gen-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-03 10:14:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mohitmishra786",
"github_project": "CodeDocGen",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "clang",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "nltk",
"specs": [
[
">=",
"3.8"
]
]
},
{
"name": "pyyaml",
"specs": [
[
">=",
"6.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.0.0"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
">=",
"4.0.0"
]
]
}
],
"lcname": "code-doc-gen"
}