# RAGVenture
[](https://www.python.org/downloads/release/python-3110/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/psf/black)
[](https://github.com/hwchase17/langchain)
[](https://www.sbert.net/)
[](https://github.com/valginer0/rag_startups/actions/workflows/ci.yml)
[](https://codecov.io/gh/valginer0/rag_startups)
RAGVenture is an intelligent startup idea generator powered by Retrieval-Augmented Generation (RAG). It helps entrepreneurs generate innovative startup ideas by learning from successful companies, combining the power of large language models with real-world startup data.
## Why RAGVenture?
Traditional startup ideation tools either rely on expensive API calls or generate ideas without real-world context. RAGVenture solves this by:
- **Completely FREE**: Runs entirely on your machine with no API costs - zero API keys required!
- **Smart Model Management**: Automatically handles model deprecation and failures with intelligent fallback
- **Data-Driven**: Learns from real startup data to ground suggestions in reality
- **Context-Aware**: Understands patterns from successful startups
- **Intelligent**: Uses RAG to combine LLM capabilities with precise information retrieval
- **Resilient**: Works offline with local models when external APIs are unavailable
- **Production-Ready**: 177 tests with comprehensive coverage, Docker runtime fixes, and monitoring
## System Requirements
- Python 3.11 or higher
- 8GB RAM minimum (16GB recommended)
- 2GB disk space for models and data
- Operating Systems:
- Linux (recommended)
- macOS
- Windows (with WSL for best performance)
## Quick Start
1. **Installation**:
```bash
# Clone the repository
git clone https://github.com/valginer0/RAGVenture.git
cd RAGVenture
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On Unix or MacOS:
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Install spaCy language model for market analysis
python -m spacy download en_core_web_sm
```
2. **Environment Setup** (Optional - system works completely FREE without any setup!):
```bash
# Optional: HuggingFace token for enhanced remote models (system works completely FREE without it)
export HUGGINGFACE_TOKEN="your-token-here" # Get from huggingface.co
# Smart model management (enabled by default)
export RAG_SMART_MODELS=true
export RAG_MODEL_CHECK_INTERVAL=3600
export RAG_MODEL_TIMEOUT=60
# Optional: LangChain tracing (debugging)
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGCHAIN_API_KEY="your-langsmith-api-key"
export LANGCHAIN_PROJECT="your-project-name"
```
3. **Generate Ideas**:
```bash
# Generate 3 startup ideas in the AI domain
python -m rag_startups.cli generate-all "AI" --num-ideas 3
# Generate ideas without market analysis
python -m rag_startups.cli generate-all "fintech" --num-ideas 2 --no-market
# Check model health and status
python -m rag_startups.cli models status
# Use custom startup data file
python -m rag_startups.cli generate-all "education" --file custom_startups.json
```
## Features & Capabilities
### Core Features
- Intelligent Idea Generation:
- Uses RAG to combine LLM knowledge with real startup data
- Generates contextually relevant and grounded ideas
- Provides structured output with problem, solution, and market analysis
### Command-Line Interface
Commands:
- `generate-all`: Generate startup ideas with market analysis
- Required argument: Topic or domain (e.g., "AI", "fintech")
- Options:
- `--num-ideas`: Number of ideas (1-5, default: 1)
- `--file`: Custom startup data file (default: yc_startups.json)
- `--market/--no-market`: Include/exclude market analysis
- `--temperature`: Model creativity (0.0-1.0)
- `--print-examples`: Show relevant examples
### Smart Model Management
- **Automatic Fallback**: Falls back to local models when external APIs fail
- **Model Migration Intelligence**: Handles model deprecation (e.g., Mistral v0.2→v0.3) automatically
- **Health Monitoring**: Continuous model health checks and status reporting
- **Local Resilience**: Works completely offline with local models
- **CLI Management**: `models` command for status, testing, and diagnostics
### Technical Features
- Smart Analysis:
- Semantic search for relevant examples
- Automatic metadata extraction
- Pattern recognition from successful startups
- Performance Optimized:
- One-time embedding generation (~22s)
- Fast idea generation (~0.5s per idea)
- Efficient data processing (~0.1s load time)
- Production Quality:
- 31 comprehensive unit tests
- Automated code formatting
- Extensive error handling
## Performance
Typical processing times on a standard machine:
- Initial Setup: ~22s (one-time embedding generation)
- Data Loading: ~0.1s
- Idea Generation: ~0.5s per idea
## Docker Support
For containerized deployment, we provide both CPU and GPU support.
### Prerequisites
- Docker and Docker Compose
- For GPU support:
- NVIDIA GPU with CUDA
- NVIDIA Container Toolkit
- nvidia-docker2
### Quick Start with Docker
```bash
# CPU Version (recommended - fully tested)
docker-compose up app-cpu
# GPU Version (with NVIDIA support)
docker-compose up app-gpu
# Run with custom data file
docker-compose run --rm app-cpu python -m rag_startups.cli generate-all fintech --num-ideas 1 --file /app/yc_startups.json
```
**Docker Status**: ✅ **Production Ready** - All runtime issues resolved, works end-to-end with real data.
## Development Setup
1. Clone and setup:
```bash
git clone https://github.com/valginer0/RAGVenture.git
cd RAGVenture
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```
2. Install development dependencies:
```bash
pip install -r requirements.txt
pre-commit install # Sets up automatic code formatting
```
3. Run tests:
```bash
pytest tests/ # Should show 177 passing tests
```
## Data Requirements
RAGVenture works with startup data in JSON format. Two options:
1. Use YC Data (Recommended):
- Download from [Y Combinator](https://www.ycombinator.com/companies)
- Convert CSV to JSON:
```bash
python -m rag_startups.data.convert_yc_data input.csv -o startups.json
```
2. Use Custom Data:
- Prepare JSON file with required fields
- See `docs/data_format.md` for schema
## Troubleshooting
1. Embedding Generation Time:
- First run takes ~22s to generate embeddings
- Subsequent runs use cached embeddings
- GPU can significantly speed up this process
2. Common Issues:
- Missing HUGGINGFACE_TOKEN: Sign up at huggingface.co
- Memory errors: Reduce batch size with --max-lines
- GPU errors: Ensure CUDA toolkit is properly installed
## Documentation
- `docs/api.md`: API documentation
- `docs/examples.md`: Usage examples
- `docs/data_format.md`: Data schema
- `CONTRIBUTING.md`: Development guidelines
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.
## License
This project is licensed under the MIT License - see [LICENSE](LICENSE) for details.
## Startup Names and Legal Considerations
### Name Generation
- Each generated startup name includes a unique identifier (e.g., "TechStartup-x7y9z")
- This identifier ensures technical uniqueness within the tool
- The unique identifier is NOT a substitute for legal name verification
### Important Notes for Users
- Generated names are suggestions only
- The uniqueness of a name at generation time does not guarantee its availability
- Users must perform their own due diligence before using any name
### Name Verification Resources
- USPTO Trademark Database: https://www.uspto.gov/trademarks
- State Business Registries
- Domain Name Availability Tools
- Professional Legal Counsel
### Future Features
- Name availability checking tool (planned)
- Integration with business registry APIs
Raw data
{
"_id": null,
"home_page": "https://github.com/valginer0/rag_startups",
"name": "rag-startups",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "rag, ai, startup, langchain",
"author": "Val Giner",
"author_email": "Val Giner <valginer0@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ae/02/0d06a749c0dae15db9230b51039126c08ed7420e2586594092b771ea86ad/rag_startups-0.9.0.tar.gz",
"platform": null,
"description": "# RAGVenture\n[](https://www.python.org/downloads/release/python-3110/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/psf/black)\n[](https://github.com/hwchase17/langchain)\n[](https://www.sbert.net/)\n[](https://github.com/valginer0/rag_startups/actions/workflows/ci.yml)\n[](https://codecov.io/gh/valginer0/rag_startups)\n\nRAGVenture is an intelligent startup idea generator powered by Retrieval-Augmented Generation (RAG). It helps entrepreneurs generate innovative startup ideas by learning from successful companies, combining the power of large language models with real-world startup data.\n\n## Why RAGVenture?\n\nTraditional startup ideation tools either rely on expensive API calls or generate ideas without real-world context. RAGVenture solves this by:\n- **Completely FREE**: Runs entirely on your machine with no API costs - zero API keys required!\n- **Smart Model Management**: Automatically handles model deprecation and failures with intelligent fallback\n- **Data-Driven**: Learns from real startup data to ground suggestions in reality\n- **Context-Aware**: Understands patterns from successful startups\n- **Intelligent**: Uses RAG to combine LLM capabilities with precise information retrieval\n- **Resilient**: Works offline with local models when external APIs are unavailable\n- **Production-Ready**: 177 tests with comprehensive coverage, Docker runtime fixes, and monitoring\n\n## System Requirements\n\n- Python 3.11 or higher\n- 8GB RAM minimum (16GB recommended)\n- 2GB disk space for models and data\n- Operating Systems:\n - Linux (recommended)\n - macOS\n - Windows (with WSL for best performance)\n\n## Quick Start\n\n1. **Installation**:\n```bash\n# Clone the repository\ngit clone https://github.com/valginer0/RAGVenture.git\ncd RAGVenture\n\n# Create virtual environment\npython -m venv .venv\n\n# Activate virtual environment\n# On Windows:\n.venv\\Scripts\\activate\n# On Unix or MacOS:\nsource .venv/bin/activate\n\n# Install dependencies\npip install -r requirements.txt\n\n# Install spaCy language model for market analysis\npython -m spacy download en_core_web_sm\n```\n\n2. **Environment Setup** (Optional - system works completely FREE without any setup!):\n```bash\n# Optional: HuggingFace token for enhanced remote models (system works completely FREE without it)\nexport HUGGINGFACE_TOKEN=\"your-token-here\" # Get from huggingface.co\n\n# Smart model management (enabled by default)\nexport RAG_SMART_MODELS=true\nexport RAG_MODEL_CHECK_INTERVAL=3600\nexport RAG_MODEL_TIMEOUT=60\n\n# Optional: LangChain tracing (debugging)\nexport LANGCHAIN_TRACING_V2=true\nexport LANGCHAIN_ENDPOINT=\"https://api.smith.langchain.com\"\nexport LANGCHAIN_API_KEY=\"your-langsmith-api-key\"\nexport LANGCHAIN_PROJECT=\"your-project-name\"\n```\n\n3. **Generate Ideas**:\n```bash\n# Generate 3 startup ideas in the AI domain\npython -m rag_startups.cli generate-all \"AI\" --num-ideas 3\n\n# Generate ideas without market analysis\npython -m rag_startups.cli generate-all \"fintech\" --num-ideas 2 --no-market\n\n# Check model health and status\npython -m rag_startups.cli models status\n\n# Use custom startup data file\npython -m rag_startups.cli generate-all \"education\" --file custom_startups.json\n```\n\n## Features & Capabilities\n\n### Core Features\n- Intelligent Idea Generation:\n - Uses RAG to combine LLM knowledge with real startup data\n - Generates contextually relevant and grounded ideas\n - Provides structured output with problem, solution, and market analysis\n\n### Command-Line Interface\nCommands:\n- `generate-all`: Generate startup ideas with market analysis\n - Required argument: Topic or domain (e.g., \"AI\", \"fintech\")\n - Options:\n - `--num-ideas`: Number of ideas (1-5, default: 1)\n - `--file`: Custom startup data file (default: yc_startups.json)\n - `--market/--no-market`: Include/exclude market analysis\n - `--temperature`: Model creativity (0.0-1.0)\n - `--print-examples`: Show relevant examples\n\n### Smart Model Management\n- **Automatic Fallback**: Falls back to local models when external APIs fail\n- **Model Migration Intelligence**: Handles model deprecation (e.g., Mistral v0.2\u2192v0.3) automatically\n- **Health Monitoring**: Continuous model health checks and status reporting\n- **Local Resilience**: Works completely offline with local models\n- **CLI Management**: `models` command for status, testing, and diagnostics\n\n### Technical Features\n- Smart Analysis:\n - Semantic search for relevant examples\n - Automatic metadata extraction\n - Pattern recognition from successful startups\n- Performance Optimized:\n - One-time embedding generation (~22s)\n - Fast idea generation (~0.5s per idea)\n - Efficient data processing (~0.1s load time)\n- Production Quality:\n - 31 comprehensive unit tests\n - Automated code formatting\n - Extensive error handling\n\n## Performance\n\nTypical processing times on a standard machine:\n- Initial Setup: ~22s (one-time embedding generation)\n- Data Loading: ~0.1s\n- Idea Generation: ~0.5s per idea\n\n## Docker Support\n\nFor containerized deployment, we provide both CPU and GPU support.\n\n### Prerequisites\n- Docker and Docker Compose\n- For GPU support:\n - NVIDIA GPU with CUDA\n - NVIDIA Container Toolkit\n - nvidia-docker2\n\n### Quick Start with Docker\n```bash\n# CPU Version (recommended - fully tested)\ndocker-compose up app-cpu\n\n# GPU Version (with NVIDIA support)\ndocker-compose up app-gpu\n\n# Run with custom data file\ndocker-compose run --rm app-cpu python -m rag_startups.cli generate-all fintech --num-ideas 1 --file /app/yc_startups.json\n```\n\n**Docker Status**: \u2705 **Production Ready** - All runtime issues resolved, works end-to-end with real data.\n\n## Development Setup\n\n1. Clone and setup:\n```bash\ngit clone https://github.com/valginer0/RAGVenture.git\ncd RAGVenture\npython -m venv .venv\nsource .venv/bin/activate # On Windows: .venv\\Scripts\\activate\n```\n\n2. Install development dependencies:\n```bash\npip install -r requirements.txt\npre-commit install # Sets up automatic code formatting\n```\n\n3. Run tests:\n```bash\npytest tests/ # Should show 177 passing tests\n```\n\n## Data Requirements\n\nRAGVenture works with startup data in JSON format. Two options:\n\n1. Use YC Data (Recommended):\n - Download from [Y Combinator](https://www.ycombinator.com/companies)\n - Convert CSV to JSON:\n ```bash\n python -m rag_startups.data.convert_yc_data input.csv -o startups.json\n ```\n\n2. Use Custom Data:\n - Prepare JSON file with required fields\n - See `docs/data_format.md` for schema\n\n## Troubleshooting\n\n1. Embedding Generation Time:\n - First run takes ~22s to generate embeddings\n - Subsequent runs use cached embeddings\n - GPU can significantly speed up this process\n\n2. Common Issues:\n - Missing HUGGINGFACE_TOKEN: Sign up at huggingface.co\n - Memory errors: Reduce batch size with --max-lines\n - GPU errors: Ensure CUDA toolkit is properly installed\n\n## Documentation\n\n- `docs/api.md`: API documentation\n- `docs/examples.md`: Usage examples\n- `docs/data_format.md`: Data schema\n- `CONTRIBUTING.md`: Development guidelines\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.\n\n## License\n\nThis project is licensed under the MIT License - see [LICENSE](LICENSE) for details.\n\n## Startup Names and Legal Considerations\n\n### Name Generation\n- Each generated startup name includes a unique identifier (e.g., \"TechStartup-x7y9z\")\n- This identifier ensures technical uniqueness within the tool\n- The unique identifier is NOT a substitute for legal name verification\n\n### Important Notes for Users\n- Generated names are suggestions only\n- The uniqueness of a name at generation time does not guarantee its availability\n- Users must perform their own due diligence before using any name\n\n### Name Verification Resources\n- USPTO Trademark Database: https://www.uspto.gov/trademarks\n- State Business Registries\n- Domain Name Availability Tools\n- Professional Legal Counsel\n\n### Future Features\n- Name availability checking tool (planned)\n- Integration with business registry APIs\n",
"bugtrack_url": null,
"license": null,
"summary": "Generate startup ideas grounded in real YC data using Retrieval-Augmented Generation (RAG).",
"version": "0.9.0",
"project_urls": {
"Homepage": "https://github.com/valginer0/rag_startups",
"Repository": "https://github.com/valginer0/rag_startups"
},
"split_keywords": [
"rag",
" ai",
" startup",
" langchain"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9f55e4006f35cd11057beea0101b1063ca287ef43133c15d20eeb4edd50cf14d",
"md5": "b19062762417c6b466e5ef7d1ead4bba",
"sha256": "923bde0a915f4c939c551f9f3d7bfe6df7c0cde355956e195e99d7a2b25e0b0e"
},
"downloads": -1,
"filename": "rag_startups-0.9.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b19062762417c6b466e5ef7d1ead4bba",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 71562,
"upload_time": "2025-08-01T01:28:25",
"upload_time_iso_8601": "2025-08-01T01:28:25.695414Z",
"url": "https://files.pythonhosted.org/packages/9f/55/e4006f35cd11057beea0101b1063ca287ef43133c15d20eeb4edd50cf14d/rag_startups-0.9.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ae020d06a749c0dae15db9230b51039126c08ed7420e2586594092b771ea86ad",
"md5": "b5ac8ec31555864a4ed219ce43deef27",
"sha256": "430571e74e3bc4da379620b6e973d96338674e537338dd2b4123c8526a25c8a4"
},
"downloads": -1,
"filename": "rag_startups-0.9.0.tar.gz",
"has_sig": false,
"md5_digest": "b5ac8ec31555864a4ed219ce43deef27",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 95504,
"upload_time": "2025-08-01T01:28:27",
"upload_time_iso_8601": "2025-08-01T01:28:27.724739Z",
"url": "https://files.pythonhosted.org/packages/ae/02/0d06a749c0dae15db9230b51039126c08ed7420e2586594092b771ea86ad/rag_startups-0.9.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-01 01:28:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "valginer0",
"github_project": "rag_startups",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
"<",
"2.0.0"
]
]
},
{
"name": "langchain",
"specs": [
[
">=",
"0.1.0"
]
]
},
{
"name": "langchain-community",
"specs": [
[
">=",
"0.0.1"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "sentence-transformers",
"specs": [
[
">=",
"2.2.2"
]
]
},
{
"name": "transformers",
"specs": [
[
">=",
"4.30.0"
]
]
},
{
"name": "langchain-chroma",
"specs": [
[
">=",
"0.1.0"
]
]
},
{
"name": "langsmith",
"specs": [
[
">=",
"0.0.30"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "pydantic-settings",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "backoff",
"specs": [
[
">=",
"2.2.1"
]
]
},
{
"name": "chromadb",
"specs": [
[
">=",
"0.4.0"
]
]
},
{
"name": "redis",
"specs": [
[
">=",
"5.0.1"
]
]
},
{
"name": "cachetools",
"specs": [
[
">=",
"5.3.2"
]
]
},
{
"name": "fakeredis",
"specs": [
[
">=",
"2.20.0"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.31.0"
]
]
},
{
"name": "wbdata",
"specs": [
[
">=",
"0.3.0"
]
]
},
{
"name": "typer",
"specs": [
[
">=",
"0.9.0"
]
]
},
{
"name": "spacy",
"specs": [
[
">=",
"3.0.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.0.0"
]
]
},
{
"name": "pytest-cov",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "pytest-benchmark",
"specs": [
[
">=",
"4.0.0"
]
]
},
{
"name": "pre-commit",
"specs": [
[
">=",
"3.5.0"
]
]
},
{
"name": "black",
"specs": [
[
"==",
"24.2.0"
]
]
},
{
"name": "flake8",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "isort",
"specs": [
[
">=",
"5.13.2"
]
]
},
{
"name": "autoflake",
"specs": [
[
">=",
"2.3.0"
]
]
},
{
"name": "autopep8",
"specs": [
[
">=",
"2.3.0"
]
]
},
{
"name": "mypy",
"specs": [
[
">=",
"1.0.0"
]
]
}
],
"lcname": "rag-startups"
}