# Auxn Agent
## Overview
Auxn Agent is a web scraping and data extraction tool designed to automate the process of collecting information from websites. Built with modern Python async capabilities, it uses Playwright for browser automation and SQLite for efficient data storage.
## Status: Alpha (v0.1.0)
Current test coverage: 84%
### Key Features
- ✅ Asynchronous web scraping with Playwright
- ✅ Automatic pagination handling
- ✅ SQLite database with SQLAlchemy ORM
- ✅ Comprehensive test suite
- ✅ Configurable logging system
- ✅ Type-safe data models with Pydantic
### Requirements
- Python 3.10 or higher
- Poetry for dependency management
- System dependencies for Playwright
```bash
# Ubuntu/Debian
sudo apt-get install -y \
libevent-2.1-7 \
libavif16
```
### Installation
1. Clone the repository:
```bash
git clone https://github.com/your-username/auxn-agent.git
cd auxn-agent
```
2. Install Poetry (if not already installed):
```bash
curl -sSL https://install.python-poetry.org | python3 -
```
3. Install dependencies:
```bash
poetry install
```
4. Install Playwright browsers:
```bash
poetry run playwright install chromium
```
5. Install browser dependencies:
```bash
poetry run playwright install-deps
```
### Running Tests
```bash
# Run all tests
poetry run pytest
# Run with coverage report
poetry run pytest --cov=src
# Run specific test file
poetry run pytest tests/test_scraper.py
```
### Usage Example
```python
from src.scraper.scraper_manager import ScraperManager
import asyncio
async def main():
manager = ScraperManager()
listings = await manager.scrape_listings(
url="https://example.com",
listing_selector=".listing",
next_button_selector=".next-page"
)
print(f"Found {len(listings)} listings")
if __name__ == "__main__":
asyncio.run(main())
```
### Project Structure
```
auxn-agent/
├── src/
│ ├── database/ # Database models and CRUD operations
│ ├── models/ # Pydantic data models
│ ├── scraper/ # Web scraping logic
│ └── utils/ # Utilities and helpers
├── tests/ # Test suite
└── poetry.lock # Dependency lock file
```
### Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Run tests (`poetry run pytest`)
4. Commit your changes (`git commit -m 'Add amazing feature'`)
5. Push to the branch (`git push origin feature/amazing-feature`)
6. Open a Pull Request
### License
[MIT](LICENSE)
Raw data
{
"_id": null,
"home_page": null,
"name": "auxn-agent",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "web scraping, data extraction, automation",
"author": "Your Name",
"author_email": "your.email@example.com",
"download_url": "https://files.pythonhosted.org/packages/69/ba/b83ccba64c7e4fda9417d4f2f3db733c7a4dc8a5589315813231c82e50bd/auxn_agent-0.2.0.tar.gz",
"platform": null,
"description": "# Auxn Agent\n\n## Overview\n\nAuxn Agent is a web scraping and data extraction tool designed to automate the process of collecting information from websites. Built with modern Python async capabilities, it uses Playwright for browser automation and SQLite for efficient data storage.\n\n## Status: Alpha (v0.1.0)\n\nCurrent test coverage: 84%\n\n### Key Features\n- \u2705 Asynchronous web scraping with Playwright\n- \u2705 Automatic pagination handling\n- \u2705 SQLite database with SQLAlchemy ORM\n- \u2705 Comprehensive test suite\n- \u2705 Configurable logging system\n- \u2705 Type-safe data models with Pydantic\n\n### Requirements\n- Python 3.10 or higher\n- Poetry for dependency management\n- System dependencies for Playwright\n ```bash\n # Ubuntu/Debian\n sudo apt-get install -y \\\n libevent-2.1-7 \\\n libavif16\n ```\n\n### Installation\n\n1. Clone the repository:\n ```bash\n git clone https://github.com/your-username/auxn-agent.git\n cd auxn-agent\n ```\n\n2. Install Poetry (if not already installed):\n ```bash\n curl -sSL https://install.python-poetry.org | python3 -\n ```\n\n3. Install dependencies:\n ```bash\n poetry install\n ```\n\n4. Install Playwright browsers:\n ```bash\n poetry run playwright install chromium\n ```\n\n5. Install browser dependencies:\n ```bash\n poetry run playwright install-deps\n ```\n\n### Running Tests\n\n```bash\n# Run all tests\npoetry run pytest\n\n# Run with coverage report\npoetry run pytest --cov=src\n\n# Run specific test file\npoetry run pytest tests/test_scraper.py\n```\n\n### Usage Example\n\n```python\nfrom src.scraper.scraper_manager import ScraperManager\nimport asyncio\n\nasync def main():\n manager = ScraperManager()\n listings = await manager.scrape_listings(\n url=\"https://example.com\",\n listing_selector=\".listing\",\n next_button_selector=\".next-page\"\n )\n print(f\"Found {len(listings)} listings\")\n\nif __name__ == \"__main__\":\n asyncio.run(main())\n```\n\n### Project Structure\n```\nauxn-agent/\n\u251c\u2500\u2500 src/\n\u2502 \u251c\u2500\u2500 database/ # Database models and CRUD operations\n\u2502 \u251c\u2500\u2500 models/ # Pydantic data models\n\u2502 \u251c\u2500\u2500 scraper/ # Web scraping logic\n\u2502 \u2514\u2500\u2500 utils/ # Utilities and helpers\n\u251c\u2500\u2500 tests/ # Test suite\n\u2514\u2500\u2500 poetry.lock # Dependency lock file\n```\n\n### Contributing\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Run tests (`poetry run pytest`)\n4. Commit your changes (`git commit -m 'Add amazing feature'`)\n5. Push to the branch (`git push origin feature/amazing-feature`)\n6. Open a Pull Request\n\n### License\n\n[MIT](LICENSE)\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A web scraping and data extraction tool.",
"version": "0.2.0",
"project_urls": {
"Bug Tracker": "https://github.com/your-username/auxn-agent/issues",
"Documentation": "https://github.com/your-username/auxn-agent/wiki",
"Repository": "https://github.com/your-username/auxn-agent"
},
"split_keywords": [
"web scraping",
" data extraction",
" automation"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3dfd3a30db7e1a6aea9bed8f91f0c8c803c54e3c52017175bcf7d7d6b42c6088",
"md5": "8c85e8db39e76e7402b549f6fd9dbcd4",
"sha256": "0e3921ab141d4de08f677294e2b02e8ecb38f7058f30effc7aa2b9fbd00c6fae"
},
"downloads": -1,
"filename": "auxn_agent-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8c85e8db39e76e7402b549f6fd9dbcd4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 11528,
"upload_time": "2025-02-06T01:03:52",
"upload_time_iso_8601": "2025-02-06T01:03:52.836073Z",
"url": "https://files.pythonhosted.org/packages/3d/fd/3a30db7e1a6aea9bed8f91f0c8c803c54e3c52017175bcf7d7d6b42c6088/auxn_agent-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "69bab83ccba64c7e4fda9417d4f2f3db733c7a4dc8a5589315813231c82e50bd",
"md5": "64db56256373f3707581d891892389c3",
"sha256": "fa449e2e1c2956cdcc1186e5df522f5d65b470efd023b06cfc8805cb0f66bc65"
},
"downloads": -1,
"filename": "auxn_agent-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "64db56256373f3707581d891892389c3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 8659,
"upload_time": "2025-02-06T01:03:54",
"upload_time_iso_8601": "2025-02-06T01:03:54.542145Z",
"url": "https://files.pythonhosted.org/packages/69/ba/b83ccba64c7e4fda9417d4f2f3db733c7a4dc8a5589315813231c82e50bd/auxn_agent-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-06 01:03:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "your-username",
"github_project": "auxn-agent",
"github_not_found": true,
"lcname": "auxn-agent"
}