# DataFog Instructor SDK
DataFog Instructor is a Python SDK for named entity recognition (NER) using Ollama as the LLM backend. It provides an easy-to-use interface for detecting and classifying entities in text.
## Installation
To install the DataFog Instructor SDK, you can use pip:
```
pip install datafog-instructor
```
For development purposes, including testing and documentation tools:
```
pip install datafog-instructor[dev,docs]
```
## Quick Start
Here's a simple example to get you started with DataFog Instructor:
```python
from datafog_instructor import DataFog
# Initialize DataFog with default settings
datafog = DataFog()
# Detect entities in text
text = "Cisco acquires Hess for $20 billion"
result = datafog.detect_entities(text)
# Print results
for entity in result.entities:
print(f"Text: {entity.text}, Type: {entity.type.value}")
```
## Configuration
You can customize the DataFog instance using environment variables:
- `DATAFOG_LLM_BACKEND`: Currently only supports "ollama"
- `DATAFOG_LLM_ENDPOINT`: The host URL for the Ollama service (default: "http://localhost:11434")
- `DATAFOG_LLM_MODEL`: The model to use for entity detection (default: "phi3")
Example with custom settings:
```python
import os
os.environ['DATAFOG_LLM_ENDPOINT'] = 'http://custom-ollama-host:11434'
os.environ['DATAFOG_LLM_MODEL'] = 'custom-model'
from datafog_instructor import DataFog
datafog = DataFog()
```
## Features
### Detect Entities
Use the `detect_entities` method to identify and classify named entities in a given text:
```python
text = "Apple Inc. reported $100 billion in revenue for Q4 2023"
result = datafog.detect_entities(text)
for entity in result.entities:
print(f"Text: {entity.text}, Type: {entity.type.value}")
```
### Manage Entity Types
You can add or remove entity types dynamically:
```python
# Add a new entity type
datafog.add_entity_type("CUSTOM", "Custom Entity")
# Remove an entity type
datafog.remove_entity_type("CUSTOM")
# Get all entity types
entity_types = datafog.get_entity_types()
print(entity_types)
```
## Default Entity Types
The SDK comes with an expanded list of predefined entity types, including:
- Organization Information: ORG, PERSON, TRANSACTION_TYPE, DEAL_STRUCTURE, FINANCIAL_INFO, PRODUCT, LOCATION, DATE, INDUSTRY, ROLE, REGULATORY, SENSITIVE_INFO, CONTACT, ID, STRATEGY, COMPANY, MONEY
- Personal Information: EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, URL, AGE, NATIONALITY, JOB_TITLE, EDUCATION
- Location Information: ADDRESS, CITY, STATE, ZIP, COUNTRY, REGION
## Error Handling
The SDK includes error handling for various scenarios. If there's an issue with processing the response or an unexpected response format, it will raise a `ValueError` with details about the error.
## Development and Testing
For development purposes, you can install additional dependencies:
```
pip install datafog-instructor[dev]
```
This includes tools like pytest, black, flake8, and mypy for testing and code quality.
## Documentation
To build the documentation locally:
```
pip install datafog-instructor[docs]
cd docs
make html
```
The documentation will be available in the `docs/_build/html` directory.
## Contributing
Contributions to the DataFog Instructor SDK are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License.
## Support
If you encounter any problems or have any questions, please open an issue on the GitHub repository or join our Discord community at https://discord.gg/bzDth394R4.
## Links
- Homepage: https://datafog.ai
- Documentation: https://docs.datafog.ai
- Twitter: https://twitter.com/datafoginc
- GitHub: https://github.com/datafog/datafog-instructor
Raw data
{
"_id": null,
"home_page": "https://datafog.ai",
"name": "datafog-instructor",
"maintainer": "DataFog",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "hi@datafog.ai",
"keywords": "pii, redaction, nlp, rag, retrieval augmented generation, entity recognition",
"author": "Sid Mohan",
"author_email": "sid@datafog.ai",
"download_url": "https://files.pythonhosted.org/packages/5a/83/8d3378949b3ed5462c4dddc5286f3dd372c22607ebd69068b0610423944d/datafog_instructor-0.1.0b8.tar.gz",
"platform": null,
"description": "# DataFog Instructor SDK\n\nDataFog Instructor is a Python SDK for named entity recognition (NER) using Ollama as the LLM backend. It provides an easy-to-use interface for detecting and classifying entities in text.\n\n## Installation\n\nTo install the DataFog Instructor SDK, you can use pip:\n\n```\npip install datafog-instructor\n```\n\nFor development purposes, including testing and documentation tools:\n\n```\npip install datafog-instructor[dev,docs]\n```\n\n## Quick Start\n\nHere's a simple example to get you started with DataFog Instructor:\n\n```python\nfrom datafog_instructor import DataFog\n\n# Initialize DataFog with default settings\ndatafog = DataFog()\n\n# Detect entities in text\ntext = \"Cisco acquires Hess for $20 billion\"\nresult = datafog.detect_entities(text)\n\n# Print results\nfor entity in result.entities:\n print(f\"Text: {entity.text}, Type: {entity.type.value}\")\n```\n\n## Configuration\n\nYou can customize the DataFog instance using environment variables:\n\n- `DATAFOG_LLM_BACKEND`: Currently only supports \"ollama\"\n- `DATAFOG_LLM_ENDPOINT`: The host URL for the Ollama service (default: \"http://localhost:11434\")\n- `DATAFOG_LLM_MODEL`: The model to use for entity detection (default: \"phi3\")\n\nExample with custom settings:\n\n```python\nimport os\nos.environ['DATAFOG_LLM_ENDPOINT'] = 'http://custom-ollama-host:11434'\nos.environ['DATAFOG_LLM_MODEL'] = 'custom-model'\n\nfrom datafog_instructor import DataFog\n\ndatafog = DataFog()\n```\n\n## Features\n\n### Detect Entities\n\nUse the `detect_entities` method to identify and classify named entities in a given text:\n\n```python\ntext = \"Apple Inc. reported $100 billion in revenue for Q4 2023\"\nresult = datafog.detect_entities(text)\n\nfor entity in result.entities:\n print(f\"Text: {entity.text}, Type: {entity.type.value}\")\n```\n\n### Manage Entity Types\n\nYou can add or remove entity types dynamically:\n\n```python\n# Add a new entity type\ndatafog.add_entity_type(\"CUSTOM\", \"Custom Entity\")\n\n# Remove an entity type\ndatafog.remove_entity_type(\"CUSTOM\")\n\n# Get all entity types\nentity_types = datafog.get_entity_types()\nprint(entity_types)\n```\n\n## Default Entity Types\n\nThe SDK comes with an expanded list of predefined entity types, including:\n\n- Organization Information: ORG, PERSON, TRANSACTION_TYPE, DEAL_STRUCTURE, FINANCIAL_INFO, PRODUCT, LOCATION, DATE, INDUSTRY, ROLE, REGULATORY, SENSITIVE_INFO, CONTACT, ID, STRATEGY, COMPANY, MONEY\n- Personal Information: EMAIL, PHONE, SSN, CREDIT_CARD, IP_ADDRESS, URL, AGE, NATIONALITY, JOB_TITLE, EDUCATION\n- Location Information: ADDRESS, CITY, STATE, ZIP, COUNTRY, REGION\n\n## Error Handling\n\nThe SDK includes error handling for various scenarios. If there's an issue with processing the response or an unexpected response format, it will raise a `ValueError` with details about the error.\n\n## Development and Testing\n\nFor development purposes, you can install additional dependencies:\n\n```\npip install datafog-instructor[dev]\n```\n\nThis includes tools like pytest, black, flake8, and mypy for testing and code quality.\n\n## Documentation\n\nTo build the documentation locally:\n\n```\npip install datafog-instructor[docs]\ncd docs\nmake html\n```\n\nThe documentation will be available in the `docs/_build/html` directory.\n\n## Contributing\n\nContributions to the DataFog Instructor SDK are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License.\n\n## Support\n\nIf you encounter any problems or have any questions, please open an issue on the GitHub repository or join our Discord community at https://discord.gg/bzDth394R4.\n\n## Links\n\n- Homepage: https://datafog.ai\n- Documentation: https://docs.datafog.ai\n- Twitter: https://twitter.com/datafoginc\n- GitHub: https://github.com/datafog/datafog-instructor\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Scan, redact, and manage PII in your documents before they get uploaded to a Retrieval Augmented Generation (RAG) system.",
"version": "0.1.0b8",
"project_urls": {
"Discord": "https://discord.gg/bzDth394R4",
"Documentation": "https://docs.datafog.ai",
"GitHub": "https://github.com/datafog/datafog-instructor",
"Homepage": "https://datafog.ai",
"Twitter": "https://twitter.com/datafoginc"
},
"split_keywords": [
"pii",
" redaction",
" nlp",
" rag",
" retrieval augmented generation",
" entity recognition"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5a838d3378949b3ed5462c4dddc5286f3dd372c22607ebd69068b0610423944d",
"md5": "d4e6761447f65acc3b09f4b8d1b3a3d8",
"sha256": "e1e7107a9c01b9a49f97a77cd5334079e250d7d9b18438026d4f917e13cd8dd1"
},
"downloads": -1,
"filename": "datafog_instructor-0.1.0b8.tar.gz",
"has_sig": false,
"md5_digest": "d4e6761447f65acc3b09f4b8d1b3a3d8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 6500,
"upload_time": "2024-07-02T18:58:29",
"upload_time_iso_8601": "2024-07-02T18:58:29.811864Z",
"url": "https://files.pythonhosted.org/packages/5a/83/8d3378949b3ed5462c4dddc5286f3dd372c22607ebd69068b0610423944d/datafog_instructor-0.1.0b8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-02 18:58:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "datafog",
"github_project": "datafog-instructor",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pydantic",
"specs": [
[
"==",
"2.7.1"
]
]
},
{
"name": "ollama",
"specs": [
[
">=",
"0.2.0"
],
[
"<",
"0.3.0"
]
]
},
{
"name": "ollama-instructor",
"specs": [
[
"==",
"0.2.0"
]
]
},
{
"name": "python-dotenv",
"specs": []
},
{
"name": "setuptools",
"specs": [
[
"==",
"58.1.0"
]
]
},
{
"name": "openai",
"specs": [
[
"==",
"1.12.0"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"8.0.0"
]
]
},
{
"name": "black",
"specs": [
[
"==",
"24.1.1"
]
]
},
{
"name": "flake8",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "mypy",
"specs": [
[
"==",
"1.8.0"
]
]
},
{
"name": "functools32",
"specs": [
[
"==",
"3.2.3-2"
]
]
},
{
"name": "types-requests",
"specs": [
[
"==",
"2.31.0.20240218"
]
]
},
{
"name": "aiohttp",
"specs": [
[
"==",
"3.9.3"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.1.7"
]
]
},
{
"name": "sphinx",
"specs": [
[
"==",
"7.2.6"
]
]
},
{
"name": "sphinx-rtd-theme",
"specs": [
[
"==",
"2.0.0"
]
]
}
],
"lcname": "datafog-instructor"
}