sdeul
=====
Structural Data Extractor using LLMs
[](https://github.com/dceoy/sdeul/actions/workflows/ci.yml)
Installation
------------
```sh
$ pip install -U sdeul
```
Usage
-----
1. Create a JSON Schema file for the output
2. Prepare a local model GGUF file or model API key.
Example:
```sh
# Set an OpenAI API key
$ export OPENAI_API_KEY='sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
# Set a Groq API key
$ export GROQ_API_KEY='sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
# Download a model GGUF file from Hugging Face
$ curl -SLO https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf
```
3. Extract structural data from given text using `sdeul extract`.
Example:
```sh
# Use OpenAI API
$ sdeul extract \
--openai-model=gpt-4o-mini \
test/data/medication_history.schema.json \
test/data/patient_medication_record.txt
# Use Groq API
$ sdeul extract \
--groq-model=llama-3.1-70b-versatile \
test/data/medication_history.schema.json \
test/data/patient_medication_record.txt
# Use local LLM
$ sdeul extract \
--model-file=Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf \
test/data/medication_history.schema.json \
test/data/patient_medication_record.txt
```
Expected output:
```json
{
"MedicationHistory": [
{
"MedicationName": "Lisinopril",
"Dosage": "10mg daily",
"Frequency": "daily",
"Purpose": "hypertension"
},
{
"MedicationName": "Metformin",
"Dosage": "500mg twice daily",
"Frequency": "twice daily",
"Purpose": "type 2 diabetes"
},
{
"MedicationName": "Atorvastatin",
"Dosage": "20mg at bedtime",
"Frequency": "at bedtime",
"Purpose": "high cholesterol"
}
]
}
```
Run `sdeul --help` for more details.
Raw data
{
"_id": null,
"home_page": "https://github.com/dceoy/sdeul",
"name": "sdeul",
"maintainer": "Daichi Narushima",
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": "dnarsil+github@gmail.com",
"keywords": "llm",
"author": "Daichi Narushima",
"author_email": "dnarsil+github@gmail.com",
"download_url": null,
"platform": null,
"description": "sdeul\n=====\n\nStructural Data Extractor using LLMs\n\n[](https://github.com/dceoy/sdeul/actions/workflows/ci.yml)\n\nInstallation\n------------\n\n```sh\n$ pip install -U sdeul\n```\n\nUsage\n-----\n\n1. Create a JSON Schema file for the output\n\n2. Prepare a local model GGUF file or model API key.\n\n Example:\n\n ```sh\n # Set an OpenAI API key\n $ export OPENAI_API_KEY='sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n\n # Set a Groq API key\n $ export GROQ_API_KEY='sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n\n # Download a model GGUF file from Hugging Face\n $ curl -SLO https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf\n ```\n\n3. Extract structural data from given text using `sdeul extract`.\n\n Example:\n\n ```sh\n # Use OpenAI API\n $ sdeul extract \\\n --openai-model=gpt-4o-mini \\\n test/data/medication_history.schema.json \\\n test/data/patient_medication_record.txt\n\n # Use Groq API\n $ sdeul extract \\\n --groq-model=llama-3.1-70b-versatile \\\n test/data/medication_history.schema.json \\\n test/data/patient_medication_record.txt\n\n # Use local LLM\n $ sdeul extract \\\n --model-file=Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf \\\n test/data/medication_history.schema.json \\\n test/data/patient_medication_record.txt\n ```\n\n Expected output:\n\n ```json\n {\n \"MedicationHistory\": [\n {\n \"MedicationName\": \"Lisinopril\",\n \"Dosage\": \"10mg daily\",\n \"Frequency\": \"daily\",\n \"Purpose\": \"hypertension\"\n },\n {\n \"MedicationName\": \"Metformin\",\n \"Dosage\": \"500mg twice daily\",\n \"Frequency\": \"twice daily\",\n \"Purpose\": \"type 2 diabetes\"\n },\n {\n \"MedicationName\": \"Atorvastatin\",\n \"Dosage\": \"20mg at bedtime\",\n \"Frequency\": \"at bedtime\",\n \"Purpose\": \"high cholesterol\"\n }\n ]\n }\n ```\n\nRun `sdeul --help` for more details.\n\n",
"bugtrack_url": null,
"license": "AGPL-3.0-or-later",
"summary": "Structural Data Extractor using LLMs",
"version": "0.1.4",
"project_urls": {
"Documentation": "https://github.com/dceoy/sdeul/blob/main/README.md",
"Homepage": "https://github.com/dceoy/sdeul",
"Repository": "https://github.com/dceoy/sdeul.git"
},
"split_keywords": [
"llm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8f049d3591adad2db7044990e279ba345f5903a849d55c32d28bdd952ae02f46",
"md5": "70c30b5137b5abba05289c4871ec6ee2",
"sha256": "4e74c1dca6d0cd67d3d391a42449abdabdab4d711b64826e9700383aac093c7e"
},
"downloads": -1,
"filename": "sdeul-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "70c30b5137b5abba05289c4871ec6ee2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 23172,
"upload_time": "2024-10-04T16:29:56",
"upload_time_iso_8601": "2024-10-04T16:29:56.069999Z",
"url": "https://files.pythonhosted.org/packages/8f/04/9d3591adad2db7044990e279ba345f5903a849d55c32d28bdd952ae02f46/sdeul-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-04 16:29:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dceoy",
"github_project": "sdeul",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sdeul"
}