phi-3-vision-mlx

Name	phi-3-vision-mlx JSON
Version	0.1.5 JSON
	download
home_page	https://github.com/JosefAlbers/Phi-3-Vision-MLX
Summary	Phi-3-Vision on Apple silicon with MLX
upload_time	2024-08-05 13:15:34
maintainer	None
docs_url	None
author	Josef Albers
requires_python	>=3.12.3
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Phi-3-MLX: Language and Vision Models for Apple Silicon

Phi-3-MLX is a versatile AI framework that leverages both the Phi-3-Vision multimodal model and the Phi-3-Mini-128K language model, optimized for Apple Silicon using the MLX framework. This project provides an easy-to-use interface for a wide range of AI tasks, from advanced text generation to visual question answering and code execution.

## Features

- Integration with Phi-3-Vision (multimodal) model
- Support for the Phi-3-Mini-128K (language-only) model
- Optimized performance on Apple Silicon using MLX
- Batched generation for processing multiple prompts
- Flexible agent system for various AI tasks
- Custom toolchains for specialized workflows
- Model quantization for improved efficiency
- LoRA fine-tuning capabilities
- API integration for extended functionality (e.g., image generation, text-to-speech)

## Minimum Requirements

Phi-3-MLX is designed to run on Apple Silicon Macs. The minimum requirements are:

- Apple Silicon Mac (M1, M2, or later)
- macOS 11.0 or later
- 8GB RAM (with quantization using `quantize_model=True` option)

For optimal performance, especially when working with larger models or datasets, we recommend using a Mac with 16GB RAM or more.

## Quick Start

Install and launch Phi-3-MLX from command line:

```bash
pip install phi-3-vision-mlx
phi3v
```

To instead use the library in a Python script:

```python
from phi_3_vision_mlx import generate
```

## 1. Core Functionalities

### Visual Question Answering

```python
generate('What is shown in this image?', 'https://collectionapi.metmuseum.org/api/collection/v1/iiif/344291/725918/main-image')
```

### Model and Cache Quantization

```python
# Model quantization
generate("Describe the water cycle.", quantize_model=True)

# Cache quantization
generate("Explain quantum computing.", quantize_cache=True)
```

### Batch Text Generation

```python
# A list of prompts for batch generation
prompts = [
    "Write a haiku about spring.",
    "Explain the theory of relativity.",
    "Describe a futuristic city."
]
# Generate responses using Phi-3-Vision (multimodal model)
generate(prompts, max_tokens=100)

# Generate responses using Phi-3-Mini-128K (language-only model)
generate(prompts, max_tokens=100, blind_model=True)
```

### Constrained Beam Decoding

```python
from phi_3_vision_mlx import constrain

# Use constrain for structured generation (e.g., code, function calls, multiple-choice)
prompts = [
    "A 20-year-old woman presents with menorrhagia for the past several years. She says that her menses “have always been heavy”, and she has experienced easy bruising for as long as she can remember. Family history is significant for her mother, who had similar problems with bruising easily. The patient's vital signs include: heart rate 98/min, respiratory rate 14/min, temperature 36.1°C (96.9°F), and blood pressure 110/87 mm Hg. Physical examination is unremarkable. Laboratory tests show the following: platelet count 200,000/mm3, PT 12 seconds, and PTT 43 seconds. Which of the following is the most likely cause of this patient’s symptoms? A: Factor V Leiden B: Hemophilia A C: Lupus anticoagulant D: Protein C deficiency E: Von Willebrand disease",
    "A 25-year-old primigravida presents to her physician for a routine prenatal visit. She is at 34 weeks gestation, as confirmed by an ultrasound examination. She has no complaints, but notes that the new shoes she bought 2 weeks ago do not fit anymore. The course of her pregnancy has been uneventful and she has been compliant with the recommended prenatal care. Her medical history is unremarkable. She has a 15-pound weight gain since the last visit 3 weeks ago. Her vital signs are as follows: blood pressure, 148/90 mm Hg; heart rate, 88/min; respiratory rate, 16/min; and temperature, 36.6℃ (97.9℉). The blood pressure on repeat assessment 4 hours later is 151/90 mm Hg. The fetal heart rate is 151/min. The physical examination is significant for 2+ pitting edema of the lower extremity. Which of the following tests o should confirm the probable condition of this patient? A: Bilirubin assessment B: Coagulation studies C: Hematocrit assessment D: Leukocyte count with differential E: 24-hour urine protein"
]

# Define constraints for the generated text
constraints = [(0, ' The'), (100, ' The correct answer is'), (1, 'X.')]

# Apply constrained beam decoding
results = constrain(prompts, constraints, blind_model=True, quantize_model=True, use_beam=True)
```

### Choosing From Options

```python
from phi_3_vision_mlx import choose

# Select best option from choices for given prompts
prompts = [
    "What is the largest planet in our solar system? A: Earth B: Mars C: Jupiter D: Saturn",
    "Which element has the chemical symbol 'O'? A: Osmium B: Oxygen C: Gold D: Silver"
]

# For multiple-choice or decision-making tasks
choose(prompts, choices='ABCDE')
```

### LoRA Fine-tuning

```python
from phi_3_vision_mlx import train_lora, test_lora

# Train a LoRA adapter
train_lora(
    lora_layers=5,  # Number of layers to apply LoRA
    lora_rank=16,   # Rank of the LoRA adaptation
    epochs=10,      # Number of training epochs
    lr=1e-4,        # Learning rate
    warmup=0.5,     # Fraction of steps for learning rate warmup
    dataset_path="JosefAlbers/akemiH_MedQA_Reason"
)

# Generate text using the trained LoRA adapter
generate("Describe the potential applications of CRISPR gene editing in medicine.",
    blind_model=True,
    quantize_model=True,
    use_adapter=True)

# Compare LoRA adapters
test_lora(adapter_path=None)                  # Without LoRA adapter
test_lora(adapter_path=True)                  # With default LoRA adapter
test_lora(adapter_path="/path/to/your/lora")  # With specific adapter
```

![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/train_log.png)

## 2. HTTP Model Server

1. Start the server:

   ```
   python server.py
   ```

2. Send POST requests to `http://localhost:8000/v1/completions` with a JSON body:

   ```bash
   curl -X POST http://localhost:8000/v1/completions \
     -H "Content-Type: application/json" \
     -d '{
       "prompt": [
           "Hello, world!",
           "Guten Tag!"
       ],
       "max_tokens": 50
     }'
   ```

3. Receive JSON responses with generated text for each prompt:

   ```json
    {
      "model": "phi-3-vision", 
      "responses": [
        "Hello! How can I help you today?<|end|>", 
        "Guten Tag! Wie kann ich Ihnen helfen?<|end|>"
      ]
    }
   ```

## 3. Agent Interactions

### Multi-turn Conversation

```python
from phi_3_vision_mlx import Agent

# Create an instance of the Agent
agent = Agent()

# First interaction: Analyze an image
agent('Analyze this image and describe the architectural style:', 'https://images.metmuseum.org/CRDImages/rl/original/DP-19531-075.jpg')

# Second interaction: Follow-up question
agent('What historical period does this architecture likely belong to?')

# End conversation, clear memory for new interaction
agent.end()
```

![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/vqa.png)

### Generative Feedback Loop

```python
# Ask the agent to generate and execute code to create a plot
agent('Plot a Lissajous Curve.')

# Ask the agent to modify the generated code and create a new plot
agent('Modify the code to plot 3:4 frequency')
agent.end()
```

![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/coding_agent.png)

### External API Tool Use

```python
# Request the agent to generate an image
agent('Draw "A perfectly red apple, 32k HDR, studio lighting"')
agent.end()

# Request the agent to convert text to speech
agent('Speak "People say nothing is impossible, but I do nothing every day."')
agent.end()
```

![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/api_agent.png)

## 4. Custom Toolchains

### In-Context Learning Agent

```python
from phi_3_vision_mlx import add_text

# Define the toolchain as a string
toolchain = """
    prompt = add_text(prompt)
    responses = generate(prompt, images)
    """

# Create an Agent instance with the custom toolchain
agent = Agent(toolchain, early_stop=100)

# Run the agent
agent('How to inspect API endpoints? @https://raw.githubusercontent.com/gradio-app/gradio/main/guides/08_gradio-clients-and-lite/01_getting-started-with-the-python-client.md')
```

### Retrieval Augmented Coding Agent

```python
from phi_3_vision_mlx import VDB
import datasets

# Simulate user input
user_input = 'Comparison of Sortino Ratio for Bitcoin and Ethereum.'

# Create a custom RAG tool
def rag(prompt, repo_id="JosefAlbers/sharegpt_python_mlx", n_topk=1):
    ds = datasets.load_dataset(repo_id, split='train')
    vdb = VDB(ds)
    context = vdb(prompt, n_topk)[0][0]
    return f'{context}\n<|end|>\n<|user|>\nPlot: {prompt}'

# Define the toolchain
toolchain_plot = """
    prompt = rag(prompt)
    responses = generate(prompt, images)
    files = execute(responses, step)
    """

# Create an Agent instance with the RAG toolchain
agent = Agent(toolchain_plot, False)

# Run the agent with the user input
_, images = agent(user_input)
```

### Multi-Agent Interaction

```python
# Continued from Example 2 above
agent_writer = Agent(early_stop=100)
agent_writer(f'Write a stock analysis report on: {user_input}', images)
```

### External LLM Integration

```python
# Create Agent with Mistral-7B-Instruct-v0.3 instead
agent = Agent(toolchain = "responses, history = mistral_api(prompt, history)")

# Generate a neurology ICU admission note
agent('Write a neurology ICU admission note.')

# Follow-up questions (multi-turn conversation)
agent('Give me the inpatient BP goal for this patient.')
agent('DVT ppx for this patient?')
agent("Patient's prognosis?")

# End
agent.end()
```

## Benchmarks

```python
from phi_3_vision_mlx import benchmark

benchmark()
```

| Task                  | Vanilla Model | Quantized Model | Quantized Cache | LoRA Adapter |
|-----------------------|---------------|-----------------|-----------------|--------------|
| Text Generation       |  24.87 tps     |  58.61 tps      |  18.47 tps       |  24.74 tps    |
| Image Captioning      |  19.08 tps     |  37.43 tps      |  3.58 tps       |  18.88 tps    |
| Batched Generation    |  235.79 tps     |  147.94 tps      |  122.02 tps       |  233.09 tps    |

*(On M1 Max 64GB)*

## More Examples

For advanced examples and external library integration, see `examples.py` in the project root. Preview:

```python
# Multimodal Reddit Thread Summarizer
from rd2md import rd2md
from pathlib import Path
import json

filename, contents, images = rd2md()
prompt = 'Write an executive summary of above (max 200 words). The article should capture the diverse range of opinions and key points discussed in the thread, presenting a balanced view of the topic without quoting specific users or comments directly. Focus on organizing the information cohesively, highlighting major arguments, counterarguments, and any emerging consensus or unresolved issues within the community.'
prompts = [f'{s}\n\n{prompt}' for s in contents]
results = [generate(prompts[i], images[i], max_tokens=512, blind_model=False, quantize_model=True, quantize_cache=False, verbose=False) for i in range(len(prompts))]
with open(Path(filename).with_suffix('.json'), 'w') as f:
    json.dump({'prompts':prompts, 'images':images, 'results':results}, f, indent=4)
```

## Documentation

API references and additional information are available at:

https://josefalbers.github.io/Phi-3-Vision-MLX/

Also check out our tutorial series available at:

https://medium.com/@albersj66

## License

This project is licensed under the [MIT License](LICENSE).

## Citation

<a href="https://zenodo.org/doi/10.5281/zenodo.11403221"><img src="https://zenodo.org/badge/806709541.svg" alt="DOI"></a>

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/JosefAlbers/Phi-3-Vision-MLX",
    "name": "phi-3-vision-mlx",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12.3",
    "maintainer_email": null,
    "keywords": null,
    "author": "Josef Albers",
    "author_email": "albersj66@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/23/27/44d1e908e5f50f13b2576da52cc6ec20563e83973a043be2700a4cd61e0c/phi_3_vision_mlx-0.1.5.tar.gz",
    "platform": null,
    "description": "# Phi-3-MLX: Language and Vision Models for Apple Silicon\n\nPhi-3-MLX is a versatile AI framework that leverages both the Phi-3-Vision multimodal model and the Phi-3-Mini-128K language model, optimized for Apple Silicon using the MLX framework. This project provides an easy-to-use interface for a wide range of AI tasks, from advanced text generation to visual question answering and code execution.\n\n## Features\n\n- Integration with Phi-3-Vision (multimodal) model\n- Support for the Phi-3-Mini-128K (language-only) model\n- Optimized performance on Apple Silicon using MLX\n- Batched generation for processing multiple prompts\n- Flexible agent system for various AI tasks\n- Custom toolchains for specialized workflows\n- Model quantization for improved efficiency\n- LoRA fine-tuning capabilities\n- API integration for extended functionality (e.g., image generation, text-to-speech)\n\n## Minimum Requirements\n\nPhi-3-MLX is designed to run on Apple Silicon Macs. The minimum requirements are:\n\n- Apple Silicon Mac (M1, M2, or later)\n- macOS 11.0 or later\n- 8GB RAM (with quantization using `quantize_model=True` option)\n\nFor optimal performance, especially when working with larger models or datasets, we recommend using a Mac with 16GB RAM or more.\n\n## Quick Start\n\nInstall and launch Phi-3-MLX from command line:\n\n```bash\npip install phi-3-vision-mlx\nphi3v\n```\n\nTo instead use the library in a Python script:\n\n```python\nfrom phi_3_vision_mlx import generate\n```\n\n## 1. Core Functionalities\n\n### Visual Question Answering\n\n```python\ngenerate('What is shown in this image?', 'https://collectionapi.metmuseum.org/api/collection/v1/iiif/344291/725918/main-image')\n```\n\n### Model and Cache Quantization\n\n```python\n# Model quantization\ngenerate(\"Describe the water cycle.\", quantize_model=True)\n\n# Cache quantization\ngenerate(\"Explain quantum computing.\", quantize_cache=True)\n```\n\n### Batch Text Generation\n\n```python\n# A list of prompts for batch generation\nprompts = [\n    \"Write a haiku about spring.\",\n    \"Explain the theory of relativity.\",\n    \"Describe a futuristic city.\"\n]\n# Generate responses using Phi-3-Vision (multimodal model)\ngenerate(prompts, max_tokens=100)\n\n# Generate responses using Phi-3-Mini-128K (language-only model)\ngenerate(prompts, max_tokens=100, blind_model=True)\n```\n\n### Constrained Beam Decoding\n\n```python\nfrom phi_3_vision_mlx import constrain\n\n# Use constrain for structured generation (e.g., code, function calls, multiple-choice)\nprompts = [\n    \"A 20-year-old woman presents with menorrhagia for the past several years. She says that her menses \u201chave always been heavy\u201d, and she has experienced easy bruising for as long as she can remember. Family history is significant for her mother, who had similar problems with bruising easily. The patient's vital signs include: heart rate 98/min, respiratory rate 14/min, temperature 36.1\u00b0C (96.9\u00b0F), and blood pressure 110/87 mm Hg. Physical examination is unremarkable. Laboratory tests show the following: platelet count 200,000/mm3, PT 12 seconds, and PTT 43 seconds. Which of the following is the most likely cause of this patient\u2019s symptoms? A: Factor V Leiden B: Hemophilia A C: Lupus anticoagulant D: Protein C deficiency E: Von Willebrand disease\",\n    \"A 25-year-old primigravida presents to her physician for a routine prenatal visit. She is at 34 weeks gestation, as confirmed by an ultrasound examination. She has no complaints, but notes that the new shoes she bought 2 weeks ago do not fit anymore. The course of her pregnancy has been uneventful and she has been compliant with the recommended prenatal care. Her medical history is unremarkable. She has a 15-pound weight gain since the last visit 3 weeks ago. Her vital signs are as follows: blood pressure, 148/90 mm Hg; heart rate, 88/min; respiratory rate, 16/min; and temperature, 36.6\u2103 (97.9\u2109). The blood pressure on repeat assessment 4 hours later is 151/90 mm Hg. The fetal heart rate is 151/min. The physical examination is significant for 2+ pitting edema of the lower extremity. Which of the following tests o should confirm the probable condition of this patient? A: Bilirubin assessment B: Coagulation studies C: Hematocrit assessment D: Leukocyte count with differential E: 24-hour urine protein\"\n]\n\n# Define constraints for the generated text\nconstraints = [(0, ' The'), (100, ' The correct answer is'), (1, 'X.')]\n\n# Apply constrained beam decoding\nresults = constrain(prompts, constraints, blind_model=True, quantize_model=True, use_beam=True)\n```\n\n### Choosing From Options\n\n```python\nfrom phi_3_vision_mlx import choose\n\n# Select best option from choices for given prompts\nprompts = [\n    \"What is the largest planet in our solar system? A: Earth B: Mars C: Jupiter D: Saturn\",\n    \"Which element has the chemical symbol 'O'? A: Osmium B: Oxygen C: Gold D: Silver\"\n]\n\n# For multiple-choice or decision-making tasks\nchoose(prompts, choices='ABCDE')\n```\n\n### LoRA Fine-tuning\n\n```python\nfrom phi_3_vision_mlx import train_lora, test_lora\n\n# Train a LoRA adapter\ntrain_lora(\n    lora_layers=5,  # Number of layers to apply LoRA\n    lora_rank=16,   # Rank of the LoRA adaptation\n    epochs=10,      # Number of training epochs\n    lr=1e-4,        # Learning rate\n    warmup=0.5,     # Fraction of steps for learning rate warmup\n    dataset_path=\"JosefAlbers/akemiH_MedQA_Reason\"\n)\n\n# Generate text using the trained LoRA adapter\ngenerate(\"Describe the potential applications of CRISPR gene editing in medicine.\",\n    blind_model=True,\n    quantize_model=True,\n    use_adapter=True)\n\n# Compare LoRA adapters\ntest_lora(adapter_path=None)                  # Without LoRA adapter\ntest_lora(adapter_path=True)                  # With default LoRA adapter\ntest_lora(adapter_path=\"/path/to/your/lora\")  # With specific adapter\n```\n\n![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/train_log.png)\n\n## 2. HTTP Model Server\n\n1. Start the server:\n\n   ```\n   python server.py\n   ```\n\n2. Send POST requests to `http://localhost:8000/v1/completions` with a JSON body:\n\n   ```bash\n   curl -X POST http://localhost:8000/v1/completions \\\n     -H \"Content-Type: application/json\" \\\n     -d '{\n       \"prompt\": [\n           \"Hello, world!\",\n           \"Guten Tag!\"\n       ],\n       \"max_tokens\": 50\n     }'\n   ```\n\n3. Receive JSON responses with generated text for each prompt:\n\n   ```json\n    {\n      \"model\": \"phi-3-vision\", \n      \"responses\": [\n        \"Hello! How can I help you today?<|end|>\", \n        \"Guten Tag! Wie kann ich Ihnen helfen?<|end|>\"\n      ]\n    }\n   ```\n\n## 3. Agent Interactions\n\n### Multi-turn Conversation\n\n```python\nfrom phi_3_vision_mlx import Agent\n\n# Create an instance of the Agent\nagent = Agent()\n\n# First interaction: Analyze an image\nagent('Analyze this image and describe the architectural style:', 'https://images.metmuseum.org/CRDImages/rl/original/DP-19531-075.jpg')\n\n# Second interaction: Follow-up question\nagent('What historical period does this architecture likely belong to?')\n\n# End conversation, clear memory for new interaction\nagent.end()\n```\n\n![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/vqa.png)\n\n### Generative Feedback Loop\n\n```python\n# Ask the agent to generate and execute code to create a plot\nagent('Plot a Lissajous Curve.')\n\n# Ask the agent to modify the generated code and create a new plot\nagent('Modify the code to plot 3:4 frequency')\nagent.end()\n```\n\n![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/coding_agent.png)\n\n### External API Tool Use\n\n```python\n# Request the agent to generate an image\nagent('Draw \"A perfectly red apple, 32k HDR, studio lighting\"')\nagent.end()\n\n# Request the agent to convert text to speech\nagent('Speak \"People say nothing is impossible, but I do nothing every day.\"')\nagent.end()\n```\n\n![Alt text](https://raw.githubusercontent.com/JosefAlbers/Phi-3-Vision-MLX/main/assets/api_agent.png)\n\n## 4. Custom Toolchains\n\n### In-Context Learning Agent\n\n```python\nfrom phi_3_vision_mlx import add_text\n\n# Define the toolchain as a string\ntoolchain = \"\"\"\n    prompt = add_text(prompt)\n    responses = generate(prompt, images)\n    \"\"\"\n\n# Create an Agent instance with the custom toolchain\nagent = Agent(toolchain, early_stop=100)\n\n# Run the agent\nagent('How to inspect API endpoints? @https://raw.githubusercontent.com/gradio-app/gradio/main/guides/08_gradio-clients-and-lite/01_getting-started-with-the-python-client.md')\n```\n\n### Retrieval Augmented Coding Agent\n\n```python\nfrom phi_3_vision_mlx import VDB\nimport datasets\n\n# Simulate user input\nuser_input = 'Comparison of Sortino Ratio for Bitcoin and Ethereum.'\n\n# Create a custom RAG tool\ndef rag(prompt, repo_id=\"JosefAlbers/sharegpt_python_mlx\", n_topk=1):\n    ds = datasets.load_dataset(repo_id, split='train')\n    vdb = VDB(ds)\n    context = vdb(prompt, n_topk)[0][0]\n    return f'{context}\\n<|end|>\\n<|user|>\\nPlot: {prompt}'\n\n# Define the toolchain\ntoolchain_plot = \"\"\"\n    prompt = rag(prompt)\n    responses = generate(prompt, images)\n    files = execute(responses, step)\n    \"\"\"\n\n# Create an Agent instance with the RAG toolchain\nagent = Agent(toolchain_plot, False)\n\n# Run the agent with the user input\n_, images = agent(user_input)\n```\n\n### Multi-Agent Interaction\n\n```python\n# Continued from Example 2 above\nagent_writer = Agent(early_stop=100)\nagent_writer(f'Write a stock analysis report on: {user_input}', images)\n```\n\n### External LLM Integration\n\n```python\n# Create Agent with Mistral-7B-Instruct-v0.3 instead\nagent = Agent(toolchain = \"responses, history = mistral_api(prompt, history)\")\n\n# Generate a neurology ICU admission note\nagent('Write a neurology ICU admission note.')\n\n# Follow-up questions (multi-turn conversation)\nagent('Give me the inpatient BP goal for this patient.')\nagent('DVT ppx for this patient?')\nagent(\"Patient's prognosis?\")\n\n# End\nagent.end()\n```\n\n## Benchmarks\n\n```python\nfrom phi_3_vision_mlx import benchmark\n\nbenchmark()\n```\n\n| Task                  | Vanilla Model | Quantized Model | Quantized Cache | LoRA Adapter |\n|-----------------------|---------------|-----------------|-----------------|--------------|\n| Text Generation       |  24.87 tps     |  58.61 tps      |  18.47 tps       |  24.74 tps    |\n| Image Captioning      |  19.08 tps     |  37.43 tps      |  3.58 tps       |  18.88 tps    |\n| Batched Generation    |  235.79 tps     |  147.94 tps      |  122.02 tps       |  233.09 tps    |\n\n*(On M1 Max 64GB)*\n\n## More Examples\n\nFor advanced examples and external library integration, see `examples.py` in the project root. Preview:\n\n```python\n# Multimodal Reddit Thread Summarizer\nfrom rd2md import rd2md\nfrom pathlib import Path\nimport json\n\nfilename, contents, images = rd2md()\nprompt = 'Write an executive summary of above (max 200 words). The article should capture the diverse range of opinions and key points discussed in the thread, presenting a balanced view of the topic without quoting specific users or comments directly. Focus on organizing the information cohesively, highlighting major arguments, counterarguments, and any emerging consensus or unresolved issues within the community.'\nprompts = [f'{s}\\n\\n{prompt}' for s in contents]\nresults = [generate(prompts[i], images[i], max_tokens=512, blind_model=False, quantize_model=True, quantize_cache=False, verbose=False) for i in range(len(prompts))]\nwith open(Path(filename).with_suffix('.json'), 'w') as f:\n    json.dump({'prompts':prompts, 'images':images, 'results':results}, f, indent=4)\n```\n\n## Documentation\n\nAPI references and additional information are available at:\n\nhttps://josefalbers.github.io/Phi-3-Vision-MLX/\n\nAlso check out our tutorial series available at:\n\nhttps://medium.com/@albersj66\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n## Citation\n\n<a href=\"https://zenodo.org/doi/10.5281/zenodo.11403221\"><img src=\"https://zenodo.org/badge/806709541.svg\" alt=\"DOI\"></a>\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Phi-3-Vision on Apple silicon with MLX",
    "version": "0.1.5",
    "project_urls": {
        "Documentation": "https://josefalbers.github.io/Phi-3-Vision-MLX/",
        "Homepage": "https://github.com/JosefAlbers/Phi-3-Vision-MLX"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fd586d554f459bf4e43e20ea82aeaac35ec0462fdbb38e6996f3089793f35066",
                "md5": "e601e8b05104b06a0e42a7dd5da6513b",
                "sha256": "9402529c668eef4dc35598f4b4795429ea66b363e7e0a39a5eae17c529e027b5"
            },
            "downloads": -1,
            "filename": "phi_3_vision_mlx-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e601e8b05104b06a0e42a7dd5da6513b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12.3",
            "size": 36066,
            "upload_time": "2024-08-05T13:15:32",
            "upload_time_iso_8601": "2024-08-05T13:15:32.283766Z",
            "url": "https://files.pythonhosted.org/packages/fd/58/6d554f459bf4e43e20ea82aeaac35ec0462fdbb38e6996f3089793f35066/phi_3_vision_mlx-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "232744d1e908e5f50f13b2576da52cc6ec20563e83973a043be2700a4cd61e0c",
                "md5": "dd1e6f57530b860d19d8ba5c1900f7ac",
                "sha256": "71df7376dd79c3eeb67bfc9c3bdc509815038ebfab0d9ebf5eb58f5b3f9bc440"
            },
            "downloads": -1,
            "filename": "phi_3_vision_mlx-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "dd1e6f57530b860d19d8ba5c1900f7ac",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12.3",
            "size": 40018,
            "upload_time": "2024-08-05T13:15:34",
            "upload_time_iso_8601": "2024-08-05T13:15:34.045232Z",
            "url": "https://files.pythonhosted.org/packages/23/27/44d1e908e5f50f13b2576da52cc6ec20563e83973a043be2700a4cd61e0c/phi_3_vision_mlx-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-05 13:15:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "JosefAlbers",
    "github_project": "Phi-3-Vision-MLX",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "phi-3-vision-mlx"
}

Josef Albers