Name | wd-llm-caption JSON |
Version |
0.1.4a0
JSON |
| download |
home_page | None |
Summary | A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha, meta Llama 3.2 Vision Instruct, Qwen2 VL Instruct, Mini-CPM V2.6 and Florence-2 models. |
upload_time | 2024-10-20 06:59:03 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.10 |
license | Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. |
keywords |
image caption
wd
llama 3.2 vision instruct
joy caption alpha
qwen2 vl instruct
mini-cpm v2.6
florence-2
|
VCS |
|
bugtrack_url |
|
requirements |
numpy
opencv-python-headless
pillow
requests
tqdm
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# WD LLM Caption Cli
A Python base cli tool and a simple gradio GUI for caption images
with [WD series](https://huggingface.co/SmilingWolf), [joy-caption-pre-alpha](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha), [LLama3.2 Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct),
[Qwen2 VL Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
and [Florence-2](https://huggingface.co/microsoft/Florence-2-large) models.
<img alt="DEMO_her.jpg" src="DEMO/DEMO_GUI.png" width="700"/>
## Introduce
If you want to caption a training datasets for Image generation model(Stable Diffusion, Flux, Kolors or others)
This tool can make a caption with danbooru style tags or a nature language description.
### New Changes:
2024.10.19: Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.)
2024.10.18: Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support.
GUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava).
2024.10.13: Add Florence2 Support.
Now LLM will use own default generate params while `--llm_temperature` and `--llm_max_tokens` are 0.
2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.
2024.10.09: Build in wheel, now you can install this repo from pypi.
```shell
# Install torch base on your GPU driver. e.g.
pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
# Install via pip from pypi
pip install wd-llm-caption
# For CUDA 11.8
pip install -U -r requirements_onnx_cu118.txt
# For CUDA 12.X
pip install -U -r requirements_onnx_cu12x.txt
# CLI
wd-llm-caption --data_path your_data_path
# GUI
wd-llm-caption-gui
```
2024.10.04: Add Qwen2 VL support.
2024.09.30: A simple gui run through gradio now😊
## Example
<img alt="DEMO_her.jpg" src="DEMO/DEMO_her.jpg" width="600" height="800"/>
### Standalone Inference
#### WD Tags
Use wd-eva02-large-tagger-v3
```text
1girl, solo, long hair, breasts, looking at viewer, smile, blue eyes, blonde hair, medium breasts, white hair, ass, looking back, blunt bangs, from behind, english text, lips, night, building, science fiction, city, railing, realistic, android, cityscape, joints, cyborg, robot joints, city lights, mechanical parts, cyberpunk
```
#### Joy Caption
Default LLama3.1 8B, no quantization
```text
This is a digitally rendered image, likely created using advanced CGI techniques, featuring a young woman with a slender, athletic build and long, straight platinum blonde hair with bangs. She has fair skin and a confident, slightly playful expression. She is dressed in a futuristic, form-fitting suit that combines sleek, metallic armor with organic-looking, glossy black panels. The suit accentuates her curvaceous figure, emphasizing her ample breasts and hourglass waist. She stands on a balcony with a red railing, overlooking a nighttime cityscape with a prominent, illuminated tower in the background. The city is bustling with lights from various buildings, creating a vibrant, urban atmosphere. The text at the top of the image reads "PUBLISHED ON 2024.07.30," followed by "AN AIGC WORK BY DUKG" and "GENERATED BY STABLE DIFFUSION." Below, there are smaller texts indicating the artist's name and the studio where the image was created. The overall style is high-tech and futuristic, with a blend of cyberpunk and anime aesthetics, highlighting the intersection of human and machine elements in a visually striking and provocative manner.
```
#### Llama-3.2-11B-Vision-Instruct
Default LLama3.2 Vision 11B Instruct, no quantization
```text
The image depicts a futuristic scene featuring a humanoid robot standing on a balcony overlooking a cityscape at night. The robot, with its sleek white body and long, straight blonde hair, is positioned in the foreground, gazing back over its shoulder. Its slender, elongated body is adorned with black accents, and it stands on a red railing, its hands resting on the edge.
In the background, a city skyline stretches out, illuminated by the soft glow of streetlights and building lights. The overall atmosphere is one of futuristic sophistication, with the robot's advanced design and the city's modern architecture creating a sense of cutting-edge technology and innovation.
The image also features several text elements, including "PUBLISH ON 2024.07.30" at the top, "AN AIGC WORK BY DukeG" in the center, and "GENERATED BY Stable Diffusion" and "TUNED BY Adobe Photoshop" at the bottom. These texts provide context and attribution for the image, suggesting that it is a product of artificial intelligence and image generation technology.
Overall, the image presents a captivating and thought-provoking vision of a futuristic world, where technology and humanity coexist in a harmonious balance.
```
#### Qwen2-VL-7B-Instruct
Default Qwen2 VL 7B Instruct, no quantization
```text
TThe image depicts a person wearing a futuristic, robotic outfit with a predominantly white and black color scheme. The outfit includes a high-tech, form-fitting design with mechanical elements visible on the arms and legs. The person is standing on a balcony or a high structure, with a cityscape in the the background, including illuminated buildings and a prominent tower. The lighting is dark, suggesting it is nighttime. The image has has text text "PUBLISH ON 2 30" and "AN AIGC WORK BY DukeG" along with credits for the Stable Diffusion and Adobe Photoshop.
```
#### Mini-CPM V2.6 7B
Default Mini-CPM V2.6 7B, no quantization
```text
The image depicts a humanoid robot with a human-like appearance, standing on a balcony railing at night. The robot has a sleek, white and black body with visible mechanical joints and components, suggesting advanced technology. Its pose is confident, with one hand resting on the railing and the other hanging by its side. The robot has long, straight, platinum blonde hair that falls over its shoulders. The background features a cityscape with illuminated buildings and a prominent tower, suggesting an urban setting. The lighting is dramatic, highlighting the robot against the darker backdrop of the night sky. The overall atmosphere is one of futuristic sophistication.
```
#### Florence 2 large
Default Florence 2 large, no quantization
```text
The image is a promotional poster for an AIGC work by DukeG. It features a young woman with long blonde hair, standing on a rooftop with a city skyline in the background. She is wearing a futuristic-looking outfit with a white and black color scheme. The outfit has a high neckline and long sleeves, and the woman is posing with one hand on her hip and the other resting on the railing. The text on the poster reads "Publish on 2024.07.30" and "Generated by Stable Diffusion" with the text "Tuned by Adobe Photoshop".
```
### WD+LLM Inference
#### Joy Caption with WD
Use wd-eva02-large-tagger-v3 and LLama3.1 8B, no quantization.
WD tags used in LLama3.1 user prompt.
```text
The image is a high-resolution photograph featuring a young woman with long, platinum blonde hair and blue eyes. She is dressed in a sleek, form-fitting white and black bodysuit that resembles a futuristic cyborg suit, with visible mechanical joints and metallic textures. Her physique is slender and toned, with a noticeable emphasis on her hips and buttocks. She is standing on a red railing, with a cityscape in the background, including a prominent tower with a red antenna. The night sky is filled with twinkling city lights, creating a vibrant, cyberpunk atmosphere. The text at the top reads "PUBLISH ON 2024.07.30" and "An IG work by DukeG" at the bottom. The overall style is realistic, with a focus on modern, high-tech aesthetics.
```
#### Llama Caption with WD
Use wd-eva02-large-tagger-v3 and LLama3.2 Vision 11B Instruct, no quantization.
WD tags used in LLama3.2 Vision 11B Instruct user prompt.
```text
The image depicts a futuristic cityscape at night, with a striking white-haired woman standing in the foreground. She is dressed in a sleek white bodysuit, accentuating her slender figure and medium-sized breasts. Her long, straight hair cascades down her back, framing her face and complementing her bright blue eyes. A subtle smile plays on her lips as she gazes directly at the viewer, her expression both inviting and enigmatic.
The woman's attire is a testament to her cyberpunk aesthetic, with visible mechanical parts and joints that suggest a fusion of human and machine. Her android-like appearance is further emphasized by her robotic limbs, which seem to blend seamlessly with her organic form. The railing behind her provides a sense of depth and context, while the cityscape in the background is a vibrant tapestry of lights and skyscrapers.
In the distance, a prominent building stands out, its sleek design and towering height a testament to the city's modernity. The night sky above is a deep, inky black, punctuated only by the soft glow of city lights that cast a warm, golden hue over the scene. The overall atmosphere is one of futuristic sophistication, with the woman's striking appearance and the city's bustling energy combining to create a truly captivating image.
```
#### Qwen2 VL 7B Instruct Caption with WD
Use wd-eva02-large-tagger-v3 and Qwen2 VL 7B Instruct, no quantization.
WD tags used in Qwen2 VL 7B Instruct user prompt.
```text
The image depicts a person with long hair, wearing a futuristic, robotic outfit. The outfit is predominantly white with black accents, featuring mechanical joints and parts that resemble those of a cyborg or android. The person is standing on a railing, looking back over their shoulder with a smile, and has is wearing a blue dress. The background shows a cityscape at night with tall buildings and city lights, creating a cyberpunk atmosphere. The text on the the image includes the following information: "PUBLISH ON 2024.07.30," "AN AIGC WORK BY DukeG," "GENERATED BY Stable Diffusion," and "TUNED BY Adobe Photoshop.
```
#### Mini-CPM V2.6 7B Caption with WD
Use wd-eva02-large-tagger-v3 and Mini-CPM V2.6 7B, no quantization.
WD tags used in Mini-CPM V2.6 7B user prompt.
```text
The image features a solo female character with long blonde hair and blue eyes. She is wearing a revealing outfit that accentuates her medium-sized breasts and prominent buttocks. Her expression is one of a subtle smile, and she is looking directly at the viewer. The is a realistic portrayal of an android or cyborg, with mechanical parts visible in her joints and a sleek design that blends human and machine aesthetics. The background depicts a cityscape at night, illuminated by city lights, and the character is positioned near a railing, suggesting she is on a high vantage point, possibly a balcony or rooftop. The overall atmosphere of the image is cyberpunk, with a blend of futuristic technology and urban environment.
```
## Model source
Hugging Face are original sources, modelscope are pure forks from Hugging Face(Because Hugging Face was blocked in Some
place).
### WD Capiton models
| Model | Hugging Face Link | ModelScope Link |
|:----------------------------:|:-------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------:|
| wd-eva02-large-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-eva02-large-tagger-v3) |
| wd-vit-large-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-vit-large-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-vit-large-tagger-v3) |
| wd-swinv2-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-swinv2-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-swinv2-tagger-v3) |
| wd-vit-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-vit-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-vit-tagger-v3) |
| wd-convnext-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-convnext-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-convnext-tagger-v3) |
| wd-v1-4-moat-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-moat-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-moat-tagger-v2) |
| wd-v1-4-swinv2-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2) |
| wd-v1-4-convnextv2-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2) |
| wd-v1-4-vit-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2) |
| wd-v1-4-convnext-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2) |
| wd-v1-4-vit-tagger | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger) |
| wd-v1-4-convnext-tagger | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger) |
| Z3D-E621-Convnext | [Hugging Face](https://huggingface.co/toynya/Z3D-E621-Convnext) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Z3D-E621-Convnext) |
### Joy Caption models
| Model | Hugging Face Link | ModelScope Link |
|:----------------------------------:|:-------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|
| joy-caption-pre-alpha | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha) |
| Joy-Caption-Alpha-One | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one) |
| Joy-Caption-Alpha-Two | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two) |
| Joy-Caption-Alpha-Two-Llava | [Hugging Face](https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava) |
| siglip-so400m-patch14-384(Google) | [Hugging Face](https://huggingface.co/google/siglip-so400m-patch14-384) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384) |
| Meta-Llama-3.1-8B | [Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B) |
| unsloth/Meta-Llama-3.1-8B-Instruct | [Hugging Face](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct) |
| Llama-3.1-8B-Lexi-Uncensored-V2 | [Hugging Face](https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2) |
### Llama 3.2 Vision Instruct models
| Model | Hugging Face Link | ModelScope Link |
|:-------------------------------:|:----------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|
| Llama-3.2-11B-Vision-Instruct | [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct) |
| Llama-3.2-90B-Vision-Instruct | [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct) |
| Llama-3.2-11b-vision-uncensored | [Hugging Face](https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11b-vision-uncensored) |
### Qwen2 VL Instruct models
| Model | Hugging Face Link | ModelScope Link |
|:---------------------:|:-----------------------------------------------------------------:|:-------------------------------------------------------------------------:|
| Qwen2-VL-7B-Instruct | [Hugging Face](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct) |
| Qwen2-VL-72B-Instruct | [Hugging Face](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct) | [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct) |
### MiniCPM-V-2_6 models
| Model | Hugging Face Link | ModelScope Link |
|:-------------:|:------------------------------------------------------------:|:--------------------------------------------------------------------:|
| MiniCPM-V-2_6 | [Hugging Face](https://huggingface.co/openbmb/MiniCPM-V-2_6) | [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
### Florence-2 models
| Model | Hugging Face Link | ModelScope Link |
|:-------------------:|:--------------------------------------------------------------------:|:---------------------------------------------------------------------------------:|
| Florence-2-large | [Hugging Face](https://huggingface.co/microsoft/Florence-2-large) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large) |
| Florence-2-base | [Hugging Face](https://huggingface.co/microsoft/Florence-2-base) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base) |
| Florence-2-large-ft | [Hugging Face](https://huggingface.co/microsoft/Florence-2-large-ft) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft ) |
| Florence-2-base-ft | [Hugging Face](https://huggingface.co/microsoft/Florence-2-base-ft) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft) |
## Installation
Python 3.10 works fine.
Open a shell terminal and follow below steps:
```shell
# Clone this repo
git clone https://github.com/fireicewolf/wd-llm-caption-cli.git
cd wd-llm-caption-cli
# create a Python venv
python -m venv .venv
.\venv\Scripts\activate
# Install torch
# Install torch base on your GPU driver. e.g.
pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
# Base dependencies, models for inference will download via python request libs.
# For WD Caption
pip install -U -r requirements_wd.txt
# If you want load WD models with GPU.
# For CUDA 11.8
pip install -U -r requirements_onnx_cu118.txt
# For CUDA 12.X
pip install -U -r requirements_onnx_cu12x.txt
# For Joy Caption or Llama 3.2 Vision Instruct or Qwen2 VL Instruct
pip install -U -r requirements_llm.txt
# If you want to download or cache model via huggingface hub, install this.
pip install -U -r requirements_huggingface.txt
# If you want to download or cache model via modelscope hub, install this.
pip install -U -r requirements_modelscope.txt
# If you want to use GUI, install this.
pip install -U -r requirements_gui.txt
```
## GUI Usage
```shell
python gui.py
```
### GUI options
`--theme`
set gradio theme [`base`, `ocean`, `origin`], default is `base`.
`--port`
gradio webui port, default is `8282`
`--listen`
allow gradio remote connections
`--share`
allow gradio share
`--inbrowser`
auto open in browser
`--log_level`
set log level [`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`],
default is `INFO`
## CLI Simple Usage
Default will use both wd and llm caption to caption images,
Llama-3.2-11B-Vision-Instruct on Hugging Face is a gated models.
Joy caption used Meta Llama 3.1 8B, on Hugging Face it is a gated models,
so you need get access on Hugging Face first.
Then add `HF_TOKEN` to your environment variable.
Windows Powershell
```shell
$Env:HF_TOKEN="yourhftoken"
```
Windows CMD
```shell
set HF_TOKEN="yourhftoken"
```
Mac or Linux shell
```shell
export HF_TOKEN="yourhftoken"
```
In python script
```python
import os
os.environ["HF_TOKEN"] = "yourhftoken"
```
__Make sure your python venv has been activated first!__
```shell
python caption.py --data_path your_datasets_path
```
To run with more options, You can find help by run with this or see at [Options](#options)
```shell
python caption.py -h
```
### <span id="options">Options</span>
<details>
<summary>Advance options</summary>
`--data_path`
path where your datasets place
`--recursive`
Will include all support images format in your input datasets path and its sub-path.
`--log_level`
set log level[`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`], default is `INFO`
`--save_logs`
save log file.
logs will be saved at same level path with `data_path`.
e.g., Your input `data_path` is `/home/mydatasets`, your logs will be saved in `/home/`,named as
`mydatasets_xxxxxxxxx.log`(x means log created date.),
`--model_site`
download model from model site huggingface or modelscope, default is "huggingface".
`--models_save_path`
path to save models, default is `models`(Under wd-joy-caption-cli)
`--use_sdk_cache`
use sdk\'s cache dir to store models. if this option enabled, `--models_save_path` will be ignored.
`--download_method`
download models via SDK or URL, default is `SDK`(If download via SDK failed, will auto retry with URL).
`--force_download`
force download even file exists.
`--skip_download`
skip download if file exists.
`--caption_method`
method for caption [`wd`, `llm`, `wd+llm`],
select wd or llm models, or both of them to caption, default is `wd+llm`.
`--run_method`
running method for wd+joy caption[`sync`, `queue`], need `caption_method` set to `both`.
if `sync`, image will caption with wd models,
then caption with joy models while wd captions in joy user prompt.
if `queue`, all images will caption with wd models first,
then caption all of them with joy models while wd captions in joy user prompt.
default is `sync`.
`--caption_extension`
extension of caption file, default is `.txt`.
If `caption_method` not `wd+llm`, it will be wd or llm caption file extension.
`--save_caption_together`
Save WD tags and LLM captions in one file.
`--save_caption_together_seperator`
Seperator between WD and LLM captions, if they are saved in one file.
`--image_size`
resize image to suitable, default is `1024`.
`--not_overwrite`
not overwrite caption file if exists.
`--custom_caption_save_path`
custom caption file save path.
`--wd_config`
configs json for wd tagger models, default is `default_wd.json`
`--wd_model_name`
wd tagger model name will be used for caption inference, default is `wd-swinv2-v3`.
`--wd_force_use_cpu`
force use cpu for wd models inference.
`--wd_caption_extension`
extension for wd captions files while `caption_method` is `both`, default is `.wdcaption`.
`--wd_remove_underscore`
replace underscores with spaces in the output tags.
e.g., `hold_in_hands` will be `hold in hands`.
`--wd_undesired_tags`
comma-separated list of undesired tags to remove from the wd captions.
`--wd_tags_frequency`
Show frequency of tags for images.
`--wd_threshold`
threshold of confidence to add a tag, default value is `0.35`.
`--wd_general_threshold`
threshold of confidence to add a tag from general category, same as `--threshold` if omitted.
`--wd_character_threshold`
threshold of confidence to add a tag for character category, same as `--threshold` if omitted.
`--wd_add_rating_tags_to_first`
Adds rating tags to the first.
`--wd_add_rating_tags_to_last`
Adds rating tags to the last.
`--wd_character_tags_first`
Always put character tags before the general tags.
`--wd_always_first_tags`
comma-separated list of tags to always put at the beginning, e.g. `1girl,solo`
`--wd_caption_separator`
Separator for captions(include space if needed), default is `, `.
`--wd_tag_replacement`
tag replacement in the format of `source1,target1;source2,target2; ...`.
Escape `,` and `;` with `\\`. e.g. `tag1,tag2;tag3,tag4
`--wd_character_tag_expand`
expand tag tail parenthesis to another tag for character tags.
e.g., `character_name_(series)` will be expanded to `character_name, series`.
`--llm_choice`
select llm models[`joy`, `llama`, `qwen`, `minicpm`, `florence`], default is `llama`.
`--llm_config`
config json for Joy Caption models, default is `default_llama_3.2V.json`
`--llm_model_name`
model name for inference, default is `Llama-3.2-11B-Vision-Instruct`
`--llm_patch`
patch llm with lora for uncensored, only support `Llama-3.2-11B-Vision-Instruct` now
`--llm_use_cpu`
load joy models use cpu.
`--llm_llm_dtype`
choice joy llm load dtype[`fp16`, `bf16", `fp32`], default is `fp16`.
`--llm_llm_qnt`
Enable quantization for joy llm [`none`,`4bit`, `8bit`]. default is `none`.
`--llm_caption_extension`
extension of caption file, default is `.llmcaption`
`--llm_read_wd_caption`
llm will read wd caption for inference. Only effect when `caption_method` is `llm`
`--llm_caption_without_wd`
llm will not read wd caption for inference.Only effect when `caption_method` is `wd+llm`
`--llm_user_prompt`
user prompt for caption.
`--llm_temperature`
temperature for LLM model, default is `0`,means use llm own default value.
`--llm_max_tokens`
max tokens for LLM model output, default is `0`, means use llm own default value.
</details>
## Credits
Base
on [SmilingWolf/wd-tagger models](https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py), [fancyfeast/joy-caption models](https://huggingface.co/fancyfeast), [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct),
[Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [openbmb/Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
and [microsoft/florence2](https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de).
Without their works(👏👏), this repo won't exist.
Raw data
{
"_id": null,
"home_page": null,
"name": "wd-llm-caption",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "Image Caption, WD, Llama 3.2 Vision Instruct, Joy Caption Alpha, Qwen2 VL Instruct, Mini-CPM V2.6, Florence-2",
"author": null,
"author_email": "DukeG <fireicewolf@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/f3/12/27f784d31bf070ef3de9d655bf055d87b6b835263e0220f6867c28c1d3b3/wd_llm_caption-0.1.4a0.tar.gz",
"platform": null,
"description": "# WD LLM Caption Cli\n\nA Python base cli tool and a simple gradio GUI for caption images\nwith [WD series](https://huggingface.co/SmilingWolf), [joy-caption-pre-alpha](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha), [LLama3.2 Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct),\n[Qwen2 VL Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)\nand [Florence-2](https://huggingface.co/microsoft/Florence-2-large) models.\n\n<img alt=\"DEMO_her.jpg\" src=\"DEMO/DEMO_GUI.png\" width=\"700\"/>\n\n## Introduce\n\nIf you want to caption a training datasets for Image generation model(Stable Diffusion, Flux, Kolors or others) \nThis tool can make a caption with danbooru style tags or a nature language description.\n\n### New Changes:\n\n2024.10.19: Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.)\n\n2024.10.18: Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support. \nGUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava).\n\n2024.10.13: Add Florence2 Support. \nNow LLM will use own default generate params while `--llm_temperature` and `--llm_max_tokens` are 0.\n\n2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.\n\n2024.10.09: Build in wheel, now you can install this repo from pypi.\n\n```shell\n# Install torch base on your GPU driver. e.g.\npip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124\n# Install via pip from pypi\npip install wd-llm-caption\n# For CUDA 11.8\npip install -U -r requirements_onnx_cu118.txt\n# For CUDA 12.X\npip install -U -r requirements_onnx_cu12x.txt\n# CLI\nwd-llm-caption --data_path your_data_path\n# GUI\nwd-llm-caption-gui\n```\n\n2024.10.04: Add Qwen2 VL support.\n\n2024.09.30: A simple gui run through gradio now\ud83d\ude0a\n\n## Example\n\n<img alt=\"DEMO_her.jpg\" src=\"DEMO/DEMO_her.jpg\" width=\"600\" height=\"800\"/>\n\n### Standalone Inference\n\n#### WD Tags\n\nUse wd-eva02-large-tagger-v3\n\n```text\n1girl, solo, long hair, breasts, looking at viewer, smile, blue eyes, blonde hair, medium breasts, white hair, ass, looking back, blunt bangs, from behind, english text, lips, night, building, science fiction, city, railing, realistic, android, cityscape, joints, cyborg, robot joints, city lights, mechanical parts, cyberpunk\n```\n\n#### Joy Caption\n\nDefault LLama3.1 8B, no quantization\n\n```text\nThis is a digitally rendered image, likely created using advanced CGI techniques, featuring a young woman with a slender, athletic build and long, straight platinum blonde hair with bangs. She has fair skin and a confident, slightly playful expression. She is dressed in a futuristic, form-fitting suit that combines sleek, metallic armor with organic-looking, glossy black panels. The suit accentuates her curvaceous figure, emphasizing her ample breasts and hourglass waist. She stands on a balcony with a red railing, overlooking a nighttime cityscape with a prominent, illuminated tower in the background. The city is bustling with lights from various buildings, creating a vibrant, urban atmosphere. The text at the top of the image reads \"PUBLISHED ON 2024.07.30,\" followed by \"AN AIGC WORK BY DUKG\" and \"GENERATED BY STABLE DIFFUSION.\" Below, there are smaller texts indicating the artist's name and the studio where the image was created. The overall style is high-tech and futuristic, with a blend of cyberpunk and anime aesthetics, highlighting the intersection of human and machine elements in a visually striking and provocative manner.\n```\n\n#### Llama-3.2-11B-Vision-Instruct\n\nDefault LLama3.2 Vision 11B Instruct, no quantization\n\n```text\nThe image depicts a futuristic scene featuring a humanoid robot standing on a balcony overlooking a cityscape at night. The robot, with its sleek white body and long, straight blonde hair, is positioned in the foreground, gazing back over its shoulder. Its slender, elongated body is adorned with black accents, and it stands on a red railing, its hands resting on the edge.\n\nIn the background, a city skyline stretches out, illuminated by the soft glow of streetlights and building lights. The overall atmosphere is one of futuristic sophistication, with the robot's advanced design and the city's modern architecture creating a sense of cutting-edge technology and innovation.\n\nThe image also features several text elements, including \"PUBLISH ON 2024.07.30\" at the top, \"AN AIGC WORK BY DukeG\" in the center, and \"GENERATED BY Stable Diffusion\" and \"TUNED BY Adobe Photoshop\" at the bottom. These texts provide context and attribution for the image, suggesting that it is a product of artificial intelligence and image generation technology.\n\nOverall, the image presents a captivating and thought-provoking vision of a futuristic world, where technology and humanity coexist in a harmonious balance.\n```\n\n#### Qwen2-VL-7B-Instruct\n\nDefault Qwen2 VL 7B Instruct, no quantization\n\n```text\nTThe image depicts a person wearing a futuristic, robotic outfit with a predominantly white and black color scheme. The outfit includes a high-tech, form-fitting design with mechanical elements visible on the arms and legs. The person is standing on a balcony or a high structure, with a cityscape in the the background, including illuminated buildings and a prominent tower. The lighting is dark, suggesting it is nighttime. The image has has text text \"PUBLISH ON 2 30\" and \"AN AIGC WORK BY DukeG\" along with credits for the Stable Diffusion and Adobe Photoshop.\n```\n\n#### Mini-CPM V2.6 7B\n\nDefault Mini-CPM V2.6 7B, no quantization\n\n```text\nThe image depicts a humanoid robot with a human-like appearance, standing on a balcony railing at night. The robot has a sleek, white and black body with visible mechanical joints and components, suggesting advanced technology. Its pose is confident, with one hand resting on the railing and the other hanging by its side. The robot has long, straight, platinum blonde hair that falls over its shoulders. The background features a cityscape with illuminated buildings and a prominent tower, suggesting an urban setting. The lighting is dramatic, highlighting the robot against the darker backdrop of the night sky. The overall atmosphere is one of futuristic sophistication.\n```\n\n#### Florence 2 large\n\nDefault Florence 2 large, no quantization\n\n```text\nThe image is a promotional poster for an AIGC work by DukeG. It features a young woman with long blonde hair, standing on a rooftop with a city skyline in the background. She is wearing a futuristic-looking outfit with a white and black color scheme. The outfit has a high neckline and long sleeves, and the woman is posing with one hand on her hip and the other resting on the railing. The text on the poster reads \"Publish on 2024.07.30\" and \"Generated by Stable Diffusion\" with the text \"Tuned by Adobe Photoshop\".\n```\n\n### WD+LLM Inference\n\n#### Joy Caption with WD\n\nUse wd-eva02-large-tagger-v3 and LLama3.1 8B, no quantization.\nWD tags used in LLama3.1 user prompt.\n\n```text\nThe image is a high-resolution photograph featuring a young woman with long, platinum blonde hair and blue eyes. She is dressed in a sleek, form-fitting white and black bodysuit that resembles a futuristic cyborg suit, with visible mechanical joints and metallic textures. Her physique is slender and toned, with a noticeable emphasis on her hips and buttocks. She is standing on a red railing, with a cityscape in the background, including a prominent tower with a red antenna. The night sky is filled with twinkling city lights, creating a vibrant, cyberpunk atmosphere. The text at the top reads \"PUBLISH ON 2024.07.30\" and \"An IG work by DukeG\" at the bottom. The overall style is realistic, with a focus on modern, high-tech aesthetics.\n```\n\n#### Llama Caption with WD\n\nUse wd-eva02-large-tagger-v3 and LLama3.2 Vision 11B Instruct, no quantization.\nWD tags used in LLama3.2 Vision 11B Instruct user prompt.\n\n```text\nThe image depicts a futuristic cityscape at night, with a striking white-haired woman standing in the foreground. She is dressed in a sleek white bodysuit, accentuating her slender figure and medium-sized breasts. Her long, straight hair cascades down her back, framing her face and complementing her bright blue eyes. A subtle smile plays on her lips as she gazes directly at the viewer, her expression both inviting and enigmatic.\n\nThe woman's attire is a testament to her cyberpunk aesthetic, with visible mechanical parts and joints that suggest a fusion of human and machine. Her android-like appearance is further emphasized by her robotic limbs, which seem to blend seamlessly with her organic form. The railing behind her provides a sense of depth and context, while the cityscape in the background is a vibrant tapestry of lights and skyscrapers.\n\nIn the distance, a prominent building stands out, its sleek design and towering height a testament to the city's modernity. The night sky above is a deep, inky black, punctuated only by the soft glow of city lights that cast a warm, golden hue over the scene. The overall atmosphere is one of futuristic sophistication, with the woman's striking appearance and the city's bustling energy combining to create a truly captivating image.\n```\n\n#### Qwen2 VL 7B Instruct Caption with WD\n\nUse wd-eva02-large-tagger-v3 and Qwen2 VL 7B Instruct, no quantization.\nWD tags used in Qwen2 VL 7B Instruct user prompt.\n\n```text\nThe image depicts a person with long hair, wearing a futuristic, robotic outfit. The outfit is predominantly white with black accents, featuring mechanical joints and parts that resemble those of a cyborg or android. The person is standing on a railing, looking back over their shoulder with a smile, and has is wearing a blue dress. The background shows a cityscape at night with tall buildings and city lights, creating a cyberpunk atmosphere. The text on the the image includes the following information: \"PUBLISH ON 2024.07.30,\" \"AN AIGC WORK BY DukeG,\" \"GENERATED BY Stable Diffusion,\" and \"TUNED BY Adobe Photoshop.\n```\n\n#### Mini-CPM V2.6 7B Caption with WD\n\nUse wd-eva02-large-tagger-v3 and Mini-CPM V2.6 7B, no quantization.\nWD tags used in Mini-CPM V2.6 7B user prompt.\n\n```text\nThe image features a solo female character with long blonde hair and blue eyes. She is wearing a revealing outfit that accentuates her medium-sized breasts and prominent buttocks. Her expression is one of a subtle smile, and she is looking directly at the viewer. The is a realistic portrayal of an android or cyborg, with mechanical parts visible in her joints and a sleek design that blends human and machine aesthetics. The background depicts a cityscape at night, illuminated by city lights, and the character is positioned near a railing, suggesting she is on a high vantage point, possibly a balcony or rooftop. The overall atmosphere of the image is cyberpunk, with a blend of futuristic technology and urban environment.\n```\n\n## Model source\n\nHugging Face are original sources, modelscope are pure forks from Hugging Face(Because Hugging Face was blocked in Some\nplace).\n\n### WD Capiton models\n\n| Model | Hugging Face Link | ModelScope Link |\n|:----------------------------:|:-------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------:|\n| wd-eva02-large-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-eva02-large-tagger-v3) |\n| wd-vit-large-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-vit-large-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-vit-large-tagger-v3) |\n| wd-swinv2-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-swinv2-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-swinv2-tagger-v3) |\n| wd-vit-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-vit-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-vit-tagger-v3) |\n| wd-convnext-tagger-v3 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-convnext-tagger-v3) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-convnext-tagger-v3) |\n| wd-v1-4-moat-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-moat-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-moat-tagger-v2) |\n| wd-v1-4-swinv2-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-swinv2-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-swinv2-tagger-v2) |\n| wd-v1-4-convnextv2-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnextv2-tagger-v2) |\n| wd-v1-4-vit-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger-v2) |\n| wd-v1-4-convnext-tagger-v2 | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger-v2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger-v2) |\n| wd-v1-4-vit-tagger | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-vit-tagger) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-vit-tagger) |\n| wd-v1-4-convnext-tagger | [Hugging Face](https://huggingface.co/SmilingWolf/wd-v1-4-convnext-tagger) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/wd-v1-4-convnext-tagger) |\n| Z3D-E621-Convnext | [Hugging Face](https://huggingface.co/toynya/Z3D-E621-Convnext) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Z3D-E621-Convnext) |\n\n### Joy Caption models\n\n| Model | Hugging Face Link | ModelScope Link |\n|:----------------------------------:|:-------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|\n| joy-caption-pre-alpha | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha) |\n| Joy-Caption-Alpha-One | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one) |\n| Joy-Caption-Alpha-Two | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two) |\n| Joy-Caption-Alpha-Two-Llava | [Hugging Face](https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava) |\n| siglip-so400m-patch14-384(Google) | [Hugging Face](https://huggingface.co/google/siglip-so400m-patch14-384) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384) |\n| Meta-Llama-3.1-8B | [Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B) |\n| unsloth/Meta-Llama-3.1-8B-Instruct | [Hugging Face](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct) |\n| Llama-3.1-8B-Lexi-Uncensored-V2 | [Hugging Face](https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2) |\n\n### Llama 3.2 Vision Instruct models\n\n| Model | Hugging Face Link | ModelScope Link |\n|:-------------------------------:|:----------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|\n| Llama-3.2-11B-Vision-Instruct | [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11B-Vision-Instruct) |\n| Llama-3.2-90B-Vision-Instruct | [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-90B-Vision-Instruct) |\n| Llama-3.2-11b-vision-uncensored | [Hugging Face](https://huggingface.co/Guilherme34/Llama-3.2-11b-vision-uncensored) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.2-11b-vision-uncensored) |\n\n### Qwen2 VL Instruct models\n\n| Model | Hugging Face Link | ModelScope Link |\n|:---------------------:|:-----------------------------------------------------------------:|:-------------------------------------------------------------------------:|\n| Qwen2-VL-7B-Instruct | [Hugging Face](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen2-VL-7B-Instruct) |\n| Qwen2-VL-72B-Instruct | [Hugging Face](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct) | [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct) |\n\n### MiniCPM-V-2_6 models\n\n| Model | Hugging Face Link | ModelScope Link |\n|:-------------:|:------------------------------------------------------------:|:--------------------------------------------------------------------:|\n| MiniCPM-V-2_6 | [Hugging Face](https://huggingface.co/openbmb/MiniCPM-V-2_6) | [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |\n\n### Florence-2 models\n\n| Model | Hugging Face Link | ModelScope Link |\n|:-------------------:|:--------------------------------------------------------------------:|:---------------------------------------------------------------------------------:|\n| Florence-2-large | [Hugging Face](https://huggingface.co/microsoft/Florence-2-large) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large) |\n| Florence-2-base | [Hugging Face](https://huggingface.co/microsoft/Florence-2-base) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base) |\n| Florence-2-large-ft | [Hugging Face](https://huggingface.co/microsoft/Florence-2-large-ft) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft ) |\n| Florence-2-base-ft | [Hugging Face](https://huggingface.co/microsoft/Florence-2-base-ft) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft) |\n\n## Installation\n\nPython 3.10 works fine.\n\nOpen a shell terminal and follow below steps:\n\n```shell\n# Clone this repo\ngit clone https://github.com/fireicewolf/wd-llm-caption-cli.git\ncd wd-llm-caption-cli\n\n# create a Python venv\npython -m venv .venv\n.\\venv\\Scripts\\activate\n\n# Install torch\n# Install torch base on your GPU driver. e.g.\npip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124\n \n# Base dependencies, models for inference will download via python request libs.\n# For WD Caption\npip install -U -r requirements_wd.txt\n\n# If you want load WD models with GPU.\n# For CUDA 11.8\npip install -U -r requirements_onnx_cu118.txt\n# For CUDA 12.X\npip install -U -r requirements_onnx_cu12x.txt\n\n# For Joy Caption or Llama 3.2 Vision Instruct or Qwen2 VL Instruct\npip install -U -r requirements_llm.txt\n\n# If you want to download or cache model via huggingface hub, install this.\npip install -U -r requirements_huggingface.txt\n\n# If you want to download or cache model via modelscope hub, install this.\npip install -U -r requirements_modelscope.txt\n\n# If you want to use GUI, install this.\npip install -U -r requirements_gui.txt\n```\n\n## GUI Usage\n\n```shell\npython gui.py\n```\n\n### GUI options\n\n`--theme`\nset gradio theme [`base`, `ocean`, `origin`], default is `base`.\n`--port` \ngradio webui port, default is `8282` \n`--listen` \nallow gradio remote connections \n`--share` \nallow gradio share \n`--inbrowser`\nauto open in browser \n`--log_level` \nset log level [`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`], \ndefault is `INFO`\n\n## CLI Simple Usage\n\nDefault will use both wd and llm caption to caption images, \nLlama-3.2-11B-Vision-Instruct on Hugging Face is a gated models. \nJoy caption used Meta Llama 3.1 8B, on Hugging Face it is a gated models, \nso you need get access on Hugging Face first. \nThen add `HF_TOKEN` to your environment variable.\n\nWindows Powershell\n\n```shell\n$Env:HF_TOKEN=\"yourhftoken\"\n```\n\nWindows CMD\n\n```shell\nset HF_TOKEN=\"yourhftoken\"\n```\n\nMac or Linux shell\n\n```shell\nexport HF_TOKEN=\"yourhftoken\"\n```\n\nIn python script\n\n```python\nimport os\n\nos.environ[\"HF_TOKEN\"] = \"yourhftoken\"\n```\n\n__Make sure your python venv has been activated first!__\n\n```shell\npython caption.py --data_path your_datasets_path\n```\n\nTo run with more options, You can find help by run with this or see at [Options](#options)\n\n```shell\npython caption.py -h\n```\n\n### <span id=\"options\">Options</span>\n\n<details>\n <summary>Advance options</summary>\n\n`--data_path`\n\npath where your datasets place\n\n`--recursive`\n\nWill include all support images format in your input datasets path and its sub-path.\n\n`--log_level`\n\nset log level[`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`], default is `INFO`\n\n`--save_logs`\n\nsave log file.\nlogs will be saved at same level path with `data_path`.\ne.g., Your input `data_path` is `/home/mydatasets`, your logs will be saved in `/home/`,named as\n`mydatasets_xxxxxxxxx.log`(x means log created date.),\n\n`--model_site`\n\ndownload model from model site huggingface or modelscope, default is \"huggingface\".\n\n`--models_save_path`\n\npath to save models, default is `models`(Under wd-joy-caption-cli)\n\n`--use_sdk_cache`\n\nuse sdk\\'s cache dir to store models. if this option enabled, `--models_save_path` will be ignored.\n\n`--download_method`\n\ndownload models via SDK or URL, default is `SDK`(If download via SDK failed, will auto retry with URL).\n\n`--force_download`\n\nforce download even file exists.\n\n`--skip_download`\n\nskip download if file exists.\n\n`--caption_method`\n\nmethod for caption [`wd`, `llm`, `wd+llm`], \nselect wd or llm models, or both of them to caption, default is `wd+llm`.\n\n`--run_method`\n\nrunning method for wd+joy caption[`sync`, `queue`], need `caption_method` set to `both`.\nif `sync`, image will caption with wd models,\nthen caption with joy models while wd captions in joy user prompt.\nif `queue`, all images will caption with wd models first,\nthen caption all of them with joy models while wd captions in joy user prompt.\ndefault is `sync`.\n\n`--caption_extension`\n\nextension of caption file, default is `.txt`.\nIf `caption_method` not `wd+llm`, it will be wd or llm caption file extension.\n\n`--save_caption_together`\n\nSave WD tags and LLM captions in one file.\n\n`--save_caption_together_seperator`\n\nSeperator between WD and LLM captions, if they are saved in one file.\n\n`--image_size`\n\nresize image to suitable, default is `1024`.\n\n`--not_overwrite`\n\nnot overwrite caption file if exists.\n\n`--custom_caption_save_path`\n\ncustom caption file save path.\n\n`--wd_config`\n\nconfigs json for wd tagger models, default is `default_wd.json`\n\n`--wd_model_name`\n\nwd tagger model name will be used for caption inference, default is `wd-swinv2-v3`.\n\n`--wd_force_use_cpu`\n\nforce use cpu for wd models inference.\n\n`--wd_caption_extension`\n\nextension for wd captions files while `caption_method` is `both`, default is `.wdcaption`.\n\n`--wd_remove_underscore`\n\nreplace underscores with spaces in the output tags.\ne.g., `hold_in_hands` will be `hold in hands`.\n\n`--wd_undesired_tags`\n\ncomma-separated list of undesired tags to remove from the wd captions.\n\n`--wd_tags_frequency`\n\nShow frequency of tags for images.\n\n`--wd_threshold`\n\nthreshold of confidence to add a tag, default value is `0.35`.\n\n`--wd_general_threshold`\n\nthreshold of confidence to add a tag from general category, same as `--threshold` if omitted.\n\n`--wd_character_threshold`\n\nthreshold of confidence to add a tag for character category, same as `--threshold` if omitted.\n\n`--wd_add_rating_tags_to_first`\n\nAdds rating tags to the first.\n\n`--wd_add_rating_tags_to_last`\n\nAdds rating tags to the last.\n\n`--wd_character_tags_first`\n\nAlways put character tags before the general tags.\n\n`--wd_always_first_tags`\n\ncomma-separated list of tags to always put at the beginning, e.g. `1girl,solo`\n\n`--wd_caption_separator`\n\nSeparator for captions(include space if needed), default is `, `.\n\n`--wd_tag_replacement`\n\ntag replacement in the format of `source1,target1;source2,target2; ...`.\nEscape `,` and `;` with `\\\\`. e.g. `tag1,tag2;tag3,tag4\n\n`--wd_character_tag_expand`\n\nexpand tag tail parenthesis to another tag for character tags.\ne.g., `character_name_(series)` will be expanded to `character_name, series`.\n\n`--llm_choice`\n\nselect llm models[`joy`, `llama`, `qwen`, `minicpm`, `florence`], default is `llama`.\n\n`--llm_config`\n\nconfig json for Joy Caption models, default is `default_llama_3.2V.json`\n\n`--llm_model_name`\n\nmodel name for inference, default is `Llama-3.2-11B-Vision-Instruct`\n\n`--llm_patch`\n\npatch llm with lora for uncensored, only support `Llama-3.2-11B-Vision-Instruct` now\n\n`--llm_use_cpu`\n\nload joy models use cpu.\n\n`--llm_llm_dtype`\n\nchoice joy llm load dtype[`fp16`, `bf16\", `fp32`], default is `fp16`.\n\n`--llm_llm_qnt`\n\nEnable quantization for joy llm [`none`,`4bit`, `8bit`]. default is `none`.\n\n`--llm_caption_extension`\n\nextension of caption file, default is `.llmcaption`\n\n`--llm_read_wd_caption`\n\nllm will read wd caption for inference. Only effect when `caption_method` is `llm`\n\n`--llm_caption_without_wd`\n\nllm will not read wd caption for inference.Only effect when `caption_method` is `wd+llm`\n\n`--llm_user_prompt`\n\nuser prompt for caption.\n\n`--llm_temperature`\n\ntemperature for LLM model, default is `0`\uff0cmeans use llm own default value.\n\n`--llm_max_tokens`\n\nmax tokens for LLM model output, default is `0`, means use llm own default value.\n\n</details>\n\n## Credits\n\nBase\non [SmilingWolf/wd-tagger models](https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py), [fancyfeast/joy-caption models](https://huggingface.co/fancyfeast), [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct), \n[Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [openbmb/Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)\nand [microsoft/florence2](https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de).\nWithout their works(\ud83d\udc4f\ud83d\udc4f), this repo won't exist.\n",
"bugtrack_url": null,
"license": "Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. \"License\" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. \"Licensor\" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. \"Legal Entity\" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, \"control\" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. \"You\" (or \"Your\") shall mean an individual or Legal Entity exercising permissions granted by this License. \"Source\" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. \"Object\" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. \"Work\" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). \"Derivative Works\" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. \"Contribution\" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, \"submitted\" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as \"Not a Contribution.\" \"Contributor\" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a \"NOTICE\" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets \"[]\" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same \"printed page\" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ",
"summary": "A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha, meta Llama 3.2 Vision Instruct, Qwen2 VL Instruct, Mini-CPM V2.6 and Florence-2 models.",
"version": "0.1.4a0",
"project_urls": {
"Homepage": "https://github.com/fireicewolf/wd-llm-caption-cli",
"Issues": "https://github.com/fireicewolf/wd-llm-caption-cli/issues"
},
"split_keywords": [
"image caption",
" wd",
" llama 3.2 vision instruct",
" joy caption alpha",
" qwen2 vl instruct",
" mini-cpm v2.6",
" florence-2"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "89dff44280e3ac848863a6f064ee76eeb91bca6a68cd3e4eeef08d91a3bc8797",
"md5": "25628e39d9659e1eba3de2633b11d70d",
"sha256": "40c0b2629b543c6a571c8d645c5eed11e2964a65764670cdcaf2cdd21a93c50e"
},
"downloads": -1,
"filename": "wd_llm_caption-0.1.4a0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "25628e39d9659e1eba3de2633b11d70d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 57433,
"upload_time": "2024-10-20T06:59:00",
"upload_time_iso_8601": "2024-10-20T06:59:00.009813Z",
"url": "https://files.pythonhosted.org/packages/89/df/f44280e3ac848863a6f064ee76eeb91bca6a68cd3e4eeef08d91a3bc8797/wd_llm_caption-0.1.4a0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f31227f784d31bf070ef3de9d655bf055d87b6b835263e0220f6867c28c1d3b3",
"md5": "f88d739327523a39c6ad9d2d0b3f24ed",
"sha256": "5d2c6b9b634be1f04a6e8ad34c1ebb139e237d559d512583bfbcff1a68218c0b"
},
"downloads": -1,
"filename": "wd_llm_caption-0.1.4a0.tar.gz",
"has_sig": false,
"md5_digest": "f88d739327523a39c6ad9d2d0b3f24ed",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 64263,
"upload_time": "2024-10-20T06:59:03",
"upload_time_iso_8601": "2024-10-20T06:59:03.840687Z",
"url": "https://files.pythonhosted.org/packages/f3/12/27f784d31bf070ef3de9d655bf055d87b6b835263e0220f6867c28c1d3b3/wd_llm_caption-0.1.4a0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-20 06:59:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fireicewolf",
"github_project": "wd-llm-caption-cli",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "numpy",
"specs": [
[
"<",
"2.0.0"
],
[
">=",
"1.26.4"
]
]
},
{
"name": "opencv-python-headless",
"specs": [
[
"==",
"4.10.0.84"
]
]
},
{
"name": "pillow",
"specs": [
[
">=",
"10.4.0"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.3"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.66.5"
]
]
}
],
"lcname": "wd-llm-caption"
}