[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)
# 🌌 Starlight Vision 🚀
![Starlight](starlight.png)
🪐 Starlight Vision is a powerful multi-modal AI model designed to generate high-quality novel videos using text, images, or video clips as input. By leveraging state-of-the-art deep learning techniques, it can synthesize realistic and visually impressive video content that can be used in a variety of applications, such as movie production, advertising, virtual reality, and more. 🎥
## 🌟 Features
- 📝 Generate videos from text descriptions
- 🌃 Convert images into video sequences
- 📼 Extend existing video clips with novel content
- 🔮 High-quality output with customizable resolution
- 🧠 Easy to use API for quick integration
## 📦 Installation
To install Starlight Vision, simply use pip:
```bash
pip install starlight-vision
```
## 🎬 Quick Start
After we train you can install Starlight Vision and can start generating videos using the following code:
```python
import torch
from starlight_vision import Unet3D, ElucidatedStarlight, StarlightTrainer
unet1 = Unet3D(dim = 64, dim_mults = (1, 2, 4, 8)).cuda()
unet2 = Unet3D(dim = 64, dim_mults = (1, 2, 4, 8)).cuda()
# elucidated starlight, which contains the unets above (base unet and super resoluting ones)
starlight = ElucidatedStarlight(
unets = (unet1, unet2),
image_sizes = (16, 32),
random_crop_sizes = (None, 16),
temporal_downsample_factor = (2, 1), # in this example, the first unet would receive the video temporally downsampled by 2x
num_sample_steps = 10,
cond_drop_prob = 0.1,
sigma_min = 0.002, # min noise level
sigma_max = (80, 160), # max noise level, double the max noise level for upsampler
sigma_data = 0.5, # standard deviation of data distribution
rho = 7, # controls the sampling schedule
P_mean = -1.2, # mean of log-normal distribution from which noise is drawn for training
P_std = 1.2, # standard deviation of log-normal distribution from which noise is drawn for training
S_churn = 80, # parameters for stochastic sampling - depends on dataset, Table 5 in apper
S_tmin = 0.05,
S_tmax = 50,
S_noise = 1.003,
).cuda()
texts = [
'a whale breaching from afar',
'young girl blowing out candles on her birthday cake',
'fireworks with blue and green sparkles',
'dust motes swirling in the morning sunshine on the windowsill'
]
videos = torch.randn(4, 3, 10, 32, 32).cuda() # (batch, channels, time / video frames, height, width)
# feed images into starlight, training each unet in the cascade
# for this example, only training unet 1
trainer = StarlightTrainer(starlight)
# you can also ignore time when training on video initially, shown to improve results in video-ddpm paper. eventually will make the 3d unet trainable with either images or video. research shows it is essential (with current data regimes) to train first on text-to-image. probably won't be true in another decade. all big data becomes small data
trainer(videos, texts = texts, unet_number = 1, ignore_time = False)
trainer.update(unet_number = 1)
videos = trainer.sample(texts = texts, video_frames = 20) # extrapolating to 20 frames from training on 10 frames
videos.shape # (4, 3, 20, 32, 32)
```
## 🤝 Contributing
We welcome contributions from the community! If you'd like to contribute, please follow these steps:
1. 🍴 Fork the repository on GitHub
2. 🌱 Create a new branch for your feature or bugfix
3. 📝 Commit your changes and push the branch to your fork
4. 🚀 Create a pull request and describe your changes
## 📄 License
Starlight Vision is released under the APACHE License. See the [LICENSE](LICENSE) file for more details.
## 🗺️ Roadmap
The following roadmap outlines our plans for future development and enhancements to Starlight Vision. We aim to achieve these milestones through a combination of research, development, and collaboration with the community.
### 🚀 Short-term Goals
- [ ] Improve text-to-video synthesis by incorporating advanced natural language understanding techniques
- [ ] Train on LAION-5B and video datasets
- [ ] Enhance the quality of generated videos through the implementation of state-of-the-art generative models
- [ ] Optimize the model for real-time video generation on various devices, including mobile phones and edge devices
- [ ] Develop a user-friendly web application that allows users to generate videos using Starlight Vision without any programming knowledge
- [ ] Create comprehensive documentation and tutorials to help users get started with Starlight Vision
### 🌌 Medium-term Goals
- [ ] Integrate advanced style transfer techniques to allow users to customize the visual style of generated videos
- [ ] Develop a plugin for popular video editing software (e.g., Adobe Premiere, Final Cut Pro) that enables users to utilize Starlight Vision within their existing workflows
- [ ] Enhance the model's ability to generate videos with multiple scenes and complex narratives
- [ ] Improve the model's understanding of object interactions and physics to generate more realistic videos
- [ ] Expand the supported input formats to include audio, 3D models, and other media types
### 🌠 Long-term Goals
- [ ] Enable users to control the generated video with more granular parameters, such as lighting, camera angles, and object placement
- [ ] Incorporate AI-driven video editing capabilities that automatically adjust the pacing, color grading, and transitions based on user preferences
- [ ] Develop an API for real-time video generation that can be integrated into virtual reality, augmented reality, and gaming applications
- [ ] Investigate methods for training Starlight Vision on custom datasets to generate domain-specific videos
- [ ] Foster a community of researchers, developers, and artists to collaborate on the continued development and exploration of Starlight Vision's capabilities
# Join Agora
Agora is advancing Humanity with State of The Art AI Models like Starlight, join us and write your mark on the history books for eternity!
https://discord.gg/sbYvXgqc
## 🙌 Acknowledgments
This project is inspired by state-of-the-art research in video synthesis, such as the Structure and Content-Guided Video Synthesis with Diffusion Models paper, and leverages the power of deep learning frameworks like PyTorch.
We would like to thank the researchers, developers, and contributors who have made this project possible. 💫
Raw data
{
"_id": null,
"home_page": "https://github.com/kyegomez/StarlightVision",
"name": "starlight-vision",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6,<4.0",
"maintainer_email": "",
"keywords": "artificial intelligence,deep learning,transformers,text-to-image,denoising-diffusion",
"author": "Kye Gomez",
"author_email": "kye@apac.ai",
"download_url": "https://files.pythonhosted.org/packages/32/82/ad383479842afb0f063760aea5039d9abd94ad06dc43699cbfa9ea3a7c1f/starlight_vision-0.1.1.tar.gz",
"platform": null,
"description": "[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)\n\n\n# \ud83c\udf0c Starlight Vision \ud83d\ude80\n\n![Starlight](starlight.png)\n\ud83e\ude90 Starlight Vision is a powerful multi-modal AI model designed to generate high-quality novel videos using text, images, or video clips as input. By leveraging state-of-the-art deep learning techniques, it can synthesize realistic and visually impressive video content that can be used in a variety of applications, such as movie production, advertising, virtual reality, and more. \ud83c\udfa5\n\n## \ud83c\udf1f Features\n\n- \ud83d\udcdd Generate videos from text descriptions\n- \ud83c\udf03 Convert images into video sequences\n- \ud83d\udcfc Extend existing video clips with novel content\n- \ud83d\udd2e High-quality output with customizable resolution\n- \ud83e\udde0 Easy to use API for quick integration\n\n## \ud83d\udce6 Installation\n\nTo install Starlight Vision, simply use pip:\n\n```bash\npip install starlight-vision\n```\n\n## \ud83c\udfac Quick Start\n\nAfter we train you can install Starlight Vision and can start generating videos using the following code:\n\n```python\nimport torch\nfrom starlight_vision import Unet3D, ElucidatedStarlight, StarlightTrainer\n\nunet1 = Unet3D(dim = 64, dim_mults = (1, 2, 4, 8)).cuda()\n\nunet2 = Unet3D(dim = 64, dim_mults = (1, 2, 4, 8)).cuda()\n\n# elucidated starlight, which contains the unets above (base unet and super resoluting ones)\n\nstarlight = ElucidatedStarlight(\n unets = (unet1, unet2),\n image_sizes = (16, 32),\n random_crop_sizes = (None, 16),\n temporal_downsample_factor = (2, 1), # in this example, the first unet would receive the video temporally downsampled by 2x\n num_sample_steps = 10,\n cond_drop_prob = 0.1,\n sigma_min = 0.002, # min noise level\n sigma_max = (80, 160), # max noise level, double the max noise level for upsampler\n sigma_data = 0.5, # standard deviation of data distribution\n rho = 7, # controls the sampling schedule\n P_mean = -1.2, # mean of log-normal distribution from which noise is drawn for training\n P_std = 1.2, # standard deviation of log-normal distribution from which noise is drawn for training\n S_churn = 80, # parameters for stochastic sampling - depends on dataset, Table 5 in apper\n S_tmin = 0.05,\n S_tmax = 50,\n S_noise = 1.003,\n).cuda()\n\ntexts = [\n 'a whale breaching from afar',\n 'young girl blowing out candles on her birthday cake',\n 'fireworks with blue and green sparkles',\n 'dust motes swirling in the morning sunshine on the windowsill'\n]\n\nvideos = torch.randn(4, 3, 10, 32, 32).cuda() # (batch, channels, time / video frames, height, width)\n\n# feed images into starlight, training each unet in the cascade\n# for this example, only training unet 1\n\ntrainer = StarlightTrainer(starlight)\n\n# you can also ignore time when training on video initially, shown to improve results in video-ddpm paper. eventually will make the 3d unet trainable with either images or video. research shows it is essential (with current data regimes) to train first on text-to-image. probably won't be true in another decade. all big data becomes small data\n\ntrainer(videos, texts = texts, unet_number = 1, ignore_time = False)\ntrainer.update(unet_number = 1)\n\nvideos = trainer.sample(texts = texts, video_frames = 20) # extrapolating to 20 frames from training on 10 frames\n\nvideos.shape # (4, 3, 20, 32, 32)\n\n```\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions from the community! If you'd like to contribute, please follow these steps:\n\n1. \ud83c\udf74 Fork the repository on GitHub\n2. \ud83c\udf31 Create a new branch for your feature or bugfix\n3. \ud83d\udcdd Commit your changes and push the branch to your fork\n4. \ud83d\ude80 Create a pull request and describe your changes\n\n## \ud83d\udcc4 License\n\nStarlight Vision is released under the APACHE License. See the [LICENSE](LICENSE) file for more details.\n\n## \ud83d\uddfa\ufe0f Roadmap\n\nThe following roadmap outlines our plans for future development and enhancements to Starlight Vision. We aim to achieve these milestones through a combination of research, development, and collaboration with the community.\n\n### \ud83d\ude80 Short-term Goals\n\n- [ ] Improve text-to-video synthesis by incorporating advanced natural language understanding techniques\n- [ ] Train on LAION-5B and video datasets\n- [ ] Enhance the quality of generated videos through the implementation of state-of-the-art generative models\n- [ ] Optimize the model for real-time video generation on various devices, including mobile phones and edge devices\n- [ ] Develop a user-friendly web application that allows users to generate videos using Starlight Vision without any programming knowledge\n- [ ] Create comprehensive documentation and tutorials to help users get started with Starlight Vision\n\n### \ud83c\udf0c Medium-term Goals\n\n- [ ] Integrate advanced style transfer techniques to allow users to customize the visual style of generated videos\n- [ ] Develop a plugin for popular video editing software (e.g., Adobe Premiere, Final Cut Pro) that enables users to utilize Starlight Vision within their existing workflows\n- [ ] Enhance the model's ability to generate videos with multiple scenes and complex narratives\n- [ ] Improve the model's understanding of object interactions and physics to generate more realistic videos\n- [ ] Expand the supported input formats to include audio, 3D models, and other media types\n\n### \ud83c\udf20 Long-term Goals\n\n- [ ] Enable users to control the generated video with more granular parameters, such as lighting, camera angles, and object placement\n- [ ] Incorporate AI-driven video editing capabilities that automatically adjust the pacing, color grading, and transitions based on user preferences\n- [ ] Develop an API for real-time video generation that can be integrated into virtual reality, augmented reality, and gaming applications\n- [ ] Investigate methods for training Starlight Vision on custom datasets to generate domain-specific videos\n- [ ] Foster a community of researchers, developers, and artists to collaborate on the continued development and exploration of Starlight Vision's capabilities\n\n# Join Agora\nAgora is advancing Humanity with State of The Art AI Models like Starlight, join us and write your mark on the history books for eternity!\n\nhttps://discord.gg/sbYvXgqc\n\n\n\n## \ud83d\ude4c Acknowledgments\n\nThis project is inspired by state-of-the-art research in video synthesis, such as the Structure and Content-Guided Video Synthesis with Diffusion Models paper, and leverages the power of deep learning frameworks like PyTorch.\n\nWe would like to thank the researchers, developers, and contributors who have made this project possible. \ud83d\udcab",
"bugtrack_url": null,
"license": "MIT",
"summary": "Starlight - unprecedented photorealism \u00d7 deep level of language understanding",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/kyegomez/StarlightVision"
},
"split_keywords": [
"artificial intelligence",
"deep learning",
"transformers",
"text-to-image",
"denoising-diffusion"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "24653456b77286916b4274217b4e780215ec8b734570eb558f26ed87a8b6d664",
"md5": "8fd877241f53c2d607650c04d5bdeed9",
"sha256": "743b633fd24cd5db1ed7d0dfed6f421971f05d2f90110e886311b69f37839eb5"
},
"downloads": -1,
"filename": "starlight_vision-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8fd877241f53c2d607650c04d5bdeed9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6,<4.0",
"size": 68996,
"upload_time": "2023-08-18T14:55:26",
"upload_time_iso_8601": "2023-08-18T14:55:26.758151Z",
"url": "https://files.pythonhosted.org/packages/24/65/3456b77286916b4274217b4e780215ec8b734570eb558f26ed87a8b6d664/starlight_vision-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3282ad383479842afb0f063760aea5039d9abd94ad06dc43699cbfa9ea3a7c1f",
"md5": "293a04bd20a44e71a14c5e15de0ef4f0",
"sha256": "232b97f9766669fdbbd9850e5e9acda8660f3e3f311bf13ae9cbbc806dab3539"
},
"downloads": -1,
"filename": "starlight_vision-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "293a04bd20a44e71a14c5e15de0ef4f0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6,<4.0",
"size": 66938,
"upload_time": "2023-08-18T14:55:28",
"upload_time_iso_8601": "2023-08-18T14:55:28.781661Z",
"url": "https://files.pythonhosted.org/packages/32/82/ad383479842afb0f063760aea5039d9abd94ad06dc43699cbfa9ea3a7c1f/starlight_vision-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-08-18 14:55:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "kyegomez",
"github_project": "StarlightVision",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "starlight-vision"
}