namo

Name	namo JSON
Version	0.0.1 JSON
	download
home_page	https://ccc.cc/a
Summary	namo is a nano level multi-modal training framework
upload_time	2025-01-25 10:52:37
maintainer	None
docs_url	None
author	lucasjin
requires_python	None
license	GPL-3.0
keywords	deep learning llm vlm namo multi-modal training framework
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

**Namo (纳摩): Na**no Multi-**Modal Training Framework**

Introducing **namo**. Namo is a dead simple multi-modal training framework focusing on training **small MLLMs**. As more and more MLLM opensource, while small multi-modal LLMs with more powerful abilities remain untouched. Hence, we crafted this framework for anyone who wants training their own MLLM without **finetuning** on existing one. Anyone can training a **base MLLM model** with ease now. The model data you have used in training base, the more ability you will get in training larger models.

**Namo** not only a framework, but also provided our experiences in training MLLMs, we make easily make MLLM work on small models, then the same component (such as ViT, AudioEncoder etc) can be easily adopt into larger LLMs, largely reduced overall training time and resources.

Our model not only showed excellent performance compare with other small vlms, but also support a wide range of downstream tasks. To highlight the advantages of our model, here is:

- **dynamic input**: namo model uses dynamic input, supports input ranges from 224 to 1080;
- **less token**: nano models only needs 576 tokens even with 800 input resolution, largely efficient than other vlms;
- **flexibal**: unlike other vlms coupled with their LLMs, we using ViT you can grab in opensource as well as LLMs, so that you can train any version by your own by any LLMs;
- **audio**: we supports do visual + audio + text at the same time, not like some other models only supports audio + text;

Overall, **namo** is not only a series of model, but also a set of revealable training framework. We hoping our work can push the area further.

Raw data

            {
    "_id": null,
    "home_page": "https://ccc.cc/a",
    "name": "namo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "deep learning, LLM, VLM, namo multi-modal training, framework",
    "author": "lucasjin",
    "author_email": "aa@qq.com",
    "download_url": "https://files.pythonhosted.org/packages/84/cd/2a279a7a0dcbcada331aed14d835dcf58538507f9b304063cd9550fa4366/namo-0.0.1.tar.gz",
    "platform": "any",
    "description": "**Namo (\u7eb3\u6469): Na**no Multi-**Modal Training Framework**\n\nIntroducing **namo**. Namo is a dead simple multi-modal training framework focusing on training **small MLLMs**. As more and more MLLM opensource, while small multi-modal LLMs with more powerful abilities remain untouched. Hence, we crafted this framework for anyone who wants training their own MLLM without **finetuning** on existing one. Anyone can training a **base MLLM model** with ease now. The model data you have used in training base, the more ability you will get in training larger models.\n\n**Namo** not only a framework, but also provided our experiences in training MLLMs, we make easily make MLLM work on small models, then the same component (such as ViT, AudioEncoder etc) can be easily adopt into larger LLMs, largely reduced overall training time and resources.\n\nOur model not only showed excellent performance compare with other small vlms, but also support a wide range of downstream tasks. To highlight the advantages of our model, here is:\n\n- **dynamic input**: namo model uses dynamic input, supports input ranges from 224 to 1080;\n- **less token**: nano models only needs 576 tokens even with 800 input resolution, largely efficient than other vlms;\n- **flexibal**: unlike other vlms coupled with their LLMs, we using ViT you can grab in opensource as well as LLMs, so that you can train any version by your own by any LLMs;\n- **audio**: we supports do visual + audio + text at the same time, not like some other models only supports audio + text;\n\nOverall, **namo** is not only a series of model, but also a set of revealable training framework. We hoping our work can push the area further.\n",
    "bugtrack_url": null,
    "license": "GPL-3.0",
    "summary": "namo is a nano level multi-modal training framework",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://ccc.cc/a"
    },
    "split_keywords": [
        "deep learning",
        " llm",
        " vlm",
        " namo multi-modal training",
        " framework"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "84cd2a279a7a0dcbcada331aed14d835dcf58538507f9b304063cd9550fa4366",
                "md5": "61e89feaf577d298946b74c6121584b2",
                "sha256": "bc123b4eb187cf236859fb81958dc52c25c6f00e822518a953b7257ed07c9089"
            },
            "downloads": -1,
            "filename": "namo-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "61e89feaf577d298946b74c6121584b2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 2489,
            "upload_time": "2025-01-25T10:52:37",
            "upload_time_iso_8601": "2025-01-25T10:52:37.590940Z",
            "url": "https://files.pythonhosted.org/packages/84/cd/2a279a7a0dcbcada331aed14d835dcf58538507f9b304063cd9550fa4366/namo-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-25 10:52:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "namo"
}

lucasjin