FlashRank


NameFlashRank JSON
Version 0.2.9 PyPI version JSON
download
home_pagehttps://github.com/PrithivirajDamodaran/FlashRank
SummaryUltra lite & Super fast SoTA cross-encoder based re-ranking for your search & retrieval pipelines.
upload_time2024-08-14 12:10:25
maintainerNone
docs_urlNone
authorPrithivi Da
requires_python>=3.6
licenseApache 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<img src=./images/logo.png width=100%>


[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11093524.svg)](https://doi.org/10.5281/zenodo.11093524)

## [IMPORTANT UPDATE]

~~*A clone library called **SwiftRank is pointing to our model buckets, we are working on a interim solution to avoid this stealing**. Thank you for patience and understanding.*~~

This issue is resolved, the models are in HF now. **please upgrade to continue** pip install -U flashrank. Thank you for patience and understanding


# 🏎️ What is it?
Ultra-lite &amp; Super-fast Python library to add re-ranking to your existing search &amp; retrieval pipelines. It is based on SoTA LLMs and cross-encoders, with gratitude to all the model owners. 

Supports:

- Pairwise / Pointwise rerankers. (Cross encoder based)
- Listwise LLM based rerankers. (LLM based)
(see below for full list of supported models)

# Table of Contents  

1. [Features](#features)  
2. [Installation](#installation)
3. [Getting started](#getting-started)
4. [Deployment patterns](#deployment-patterns)
5. [How to Cite?](#how-to-cite)
5. [Papers citing flashrank](#papers-citing-flashrank) 


## Features

1. ⚡ **Ultra-lite**: 
    - **No Torch or Transformers** needed. Runs on CPU.
    - Boasts the **tiniest reranking model in the world, ~4MB**.
    
2. ⏱️ **Super-fast**:
    - Rerank speed is a function of **# of tokens in passages, query + model depth (layers)**
    - To give an idea, Time taken by the example (in code) using the default model is below.
    - <center><img src="./images/time.png" width=600/></center>
    - Detailed benchmarking, TBD

3. 💸 **$ concious**:
    - **Lowest $ per invocation:** Serverless deployments like Lambda are charged by memory & time per invocation*
    - **Smaller package size** = shorter cold start times, quicker re-deployments for Serverless.

4. 🎯 **Based on SoTA Cross-encoders and other models**:
    - How good are Zero-shot rerankers - look at the reference section.
    - Below are the list of models supported as of now.
        * `ms-marco-TinyBERT-L-2-v2` (default) [Model card](https://huggingface.co/cross-encoder/ms-marco-TinyBERT-L-2)
        * `ms-marco-MiniLM-L-12-v2` [Model card](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2)
        * `rank-T5-flan` (Best non cross-encoder reranker) [Model card](https://huggingface.co/bergum/rank-T5-flan)
        * `ms-marco-MultiBERT-L-12`  (Multi-lingual, [supports 100+ languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages))
        * `ce-esci-MiniLM-L12-v2` [FT on Amazon ESCI dataset](https://github.com/amazon-science/esci-data) (This is interesting because most models are FT on MSFT MARCO Bing queries) [Model card](https://huggingface.co/metarank/ce-esci-MiniLM-L12-v2)
        * `rank_zephyr_7b_v1_full` (4-bit-quantised GGUF) [Model card](https://huggingface.co/castorini/rank_zephyr_7b_v1_full) (Offers very competitive performance, with large context window and relatively faster for a 4GB model).
            - **Important note:** Our current integration of `rank_zephyr` supports a max of 20 passages in one pass. The sliding window logic support is yet to be added.
        * `miniReranker_arabic_v1` [Model card](https://huggingface.co/prithivida/miniReranker_arabic_v1)             
    - Models in roadmap:
        * InRanker
    - Why sleeker models are preferred ? Reranking is the final leg of larger retrieval pipelines, idea is to avoid any extra overhead especially for user-facing scenarios. To that end models with really small footprint that doesn't need any specialised hardware and yet offer competitive performance are chosen. Feel free to raise issues to add support for a new models as you see fit.


## Installation:
#### If you need lightweight pairwise rerankers [default]
```python 
pip install flashrank
```

#### If you need LLM based listwise rerankers
```python 
pip install flashrank[listwise]
```


## Getting started:
```python
from flashrank import Ranker, RerankRequest

# Nano (~4MB), blazing fast model & competitive performance (ranking precision).
ranker = Ranker()

or 

# Small (~34MB), slightly slower & best performance (ranking precision).
ranker = Ranker(model_name="ms-marco-MiniLM-L-12-v2", cache_dir="/opt")

or 

# Medium (~110MB), slower model with best zeroshot performance (ranking precision) on out of domain data.
ranker = Ranker(model_name="rank-T5-flan", cache_dir="/opt")

or 

# Medium (~150MB), slower model with competitive performance (ranking precision) for 100+ languages  (don't use for english)
ranker = Ranker(model_name="ms-marco-MultiBERT-L-12", cache_dir="/opt")

or 

ranker = Ranker(model_name="rank_zephyr_7b_v1_full", max_length=1024) # adjust max_length based on your passage length
```

```python
# Metadata is optional, Id can be your DB ids from your retrieval stage or simple numeric indices.
query = "How to speedup LLMs?"
passages = [
   {
      "id":1,
      "text":"Introduce *lookahead decoding*: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step.",
      "meta": {"additional": "info1"}
   },
   {
      "id":2,
      "text":"LLM inference efficiency will be one of the most crucial topics for both industry and academia, simply because the more efficient you are, the more $$$ you will save. vllm project is a must-read for this direction, and now they have just released the paper",
      "meta": {"additional": "info2"}
   },
   {
      "id":3,
      "text":"There are many ways to increase LLM inference throughput (tokens/second) and decrease memory footprint, sometimes at the same time. Here are a few methods I’ve found effective when working with Llama 2. These methods are all well-integrated with Hugging Face. This list is far from exhaustive; some of these techniques can be used in combination with each other and there are plenty of others to try. - Bettertransformer (Optimum Library): Simply call `model.to_bettertransformer()` on your Hugging Face model for a modest improvement in tokens per second. - Fp4 Mixed-Precision (Bitsandbytes): Requires minimal configuration and dramatically reduces the model's memory footprint. - AutoGPTQ: Time-consuming but leads to a much smaller model and faster inference. The quantization is a one-time cost that pays off in the long run.",
      "meta": {"additional": "info3"}

   },
   {
      "id":4,
      "text":"Ever want to make your LLM inference go brrrrr but got stuck at implementing speculative decoding and finding the suitable draft model? No more pain! Thrilled to unveil Medusa, a simple framework that removes the annoying draft model while getting 2x speedup.",
      "meta": {"additional": "info4"}
   },
   {
      "id":5,
      "text":"vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: State-of-the-art serving throughput Efficient management of attention key and value memory with PagedAttention Continuous batching of incoming requests Optimized CUDA kernels",
      "meta": {"additional": "info5"}
   }
]

rerankrequest = RerankRequest(query=query, passages=passages)
results = ranker.rerank(rerankrequest)
print(results)
```

```python 
# Reranked output from default reranker
[
   {
      "id":4,
      "text":"Ever want to make your LLM inference go brrrrr but got stuck at implementing speculative decoding and finding the suitable draft model? No more pain! Thrilled to unveil Medusa, a simple framework that removes the annoying draft model while getting 2x speedup.",
      "meta":{
         "additional":"info4"
      },
      "score":0.016847236
   },
   {
      "id":5,
      "text":"vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: State-of-the-art serving throughput Efficient management of attention key and value memory with PagedAttention Continuous batching of incoming requests Optimized CUDA kernels",
      "meta":{
         "additional":"info5"
      },
      "score":0.011563735
   },
   {
      "id":3,
      "text":"There are many ways to increase LLM inference throughput (tokens/second) and decrease memory footprint, sometimes at the same time. Here are a few methods I’ve found effective when working with Llama 2. These methods are all well-integrated with Hugging Face. This list is far from exhaustive; some of these techniques can be used in combination with each other and there are plenty of others to try. - Bettertransformer (Optimum Library): Simply call `model.to_bettertransformer()` on your Hugging Face model for a modest improvement in tokens per second. - Fp4 Mixed-Precision (Bitsandbytes): Requires minimal configuration and dramatically reduces the model's memory footprint. - AutoGPTQ: Time-consuming but leads to a much smaller model and faster inference. The quantization is a one-time cost that pays off in the long run.",
      "meta":{
         "additional":"info3"
      },
      "score":0.00081340264
   },
   {
      "id":1,
      "text":"Introduce *lookahead decoding*: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step.",
      "meta":{
         "additional":"info1"
      },
      "score":0.00063596206
   },
   {
      "id":2,
      "text":"LLM inference efficiency will be one of the most crucial topics for both industry and academia, simply because the more efficient you are, the more $$$ you will save. vllm project is a must-read for this direction, and now they have just released the paper",
      "meta":{
         "additional":"info2"
      },
      "score":0.00024851
   }
]
```

## You can use it with any search & retrieval pipeline:

1. **Lexical Search (RegularDBs that supports full-text search or Inverted Index)**
  <center><img src="./images/lexical_search.png" width=600/></center>

<br/>

2. **Semantic Search / RAG usecases (VectorDBs)**
  <center><img src="./images/vector_search_rag.png" width=600/></center>
<br/>

3. **Hybrid Search**
  <center><img src="./images/hybrid_search.png" width=400/></center>

<br/>

## Deployment patterns
#### How to use it in a AWS Lambda function ?
In AWS or other serverless environments the entire VM is read-only you might have to create your 
own custom dir. You can do so in your Dockerfile and use it for loading the models (and eventually as a cache between warm calls). You can do it during init with cache_dir parameter. 

```python
ranker = Ranker(model_name="ms-marco-MiniLM-L-12-v2", cache_dir="/opt")
```

## References:

1. **In-domain and Zeroshot performance of Cross Encoders fine-tuned on MS-MARCO**
  <center><img src="./images/CE_BEIR.png" width=600/></center>

<br/>

2. **In-domain and Zeroshot performance of RankT5 fine-tuned on MS-MARCO**
  <center><img src="./images/RankT5_BEIR.png" width=450/></center>
<br/>

## How to Cite?

To cite this repository in your work please click the "cite this repository" link on the right side (bewlow repo descriptions and tags)


## Papers citing flashrank

- [COS-Mix: Cosine Similarity and Distance Fusion for
Improved Information Retrieval](https://arxiv.org/pdf/2406.00638)

- [Bryndza at ClimateActivism 2024: Stance, Target and Hate Event
Detection via Retrieval-Augmented GPT-4 and LLaMA](https://arxiv.org/pdf/2402.06549)

- [Stance and Hate Event Detection in Tweets Related to
Climate Activism - Shared Task at CASE 2024](https://aclanthology.org/2024.case-1.33.pdf)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/PrithivirajDamodaran/FlashRank",
    "name": "FlashRank",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Prithivi Da",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/0c/8c/4b44180d4be0f93bffe31db7229c727638994c74f04257f3844bca066b88/FlashRank-0.2.9.tar.gz",
    "platform": null,
    "description": "\n<img src=./images/logo.png width=100%>\n\n\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11093524.svg)](https://doi.org/10.5281/zenodo.11093524)\n\n## [IMPORTANT UPDATE]\n\n~~*A clone library called **SwiftRank is pointing to our model buckets, we are working on a interim solution to avoid this stealing**. Thank you for patience and understanding.*~~\n\nThis issue is resolved, the models are in HF now. **please upgrade to continue** pip install -U flashrank. Thank you for patience and understanding\n\n\n# \ud83c\udfce\ufe0f What is it?\nUltra-lite &amp; Super-fast Python library to add re-ranking to your existing search &amp; retrieval pipelines. It is based on SoTA LLMs and cross-encoders, with gratitude to all the model owners. \n\nSupports:\n\n- Pairwise / Pointwise rerankers. (Cross encoder based)\n- Listwise LLM based rerankers. (LLM based)\n(see below for full list of supported models)\n\n# Table of Contents  \n\n1. [Features](#features)  \n2. [Installation](#installation)\n3. [Getting started](#getting-started)\n4. [Deployment patterns](#deployment-patterns)\n5. [How to Cite?](#how-to-cite)\n5. [Papers citing flashrank](#papers-citing-flashrank) \n\n\n## Features\n\n1. \u26a1 **Ultra-lite**: \n    - **No Torch or Transformers** needed. Runs on CPU.\n    - Boasts the **tiniest reranking model in the world, ~4MB**.\n    \n2. \u23f1\ufe0f **Super-fast**:\n    - Rerank speed is a function of **# of tokens in passages, query + model depth (layers)**\n    - To give an idea, Time taken by the example (in code) using the default model is below.\n    - <center><img src=\"./images/time.png\" width=600/></center>\n    - Detailed benchmarking, TBD\n\n3. \ud83d\udcb8 **$ concious**:\n    - **Lowest $ per invocation:** Serverless deployments like Lambda are charged by memory & time per invocation*\n    - **Smaller package size** = shorter cold start times, quicker re-deployments for Serverless.\n\n4. \ud83c\udfaf **Based on SoTA Cross-encoders and other models**:\n    - How good are Zero-shot rerankers - look at the reference section.\n    - Below are the list of models supported as of now.\n        * `ms-marco-TinyBERT-L-2-v2` (default) [Model card](https://huggingface.co/cross-encoder/ms-marco-TinyBERT-L-2)\n        * `ms-marco-MiniLM-L-12-v2` [Model card](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2)\n        * `rank-T5-flan` (Best non cross-encoder reranker) [Model card](https://huggingface.co/bergum/rank-T5-flan)\n        * `ms-marco-MultiBERT-L-12`  (Multi-lingual, [supports 100+ languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages))\n        * `ce-esci-MiniLM-L12-v2` [FT on Amazon ESCI dataset](https://github.com/amazon-science/esci-data) (This is interesting because most models are FT on MSFT MARCO Bing queries) [Model card](https://huggingface.co/metarank/ce-esci-MiniLM-L12-v2)\n        * `rank_zephyr_7b_v1_full` (4-bit-quantised GGUF) [Model card](https://huggingface.co/castorini/rank_zephyr_7b_v1_full) (Offers very competitive performance, with large context window and relatively faster for a 4GB model).\n            - **Important note:** Our current integration of `rank_zephyr` supports a max of 20 passages in one pass. The sliding window logic support is yet to be added.\n        * `miniReranker_arabic_v1` [Model card](https://huggingface.co/prithivida/miniReranker_arabic_v1)             \n    - Models in roadmap:\n        * InRanker\n    - Why sleeker models are preferred ? Reranking is the final leg of larger retrieval pipelines, idea is to avoid any extra overhead especially for user-facing scenarios. To that end models with really small footprint that doesn't need any specialised hardware and yet offer competitive performance are chosen. Feel free to raise issues to add support for a new models as you see fit.\n\n\n## Installation:\n#### If you need lightweight pairwise rerankers [default]\n```python \npip install flashrank\n```\n\n#### If you need LLM based listwise rerankers\n```python \npip install flashrank[listwise]\n```\n\n\n## Getting started:\n```python\nfrom flashrank import Ranker, RerankRequest\n\n# Nano (~4MB), blazing fast model & competitive performance (ranking precision).\nranker = Ranker()\n\nor \n\n# Small (~34MB), slightly slower & best performance (ranking precision).\nranker = Ranker(model_name=\"ms-marco-MiniLM-L-12-v2\", cache_dir=\"/opt\")\n\nor \n\n# Medium (~110MB), slower model with best zeroshot performance (ranking precision) on out of domain data.\nranker = Ranker(model_name=\"rank-T5-flan\", cache_dir=\"/opt\")\n\nor \n\n# Medium (~150MB), slower model with competitive performance (ranking precision) for 100+ languages  (don't use for english)\nranker = Ranker(model_name=\"ms-marco-MultiBERT-L-12\", cache_dir=\"/opt\")\n\nor \n\nranker = Ranker(model_name=\"rank_zephyr_7b_v1_full\", max_length=1024) # adjust max_length based on your passage length\n```\n\n```python\n# Metadata is optional, Id can be your DB ids from your retrieval stage or simple numeric indices.\nquery = \"How to speedup LLMs?\"\npassages = [\n   {\n      \"id\":1,\n      \"text\":\"Introduce *lookahead decoding*: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step.\",\n      \"meta\": {\"additional\": \"info1\"}\n   },\n   {\n      \"id\":2,\n      \"text\":\"LLM inference efficiency will be one of the most crucial topics for both industry and academia, simply because the more efficient you are, the more $$$ you will save. vllm project is a must-read for this direction, and now they have just released the paper\",\n      \"meta\": {\"additional\": \"info2\"}\n   },\n   {\n      \"id\":3,\n      \"text\":\"There are many ways to increase LLM inference throughput (tokens/second) and decrease memory footprint, sometimes at the same time. Here are a few methods I\u2019ve found effective when working with Llama 2. These methods are all well-integrated with Hugging Face. This list is far from exhaustive; some of these techniques can be used in combination with each other and there are plenty of others to try. - Bettertransformer (Optimum Library): Simply call `model.to_bettertransformer()` on your Hugging Face model for a modest improvement in tokens per second. - Fp4 Mixed-Precision (Bitsandbytes): Requires minimal configuration and dramatically reduces the model's memory footprint. - AutoGPTQ: Time-consuming but leads to a much smaller model and faster inference. The quantization is a one-time cost that pays off in the long run.\",\n      \"meta\": {\"additional\": \"info3\"}\n\n   },\n   {\n      \"id\":4,\n      \"text\":\"Ever want to make your LLM inference go brrrrr but got stuck at implementing speculative decoding and finding the suitable draft model? No more pain! Thrilled to unveil Medusa, a simple framework that removes the annoying draft model while getting 2x speedup.\",\n      \"meta\": {\"additional\": \"info4\"}\n   },\n   {\n      \"id\":5,\n      \"text\":\"vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: State-of-the-art serving throughput Efficient management of attention key and value memory with PagedAttention Continuous batching of incoming requests Optimized CUDA kernels\",\n      \"meta\": {\"additional\": \"info5\"}\n   }\n]\n\nrerankrequest = RerankRequest(query=query, passages=passages)\nresults = ranker.rerank(rerankrequest)\nprint(results)\n```\n\n```python \n# Reranked output from default reranker\n[\n   {\n      \"id\":4,\n      \"text\":\"Ever want to make your LLM inference go brrrrr but got stuck at implementing speculative decoding and finding the suitable draft model? No more pain! Thrilled to unveil Medusa, a simple framework that removes the annoying draft model while getting 2x speedup.\",\n      \"meta\":{\n         \"additional\":\"info4\"\n      },\n      \"score\":0.016847236\n   },\n   {\n      \"id\":5,\n      \"text\":\"vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: State-of-the-art serving throughput Efficient management of attention key and value memory with PagedAttention Continuous batching of incoming requests Optimized CUDA kernels\",\n      \"meta\":{\n         \"additional\":\"info5\"\n      },\n      \"score\":0.011563735\n   },\n   {\n      \"id\":3,\n      \"text\":\"There are many ways to increase LLM inference throughput (tokens/second) and decrease memory footprint, sometimes at the same time. Here are a few methods I\u2019ve found effective when working with Llama 2. These methods are all well-integrated with Hugging Face. This list is far from exhaustive; some of these techniques can be used in combination with each other and there are plenty of others to try. - Bettertransformer (Optimum Library): Simply call `model.to_bettertransformer()` on your Hugging Face model for a modest improvement in tokens per second. - Fp4 Mixed-Precision (Bitsandbytes): Requires minimal configuration and dramatically reduces the model's memory footprint. - AutoGPTQ: Time-consuming but leads to a much smaller model and faster inference. The quantization is a one-time cost that pays off in the long run.\",\n      \"meta\":{\n         \"additional\":\"info3\"\n      },\n      \"score\":0.00081340264\n   },\n   {\n      \"id\":1,\n      \"text\":\"Introduce *lookahead decoding*: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step.\",\n      \"meta\":{\n         \"additional\":\"info1\"\n      },\n      \"score\":0.00063596206\n   },\n   {\n      \"id\":2,\n      \"text\":\"LLM inference efficiency will be one of the most crucial topics for both industry and academia, simply because the more efficient you are, the more $$$ you will save. vllm project is a must-read for this direction, and now they have just released the paper\",\n      \"meta\":{\n         \"additional\":\"info2\"\n      },\n      \"score\":0.00024851\n   }\n]\n```\n\n## You can use it with any search & retrieval pipeline:\n\n1. **Lexical Search (RegularDBs that supports full-text search or Inverted Index)**\n  <center><img src=\"./images/lexical_search.png\" width=600/></center>\n\n<br/>\n\n2. **Semantic Search / RAG usecases (VectorDBs)**\n  <center><img src=\"./images/vector_search_rag.png\" width=600/></center>\n<br/>\n\n3. **Hybrid Search**\n  <center><img src=\"./images/hybrid_search.png\" width=400/></center>\n\n<br/>\n\n## Deployment patterns\n#### How to use it in a AWS Lambda function ?\nIn AWS or other serverless environments the entire VM is read-only you might have to create your \nown custom dir. You can do so in your Dockerfile and use it for loading the models (and eventually as a cache between warm calls). You can do it during init with cache_dir parameter. \n\n```python\nranker = Ranker(model_name=\"ms-marco-MiniLM-L-12-v2\", cache_dir=\"/opt\")\n```\n\n## References:\n\n1. **In-domain and Zeroshot performance of Cross Encoders fine-tuned on MS-MARCO**\n  <center><img src=\"./images/CE_BEIR.png\" width=600/></center>\n\n<br/>\n\n2. **In-domain and Zeroshot performance of RankT5 fine-tuned on MS-MARCO**\n  <center><img src=\"./images/RankT5_BEIR.png\" width=450/></center>\n<br/>\n\n## How to Cite?\n\nTo cite this repository in your work please click the \"cite this repository\" link on the right side (bewlow repo descriptions and tags)\n\n\n## Papers citing flashrank\n\n- [COS-Mix: Cosine Similarity and Distance Fusion for\nImproved Information Retrieval](https://arxiv.org/pdf/2406.00638)\n\n- [Bryndza at ClimateActivism 2024: Stance, Target and Hate Event\nDetection via Retrieval-Augmented GPT-4 and LLaMA](https://arxiv.org/pdf/2402.06549)\n\n- [Stance and Hate Event Detection in Tweets Related to\nClimate Activism - Shared Task at CASE 2024](https://aclanthology.org/2024.case-1.33.pdf)\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Ultra lite & Super fast SoTA cross-encoder based re-ranking for your search & retrieval pipelines.",
    "version": "0.2.9",
    "project_urls": {
        "Homepage": "https://github.com/PrithivirajDamodaran/FlashRank"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aacc7e327a48452bc1a7c2f26686c14f8192a5dce9ef00e88f48599b64e3c6bd",
                "md5": "db5f12b4e6c3d02fa21112bcb7d6f784",
                "sha256": "4e43e0ccb95f143bb6eaf9bde74b9bd7159fd2161116eba4c0fa295def86156d"
            },
            "downloads": -1,
            "filename": "FlashRank-0.2.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "db5f12b4e6c3d02fa21112bcb7d6f784",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 19055,
            "upload_time": "2024-08-14T12:10:23",
            "upload_time_iso_8601": "2024-08-14T12:10:23.056663Z",
            "url": "https://files.pythonhosted.org/packages/aa/cc/7e327a48452bc1a7c2f26686c14f8192a5dce9ef00e88f48599b64e3c6bd/FlashRank-0.2.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c8c4b44180d4be0f93bffe31db7229c727638994c74f04257f3844bca066b88",
                "md5": "9614871f4065ebd8ba54a602f450b4ab",
                "sha256": "475f1192e0722da1a4409812165ebc7e3eccec56e7b7853ed9dd5dd5c9c985f5"
            },
            "downloads": -1,
            "filename": "FlashRank-0.2.9.tar.gz",
            "has_sig": false,
            "md5_digest": "9614871f4065ebd8ba54a602f450b4ab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 17078,
            "upload_time": "2024-08-14T12:10:25",
            "upload_time_iso_8601": "2024-08-14T12:10:25.564423Z",
            "url": "https://files.pythonhosted.org/packages/0c/8c/4b44180d4be0f93bffe31db7229c727638994c74f04257f3844bca066b88/FlashRank-0.2.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-14 12:10:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "PrithivirajDamodaran",
    "github_project": "FlashRank",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "flashrank"
}
        
Elapsed time: 1.11034s