llama-index-readers-semanticscholar


Namellama-index-readers-semanticscholar JSON
Version 0.1.3 PyPI version JSON
download
home_page
Summaryllama-index readers semanticscholar integration
upload_time2024-02-21 20:49:37
maintainershauryr
docs_urlNone
authorYour Name
requires_python>=3.8.1,<4.0
licenseMIT
keywords paper research scholar semantic
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Semantic Scholar Loader

Welcome to Semantic Scholar Loader. This module serves as a crucial utility for researchers and professionals looking to get scholarly articles and publications from the Semantic Scholar database.

For any research topic you are interested in, this loader reads relevant papers from a search result in Semantic Scholar into `Documents`.

Please go through [demo_s2.ipynb](demo_s2.ipynb)

## Some preliminaries -

- `query_space` : broad area of research
- `query_string` : a specific question to the documents in the query space

**UPDATE** :

To download the open access pdfs and extract text from them, simply mark the `full_text` flag as `True` :

```python
s2reader = SemanticScholarReader()
documents = s2reader.load_data(query_space, total_papers, full_text=True)
```

## Usage

Here is an example of how to use this loader in `llama_index` and get citations for a given query.

### LlamaIndex

```python
from llama_index.llms import OpenAI
from llama_index.query_engine import CitationQueryEngine
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
)
from llama_hub.semanticscholar import SemanticScholarReader

s2reader = SemanticScholarReader()

# narrow down the search space
query_space = "large language models"

# increase limit to get more documents
documents = s2reader.load_data(query=query_space, limit=10)

service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0)
)
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

query_engine = CitationQueryEngine.from_args(
    index,
    similarity_top_k=3,
    citation_chunk_size=512,
)

# query the index
response = query_engine.query("limitations of using large language models")
print("Answer: ", response)
print("Source nodes: ")
for node in response.source_nodes:
    print(node.node.metadata)
```

### Output

```bash
Answer:  The limitations of using large language models include the struggle to learn long-tail knowledge [2], the need for scaling by many orders of magnitude to reach competitive performance on questions with little support in the pre-training data [2], and the difficulty in synthesizing complex programs from natural language descriptions [3].
Source nodes:
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '3eed4de25636ac90f39f6e1ef70e3507ed61a2a6', 'citationCount': 35, 'openAccessPdf': None, 'authors': ['M. Shanahan'], 'title': 'Talking About Large Language Models'}
{'venue': 'arXiv.org', 'year': 2022, 'paperId': '6491980820d9c255b9d798874c8fce696750e0d9', 'citationCount': 31, 'openAccessPdf': None, 'authors': ['Nikhil Kandpal', 'H. Deng', 'Adam Roberts', 'Eric Wallace', 'Colin Raffel'], 'title': 'Large Language Models Struggle to Learn Long-Tail Knowledge'}
{'venue': 'arXiv.org', 'year': 2021, 'paperId': 'a38e0f993e4805ba8a9beae4c275c91ffcec01df', 'citationCount': 305, 'openAccessPdf': None, 'authors': ['Jacob Austin', 'Augustus Odena', 'Maxwell Nye', 'Maarten Bosma', 'H. Michalewski', 'David Dohan', 'Ellen Jiang', 'Carrie J. Cai', 'Michael Terry', 'Quoc V. Le', 'Charles Sutton'], 'title': 'Program Synthesis with Large Language Models'}
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "llama-index-readers-semanticscholar",
    "maintainer": "shauryr",
    "docs_url": null,
    "requires_python": ">=3.8.1,<4.0",
    "maintainer_email": "",
    "keywords": "paper,research,scholar,semantic",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/58/c0/d6e08e61dac8276e490d4bf5e01984609586168f6993bcecad026fda5605/llama_index_readers_semanticscholar-0.1.3.tar.gz",
    "platform": null,
    "description": "# Semantic Scholar Loader\n\nWelcome to Semantic Scholar Loader. This module serves as a crucial utility for researchers and professionals looking to get scholarly articles and publications from the Semantic Scholar database.\n\nFor any research topic you are interested in, this loader reads relevant papers from a search result in Semantic Scholar into `Documents`.\n\nPlease go through [demo_s2.ipynb](demo_s2.ipynb)\n\n## Some preliminaries -\n\n- `query_space` : broad area of research\n- `query_string` : a specific question to the documents in the query space\n\n**UPDATE** :\n\nTo download the open access pdfs and extract text from them, simply mark the `full_text` flag as `True` :\n\n```python\ns2reader = SemanticScholarReader()\ndocuments = s2reader.load_data(query_space, total_papers, full_text=True)\n```\n\n## Usage\n\nHere is an example of how to use this loader in `llama_index` and get citations for a given query.\n\n### LlamaIndex\n\n```python\nfrom llama_index.llms import OpenAI\nfrom llama_index.query_engine import CitationQueryEngine\nfrom llama_index import (\n    VectorStoreIndex,\n    ServiceContext,\n)\nfrom llama_hub.semanticscholar import SemanticScholarReader\n\ns2reader = SemanticScholarReader()\n\n# narrow down the search space\nquery_space = \"large language models\"\n\n# increase limit to get more documents\ndocuments = s2reader.load_data(query=query_space, limit=10)\n\nservice_context = ServiceContext.from_defaults(\n    llm=OpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n)\nindex = VectorStoreIndex.from_documents(\n    documents, service_context=service_context\n)\n\nquery_engine = CitationQueryEngine.from_args(\n    index,\n    similarity_top_k=3,\n    citation_chunk_size=512,\n)\n\n# query the index\nresponse = query_engine.query(\"limitations of using large language models\")\nprint(\"Answer: \", response)\nprint(\"Source nodes: \")\nfor node in response.source_nodes:\n    print(node.node.metadata)\n```\n\n### Output\n\n```bash\nAnswer:  The limitations of using large language models include the struggle to learn long-tail knowledge [2], the need for scaling by many orders of magnitude to reach competitive performance on questions with little support in the pre-training data [2], and the difficulty in synthesizing complex programs from natural language descriptions [3].\nSource nodes:\n{'venue': 'arXiv.org', 'year': 2022, 'paperId': '3eed4de25636ac90f39f6e1ef70e3507ed61a2a6', 'citationCount': 35, 'openAccessPdf': None, 'authors': ['M. Shanahan'], 'title': 'Talking About Large Language Models'}\n{'venue': 'arXiv.org', 'year': 2022, 'paperId': '6491980820d9c255b9d798874c8fce696750e0d9', 'citationCount': 31, 'openAccessPdf': None, 'authors': ['Nikhil Kandpal', 'H. Deng', 'Adam Roberts', 'Eric Wallace', 'Colin Raffel'], 'title': 'Large Language Models Struggle to Learn Long-Tail Knowledge'}\n{'venue': 'arXiv.org', 'year': 2021, 'paperId': 'a38e0f993e4805ba8a9beae4c275c91ffcec01df', 'citationCount': 305, 'openAccessPdf': None, 'authors': ['Jacob Austin', 'Augustus Odena', 'Maxwell Nye', 'Maarten Bosma', 'H. Michalewski', 'David Dohan', 'Ellen Jiang', 'Carrie J. Cai', 'Michael Terry', 'Quoc V. Le', 'Charles Sutton'], 'title': 'Program Synthesis with Large Language Models'}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers semanticscholar integration",
    "version": "0.1.3",
    "project_urls": null,
    "split_keywords": [
        "paper",
        "research",
        "scholar",
        "semantic"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "42b5ae374b90bd9e198625cdd99ae65490b250ba21204d22c2ac7b1405a38a94",
                "md5": "fb9ecd8c0b98425503fb7d878ae50cf1",
                "sha256": "0b434d838cb5e764eed3843807395db65586708ae9f04a486880a56663ea171f"
            },
            "downloads": -1,
            "filename": "llama_index_readers_semanticscholar-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fb9ecd8c0b98425503fb7d878ae50cf1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.1,<4.0",
            "size": 5452,
            "upload_time": "2024-02-21T20:49:35",
            "upload_time_iso_8601": "2024-02-21T20:49:35.563741Z",
            "url": "https://files.pythonhosted.org/packages/42/b5/ae374b90bd9e198625cdd99ae65490b250ba21204d22c2ac7b1405a38a94/llama_index_readers_semanticscholar-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "58c0d6e08e61dac8276e490d4bf5e01984609586168f6993bcecad026fda5605",
                "md5": "c169a67beb3beab7976eeaf63917b8e4",
                "sha256": "0f0f548dc01d0c26d3eedc43dd1dbfbe4d1aa4ab80603f7f5779cd7b6aa60f1b"
            },
            "downloads": -1,
            "filename": "llama_index_readers_semanticscholar-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "c169a67beb3beab7976eeaf63917b8e4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.1,<4.0",
            "size": 5135,
            "upload_time": "2024-02-21T20:49:37",
            "upload_time_iso_8601": "2024-02-21T20:49:37.097933Z",
            "url": "https://files.pythonhosted.org/packages/58/c0/d6e08e61dac8276e490d4bf5e01984609586168f6993bcecad026fda5605/llama_index_readers_semanticscholar-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-21 20:49:37",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-semanticscholar"
}
        
Elapsed time: 0.18311s