arxiv-export-documents


Namearxiv-export-documents JSON
Version 0.1.8 PyPI version JSON
download
home_pageNone
SummaryExport arxiv papers to pdf formats
upload_time2025-08-01 16:44:35
maintainerNone
docs_urlNone
authorNone
requires_python>=3.12
licenseNone
keywords arxiv export papers
VCS
bugtrack_url
requirements aiohappyeyeballs aiohttp aiosignal annotated-types anyio attrs build certifi charset-normalizer dataclasses-json frozenlist greenlet h11 httpcore httpx httpx-sse idna jsonpatch jsonpointer langchain langchain-community langchain-core langchain-text-splitters langsmith marshmallow multidict mypy_extensions numpy orjson packaging propcache pydantic pydantic-settings pydantic_core pypdf pyproject_hooks python-dotenv PyYAML requests requests-toolbelt sniffio SQLAlchemy tenacity typing-inspect typing-inspection typing_extensions urllib3 yarl zstandard
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Arxix Export

**Arxiv Export** is a Python library that allows you to search, download, and manage scientific articles from [arXiv.org](https://arxiv.org/). It is useful for automating paper downloads and obtaining structured information about articles.

## Installation

```bash
pip install arxiv-export
```

## Usage Example

```python
import asyncio
from arxiv_export_documents import export_papers


async def main():
    search_query = "quantum computing"
    download_path = "./arxiv_papers"
    max_results = 5

    async for paper in export_papers(
        search=search_query,
        path_download=download_path,
        max_results=max_results
    ):
        print(f"Downloaded paper: {paper.title}")
        print(f"Authors: {', '.join(paper.authors)}")
        print(f"Summary: {paper.summary}")
        print(f"Link: {paper.link}")
        print(f"Path: {paper.path}")
        print(f"Documents: {len(paper.documents)}")
        print(f"Exists: {paper.is_exist}")
        print("-" * 80)


if __name__ == "__main__":
    asyncio.run(main())
```

## Features

- Search for articles on arXiv using keywords.
- Automatically download article PDFs.
- Access metadata such as title, authors, abstract, link, and local path.
- Manage multiple results with a single command.

## Main Parameters

- `search`: search string (e.g., `"quantum computing"`).
- `path_download`: path to save the PDFs.
- `max_results`: maximum number of articles to download.

### Vector Database for LLMs

The `documents` property provides a list of `Document` files intended for ingestion into a vector database. These files are commonly used to supply structured data to language models (LLMs), supporting semantic search and advanced analysis.

## License

This library is distributed under the MIT license.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "arxiv-export-documents",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": "arxiv, export, papers",
    "author": null,
    "author_email": "Giuseppe Zileni <giuseppe.zileni@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d4/38/b8f664634bee2e09367f49e325e0e7756bbc092fe683f222d3b85d236325/arxiv_export_documents-0.1.8.tar.gz",
    "platform": null,
    "description": "# Arxix Export\n\n**Arxiv Export** is a Python library that allows you to search, download, and manage scientific articles from [arXiv.org](https://arxiv.org/). It is useful for automating paper downloads and obtaining structured information about articles.\n\n## Installation\n\n```bash\npip install arxiv-export\n```\n\n## Usage Example\n\n```python\nimport asyncio\nfrom arxiv_export_documents import export_papers\n\n\nasync def main():\n    search_query = \"quantum computing\"\n    download_path = \"./arxiv_papers\"\n    max_results = 5\n\n    async for paper in export_papers(\n        search=search_query,\n        path_download=download_path,\n        max_results=max_results\n    ):\n        print(f\"Downloaded paper: {paper.title}\")\n        print(f\"Authors: {', '.join(paper.authors)}\")\n        print(f\"Summary: {paper.summary}\")\n        print(f\"Link: {paper.link}\")\n        print(f\"Path: {paper.path}\")\n        print(f\"Documents: {len(paper.documents)}\")\n        print(f\"Exists: {paper.is_exist}\")\n        print(\"-\" * 80)\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n## Features\n\n- Search for articles on arXiv using keywords.\n- Automatically download article PDFs.\n- Access metadata such as title, authors, abstract, link, and local path.\n- Manage multiple results with a single command.\n\n## Main Parameters\n\n- `search`: search string (e.g., `\"quantum computing\"`).\n- `path_download`: path to save the PDFs.\n- `max_results`: maximum number of articles to download.\n\n### Vector Database for LLMs\n\nThe `documents` property provides a list of `Document` files intended for ingestion into a vector database. These files are commonly used to supply structured data to language models (LLMs), supporting semantic search and advanced analysis.\n\n## License\n\nThis library is distributed under the MIT license.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Export arxiv papers to pdf formats",
    "version": "0.1.8",
    "project_urls": {
        "Homepage": "https://gzileni.github.io/arxiv-export-documents",
        "Repository": "https://github.com/gzileni/arxiv-export-documents.git"
    },
    "split_keywords": [
        "arxiv",
        " export",
        " papers"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "15288fbb767b6b665ccd6cb0ef05190176286ec5fb7573776169b879eed1df34",
                "md5": "97c92058cc34464d274ce8244039e36b",
                "sha256": "5ea4c8733ba09b5a2d420c732209b1cffa8980398287235ca613bd991f414201"
            },
            "downloads": -1,
            "filename": "arxiv_export_documents-0.1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "97c92058cc34464d274ce8244039e36b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 6612,
            "upload_time": "2025-08-01T16:44:34",
            "upload_time_iso_8601": "2025-08-01T16:44:34.117044Z",
            "url": "https://files.pythonhosted.org/packages/15/28/8fbb767b6b665ccd6cb0ef05190176286ec5fb7573776169b879eed1df34/arxiv_export_documents-0.1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "d438b8f664634bee2e09367f49e325e0e7756bbc092fe683f222d3b85d236325",
                "md5": "0d2ae7c47309cfaea73fc0687d05445b",
                "sha256": "1c169fb60a7f41753e5c288b36987daf4f37001a3a9fc204360d9000fc5be4ae"
            },
            "downloads": -1,
            "filename": "arxiv_export_documents-0.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "0d2ae7c47309cfaea73fc0687d05445b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 6160,
            "upload_time": "2025-08-01T16:44:35",
            "upload_time_iso_8601": "2025-08-01T16:44:35.173090Z",
            "url": "https://files.pythonhosted.org/packages/d4/38/b8f664634bee2e09367f49e325e0e7756bbc092fe683f222d3b85d236325/arxiv_export_documents-0.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-01 16:44:35",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "gzileni",
    "github_project": "arxiv-export-documents",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "aiohappyeyeballs",
            "specs": [
                [
                    "==",
                    "2.6.1"
                ]
            ]
        },
        {
            "name": "aiohttp",
            "specs": [
                [
                    "==",
                    "3.12.15"
                ]
            ]
        },
        {
            "name": "aiosignal",
            "specs": [
                [
                    "==",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "annotated-types",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "anyio",
            "specs": [
                [
                    "==",
                    "4.9.0"
                ]
            ]
        },
        {
            "name": "attrs",
            "specs": [
                [
                    "==",
                    "25.3.0"
                ]
            ]
        },
        {
            "name": "build",
            "specs": [
                [
                    "==",
                    "1.2.2.post1"
                ]
            ]
        },
        {
            "name": "certifi",
            "specs": [
                [
                    "==",
                    "2025.7.14"
                ]
            ]
        },
        {
            "name": "charset-normalizer",
            "specs": [
                [
                    "==",
                    "3.4.2"
                ]
            ]
        },
        {
            "name": "dataclasses-json",
            "specs": [
                [
                    "==",
                    "0.6.7"
                ]
            ]
        },
        {
            "name": "frozenlist",
            "specs": [
                [
                    "==",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "greenlet",
            "specs": [
                [
                    "==",
                    "3.2.3"
                ]
            ]
        },
        {
            "name": "h11",
            "specs": [
                [
                    "==",
                    "0.16.0"
                ]
            ]
        },
        {
            "name": "httpcore",
            "specs": [
                [
                    "==",
                    "1.0.9"
                ]
            ]
        },
        {
            "name": "httpx",
            "specs": [
                [
                    "==",
                    "0.28.1"
                ]
            ]
        },
        {
            "name": "httpx-sse",
            "specs": [
                [
                    "==",
                    "0.4.1"
                ]
            ]
        },
        {
            "name": "idna",
            "specs": [
                [
                    "==",
                    "3.10"
                ]
            ]
        },
        {
            "name": "jsonpatch",
            "specs": [
                [
                    "==",
                    "1.33"
                ]
            ]
        },
        {
            "name": "jsonpointer",
            "specs": [
                [
                    "==",
                    "3.0.0"
                ]
            ]
        },
        {
            "name": "langchain",
            "specs": [
                [
                    "==",
                    "0.3.27"
                ]
            ]
        },
        {
            "name": "langchain-community",
            "specs": [
                [
                    "==",
                    "0.3.27"
                ]
            ]
        },
        {
            "name": "langchain-core",
            "specs": [
                [
                    "==",
                    "0.3.72"
                ]
            ]
        },
        {
            "name": "langchain-text-splitters",
            "specs": [
                [
                    "==",
                    "0.3.9"
                ]
            ]
        },
        {
            "name": "langsmith",
            "specs": [
                [
                    "==",
                    "0.4.8"
                ]
            ]
        },
        {
            "name": "marshmallow",
            "specs": [
                [
                    "==",
                    "3.26.1"
                ]
            ]
        },
        {
            "name": "multidict",
            "specs": [
                [
                    "==",
                    "6.6.3"
                ]
            ]
        },
        {
            "name": "mypy_extensions",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.3.2"
                ]
            ]
        },
        {
            "name": "orjson",
            "specs": [
                [
                    "==",
                    "3.11.1"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "25.0"
                ]
            ]
        },
        {
            "name": "propcache",
            "specs": [
                [
                    "==",
                    "0.3.2"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.11.7"
                ]
            ]
        },
        {
            "name": "pydantic-settings",
            "specs": [
                [
                    "==",
                    "2.10.1"
                ]
            ]
        },
        {
            "name": "pydantic_core",
            "specs": [
                [
                    "==",
                    "2.33.2"
                ]
            ]
        },
        {
            "name": "pypdf",
            "specs": [
                [
                    "==",
                    "5.9.0"
                ]
            ]
        },
        {
            "name": "pyproject_hooks",
            "specs": [
                [
                    "==",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    "==",
                    "1.1.1"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    "==",
                    "6.0.2"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.4"
                ]
            ]
        },
        {
            "name": "requests-toolbelt",
            "specs": [
                [
                    "==",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "sniffio",
            "specs": [
                [
                    "==",
                    "1.3.1"
                ]
            ]
        },
        {
            "name": "SQLAlchemy",
            "specs": [
                [
                    "==",
                    "2.0.42"
                ]
            ]
        },
        {
            "name": "tenacity",
            "specs": [
                [
                    "==",
                    "9.1.2"
                ]
            ]
        },
        {
            "name": "typing-inspect",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "typing-inspection",
            "specs": [
                [
                    "==",
                    "0.4.1"
                ]
            ]
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    "==",
                    "4.14.1"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.5.0"
                ]
            ]
        },
        {
            "name": "yarl",
            "specs": [
                [
                    "==",
                    "1.20.1"
                ]
            ]
        },
        {
            "name": "zstandard",
            "specs": [
                [
                    "==",
                    "0.23.0"
                ]
            ]
        }
    ],
    "lcname": "arxiv-export-documents"
}
        
Elapsed time: 1.65251s