scrapebiblio

Name	scrapebiblio JSON
Version	1.2.0 JSON
	download
home_page	None
Summary	library for extracting reference from documents
upload_time	2024-10-21 08:24:03
maintainer	None
docs_url	None
author	None
requires_python	<4.0,>=3.9
license	None
keywords	ai artificial intelligence gpt graph machine learning natural language processing nlp openai scraping web scraping tool webscraping
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ScrapeBiblio: PDF Reference Extraction and Verification Library

## Powered by Scrapegraphai
![ScrapeBiblio Logo](docs/scrapebiblio.png)
[![Downloads](https://static.pepy.tech/badge/scrapebiblio)](https://pepy.tech/project/scrapebiblio)

ScrapeBiblio is a powerful library designed to extract references from PDF files, verify them against various databases, and convert the content to Markdown format.

## Features

- Extract text from PDF files
- Extract references using OpenAI's GPT models
- Verify references using Semantic Scholar, CORE, and BASE databases
- Convert PDF content to Markdown format
- Integration with ScrapeGraph for additional reference checking

## Installation

Install ScrapeBiblio using pip:
```bash
pip install scrapebiblio
```

## Configuration

Create a `.env` file in your project root with the following content:

```plaintext
OPENAI_API_KEY=your_openai_api_key
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key
CORE_API_KEY=your_core_api_key
BASE_API_KEY=your_base_api_key
```
## Usage

Here's a basic example of how to use ScrapeBiblio:

```python
from scrapebiblio.core.find_reference import process_pdf
from dotenv import load_dotenv
import os
load_dotenv()
pdf_path = 'path/to/your/pdf/file.pdf'
output_path = 'references.md'
openai_api_key = os.getenv('OPENAI_API_KEY')
semantic_scholar_api_key = os.getenv('SEMANTIC_SCHOLAR_API_KEY')
core_api_key = os.getenv('CORE_API_KEY')
base_api_key = os.getenv('BASE_API_KEY')
process_pdf(pdf_path, output_path, openai_api_key, semantic_scholar_api_key,
core_api_key=core_api_key, base_api_key=base_api_key)
```
## Advanced Usage

ScrapeBiblio offers additional functionalities:

1. Convert PDF to Markdown:
```python
from scrapebiblio.core.convert_to_md import convert_to_md
convert_to_md(pdf_path, output_path, openai_api_key)
```
2. Check references with ScrapeGraph:

```python
from scrapebiblio.utils.api.reference_utils import check_reference_with_scrapegraph
result = check_reference_with_scrapegraph("Reference Title")
```
## Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for more details.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "scrapebiblio",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "ai, artificial intelligence, gpt, graph, machine learning, natural language processing, nlp, openai, scraping, web scraping tool, webscraping",
    "author": null,
    "author_email": "Marco Vinciguerra <mvincig11@gmail.com>, Marco Perini <perinim.98@gmail.com>, Lorenzo Padoan <lorenzo.padoan977@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5d/4c/287fc3aa5bb8b4724654ac6544110df917eb3c1ad93b76a6b03cfb08be0a/scrapebiblio-1.2.0.tar.gz",
    "platform": null,
    "description": "# ScrapeBiblio: PDF Reference Extraction and Verification Library\n\n## Powered by Scrapegraphai\n![ScrapeBiblio Logo](docs/scrapebiblio.png)\n[![Downloads](https://static.pepy.tech/badge/scrapebiblio)](https://pepy.tech/project/scrapebiblio)\n\nScrapeBiblio is a powerful library designed to extract references from PDF files, verify them against various databases, and convert the content to Markdown format.\n\n## Features\n\n- Extract text from PDF files\n- Extract references using OpenAI's GPT models\n- Verify references using Semantic Scholar, CORE, and BASE databases\n- Convert PDF content to Markdown format\n- Integration with ScrapeGraph for additional reference checking\n\n## Installation\n\nInstall ScrapeBiblio using pip:\n```bash\npip install scrapebiblio\n```\n\n## Configuration\n\nCreate a `.env` file in your project root with the following content:\n\n```plaintext\nOPENAI_API_KEY=your_openai_api_key\nSEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key\nCORE_API_KEY=your_core_api_key\nBASE_API_KEY=your_base_api_key\n```\n## Usage\n\nHere's a basic example of how to use ScrapeBiblio:\n\n```python\nfrom scrapebiblio.core.find_reference import process_pdf\nfrom dotenv import load_dotenv\nimport os\nload_dotenv()\npdf_path = 'path/to/your/pdf/file.pdf'\noutput_path = 'references.md'\nopenai_api_key = os.getenv('OPENAI_API_KEY')\nsemantic_scholar_api_key = os.getenv('SEMANTIC_SCHOLAR_API_KEY')\ncore_api_key = os.getenv('CORE_API_KEY')\nbase_api_key = os.getenv('BASE_API_KEY')\nprocess_pdf(pdf_path, output_path, openai_api_key, semantic_scholar_api_key,\ncore_api_key=core_api_key, base_api_key=base_api_key)\n```\n## Advanced Usage\n\nScrapeBiblio offers additional functionalities:\n\n1. Convert PDF to Markdown:\n```python\nfrom scrapebiblio.core.convert_to_md import convert_to_md\nconvert_to_md(pdf_path, output_path, openai_api_key)\n```\n2. Check references with ScrapeGraph:\n\n```python\nfrom scrapebiblio.utils.api.reference_utils import check_reference_with_scrapegraph\nresult = check_reference_with_scrapegraph(\"Reference Title\")\n```\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for more details.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.",
    "bugtrack_url": null,
    "license": null,
    "summary": "library for extracting reference from documents",
    "version": "1.2.0",
    "project_urls": null,
    "split_keywords": [
        "ai",
        " artificial intelligence",
        " gpt",
        " graph",
        " machine learning",
        " natural language processing",
        " nlp",
        " openai",
        " scraping",
        " web scraping tool",
        " webscraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cafd439a018a641a92351c805962276d9c7a21d29211657a7435ebc5eb160468",
                "md5": "520c8fd944c023139e18db1fdd0a9f5f",
                "sha256": "95f5e0f3a0ea207d6dc5d91542d923aeebc853f8969e904aaaae65ebb462467a"
            },
            "downloads": -1,
            "filename": "scrapebiblio-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "520c8fd944c023139e18db1fdd0a9f5f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 10218,
            "upload_time": "2024-10-21T08:24:01",
            "upload_time_iso_8601": "2024-10-21T08:24:01.733731Z",
            "url": "https://files.pythonhosted.org/packages/ca/fd/439a018a641a92351c805962276d9c7a21d29211657a7435ebc5eb160468/scrapebiblio-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5d4c287fc3aa5bb8b4724654ac6544110df917eb3c1ad93b76a6b03cfb08be0a",
                "md5": "8c9c1a9a0bf33b19049b7d11a162b395",
                "sha256": "efc4a578a4badd6d8f0303a87198e78f50bf5eb69b3f612f946e497db324860b"
            },
            "downloads": -1,
            "filename": "scrapebiblio-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "8c9c1a9a0bf33b19049b7d11a162b395",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 1093878,
            "upload_time": "2024-10-21T08:24:03",
            "upload_time_iso_8601": "2024-10-21T08:24:03.351888Z",
            "url": "https://files.pythonhosted.org/packages/5d/4c/287fc3aa5bb8b4724654ac6544110df917eb3c1ad93b76a6b03cfb08be0a/scrapebiblio-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-21 08:24:03",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "scrapebiblio"
}

None