# notion-haystack: Export Notion pages to Haystack Documents
This python package allows you to easily export your Notion pages to Haystack Documents by providing a Notion API token.
Given that the Notion API is subject to some [rate limits](https://developers.notion.com/reference/request-limits),
this component will automatically retry failed requests and wait for the rate limit to reset before retrying. This is
especially useful when exporting a large number of pages. Furthermore, this component uses `asyncio` to make requests in
parallel, which can significantly speed up the export process.
## Installation
```bash
pip install notion-haystack
```
## Usage
To use this package, you will need a Notion API token. You can follow the steps outlined in the [Notion documentation](https://developers.notion.com/docs/create-a-notion-integration#create-your-integration-in-notion) to create a new Notion integration, connect it to your pages, and obtain your API token.
> To enable your Notion integration to work on specific pages and the child pages in Notion, make sure to enable it in the 'Connections' setting of the page.
The following minimal example demonstrates how to export a list of pages to Haystack Documents:
```python
from notion_haystack import NotionExporter
exporter = NotionExporter(api_token="<your-token>")
exported_pages = exporter.run(page_ids=["<list-of-page-ids>"])
# exported_pages will be a list of Haystack Documents where each Document corresponds to a Notion page
```
The following example shows how to use the `NotionExporter` inside an indexing pipeline:
```python
from haystack import Pipeline
from notion_haystack import NotionExporter
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.document_stores import InMemoryDocumentStore
document_store = InMemoryDocumentStore()
exporter = NotionExporter(api_token='YOUR_NOTION_API_KEY')
splitter = DocumentSplitter()
writer = DocumentWriter(document_store=document_store)
indexing_pipeline = Pipeline()
indexing_pipeline.add_component(instance=exporter, name="exporter")
indexing_pipeline.add_component(instance=splitter, name="splitter")
indexing_pipeline.add_component(instance=writer, name="writer")
indexing_pipeline.connect("exporter.documents", "splitter.documents")
indexing_pipeline.connect("splitter", "writer")
indexing_pipeline.run(data={"exporter": {"page_ids": ["your_page_id"] }})
# The pages will now be indexed in the document store
```
The `NotionExporter` class takes the following arguments:
- `api_token`: Your Notion API token. You can find information on how to get an API token in [Notion's documentation](https://developers.notion.com/docs/create-a-notion-integration)
- `export_child_pages`: Whether to recursively export all child pages of the provided page ids. Defaults to `False`.
- `extract_page_metadata`: Whether to extract metadata from the page and add it as a frontmatter to the markdown.
Extracted metadata includes title, author, path, URL, last editor, and last editing time of
the page. Defaults to `False`.
- `exclude_title_containing`: If specified, pages with titles containing this string will be excluded. This might be
useful for example to exclude pages that are archived. Defaults to `None`.
The `NotionExporter.run` method takes the following arguments:
- `page_ids`: A list of page ids to export. If `export_child_pages` is `True`, all child pages of these pages will be
exported as well.
Raw data
{
"_id": null,
"home_page": null,
"name": "notion-haystack",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "haystack,markdown,notion",
"author": null,
"author_email": "Bogdan Kosti\u0107 <bogdankostic@web.de>",
"download_url": "https://files.pythonhosted.org/packages/4c/07/ee12199a77860a0c937c7a15bcb4ec96443e20b1441383799ddfd7f56091/notion_haystack-1.0.0.tar.gz",
"platform": null,
"description": "# notion-haystack: Export Notion pages to Haystack Documents\n\nThis python package allows you to easily export your Notion pages to Haystack Documents by providing a Notion API token.\n\nGiven that the Notion API is subject to some [rate limits](https://developers.notion.com/reference/request-limits),\nthis component will automatically retry failed requests and wait for the rate limit to reset before retrying. This is\nespecially useful when exporting a large number of pages. Furthermore, this component uses `asyncio` to make requests in\nparallel, which can significantly speed up the export process.\n\n## Installation\n\n```bash\npip install notion-haystack\n```\n\n## Usage\n\nTo use this package, you will need a Notion API token. You can follow the steps outlined in the [Notion documentation](https://developers.notion.com/docs/create-a-notion-integration#create-your-integration-in-notion) to create a new Notion integration, connect it to your pages, and obtain your API token.\n> To enable your Notion integration to work on specific pages and the child pages in Notion, make sure to enable it in the 'Connections' setting of the page.\n\nThe following minimal example demonstrates how to export a list of pages to Haystack Documents:\n```python\nfrom notion_haystack import NotionExporter\n\nexporter = NotionExporter(api_token=\"<your-token>\")\nexported_pages = exporter.run(page_ids=[\"<list-of-page-ids>\"])\n\n# exported_pages will be a list of Haystack Documents where each Document corresponds to a Notion page\n```\n\nThe following example shows how to use the `NotionExporter` inside an indexing pipeline:\n```python\nfrom haystack import Pipeline\n\nfrom notion_haystack import NotionExporter\nfrom haystack.components.preprocessors import DocumentSplitter\nfrom haystack.components.writers import DocumentWriter\nfrom haystack.document_stores import InMemoryDocumentStore\n\ndocument_store = InMemoryDocumentStore()\nexporter = NotionExporter(api_token='YOUR_NOTION_API_KEY')\nsplitter = DocumentSplitter()\nwriter = DocumentWriter(document_store=document_store)\n\nindexing_pipeline = Pipeline()\nindexing_pipeline.add_component(instance=exporter, name=\"exporter\")\nindexing_pipeline.add_component(instance=splitter, name=\"splitter\")\nindexing_pipeline.add_component(instance=writer, name=\"writer\")\n\nindexing_pipeline.connect(\"exporter.documents\", \"splitter.documents\")\nindexing_pipeline.connect(\"splitter\", \"writer\")\n\nindexing_pipeline.run(data={\"exporter\": {\"page_ids\": [\"your_page_id\"] }})\n# The pages will now be indexed in the document store\n```\n\nThe `NotionExporter` class takes the following arguments:\n- `api_token`: Your Notion API token. You can find information on how to get an API token in [Notion's documentation](https://developers.notion.com/docs/create-a-notion-integration)\n- `export_child_pages`: Whether to recursively export all child pages of the provided page ids. Defaults to `False`.\n- `extract_page_metadata`: Whether to extract metadata from the page and add it as a frontmatter to the markdown. \n Extracted metadata includes title, author, path, URL, last editor, and last editing time of \n the page. Defaults to `False`.\n- `exclude_title_containing`: If specified, pages with titles containing this string will be excluded. This might be\n useful for example to exclude pages that are archived. Defaults to `None`.\n\nThe `NotionExporter.run` method takes the following arguments:\n- `page_ids`: A list of page ids to export. If `export_child_pages` is `True`, all child pages of these pages will be\n exported as well.\n",
"bugtrack_url": null,
"license": null,
"summary": null,
"version": "1.0.0",
"project_urls": {
"GitHub": "https://github.com/bogdankostic/notion-haystack"
},
"split_keywords": [
"haystack",
"markdown",
"notion"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "345427d483254f27ff5a755984850f3b9b17e00e2d4f9d8c4879c145f4439ce2",
"md5": "bd68f726669d12aa76f7bc7e4dd5e1f9",
"sha256": "d554f8b1a04eacddc872240daad35faa05ed70990543a5ed6e744d4780387f5a"
},
"downloads": -1,
"filename": "notion_haystack-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bd68f726669d12aa76f7bc7e4dd5e1f9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 8519,
"upload_time": "2023-12-08T14:53:49",
"upload_time_iso_8601": "2023-12-08T14:53:49.575156Z",
"url": "https://files.pythonhosted.org/packages/34/54/27d483254f27ff5a755984850f3b9b17e00e2d4f9d8c4879c145f4439ce2/notion_haystack-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4c07ee12199a77860a0c937c7a15bcb4ec96443e20b1441383799ddfd7f56091",
"md5": "640e208abc67206e6cb298df14294c25",
"sha256": "3e86e60040259390f31d7c2d47acd8c9856b37b611e1c07c140315140bee1ef8"
},
"downloads": -1,
"filename": "notion_haystack-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "640e208abc67206e6cb298df14294c25",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 8432,
"upload_time": "2023-12-08T14:53:50",
"upload_time_iso_8601": "2023-12-08T14:53:50.734626Z",
"url": "https://files.pythonhosted.org/packages/4c/07/ee12199a77860a0c937c7a15bcb4ec96443e20b1441383799ddfd7f56091/notion_haystack-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-08 14:53:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bogdankostic",
"github_project": "notion-haystack",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "notion-haystack"
}