# Wordpress Loader
```bash
pip install llama-index-readers-wordpress
```
This loader fetches the text from Wordpress blog posts using the Wordpress API. It also uses the BeautifulSoup library to parse the HTML and extract the text from the articles.
## Usage
To use this loader, you need to pass base url of the Wordpress installation
(e.g. `https://www.mysite.com`) and optionally a username, and an application
password for the user (more about application passwords
[here](https://www.paidmembershipspro.com/create-application-password-wordpress/))
```python
from llama_index.readers.wordpress import WordpressReader
loader = WordpressReader(
url="https://www.mysite.com",
username="my_username",
password="my_password",
)
documents = loader.load_data()
```
This loader is designed to be used as a way to load data into
[LlamaIndex](https://github.com/run-llama/llama_index/).
## Pages and Posts
Be default, the loader retrieves both Wordpress _pages_ (static content) and
_posts_ (blog entries) from the target site. This behavior can be configured
by setting `get_pages=False` or `get_posts=False` when initializing the
`WordpressReader` object.
## Additional Custom Post types
To scrape additional custom endpoints beside _posts_ and _pages_, you can specify `additional_post_types` as a comma-separated list (e.g., `additional_post_types="custom-pages,custom-posts"`) when initializing the `WordpressReader` object.
```python
from llama_index.readers.wordpress import WordpressReader
loader = WordpressReader(
url="https://www.mysite.com",
username="my_username",
password="my_password",
additional_post_types="webiners,podcasts",
)
documents = loader.load_data()
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-readers-wordpress",
"maintainer": "bbornsztein",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "blog, wordpress",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/7b/b3/33950292a3f98069fb58dd188e1a096ca7495fa2de2515e4ccccd7f3002d/llama_index_readers_wordpress-0.2.3.tar.gz",
"platform": null,
"description": "# Wordpress Loader\n\n```bash\npip install llama-index-readers-wordpress\n```\n\nThis loader fetches the text from Wordpress blog posts using the Wordpress API. It also uses the BeautifulSoup library to parse the HTML and extract the text from the articles.\n\n## Usage\n\nTo use this loader, you need to pass base url of the Wordpress installation\n(e.g. `https://www.mysite.com`) and optionally a username, and an application\npassword for the user (more about application passwords\n[here](https://www.paidmembershipspro.com/create-application-password-wordpress/))\n\n```python\nfrom llama_index.readers.wordpress import WordpressReader\n\nloader = WordpressReader(\n url=\"https://www.mysite.com\",\n username=\"my_username\",\n password=\"my_password\",\n)\ndocuments = loader.load_data()\n```\n\nThis loader is designed to be used as a way to load data into\n[LlamaIndex](https://github.com/run-llama/llama_index/).\n\n## Pages and Posts\n\nBe default, the loader retrieves both Wordpress _pages_ (static content) and\n_posts_ (blog entries) from the target site. This behavior can be configured\nby setting `get_pages=False` or `get_posts=False` when initializing the\n`WordpressReader` object.\n\n## Additional Custom Post types\n\nTo scrape additional custom endpoints beside _posts_ and _pages_, you can specify `additional_post_types` as a comma-separated list (e.g., `additional_post_types=\"custom-pages,custom-posts\"`) when initializing the `WordpressReader` object.\n\n```python\nfrom llama_index.readers.wordpress import WordpressReader\n\nloader = WordpressReader(\n url=\"https://www.mysite.com\",\n username=\"my_username\",\n password=\"my_password\",\n additional_post_types=\"webiners,podcasts\",\n)\ndocuments = loader.load_data()\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index readers wordpress integration",
"version": "0.2.3",
"project_urls": null,
"split_keywords": [
"blog",
" wordpress"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "06a569c7df6897ebb51b518bd3c1b2381bb73c77e0e69cbbdcc94cc074d90690",
"md5": "281b2544145a07050e549784b8a06bc2",
"sha256": "19f56e05b0d8f87985821bbacdfd2fcef571bf2b2d2250b3d20add6002f9daf6"
},
"downloads": -1,
"filename": "llama_index_readers_wordpress-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "281b2544145a07050e549784b8a06bc2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 3851,
"upload_time": "2024-11-07T09:19:47",
"upload_time_iso_8601": "2024-11-07T09:19:47.194254Z",
"url": "https://files.pythonhosted.org/packages/06/a5/69c7df6897ebb51b518bd3c1b2381bb73c77e0e69cbbdcc94cc074d90690/llama_index_readers_wordpress-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7bb333950292a3f98069fb58dd188e1a096ca7495fa2de2515e4ccccd7f3002d",
"md5": "7e8ed57f00039b5d093ece82786a2768",
"sha256": "5add4f06b7fa3c4c2cfff3a711b63251cc939f522a833bea0c5d6cece69252cb"
},
"downloads": -1,
"filename": "llama_index_readers_wordpress-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "7e8ed57f00039b5d093ece82786a2768",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 3515,
"upload_time": "2024-11-07T09:19:48",
"upload_time_iso_8601": "2024-11-07T09:19:48.450331Z",
"url": "https://files.pythonhosted.org/packages/7b/b3/33950292a3f98069fb58dd188e1a096ca7495fa2de2515e4ccccd7f3002d/llama_index_readers_wordpress-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-07 09:19:48",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-readers-wordpress"
}