llama-index-readers-wordpress


Namellama-index-readers-wordpress JSON
Version 0.2.3 PyPI version JSON
download
home_pageNone
Summaryllama-index readers wordpress integration
upload_time2024-11-07 09:19:48
maintainerbbornsztein
docs_urlNone
authorYour Name
requires_python<4.0,>=3.9
licenseMIT
keywords blog wordpress
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Wordpress Loader

```bash
pip install llama-index-readers-wordpress
```

This loader fetches the text from Wordpress blog posts using the Wordpress API. It also uses the BeautifulSoup library to parse the HTML and extract the text from the articles.

## Usage

To use this loader, you need to pass base url of the Wordpress installation
(e.g. `https://www.mysite.com`) and optionally a username, and an application
password for the user (more about application passwords
[here](https://www.paidmembershipspro.com/create-application-password-wordpress/))

```python
from llama_index.readers.wordpress import WordpressReader

loader = WordpressReader(
    url="https://www.mysite.com",
    username="my_username",
    password="my_password",
)
documents = loader.load_data()
```

This loader is designed to be used as a way to load data into
[LlamaIndex](https://github.com/run-llama/llama_index/).

## Pages and Posts

Be default, the loader retrieves both Wordpress _pages_ (static content) and
_posts_ (blog entries) from the target site. This behavior can be configured
by setting `get_pages=False` or `get_posts=False` when initializing the
`WordpressReader` object.

## Additional Custom Post types

To scrape additional custom endpoints beside _posts_ and _pages_, you can specify `additional_post_types` as a comma-separated list (e.g., `additional_post_types="custom-pages,custom-posts"`) when initializing the `WordpressReader` object.

```python
from llama_index.readers.wordpress import WordpressReader

loader = WordpressReader(
    url="https://www.mysite.com",
    username="my_username",
    password="my_password",
    additional_post_types="webiners,podcasts",
)
documents = loader.load_data()
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-wordpress",
    "maintainer": "bbornsztein",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "blog, wordpress",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/7b/b3/33950292a3f98069fb58dd188e1a096ca7495fa2de2515e4ccccd7f3002d/llama_index_readers_wordpress-0.2.3.tar.gz",
    "platform": null,
    "description": "# Wordpress Loader\n\n```bash\npip install llama-index-readers-wordpress\n```\n\nThis loader fetches the text from Wordpress blog posts using the Wordpress API. It also uses the BeautifulSoup library to parse the HTML and extract the text from the articles.\n\n## Usage\n\nTo use this loader, you need to pass base url of the Wordpress installation\n(e.g. `https://www.mysite.com`) and optionally a username, and an application\npassword for the user (more about application passwords\n[here](https://www.paidmembershipspro.com/create-application-password-wordpress/))\n\n```python\nfrom llama_index.readers.wordpress import WordpressReader\n\nloader = WordpressReader(\n    url=\"https://www.mysite.com\",\n    username=\"my_username\",\n    password=\"my_password\",\n)\ndocuments = loader.load_data()\n```\n\nThis loader is designed to be used as a way to load data into\n[LlamaIndex](https://github.com/run-llama/llama_index/).\n\n## Pages and Posts\n\nBe default, the loader retrieves both Wordpress _pages_ (static content) and\n_posts_ (blog entries) from the target site. This behavior can be configured\nby setting `get_pages=False` or `get_posts=False` when initializing the\n`WordpressReader` object.\n\n## Additional Custom Post types\n\nTo scrape additional custom endpoints beside _posts_ and _pages_, you can specify `additional_post_types` as a comma-separated list (e.g., `additional_post_types=\"custom-pages,custom-posts\"`) when initializing the `WordpressReader` object.\n\n```python\nfrom llama_index.readers.wordpress import WordpressReader\n\nloader = WordpressReader(\n    url=\"https://www.mysite.com\",\n    username=\"my_username\",\n    password=\"my_password\",\n    additional_post_types=\"webiners,podcasts\",\n)\ndocuments = loader.load_data()\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers wordpress integration",
    "version": "0.2.3",
    "project_urls": null,
    "split_keywords": [
        "blog",
        " wordpress"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "06a569c7df6897ebb51b518bd3c1b2381bb73c77e0e69cbbdcc94cc074d90690",
                "md5": "281b2544145a07050e549784b8a06bc2",
                "sha256": "19f56e05b0d8f87985821bbacdfd2fcef571bf2b2d2250b3d20add6002f9daf6"
            },
            "downloads": -1,
            "filename": "llama_index_readers_wordpress-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "281b2544145a07050e549784b8a06bc2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 3851,
            "upload_time": "2024-11-07T09:19:47",
            "upload_time_iso_8601": "2024-11-07T09:19:47.194254Z",
            "url": "https://files.pythonhosted.org/packages/06/a5/69c7df6897ebb51b518bd3c1b2381bb73c77e0e69cbbdcc94cc074d90690/llama_index_readers_wordpress-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7bb333950292a3f98069fb58dd188e1a096ca7495fa2de2515e4ccccd7f3002d",
                "md5": "7e8ed57f00039b5d093ece82786a2768",
                "sha256": "5add4f06b7fa3c4c2cfff3a711b63251cc939f522a833bea0c5d6cece69252cb"
            },
            "downloads": -1,
            "filename": "llama_index_readers_wordpress-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "7e8ed57f00039b5d093ece82786a2768",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 3515,
            "upload_time": "2024-11-07T09:19:48",
            "upload_time_iso_8601": "2024-11-07T09:19:48.450331Z",
            "url": "https://files.pythonhosted.org/packages/7b/b3/33950292a3f98069fb58dd188e1a096ca7495fa2de2515e4ccccd7f3002d/llama_index_readers_wordpress-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-07 09:19:48",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-wordpress"
}
        
Elapsed time: 3.04690s