htmltagparse


Namehtmltagparse JSON
Version 3.1 PyPI version JSON
download
home_pagehttps://github.com/xyzpw/htmltagparse/
SummaryA tool designed to quickly parse html tags and elements.
upload_time2024-08-09 09:08:37
maintainerxyzpw
docs_urlNone
authorxyzpw
requires_python>=3.10
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # htmltagparse
![Pepy Total Downlods](https://img.shields.io/pepy/dt/htmltagparse)

A tool designed to quickly parse HTML tags and elements.

## Prerequisites
- Pip packages:
  - beautifulsoup4==4.*
  - html5lib==1.*
  - requests==2.*

- Optional packages:
  - timeoutcall==1.*

## Usage
### Reading Page Titles
Firstly, if you would like to view page info alone, you could use a few functions for this:
```python
import htmltagparse
title = htmltagparse.titleFromUri("https://github.com/")
print(title) # output: GitHub: Let’s build from here · GitHub

metadata = htmltagparse.metadataFromUri("https://github.com/") # meta tags from github
```

### Building Pages
#### Building Pages via URI
```python
from htmltagparse import build

brave = build.fromUri("https://search.brave.com/")
print(brave.response) #output: (200, 'OK')
print(brave.tags) #list of tags found on the specified page
print(brave.elapsed) #the time taken to create the html page class
print(brave.title) #title of the html page
```
This is not limited to these values alone; there are more values associated with an html page.

#### Building Pages via HTML
```python
from htmltagparse import HtmlPage
from requests import get

htmlContent = get("https://duckduckgo.com/").text
ddg = HtmlPage(htmlContent)
print(list(ddg.sources)) #output: ['script']
```

#### Searching A Page
With this package, you have the ability to search the html page you have created directly through a function:
```python
from htmltagparse import build
import re

videoId = ""
page = build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
try:
  #NOTE: the regex function already has re's MULTILINE and DOTALL flags in use
  #get a list of tags to the youtube video via this regex pattern
  videoTags = page.regex(r"\"keywords\":(?P<tags>\[.*?),\"channelId\":").group("tags")
  #converting from string to array
  videoTags = re.findall(r"(?:\"|\')(?P<tag>.*?)(?:\'|\")(?:\,|\])", videoTags)
except:
  videoTags = "no tags found"

print(videoTags)
```

Another way you could get tags from a Youtube video is with the `find` function, example:
```python
import htmltagparse

videoId = "" #video id here
yt = htmltagparse.build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
elTagOpening = yt.find("meta", attrs={"name": "keywords"})[0]
videoKeywords = htmltagparse.getElementAttributeValue(elTagOpening, "content").split(", ")
print(videoKeywords) # tags of the youtube video
```

## Developers
### Building to Wheel File
- cd into root directory of this repository
- run `python3 -m build`

> [!NOTE]
> Errors building this package may be due to this packages requirements, if this occurs, use `python3 -m build -n` instead.

### Contributions
Must not include:
- Major changes
- Breaking code
- Changes to version number

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xyzpw/htmltagparse/",
    "name": "htmltagparse",
    "maintainer": "xyzpw",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "xyzpw",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/13/1e/35a2844ce4ffeecca03476114fc999cbaa961fdaf0e72b5298f2671b8a5e/htmltagparse-3.1.tar.gz",
    "platform": null,
    "description": "# htmltagparse\n![Pepy Total Downlods](https://img.shields.io/pepy/dt/htmltagparse)\n\nA tool designed to quickly parse HTML tags and elements.\n\n## Prerequisites\n- Pip packages:\n  - beautifulsoup4==4.*\n  - html5lib==1.*\n  - requests==2.*\n\n- Optional packages:\n  - timeoutcall==1.*\n\n## Usage\n### Reading Page Titles\nFirstly, if you would like to view page info alone, you could use a few functions for this:\n```python\nimport htmltagparse\ntitle = htmltagparse.titleFromUri(\"https://github.com/\")\nprint(title) # output: GitHub: Let\u2019s build from here \u00b7 GitHub\n\nmetadata = htmltagparse.metadataFromUri(\"https://github.com/\") # meta tags from github\n```\n\n### Building Pages\n#### Building Pages via URI\n```python\nfrom htmltagparse import build\n\nbrave = build.fromUri(\"https://search.brave.com/\")\nprint(brave.response) #output: (200, 'OK')\nprint(brave.tags) #list of tags found on the specified page\nprint(brave.elapsed) #the time taken to create the html page class\nprint(brave.title) #title of the html page\n```\nThis is not limited to these values alone; there are more values associated with an html page.\n\n#### Building Pages via HTML\n```python\nfrom htmltagparse import HtmlPage\nfrom requests import get\n\nhtmlContent = get(\"https://duckduckgo.com/\").text\nddg = HtmlPage(htmlContent)\nprint(list(ddg.sources)) #output: ['script']\n```\n\n#### Searching A Page\nWith this package, you have the ability to search the html page you have created directly through a function:\n```python\nfrom htmltagparse import build\nimport re\n\nvideoId = \"\"\npage = build.fromUri(\"https://www.youtube.com/watch?v=%s\" % videoId)\ntry:\n  #NOTE: the regex function already has re's MULTILINE and DOTALL flags in use\n  #get a list of tags to the youtube video via this regex pattern\n  videoTags = page.regex(r\"\\\"keywords\\\":(?P<tags>\\[.*?),\\\"channelId\\\":\").group(\"tags\")\n  #converting from string to array\n  videoTags = re.findall(r\"(?:\\\"|\\')(?P<tag>.*?)(?:\\'|\\\")(?:\\,|\\])\", videoTags)\nexcept:\n  videoTags = \"no tags found\"\n\nprint(videoTags)\n```\n\nAnother way you could get tags from a Youtube video is with the `find` function, example:\n```python\nimport htmltagparse\n\nvideoId = \"\" #video id here\nyt = htmltagparse.build.fromUri(\"https://www.youtube.com/watch?v=%s\" % videoId)\nelTagOpening = yt.find(\"meta\", attrs={\"name\": \"keywords\"})[0]\nvideoKeywords = htmltagparse.getElementAttributeValue(elTagOpening, \"content\").split(\", \")\nprint(videoKeywords) # tags of the youtube video\n```\n\n## Developers\n### Building to Wheel File\n- cd into root directory of this repository\n- run `python3 -m build`\n\n> [!NOTE]\n> Errors building this package may be due to this packages requirements, if this occurs, use `python3 -m build -n` instead.\n\n### Contributions\nMust not include:\n- Major changes\n- Breaking code\n- Changes to version number\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A tool designed to quickly parse html tags and elements.",
    "version": "3.1",
    "project_urls": {
        "Homepage": "https://github.com/xyzpw/htmltagparse/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "09380649c7481503dc535899071fc2f17b7bd420dfd4b10e6810f05bf226839f",
                "md5": "5205d29e58a57b411d24c6d274dfc1fc",
                "sha256": "4f80e340c7e8a027a3f1fac0f474949599505b35f70ac1d6679cc9f77671502d"
            },
            "downloads": -1,
            "filename": "htmltagparse-3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5205d29e58a57b411d24c6d274dfc1fc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 9661,
            "upload_time": "2024-08-09T09:08:36",
            "upload_time_iso_8601": "2024-08-09T09:08:36.322671Z",
            "url": "https://files.pythonhosted.org/packages/09/38/0649c7481503dc535899071fc2f17b7bd420dfd4b10e6810f05bf226839f/htmltagparse-3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "131e35a2844ce4ffeecca03476114fc999cbaa961fdaf0e72b5298f2671b8a5e",
                "md5": "3a635b84298f0ee2c07fb89fe0689cbd",
                "sha256": "4f45a0282a0269f285ec8e60be5886421c1f7ae27816e9818f9ec79f71aff5bb"
            },
            "downloads": -1,
            "filename": "htmltagparse-3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "3a635b84298f0ee2c07fb89fe0689cbd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 9603,
            "upload_time": "2024-08-09T09:08:37",
            "upload_time_iso_8601": "2024-08-09T09:08:37.645770Z",
            "url": "https://files.pythonhosted.org/packages/13/1e/35a2844ce4ffeecca03476114fc999cbaa961fdaf0e72b5298f2671b8a5e/htmltagparse-3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-09 09:08:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xyzpw",
    "github_project": "htmltagparse",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "htmltagparse"
}
        
Elapsed time: 1.97746s