Name | htmltagparse JSON |
Version |
3.1
JSON |
| download |
home_page | https://github.com/xyzpw/htmltagparse/ |
Summary | A tool designed to quickly parse html tags and elements. |
upload_time | 2024-08-09 09:08:37 |
maintainer | xyzpw |
docs_url | None |
author | xyzpw |
requires_python | >=3.10 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# htmltagparse
![Pepy Total Downlods](https://img.shields.io/pepy/dt/htmltagparse)
A tool designed to quickly parse HTML tags and elements.
## Prerequisites
- Pip packages:
- beautifulsoup4==4.*
- html5lib==1.*
- requests==2.*
- Optional packages:
- timeoutcall==1.*
## Usage
### Reading Page Titles
Firstly, if you would like to view page info alone, you could use a few functions for this:
```python
import htmltagparse
title = htmltagparse.titleFromUri("https://github.com/")
print(title) # output: GitHub: Let’s build from here · GitHub
metadata = htmltagparse.metadataFromUri("https://github.com/") # meta tags from github
```
### Building Pages
#### Building Pages via URI
```python
from htmltagparse import build
brave = build.fromUri("https://search.brave.com/")
print(brave.response) #output: (200, 'OK')
print(brave.tags) #list of tags found on the specified page
print(brave.elapsed) #the time taken to create the html page class
print(brave.title) #title of the html page
```
This is not limited to these values alone; there are more values associated with an html page.
#### Building Pages via HTML
```python
from htmltagparse import HtmlPage
from requests import get
htmlContent = get("https://duckduckgo.com/").text
ddg = HtmlPage(htmlContent)
print(list(ddg.sources)) #output: ['script']
```
#### Searching A Page
With this package, you have the ability to search the html page you have created directly through a function:
```python
from htmltagparse import build
import re
videoId = ""
page = build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
try:
#NOTE: the regex function already has re's MULTILINE and DOTALL flags in use
#get a list of tags to the youtube video via this regex pattern
videoTags = page.regex(r"\"keywords\":(?P<tags>\[.*?),\"channelId\":").group("tags")
#converting from string to array
videoTags = re.findall(r"(?:\"|\')(?P<tag>.*?)(?:\'|\")(?:\,|\])", videoTags)
except:
videoTags = "no tags found"
print(videoTags)
```
Another way you could get tags from a Youtube video is with the `find` function, example:
```python
import htmltagparse
videoId = "" #video id here
yt = htmltagparse.build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
elTagOpening = yt.find("meta", attrs={"name": "keywords"})[0]
videoKeywords = htmltagparse.getElementAttributeValue(elTagOpening, "content").split(", ")
print(videoKeywords) # tags of the youtube video
```
## Developers
### Building to Wheel File
- cd into root directory of this repository
- run `python3 -m build`
> [!NOTE]
> Errors building this package may be due to this packages requirements, if this occurs, use `python3 -m build -n` instead.
### Contributions
Must not include:
- Major changes
- Breaking code
- Changes to version number
Raw data
{
"_id": null,
"home_page": "https://github.com/xyzpw/htmltagparse/",
"name": "htmltagparse",
"maintainer": "xyzpw",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "xyzpw",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/13/1e/35a2844ce4ffeecca03476114fc999cbaa961fdaf0e72b5298f2671b8a5e/htmltagparse-3.1.tar.gz",
"platform": null,
"description": "# htmltagparse\n![Pepy Total Downlods](https://img.shields.io/pepy/dt/htmltagparse)\n\nA tool designed to quickly parse HTML tags and elements.\n\n## Prerequisites\n- Pip packages:\n - beautifulsoup4==4.*\n - html5lib==1.*\n - requests==2.*\n\n- Optional packages:\n - timeoutcall==1.*\n\n## Usage\n### Reading Page Titles\nFirstly, if you would like to view page info alone, you could use a few functions for this:\n```python\nimport htmltagparse\ntitle = htmltagparse.titleFromUri(\"https://github.com/\")\nprint(title) # output: GitHub: Let\u2019s build from here \u00b7 GitHub\n\nmetadata = htmltagparse.metadataFromUri(\"https://github.com/\") # meta tags from github\n```\n\n### Building Pages\n#### Building Pages via URI\n```python\nfrom htmltagparse import build\n\nbrave = build.fromUri(\"https://search.brave.com/\")\nprint(brave.response) #output: (200, 'OK')\nprint(brave.tags) #list of tags found on the specified page\nprint(brave.elapsed) #the time taken to create the html page class\nprint(brave.title) #title of the html page\n```\nThis is not limited to these values alone; there are more values associated with an html page.\n\n#### Building Pages via HTML\n```python\nfrom htmltagparse import HtmlPage\nfrom requests import get\n\nhtmlContent = get(\"https://duckduckgo.com/\").text\nddg = HtmlPage(htmlContent)\nprint(list(ddg.sources)) #output: ['script']\n```\n\n#### Searching A Page\nWith this package, you have the ability to search the html page you have created directly through a function:\n```python\nfrom htmltagparse import build\nimport re\n\nvideoId = \"\"\npage = build.fromUri(\"https://www.youtube.com/watch?v=%s\" % videoId)\ntry:\n #NOTE: the regex function already has re's MULTILINE and DOTALL flags in use\n #get a list of tags to the youtube video via this regex pattern\n videoTags = page.regex(r\"\\\"keywords\\\":(?P<tags>\\[.*?),\\\"channelId\\\":\").group(\"tags\")\n #converting from string to array\n videoTags = re.findall(r\"(?:\\\"|\\')(?P<tag>.*?)(?:\\'|\\\")(?:\\,|\\])\", videoTags)\nexcept:\n videoTags = \"no tags found\"\n\nprint(videoTags)\n```\n\nAnother way you could get tags from a Youtube video is with the `find` function, example:\n```python\nimport htmltagparse\n\nvideoId = \"\" #video id here\nyt = htmltagparse.build.fromUri(\"https://www.youtube.com/watch?v=%s\" % videoId)\nelTagOpening = yt.find(\"meta\", attrs={\"name\": \"keywords\"})[0]\nvideoKeywords = htmltagparse.getElementAttributeValue(elTagOpening, \"content\").split(\", \")\nprint(videoKeywords) # tags of the youtube video\n```\n\n## Developers\n### Building to Wheel File\n- cd into root directory of this repository\n- run `python3 -m build`\n\n> [!NOTE]\n> Errors building this package may be due to this packages requirements, if this occurs, use `python3 -m build -n` instead.\n\n### Contributions\nMust not include:\n- Major changes\n- Breaking code\n- Changes to version number\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A tool designed to quickly parse html tags and elements.",
"version": "3.1",
"project_urls": {
"Homepage": "https://github.com/xyzpw/htmltagparse/"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "09380649c7481503dc535899071fc2f17b7bd420dfd4b10e6810f05bf226839f",
"md5": "5205d29e58a57b411d24c6d274dfc1fc",
"sha256": "4f80e340c7e8a027a3f1fac0f474949599505b35f70ac1d6679cc9f77671502d"
},
"downloads": -1,
"filename": "htmltagparse-3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5205d29e58a57b411d24c6d274dfc1fc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 9661,
"upload_time": "2024-08-09T09:08:36",
"upload_time_iso_8601": "2024-08-09T09:08:36.322671Z",
"url": "https://files.pythonhosted.org/packages/09/38/0649c7481503dc535899071fc2f17b7bd420dfd4b10e6810f05bf226839f/htmltagparse-3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "131e35a2844ce4ffeecca03476114fc999cbaa961fdaf0e72b5298f2671b8a5e",
"md5": "3a635b84298f0ee2c07fb89fe0689cbd",
"sha256": "4f45a0282a0269f285ec8e60be5886421c1f7ae27816e9818f9ec79f71aff5bb"
},
"downloads": -1,
"filename": "htmltagparse-3.1.tar.gz",
"has_sig": false,
"md5_digest": "3a635b84298f0ee2c07fb89fe0689cbd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 9603,
"upload_time": "2024-08-09T09:08:37",
"upload_time_iso_8601": "2024-08-09T09:08:37.645770Z",
"url": "https://files.pythonhosted.org/packages/13/1e/35a2844ce4ffeecca03476114fc999cbaa961fdaf0e72b5298f2671b8a5e/htmltagparse-3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-09 09:08:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xyzpw",
"github_project": "htmltagparse",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "htmltagparse"
}