# linkpreview
[![Build Status](https://github.com/meyt/linkpreview/actions/workflows/main.yaml/badge.svg)](https://github.com/meyt/linkpreview/actions)
[![Coverage Status](https://coveralls.io/repos/github/meyt/linkpreview/badge.svg?branch=master)](https://coveralls.io/github/meyt/linkpreview?branch=master)
[![pypi](https://img.shields.io/pypi/pyversions/linkpreview.svg)](https://pypi.python.org/pypi/linkpreview)
Get link preview in python
Gathering data from:
1. [OpenGraph](https://ogp.me/) meta tags
2. [TwitterCard](https://developer.twitter.com/en/docs/tweets/optimize-with-cards/overview/abouts-cards) meta tags
3. [Microdata](<https://en.wikipedia.org/wiki/Microdata_(HTML)>) meta tags
4. [JSON-LD](https://en.wikipedia.org/wiki/JSON-LD) meta tags
5. HTML Generic tags (`h1`, `p`, `img`)
6. URL readable parts
## Install
```
pip install linkpreview
```
## Usage
### Basic
```python
from linkpreview import link_preview
url = "http://localhost"
content = """
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width">
<!-- ... --->
<title>a title</title>
</head>
<body>
<!-- ... --->
</body>
</html>
"""
preview = link_preview(url, content)
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```
### Automatic fetch link content
```python
from linkpreview import link_preview
preview = link_preview("http://github.com/")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```
### `lxml` as XML parser
Very recommended for better performance.
[Install](https://lxml.de/installation.html) the `lxml` and use it like this:
```python
from linkpreview import link_preview
preview = link_preview("http://github.com/", parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```
### Advanced
```python
from linkpreview import Link, LinkPreview, LinkGrabber
url = "http://github.com"
grabber = LinkGrabber(
initial_timeout=20,
maxsize=1048576,
receive_timeout=10,
chunk_size=1024,
)
content, url = grabber.get_content(url)
link = Link(url, content)
preview = LinkPreview(link, parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```
Extend default headers:
```python
content, url = grabber.get_content(url, headers={'user-agent': 'Twitterbot'})
```
Ignore default headers:
```python
content, url = grabber.get_content(
url,
headers={'user-agent': 'Twitterbot', 'accept': '*/*'},
replace_headers=True,
)
```
Use preset headers:
```python
content, url = grabber.get_content( url, headers='googlebot')
```
Available presets:
`firefox`,
`chrome`,
`googlebot`,
`twitterbot`,
`telegrambot`,
`imessagebot`
If you already have parsed `BeautifulSoup` object:
```python
from bs4 import BeautifulSoup
from linkpreview import Link, LinkPreview
url = "http://example.com"
content = "<h1>Hello</h1>"
soup = BeautifulSoup(content, "html.parser")
link = Link(url, content)
preview = LinkPreview(link, soup=soup)
print("title:", preview.title)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "linkpreview",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "link preview web htmlparse schema.org opengraph twittercard url",
"author": "MeyT",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/15/e0/7add03bd40f7f20dc5661e11e6e2137dc0a1062b01070699b420859de899/linkpreview-0.11.0.tar.gz",
"platform": null,
"description": "# linkpreview\n\n[![Build Status](https://github.com/meyt/linkpreview/actions/workflows/main.yaml/badge.svg)](https://github.com/meyt/linkpreview/actions)\n[![Coverage Status](https://coveralls.io/repos/github/meyt/linkpreview/badge.svg?branch=master)](https://coveralls.io/github/meyt/linkpreview?branch=master)\n[![pypi](https://img.shields.io/pypi/pyversions/linkpreview.svg)](https://pypi.python.org/pypi/linkpreview)\n\nGet link preview in python\n\nGathering data from:\n\n1. [OpenGraph](https://ogp.me/) meta tags\n2. [TwitterCard](https://developer.twitter.com/en/docs/tweets/optimize-with-cards/overview/abouts-cards) meta tags\n3. [Microdata](<https://en.wikipedia.org/wiki/Microdata_(HTML)>) meta tags\n4. [JSON-LD](https://en.wikipedia.org/wiki/JSON-LD) meta tags\n5. HTML Generic tags (`h1`, `p`, `img`)\n6. URL readable parts\n\n## Install\n\n```\npip install linkpreview\n```\n\n## Usage\n\n### Basic\n\n```python\nfrom linkpreview import link_preview\n\nurl = \"http://localhost\"\ncontent = \"\"\"\n<!DOCTYPE html>\n<html>\n <head>\n <meta charset=\"utf-8\">\n <meta name=\"viewport\" content=\"width=device-width\">\n <!-- ... --->\n <title>a title</title>\n </head>\n <body>\n <!-- ... --->\n </body>\n</html>\n\"\"\"\npreview = link_preview(url, content)\nprint(\"title:\", preview.title)\nprint(\"description:\", preview.description)\nprint(\"image:\", preview.image)\nprint(\"force_title:\", preview.force_title)\nprint(\"absolute_image:\", preview.absolute_image)\nprint(\"site_name:\", preview.site_name)\nprint(\"favicon:\", preview.favicon)\nprint(\"absolute_favicon:\", preview.absolute_favicon)\n```\n\n### Automatic fetch link content\n\n```python\nfrom linkpreview import link_preview\n\npreview = link_preview(\"http://github.com/\")\nprint(\"title:\", preview.title)\nprint(\"description:\", preview.description)\nprint(\"image:\", preview.image)\nprint(\"force_title:\", preview.force_title)\nprint(\"absolute_image:\", preview.absolute_image)\nprint(\"site_name:\", preview.site_name)\nprint(\"favicon:\", preview.favicon)\nprint(\"absolute_favicon:\", preview.absolute_favicon)\n```\n\n### `lxml` as XML parser\n\nVery recommended for better performance.\n\n[Install](https://lxml.de/installation.html) the `lxml` and use it like this:\n\n```python\nfrom linkpreview import link_preview\n\npreview = link_preview(\"http://github.com/\", parser=\"lxml\")\nprint(\"title:\", preview.title)\nprint(\"description:\", preview.description)\nprint(\"image:\", preview.image)\nprint(\"force_title:\", preview.force_title)\nprint(\"absolute_image:\", preview.absolute_image)\nprint(\"site_name:\", preview.site_name)\nprint(\"favicon:\", preview.favicon)\nprint(\"absolute_favicon:\", preview.absolute_favicon)\n```\n\n### Advanced\n\n```python\nfrom linkpreview import Link, LinkPreview, LinkGrabber\n\nurl = \"http://github.com\"\ngrabber = LinkGrabber(\n initial_timeout=20,\n maxsize=1048576,\n receive_timeout=10,\n chunk_size=1024,\n)\ncontent, url = grabber.get_content(url)\nlink = Link(url, content)\npreview = LinkPreview(link, parser=\"lxml\")\nprint(\"title:\", preview.title)\nprint(\"description:\", preview.description)\nprint(\"image:\", preview.image)\nprint(\"force_title:\", preview.force_title)\nprint(\"absolute_image:\", preview.absolute_image)\nprint(\"site_name:\", preview.site_name)\nprint(\"favicon:\", preview.favicon)\nprint(\"absolute_favicon:\", preview.absolute_favicon)\n```\n\nExtend default headers:\n\n```python\ncontent, url = grabber.get_content(url, headers={'user-agent': 'Twitterbot'})\n```\n\nIgnore default headers:\n\n```python\ncontent, url = grabber.get_content(\n url,\n headers={'user-agent': 'Twitterbot', 'accept': '*/*'},\n replace_headers=True,\n)\n```\n\nUse preset headers:\n\n```python\ncontent, url = grabber.get_content( url, headers='googlebot')\n```\n\nAvailable presets:\n`firefox`,\n`chrome`,\n`googlebot`,\n`twitterbot`,\n`telegrambot`,\n`imessagebot`\n\nIf you already have parsed `BeautifulSoup` object:\n\n```python\nfrom bs4 import BeautifulSoup\nfrom linkpreview import Link, LinkPreview\n\nurl = \"http://example.com\"\ncontent = \"<h1>Hello</h1>\"\nsoup = BeautifulSoup(content, \"html.parser\")\nlink = Link(url, content)\npreview = LinkPreview(link, soup=soup)\nprint(\"title:\", preview.title)\n```\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Get link (URL) preview",
"version": "0.11.0",
"project_urls": null,
"split_keywords": [
"link",
"preview",
"web",
"htmlparse",
"schema.org",
"opengraph",
"twittercard",
"url"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a14b04c4740668ee84b37a2cb7d5e38111a399407a7ac81bc1c3e7efe2950b94",
"md5": "bd3128d1ac9d37f50d52fba5c0621847",
"sha256": "9f4dbd9abf0cdff6a5c8ca0e4133509c02ecf531ed6ea8c9e31da7e1cc510e8e"
},
"downloads": -1,
"filename": "linkpreview-0.11.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bd3128d1ac9d37f50d52fba5c0621847",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 21654,
"upload_time": "2024-09-27T18:52:04",
"upload_time_iso_8601": "2024-09-27T18:52:04.197896Z",
"url": "https://files.pythonhosted.org/packages/a1/4b/04c4740668ee84b37a2cb7d5e38111a399407a7ac81bc1c3e7efe2950b94/linkpreview-0.11.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "15e07add03bd40f7f20dc5661e11e6e2137dc0a1062b01070699b420859de899",
"md5": "19f8dbac1eabf0d14bed400a42ff08d9",
"sha256": "af30d3d1d86358d8fce9fa7bf9976f0a7ef0b213645072f58e916a87782ccbb5"
},
"downloads": -1,
"filename": "linkpreview-0.11.0.tar.gz",
"has_sig": false,
"md5_digest": "19f8dbac1eabf0d14bed400a42ff08d9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 15277,
"upload_time": "2024-09-27T18:52:05",
"upload_time_iso_8601": "2024-09-27T18:52:05.481578Z",
"url": "https://files.pythonhosted.org/packages/15/e0/7add03bd40f7f20dc5661e11e6e2137dc0a1062b01070699b420859de899/linkpreview-0.11.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-27 18:52:05",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "linkpreview"
}