SearchURL


NameSearchURL JSON
Version 1.1.4 PyPI version JSON
download
home_pageNone
SummarySearchURL lets perform Keyword, Fuzzy and Semantic Search through the text on websites using thier URLs.
upload_time2024-10-09 08:38:49
maintainerNone
docs_urlNone
authorPahul Gogna
requires_python>=3.10
licenseMIT
keywords python web-scraping search searchurl searchurl fuzzy matching fuzzy search web scraping semantic search nlp natural language processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<h2 align="center">
    SearchURL
</h2>

<h5 align="center">
    <br/>
    <br/>
    <img src="https://img.shields.io/pepy/dt/SearchURL"/>
    <img src="https://img.shields.io/pypi/v/SearchURL"/>
    <img src="https://img.shields.io/pypi/status/SearchURL"/>
    <img src="https://img.shields.io/pypi/l/SearchURL"/>
    <a href="https://github.com/pahulgogna/SearchURL">github-searchURL</a>
</h5>

## Installation
Install SearchURL with pip

```bash
  pip install SearchURL
```
    
## Documentation

**1. Getting all the text from a webpage by not passing in keywords:**
```python
from SearchURL.main import SearchURL

search = SearchURL(cache=True)

data = search.searchUrl(
    url="https://en.wikipedia.org/wiki/Web_scraping"
    )

print(data)
```
**output:** {'success': True, 'data': 'Web scraping - Wikipedia ...'}

**2. Searching with keywords:**

```python
from SearchURL.main import SearchURL

search = SearchURL(cache=True)

data = search.searchUrl(
    url="https://en.wikipedia.org/wiki/Web_scraping",
    keywords=['legal'])

print(data)
```
**output:** {'success': True, 'data': 'Legal issues Toggle Legal issues subsection Legal issues [ edit ] The legality of web scraping varies across the world ...'}

**3. Fuzzy Searching:**

```python
from SearchURL.main import SearchURL

search = SearchURL(cache=True)

data = search.searchUrlFuzz(
    url="https://en.wikipedia.org/wiki/Web_scraping",
    keywords=['legal'])


print(data)
```
**output:** {'success': True, 'data': 'Legal issues [ edit ] | In the United States, website owners can use three major  legal claims  to prevent undesired web scraping: (1) copyright ...'}


**4. Semantic Search:**
Yes, this package supports Semantic Search! 

```python
from SearchURL.main import SearchURL

search = SearchURL(createVector=True) # creates a in-memory vector database using chromadb

data = search.createEmbededData("https://en.wikipedia.org/wiki/Web_scraping") # loads and embeds all the data from the webpage.

if data.get('success'): # data = {'success': True, 'db': db}
    db = data.get('db') 
    results = db.query(keywords=['benefits', 'what benifits can we get from web scraping'], limit=10)
    print(results)

else:
    print(data.get('detail')) # data = {'success': False, 'detail': 'ERROR'}
```

## Errors
If this package runs into some error while fetching and searching, it will return an object like this: 
{'success': False, 'detail': 'The error that occurred'}

***
***

####
The URL used in this readme is a link to an article on [wikipedia.org](https://wikipedia.org) on the topic of [Web_scraping](https://en.wikipedia.org/wiki/Web_scraping).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "SearchURL",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "python, web-scraping, search, searchurl, SearchURL, Fuzzy matching, fuzzy search, web, scraping, semantic search, nlp, natural language processing",
    "author": "Pahul Gogna",
    "author_email": "pahulgogna@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/53/f5/94f0b2e9557d3b1a14242d4fbc39a3f6155a404c02e062b92cf72dfa863a/SearchURL-1.1.4.tar.gz",
    "platform": null,
    "description": "\r\n<h2 align=\"center\">\r\n    SearchURL\r\n</h2>\r\n\r\n<h5 align=\"center\">\r\n    <br/>\r\n    <br/>\r\n    <img src=\"https://img.shields.io/pepy/dt/SearchURL\"/>\r\n    <img src=\"https://img.shields.io/pypi/v/SearchURL\"/>\r\n    <img src=\"https://img.shields.io/pypi/status/SearchURL\"/>\r\n    <img src=\"https://img.shields.io/pypi/l/SearchURL\"/>\r\n    <a href=\"https://github.com/pahulgogna/SearchURL\">github-searchURL</a>\r\n</h5>\r\n\r\n## Installation\r\nInstall SearchURL with pip\r\n\r\n```bash\r\n  pip install SearchURL\r\n```\r\n    \r\n## Documentation\r\n\r\n**1. Getting all the text from a webpage by not passing in keywords:**\r\n```python\r\nfrom SearchURL.main import SearchURL\r\n\r\nsearch = SearchURL(cache=True)\r\n\r\ndata = search.searchUrl(\r\n    url=\"https://en.wikipedia.org/wiki/Web_scraping\"\r\n    )\r\n\r\nprint(data)\r\n```\r\n**output:** {'success': True, 'data': 'Web scraping - Wikipedia ...'}\r\n\r\n**2. Searching with keywords:**\r\n\r\n```python\r\nfrom SearchURL.main import SearchURL\r\n\r\nsearch = SearchURL(cache=True)\r\n\r\ndata = search.searchUrl(\r\n    url=\"https://en.wikipedia.org/wiki/Web_scraping\",\r\n    keywords=['legal'])\r\n\r\nprint(data)\r\n```\r\n**output:** {'success': True, 'data': 'Legal issues Toggle Legal issues subsection Legal issues [ edit ] The legality of web scraping varies across the world ...'}\r\n\r\n**3. Fuzzy Searching:**\r\n\r\n```python\r\nfrom SearchURL.main import SearchURL\r\n\r\nsearch = SearchURL(cache=True)\r\n\r\ndata = search.searchUrlFuzz(\r\n    url=\"https://en.wikipedia.org/wiki/Web_scraping\",\r\n    keywords=['legal'])\r\n\r\n\r\nprint(data)\r\n```\r\n**output:** {'success': True, 'data': 'Legal issues [ edit ] | In the United States, website owners can use three major  legal claims  to prevent undesired web scraping: (1) copyright ...'}\r\n\r\n\r\n**4. Semantic Search:**\r\nYes, this package supports Semantic Search! \r\n\r\n```python\r\nfrom SearchURL.main import SearchURL\r\n\r\nsearch = SearchURL(createVector=True) # creates a in-memory vector database using chromadb\r\n\r\ndata = search.createEmbededData(\"https://en.wikipedia.org/wiki/Web_scraping\") # loads and embeds all the data from the webpage.\r\n\r\nif data.get('success'): # data = {'success': True, 'db': db}\r\n    db = data.get('db') \r\n    results = db.query(keywords=['benefits', 'what benifits can we get from web scraping'], limit=10)\r\n    print(results)\r\n\r\nelse:\r\n    print(data.get('detail')) # data = {'success': False, 'detail': 'ERROR'}\r\n```\r\n\r\n## Errors\r\nIf this package runs into some error while fetching and searching, it will return an object like this: \r\n{'success': False, 'detail': 'The error that occurred'}\r\n\r\n***\r\n***\r\n\r\n####\r\nThe URL used in this readme is a link to an article on [wikipedia.org](https://wikipedia.org) on the topic of [Web_scraping](https://en.wikipedia.org/wiki/Web_scraping).\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "SearchURL lets perform Keyword, Fuzzy and Semantic Search through the text on websites using thier URLs.",
    "version": "1.1.4",
    "project_urls": null,
    "split_keywords": [
        "python",
        " web-scraping",
        " search",
        " searchurl",
        " searchurl",
        " fuzzy matching",
        " fuzzy search",
        " web",
        " scraping",
        " semantic search",
        " nlp",
        " natural language processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "38185f5e4c07407a241fc6fcef37ca0b6f8c35949055ed094077f75c40b61580",
                "md5": "7b4f3843cc5f92bfd717d5eba4cef250",
                "sha256": "c6b8d848459dac364978a40d1af6a8476e6643a049bb9818e79a24e59ffa3f4a"
            },
            "downloads": -1,
            "filename": "SearchURL-1.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7b4f3843cc5f92bfd717d5eba4cef250",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 5848,
            "upload_time": "2024-10-09T08:38:47",
            "upload_time_iso_8601": "2024-10-09T08:38:47.637679Z",
            "url": "https://files.pythonhosted.org/packages/38/18/5f5e4c07407a241fc6fcef37ca0b6f8c35949055ed094077f75c40b61580/SearchURL-1.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "53f594f0b2e9557d3b1a14242d4fbc39a3f6155a404c02e062b92cf72dfa863a",
                "md5": "af416fb8d90552fa7f6546b49d128971",
                "sha256": "3e909e16dd75a734e00b6e358a2a663608c24a3ae288e289c55d18a065e48b4f"
            },
            "downloads": -1,
            "filename": "SearchURL-1.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "af416fb8d90552fa7f6546b49d128971",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 5411,
            "upload_time": "2024-10-09T08:38:49",
            "upload_time_iso_8601": "2024-10-09T08:38:49.845112Z",
            "url": "https://files.pythonhosted.org/packages/53/f5/94f0b2e9557d3b1a14242d4fbc39a3f6155a404c02e062b92cf72dfa863a/SearchURL-1.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-09 08:38:49",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "searchurl"
}
        
Elapsed time: 0.47608s