easyscrapper


Nameeasyscrapper JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://github.com/krishnatadi/easyscrapper
SummaryEasyScrapper is a simple and effective Python package for web scraping. It allows you to fetch and extract data from any website without the hassle of complex parsing logic. This package is designed for developers who need quick and reliable web scraping solutions.
upload_time2024-11-01 05:54:52
maintainerNone
docs_urlNone
authorKrishna Tadi
requires_python>=3.6
licenseMIT
keywords web scraping data extraction html parser scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EasyScrapper

**EasyScrapper** is a simple and effective Python package for web scraping. It allows you to fetch and extract data from any website without the hassle of complex parsing logic. This package is designed for developers who need quick and reliable web scraping solutions.

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [# WebScraper Class Overview](WebScraper-Class-Overview)
- [Usage](#usage)
  - [Fetching Raw Content]
  - [Providing Parsed Data]
  - [Extracting URLs]
  - [Saving Data to a Text File]
- [Use Cases](#Use-Cases)

## Features

- **Fetch Raw Content**: Get the full HTML content from any webpage.
- **Parse Content**: Extract headings and paragraphs as a single string without formatting.
- **User-Friendly**: EasyScrapper is designed to be intuitive and straightforward, making it perfect for both beginners and experienced developers.
- **Flexible**: With functionalities to extract raw content, parsed data, and URLs, it caters to various scraping needs.
- **Data Preservation**: Save scraped data to text files for future analysis or reporting.


## Installation

To install the EasyScrapper package, you can use `pip`. Run the following command:

```bash
pip install easyscrapper
```


## WebScraper Class Overview

| Function              | Description                                                                 | Example                                                       |
|-----------------------|-----------------------------------------------------------------------------|---------------------------------------------------------------|
| `__init__(url, user_agent=None)` | Initializes the `WebScraper` instance with a URL and optional user agent. | `scraper = WebScraper("https://example.com")`              |
| `fetch_content()`     | Fetches the HTML content from the specified URL.                          | `scraper.fetch_content()`                                    |
| `get_raw_content()`   | Returns the entire scraped content without parsing.                       | `raw_content = scraper.get_raw_content()`                    |
| `parse_content()`     | Parses the HTML content and returns extracted headings and paragraphs.     | `parsed_data = scraper.parse_content()`                      |
| `extract_all_data(soup)` | Extracts all text content (headings and paragraphs) from the parsed HTML. | `content = scraper.extract_all_data(soup)`                  |
| `extract_links(soup)` | Extracts all links (URLs) from the HTML content.                         | `links = scraper.extract_links(soup)`                        |
| `save_to_file(data, filename='scraped_data.txt')` | Saves the provided data to a text file.                               | `scraper.save_to_file(parsed_data, 'output.txt')`          |



## Usage
### 1. Fetching Raw Content
This functionality retrieves the complete HTML content from a specified URL. You can also limit the output to the first 500 characters for a quick preview.
```python
from easyscrapper.scraper import WebScraper

# Specify the URL to scrape
url = 'https://example.com'  # Replace with your target URL

# Initialize the WebScraper
scraper = WebScraper(url)

# Fetch content from the URL
scraper.fetch_content()

# Get and print the raw content
raw_content = scraper.get_raw_content()
print("Raw Content:")
print(raw_content)  # Print the entire raw content
# Output: Entire HTML content of the webpage

# Get and print the first 500 characters of the raw content
print("\nFirst 500 Characters of Raw Content:")
print(raw_content[:500])  # Print the first 500 characters
# Output: (First 500 characters of the HTML content)
```


### 2. Providing Parsed Data
This functionality extracts headings and paragraphs from the fetched content, returning them as a single string without any HTML tags.
The `parse_content()` method processes the raw HTML to extract meaningful text, ignoring tags and formatting. This makes it easy for users to get a clean summary of the content.
```python
# Get parsed data from the fetched content
parsed_data = scraper.parse_content()
print("Parsed Data (Headings and Paragraphs):")
print(parsed_data)  # Output: Combined text of headings and paragraphs without formatting
```


### 3. Extracting URLs
This functionality retrieves all hyperlinks found on the webpage.
The `extract_links()` method identifies and returns all anchor tags (<a>) present in the HTML, providing users with a list of links for further exploration.
```python
# Extract URLs from the fetched content
urls = scraper.extract_links()
print("Extracted URLs:")
print(urls)  # Output: List of URLs found on the webpage
```


### 4. Saving Data to a Text File
This functionality allows users to save either raw or parsed data into text files for later use.
The `save_to_file()` method allows users to specify the content they wish to save and the desired filename. This feature is useful for archiving or further processing of scraped data.
```python
# Save raw content to a text file
scraper.save_to_file(scraper.get_raw_content(), 'easyscrapper_data.txt')
print("Raw content saved successfully to 'easyscrapper_data.txt'.")

# Save parsed data to a text file
parsed_data = scraper.parse_content()
scraper.save_to_file(parsed_data, 'parsed_data.txt')
print("Parsed data saved successfully to 'parsed_data.txt'.")
# Output: Confirmation messages indicating successful saving of files
```


## Use Cases

EasyScrapper is versatile and serves a wide range of users, including researchers, analysts, developers, and marketers. Here are some common use cases:

- **Research**: Easily gather information from multiple web pages for academic or market research. Many students and professionals rely on it to compile data for reports and studies.

- **Data Analysis**: Collect data points from various websites to analyze trends and patterns. Analysts utilize EasyScrapper to extract quantitative and qualitative data for deeper insights.

- **Content Aggregation**: Scrape content from different sources and compile it into a single location for easy access. Content creators and curators use it to gather relevant articles, blogs, and news for their audiences.

- **SEO Monitoring**: Extract meta tags and other SEO-related information to monitor website performance. Marketers leverage this functionality to assess competitors and optimize their own content.

- **E-commerce Price Tracking**: Monitor product prices across multiple e-commerce platforms to find the best deals. Shoppers and businesses use this feature for competitive pricing analysis.

- **Job Listings Aggregation**: Collect job postings from various job boards to create a centralized platform for job seekers. Recruiters and job seekers benefit from the aggregated data for better visibility.

With its user-friendly design and robust capabilities, EasyScrapper is trusted by a growing community of users who are enhancing their web scraping efforts across various fields.


## Join the Community

If you find EasyScrapper useful, consider giving it a star on GitHub and sharing it with others. Your support helps us improve and expand our tool!


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/krishnatadi/easyscrapper",
    "name": "easyscrapper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "web scraping, data extraction, html parser, scraping",
    "author": "Krishna Tadi",
    "author_email": "er.krishnatadi@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/10/56/4208ce0de9477ef8f8284e4842dc6d2bfb1b93339f43b6604a85dab03f52/easyscrapper-1.0.0.tar.gz",
    "platform": null,
    "description": "# EasyScrapper\r\n\r\n**EasyScrapper** is a simple and effective Python package for web scraping. It allows you to fetch and extract data from any website without the hassle of complex parsing logic. This package is designed for developers who need quick and reliable web scraping solutions.\r\n\r\n## Table of Contents\r\n\r\n- [Features](#features)\r\n- [Installation](#installation)\r\n- [# WebScraper Class Overview](WebScraper-Class-Overview)\r\n- [Usage](#usage)\r\n  - [Fetching Raw Content]\r\n  - [Providing Parsed Data]\r\n  - [Extracting URLs]\r\n  - [Saving Data to a Text File]\r\n- [Use Cases](#Use-Cases)\r\n\r\n## Features\r\n\r\n- **Fetch Raw Content**: Get the full HTML content from any webpage.\r\n- **Parse Content**: Extract headings and paragraphs as a single string without formatting.\r\n- **User-Friendly**: EasyScrapper is designed to be intuitive and straightforward, making it perfect for both beginners and experienced developers.\r\n- **Flexible**: With functionalities to extract raw content, parsed data, and URLs, it caters to various scraping needs.\r\n- **Data Preservation**: Save scraped data to text files for future analysis or reporting.\r\n\r\n\r\n## Installation\r\n\r\nTo install the EasyScrapper package, you can use `pip`. Run the following command:\r\n\r\n```bash\r\npip install easyscrapper\r\n```\r\n\r\n\r\n## WebScraper Class Overview\r\n\r\n| Function              | Description                                                                 | Example                                                       |\r\n|-----------------------|-----------------------------------------------------------------------------|---------------------------------------------------------------|\r\n| `__init__(url, user_agent=None)` | Initializes the `WebScraper` instance with a URL and optional user agent. | `scraper = WebScraper(\"https://example.com\")`              |\r\n| `fetch_content()`     | Fetches the HTML content from the specified URL.                          | `scraper.fetch_content()`                                    |\r\n| `get_raw_content()`   | Returns the entire scraped content without parsing.                       | `raw_content = scraper.get_raw_content()`                    |\r\n| `parse_content()`     | Parses the HTML content and returns extracted headings and paragraphs.     | `parsed_data = scraper.parse_content()`                      |\r\n| `extract_all_data(soup)` | Extracts all text content (headings and paragraphs) from the parsed HTML. | `content = scraper.extract_all_data(soup)`                  |\r\n| `extract_links(soup)` | Extracts all links (URLs) from the HTML content.                         | `links = scraper.extract_links(soup)`                        |\r\n| `save_to_file(data, filename='scraped_data.txt')` | Saves the provided data to a text file.                               | `scraper.save_to_file(parsed_data, 'output.txt')`          |\r\n\r\n\r\n\r\n## Usage\r\n### 1. Fetching Raw Content\r\nThis functionality retrieves the complete HTML content from a specified URL. You can also limit the output to the first 500 characters for a quick preview.\r\n```python\r\nfrom easyscrapper.scraper import WebScraper\r\n\r\n# Specify the URL to scrape\r\nurl = 'https://example.com'  # Replace with your target URL\r\n\r\n# Initialize the WebScraper\r\nscraper = WebScraper(url)\r\n\r\n# Fetch content from the URL\r\nscraper.fetch_content()\r\n\r\n# Get and print the raw content\r\nraw_content = scraper.get_raw_content()\r\nprint(\"Raw Content:\")\r\nprint(raw_content)  # Print the entire raw content\r\n# Output: Entire HTML content of the webpage\r\n\r\n# Get and print the first 500 characters of the raw content\r\nprint(\"\\nFirst 500 Characters of Raw Content:\")\r\nprint(raw_content[:500])  # Print the first 500 characters\r\n# Output: (First 500 characters of the HTML content)\r\n```\r\n\r\n\r\n### 2. Providing Parsed Data\r\nThis functionality extracts headings and paragraphs from the fetched content, returning them as a single string without any HTML tags.\r\nThe `parse_content()` method processes the raw HTML to extract meaningful text, ignoring tags and formatting. This makes it easy for users to get a clean summary of the content.\r\n```python\r\n# Get parsed data from the fetched content\r\nparsed_data = scraper.parse_content()\r\nprint(\"Parsed Data (Headings and Paragraphs):\")\r\nprint(parsed_data)  # Output: Combined text of headings and paragraphs without formatting\r\n```\r\n\r\n\r\n### 3. Extracting URLs\r\nThis functionality retrieves all hyperlinks found on the webpage.\r\nThe `extract_links()` method identifies and returns all anchor tags (<a>) present in the HTML, providing users with a list of links for further exploration.\r\n```python\r\n# Extract URLs from the fetched content\r\nurls = scraper.extract_links()\r\nprint(\"Extracted URLs:\")\r\nprint(urls)  # Output: List of URLs found on the webpage\r\n```\r\n\r\n\r\n### 4. Saving Data to a Text File\r\nThis functionality allows users to save either raw or parsed data into text files for later use.\r\nThe `save_to_file()` method allows users to specify the content they wish to save and the desired filename. This feature is useful for archiving or further processing of scraped data.\r\n```python\r\n# Save raw content to a text file\r\nscraper.save_to_file(scraper.get_raw_content(), 'easyscrapper_data.txt')\r\nprint(\"Raw content saved successfully to 'easyscrapper_data.txt'.\")\r\n\r\n# Save parsed data to a text file\r\nparsed_data = scraper.parse_content()\r\nscraper.save_to_file(parsed_data, 'parsed_data.txt')\r\nprint(\"Parsed data saved successfully to 'parsed_data.txt'.\")\r\n# Output: Confirmation messages indicating successful saving of files\r\n```\r\n\r\n\r\n## Use Cases\r\n\r\nEasyScrapper is versatile and serves a wide range of users, including researchers, analysts, developers, and marketers. Here are some common use cases:\r\n\r\n- **Research**: Easily gather information from multiple web pages for academic or market research. Many students and professionals rely on it to compile data for reports and studies.\r\n\r\n- **Data Analysis**: Collect data points from various websites to analyze trends and patterns. Analysts utilize EasyScrapper to extract quantitative and qualitative data for deeper insights.\r\n\r\n- **Content Aggregation**: Scrape content from different sources and compile it into a single location for easy access. Content creators and curators use it to gather relevant articles, blogs, and news for their audiences.\r\n\r\n- **SEO Monitoring**: Extract meta tags and other SEO-related information to monitor website performance. Marketers leverage this functionality to assess competitors and optimize their own content.\r\n\r\n- **E-commerce Price Tracking**: Monitor product prices across multiple e-commerce platforms to find the best deals. Shoppers and businesses use this feature for competitive pricing analysis.\r\n\r\n- **Job Listings Aggregation**: Collect job postings from various job boards to create a centralized platform for job seekers. Recruiters and job seekers benefit from the aggregated data for better visibility.\r\n\r\nWith its user-friendly design and robust capabilities, EasyScrapper is trusted by a growing community of users who are enhancing their web scraping efforts across various fields.\r\n\r\n\r\n## Join the Community\r\n\r\nIf you find EasyScrapper useful, consider giving it a star on GitHub and sharing it with others. Your support helps us improve and expand our tool!\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "EasyScrapper is a simple and effective Python package for web scraping. It allows you to fetch and extract data from any website without the hassle of complex parsing logic. This package is designed for developers who need quick and reliable web scraping solutions.",
    "version": "1.0.0",
    "project_urls": {
        "Documentation": "https://github.com/krishnatadi/easyscrapper#readme",
        "Homepage": "https://github.com/krishnatadi/easyscrapper",
        "Issue Tracker": "https://github.com/krishnatadi/easyscrapper/issues",
        "Source": "https://github.com/krishnatadi/easyscrapper"
    },
    "split_keywords": [
        "web scraping",
        " data extraction",
        " html parser",
        " scraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4bea4a0d2593f0d0ccf6f3c00eb45499d4eb64b01902eebc3abe778fa0d53fca",
                "md5": "afb96ec4db1eff9184c287d27efdff9f",
                "sha256": "bdf52e7984cb101b6a8f4f2fafb3879b8ab86099613515653648311114cf8216"
            },
            "downloads": -1,
            "filename": "easyscrapper-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "afb96ec4db1eff9184c287d27efdff9f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5628,
            "upload_time": "2024-11-01T05:54:50",
            "upload_time_iso_8601": "2024-11-01T05:54:50.821904Z",
            "url": "https://files.pythonhosted.org/packages/4b/ea/4a0d2593f0d0ccf6f3c00eb45499d4eb64b01902eebc3abe778fa0d53fca/easyscrapper-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "10564208ce0de9477ef8f8284e4842dc6d2bfb1b93339f43b6604a85dab03f52",
                "md5": "ec9fb6d8ddee46f9bf7be3474ffb37f6",
                "sha256": "1b6149527ab68a98bd12b7bdbede47c45bbc32f87cfe542ee0b6e530d987b25d"
            },
            "downloads": -1,
            "filename": "easyscrapper-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ec9fb6d8ddee46f9bf7be3474ffb37f6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 5228,
            "upload_time": "2024-11-01T05:54:52",
            "upload_time_iso_8601": "2024-11-01T05:54:52.882470Z",
            "url": "https://files.pythonhosted.org/packages/10/56/4208ce0de9477ef8f8284e4842dc6d2bfb1b93339f43b6604a85dab03f52/easyscrapper-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-01 05:54:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "krishnatadi",
    "github_project": "easyscrapper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "easyscrapper"
}
        
Elapsed time: 0.67521s