ghananews-scraper


Nameghananews-scraper JSON
Version 1.0.27 PyPI version JSON
download
home_pagehttps://github.com/donwany/ghananews-scraper
SummaryA python package to scrape data from Ghana News Portals
upload_time2023-11-17 01:49:37
maintainer
docs_urlNone
authorTheophilus Siameh
requires_python>=3.7
licenseMIT License
keywords scraper data ghananews ghanaweb joynews myjoyonline news yen mynewsgh threenews web scraper ghana scraper
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![Python 3.7, 3.8, 3.9](https://img.shields.io/badge/Python-3.7%2C%203.8%2C%203.9-3776ab.svg?maxAge=2592000)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)

### GhanaNews Scraper
  A simple unofficial python package to scrape data from 
  [GhanaWeb](https://www.ghanaweb.com),
  [MyJoyOnline](https://www.myjoyonline.com),
  [DailyGraphic](https://www.graphic.com.gh),
  [CitiBusinessNews](https://citibusinessnews.com),
  [YenGH](https://www.yen.com.gh),
  [3News](https://www.3news.com),
  [MyNewsGh](https://www.mynewsgh.com),
  [PulseGh](https://www.pulse.com.gh)
  Affiliated to [Bank of Ghana Fx Rates](https://pypi.org/project/bank-of-ghana-fx-rates/) and 
  [GhanaShops-Scraper](https://pypi.org/project/ghanashops-scraper/)

### NOTE: `This library may keep changing due to changes to the respective websites.`

### How to install
```shell
pip install ghananews-scraper
```

### Example Google Colab Notebook
   Click Here: [![Google Colab Notebook](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jZo8TUictFbZBCglXWxZPtbVDYtp_eUn?usp=sharing)

![#f03c15](https://placehold.co/15x15/f03c15/f03c15.png) `Warning: DO NOT RUN GHANAWEB CODE IN ONLINE Google Colabs)`

https://colab.research.google.com/drive/1zZUIyp9zBhwL5CqHS3Ggf5vJCr_yTYw0?usp=sharing
### Some GhanaWeb Urls:
```markdown
urls = [
    "https://www.ghanaweb.com/GhanaHomePage/regional/"	
    "https://www.ghanaweb.com/GhanaHomePage/editorial/"
    "https://www.ghanaweb.com/GhanaHomePage/health/"
    "https://www.ghanaweb.com/GhanaHomePage/diaspora/"
    "https://www.ghanaweb.com/GhanaHomePage/tabloid/"
    "https://www.ghanaweb.com/GhanaHomePage/africa/"
    "https://www.ghanaweb.com/GhanaHomePage/religion/"
    "https://www.ghanaweb.com/GhanaHomePage/NewsArchive/"
    "https://www.ghanaweb.com/GhanaHomePage/business/"
    "https://www.ghanaweb.com/GhanaHomePage/SportsArchive/"
    "https://www.ghanaweb.com/GhanaHomePage/entertainment/"
    "https://www.ghanaweb.com/GhanaHomePage/africa/"
    "https://www.ghanaweb.com/GhanaHomePage/television/"
]
```
### Outputs
  - All outputs will be saved in a `.csv` file. Other file formats not yet supported.

### Usage
```python
from ghanaweb.scraper import GhanaWeb

url = 'https://www.ghanaweb.com/GhanaHomePage/politics/'
# url = "https://www.ghanaweb.com/GhanaHomePage/NewsArchive/"
# url = 'https://www.ghanaweb.com/GhanaHomePage/health/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/crime/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/regional/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'

# web = GhanaWeb(url='https://www.ghanaweb.com/GhanaHomePage/politics/')
web = GhanaWeb(url=url)
# scrape data and save to `current working dir`
web.download(output_dir=None)
```
### Scrape list of articles from [GhanaWeb](https://ghanaweb.com)
```python
from ghanaweb.scraper import GhanaWeb

urls = [
        'https://www.ghanaweb.com/GhanaHomePage/politics/',
        'https://www.ghanaweb.com/GhanaHomePage/health/',
        'https://www.ghanaweb.com/GhanaHomePage/crime/',
        'https://www.ghanaweb.com/GhanaHomePage/regional/',
        'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'
    ]

for url in urls:
    print(f"Downloading: {url}")
    web = GhanaWeb(url=url)
    # download to current working directory
    # if no location is specified
    # web.download(output_dir="/Users/tsiameh/Desktop/")
    web.download(output_dir=None)
```

### Scrape data from [MyJoyOnline](https://myjoyonline.com)
  + It is recommended to use option 1. Option 2 might run into timeout issues.
  + DO NOT RUN THIS IN `Google Colab NOTEBOOK`, either in .py script, visual studio code or in terminal due to `selenium` package.
  + You may pass driver_name = `chrome` or `firefox`
```python

# Option 1.
from myjoyonline.scraper import MyJoyOnlineNews

url = 'https://myjoyonline.com/politics'
print(f"Downloading data from: {url}")
joy = MyJoyOnlineNews(url=url)
# joy = MyJoyOnlineNews(url=url, driver_name="firefox")
joy.download()


# Option 2.
from myjoyonline.scraper import MyJoyOnlineNews

urls = [
        'https://myjoyonline.com/news',
        'https://myjoyonline.com/politics'
        'https://myjoyonline.com/entertainment',
        'https://myjoyonline.com/business',
        'https://myjoyonline.com/sports',
        'https://myjoyonline.com/opinion',
        'https://myjoyonline.com/technology'
    ]

for url in urls:
    print(f"Downloading data from: {url}")
    joy = MyJoyOnlineNews(url=url)
    # download to current working directory
    # if no location is specified
    # joy.download(output_dir="/Users/tsiameh/Desktop/")
    joy.download()

```

### Scrape data from [CitiBusinessNews](https://citibusinessnews.com)
  + Here are some list of publisher names:
    + `citibusinessnews`
    + `aklama`
    + `ellen`
    + `emmanuel-oppong`
    + `nerteley`
    + `edna-agnes-boakye`
    + `nii-larte-lartey`
    + `naa-shika-caesar`
    + `ogbodu`
      * Note: using publisher names fetches more data than the url.

```python
from citionline.scraper import CitiBusinessOnline

urls = [
    "https://citibusinessnews.com/ghanabusinessnews/features/",
    "https://citibusinessnews.com/ghanabusinessnews/telecoms-technology/",
    "https://citibusinessnews.com/ghanabusinessnews/international/",
    "https://citibusinessnews.com/ghanabusinessnews/news/government/",
    "https://citibusinessnews.com/ghanabusinessnews/news/",
    "https://citibusinessnews.com/ghanabusinessnews/business/",
    "https://citibusinessnews.com/ghanabusinessnews/news/economy/",
    "https://citibusinessnews.com/ghanabusinessnews/news/general/",
    "https://citibusinessnews.com/ghanabusinessnews/news/top-stories/",
    "https://citibusinessnews.com/ghanabusinessnews/business/tourism/"
]

for url in urls:
    print(f"Downloading data from: {url}")
    citi = CitiBusinessOnline(url=url)
    citi.download()

# OR: scrape using publisher name
from citionline.authors import CitiBusiness

citi = CitiBusiness(author="citibusinessnews", limit_pages=4)
citi.download()

```

### Scrape data from [DailyGraphic](https://www.graphic.com.gh/)
```python
from graphiconline.scraper import GraphicOnline

urls = [
    "https://www.graphic.com.gh/news.html",
    "https://www.graphic.com.gh/news/politics.html",
    "https://www.graphic.com.gh/lifestyle.html",
    "https://www.graphic.com.gh/news/education.html",
    "https://www.graphic.com.gh/native-daughter.html",
    "https://www.graphic.com.gh/international.html"
]

for url in urls:
    print(f"Downloading data from: {url}")
    graphic = GraphicOnline(url=url)
    graphic.download()
```

### Scrape data from [YenGH](https://www.yen.com.gh/)
```python

# OPTION: 1

from yen.scrapy import YenNews

url = 'https://www.yen.com.gh/'
print(f"Downloading data from: {url}")
yen = YenNews(url=url)
yen.download()

# OPTION: 2

from yen.scrapy import YenNews

urls = [
    'https://www.yen.com.gh/',
    'https://yen.com.gh/politics/',
    'https://yen.com.gh/world/europe/',
    'https://yen.com.gh/education/',
    'https://yen.com.gh/ghana/',
    'https://yen.com.gh/people/',
    'https://yen.com.gh/world/asia/',
    'https://yen.com.gh/world/africa/',
    'https://yen.com.gh/entertainment/',
    'https://yen.com.gh/business-economy/money/',
    'https://yen.com.gh/business-economy/technology/'
]

for url in urls:
    print(f"Downloading data from: {url}")
    yen = YenNews(url=url)
    yen.download()
```

### Scrape data from [MyNewsGh](https://mynewsgh.com)
```python
from mynewsgh.scraper import MyNewsGh

# scrape from multiple URLs
urls = [
  "https://www.mynewsgh.com/category/politics/",
  "https://www.mynewsgh.com/category/news/",
  "https://www.mynewsgh.com/category/entertainment/",
  "https://www.mynewsgh.com/category/business/",
  "https://www.mynewsgh.com/category/lifestyle/",
  "https://www.mynewsgh.com/tag/feature/",
  "https://www.mynewsgh.com/category/world/",
  "https://www.mynewsgh.com/category/sports/"
]

for url in urls:
    print(f"Downloading data from: {url}")
    my_news = MyNewsGh(url=url, limit_pages=50)
    my_news.download()

# scrape from a single URL
from mynewsgh.scraper import MyNewsGh

url = "https://www.mynewsgh.com/category/politics/"
my_news = MyNewsGh(url=url, limit_pages=None)
my_news.download()
```
### Scrape data from [3News](https://3news.com)
```python
from threenews.scraper import ThreeNews

# DO NOT RUN ALL AUTHORS: select ONLY few
# DO NOT CHANGE THE AUTHOR NAMES
authors = [
  "laud-nartey",
  "3xtra",
  "essel-issac",
  "arabaincoom",
  "bbc",
  "betty-kankam-boadu",
  "kwameamoh",
  "fiifi_forson",
  "fdoku",
  "frankappiah",
  "godwin-asediba",
  "afua-somuah",
  "irene",
  "joyce-sesi",
  "3news_user",
  "ntollo",
  "pwaberi-denis",
  "sonia-amade",
  "effah-steven",
  "michael-tetteh"
]

for author in authors:
    print(f"Downloading data from author: {author}")
    three_news = ThreeNews(author=author, limit_pages=50)
    three_news.download()
    
# OR
from threenews.scraper import ThreeNews

three = ThreeNews(author="laud-nartey", limit_pages=None)
three.download()

```
### Scrape data from [PulseGh](https://pulse.com.gh)
  + select ONLY few urls
    * Note: these values may change

    | Category                  | Number of Pages |
    |---------------------------|-----------------|
    | News                      | 40              |
    | Entertainment             | 40              |
    | Business                  | 40              |
    | Lifestyle                 | 40              |
    | Business/Domestic         | 26              |
    | Business/International    | 40              |
    | Sports/Football           | 99              |
    | News/Politics             | 40              |
    | News/Local                | 40              |
    | News/World                | 40              |
    | News/Filla                | 38              |
    | Entertainment/Celebrities | 40              |
    | Lifestyle/Fashion         | 40              |

```python
from pulsegh.scraper import PulseGh

urls = [
  "https://www.pulse.com.gh/news",
  "https://www.pulse.com.gh/news/politics",
  "https://www.pulse.com.gh/entertainment",
  "https://www.pulse.com.gh/lifestyle",
  "https://www.pulse.com.gh/sports",
  "https://www.pulse.com.gh/sports/football",
  "https://www.pulse.com.gh/business/international",
  "https://www.pulse.com.gh/business/domestic",
  "https://www.pulse.com.gh/business",
  "https://www.pulse.com.gh/quizzes",
  "https://www.pulse.com.gh/news/filla",
  "https://www.pulse.com.gh/news/world"
]

for url in urls:
    print(f"Downloading data from: {url}")
    pulse = PulseGh(url=url, limit_pages=5)
    pulse.download()
    
# news has 40 pages
from pulsegh.scraper import PulseGh

pulse = PulseGh(url="https://www.pulse.com.gh/news", total_pages = 40, limit_pages=20)
pulse.download()

# Sports/football has 99 pages
from pulsegh.scraper import PulseGh
pulse = PulseGh(url="https://www.pulse.com.gh/sports/football", total_pages=99, limit_pages=None)
pulse.download()

```

BuyMeCoffee
-----------
[![Build](https://www.buymeacoffee.com/assets/img/custom_images/yellow_img.png)](https://www.buymeacoffee.com/theodondrew)

Credits
-------
-  `Theophilus Siameh`
<div>
    <a href="https://twitter.com/tsiameh"><img src="https://img.shields.io/twitter/follow/tsiameh?color=blue&logo=twitter&style=flat" alt="tsiameh twitter"></a>
</div>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/donwany/ghananews-scraper",
    "name": "ghananews-scraper",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "Scraper,Data,GhanaNews,GhanaWeb,JoyNews,MyJoyOnline,News,Yen,MyNewsGh,ThreeNews,Web Scraper,Ghana Scraper",
    "author": "Theophilus Siameh",
    "author_email": "theodondre@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b9/fe/44255922c62dca21c9d86d1aa6f91ae43c31b4efde7a35f0b29d3cfeaef3/ghananews-scraper-1.0.27.tar.gz",
    "platform": "any",
    "description": "![Python 3.7, 3.8, 3.9](https://img.shields.io/badge/Python-3.7%2C%203.8%2C%203.9-3776ab.svg?maxAge=2592000)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)\n\n### GhanaNews Scraper\n  A simple unofficial python package to scrape data from \n  [GhanaWeb](https://www.ghanaweb.com),\n  [MyJoyOnline](https://www.myjoyonline.com),\n  [DailyGraphic](https://www.graphic.com.gh),\n  [CitiBusinessNews](https://citibusinessnews.com),\n  [YenGH](https://www.yen.com.gh),\n  [3News](https://www.3news.com),\n  [MyNewsGh](https://www.mynewsgh.com),\n  [PulseGh](https://www.pulse.com.gh)\n  Affiliated to [Bank of Ghana Fx Rates](https://pypi.org/project/bank-of-ghana-fx-rates/) and \n  [GhanaShops-Scraper](https://pypi.org/project/ghanashops-scraper/)\n\n### NOTE: `This library may keep changing due to changes to the respective websites.`\n\n### How to install\n```shell\npip install ghananews-scraper\n```\n\n### Example Google Colab Notebook\n   Click Here: [![Google Colab Notebook](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jZo8TUictFbZBCglXWxZPtbVDYtp_eUn?usp=sharing)\n\n![#f03c15](https://placehold.co/15x15/f03c15/f03c15.png) `Warning: DO NOT RUN GHANAWEB CODE IN ONLINE Google Colabs)`\n\nhttps://colab.research.google.com/drive/1zZUIyp9zBhwL5CqHS3Ggf5vJCr_yTYw0?usp=sharing\n### Some GhanaWeb Urls:\n```markdown\nurls = [\n    \"https://www.ghanaweb.com/GhanaHomePage/regional/\"\t\n    \"https://www.ghanaweb.com/GhanaHomePage/editorial/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/health/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/diaspora/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/tabloid/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/africa/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/religion/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/NewsArchive/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/business/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/SportsArchive/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/entertainment/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/africa/\"\n    \"https://www.ghanaweb.com/GhanaHomePage/television/\"\n]\n```\n### Outputs\n  - All outputs will be saved in a `.csv` file. Other file formats not yet supported.\n\n### Usage\n```python\nfrom ghanaweb.scraper import GhanaWeb\n\nurl = 'https://www.ghanaweb.com/GhanaHomePage/politics/'\n# url = \"https://www.ghanaweb.com/GhanaHomePage/NewsArchive/\"\n# url = 'https://www.ghanaweb.com/GhanaHomePage/health/'\n# url = 'https://www.ghanaweb.com/GhanaHomePage/crime/'\n# url = 'https://www.ghanaweb.com/GhanaHomePage/regional/'\n# url = 'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'\n\n# web = GhanaWeb(url='https://www.ghanaweb.com/GhanaHomePage/politics/')\nweb = GhanaWeb(url=url)\n# scrape data and save to `current working dir`\nweb.download(output_dir=None)\n```\n### Scrape list of articles from [GhanaWeb](https://ghanaweb.com)\n```python\nfrom ghanaweb.scraper import GhanaWeb\n\nurls = [\n        'https://www.ghanaweb.com/GhanaHomePage/politics/',\n        'https://www.ghanaweb.com/GhanaHomePage/health/',\n        'https://www.ghanaweb.com/GhanaHomePage/crime/',\n        'https://www.ghanaweb.com/GhanaHomePage/regional/',\n        'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'\n    ]\n\nfor url in urls:\n    print(f\"Downloading: {url}\")\n    web = GhanaWeb(url=url)\n    # download to current working directory\n    # if no location is specified\n    # web.download(output_dir=\"/Users/tsiameh/Desktop/\")\n    web.download(output_dir=None)\n```\n\n### Scrape data from [MyJoyOnline](https://myjoyonline.com)\n  + It is recommended to use option 1. Option 2 might run into timeout issues.\n  + DO NOT RUN THIS IN `Google Colab NOTEBOOK`, either in .py script, visual studio code or in terminal due to `selenium` package.\n  + You may pass driver_name = `chrome` or `firefox`\n```python\n\n# Option 1.\nfrom myjoyonline.scraper import MyJoyOnlineNews\n\nurl = 'https://myjoyonline.com/politics'\nprint(f\"Downloading data from: {url}\")\njoy = MyJoyOnlineNews(url=url)\n# joy = MyJoyOnlineNews(url=url, driver_name=\"firefox\")\njoy.download()\n\n\n# Option 2.\nfrom myjoyonline.scraper import MyJoyOnlineNews\n\nurls = [\n        'https://myjoyonline.com/news',\n        'https://myjoyonline.com/politics'\n        'https://myjoyonline.com/entertainment',\n        'https://myjoyonline.com/business',\n        'https://myjoyonline.com/sports',\n        'https://myjoyonline.com/opinion',\n        'https://myjoyonline.com/technology'\n    ]\n\nfor url in urls:\n    print(f\"Downloading data from: {url}\")\n    joy = MyJoyOnlineNews(url=url)\n    # download to current working directory\n    # if no location is specified\n    # joy.download(output_dir=\"/Users/tsiameh/Desktop/\")\n    joy.download()\n\n```\n\n### Scrape data from [CitiBusinessNews](https://citibusinessnews.com)\n  + Here are some list of publisher names:\n    + `citibusinessnews`\n    + `aklama`\n    + `ellen`\n    + `emmanuel-oppong`\n    + `nerteley`\n    + `edna-agnes-boakye`\n    + `nii-larte-lartey`\n    + `naa-shika-caesar`\n    + `ogbodu`\n      * Note: using publisher names fetches more data than the url.\n\n```python\nfrom citionline.scraper import CitiBusinessOnline\n\nurls = [\n    \"https://citibusinessnews.com/ghanabusinessnews/features/\",\n    \"https://citibusinessnews.com/ghanabusinessnews/telecoms-technology/\",\n    \"https://citibusinessnews.com/ghanabusinessnews/international/\",\n    \"https://citibusinessnews.com/ghanabusinessnews/news/government/\",\n    \"https://citibusinessnews.com/ghanabusinessnews/news/\",\n    \"https://citibusinessnews.com/ghanabusinessnews/business/\",\n    \"https://citibusinessnews.com/ghanabusinessnews/news/economy/\",\n    \"https://citibusinessnews.com/ghanabusinessnews/news/general/\",\n    \"https://citibusinessnews.com/ghanabusinessnews/news/top-stories/\",\n    \"https://citibusinessnews.com/ghanabusinessnews/business/tourism/\"\n]\n\nfor url in urls:\n    print(f\"Downloading data from: {url}\")\n    citi = CitiBusinessOnline(url=url)\n    citi.download()\n\n# OR: scrape using publisher name\nfrom citionline.authors import CitiBusiness\n\nciti = CitiBusiness(author=\"citibusinessnews\", limit_pages=4)\nciti.download()\n\n```\n\n### Scrape data from [DailyGraphic](https://www.graphic.com.gh/)\n```python\nfrom graphiconline.scraper import GraphicOnline\n\nurls = [\n    \"https://www.graphic.com.gh/news.html\",\n    \"https://www.graphic.com.gh/news/politics.html\",\n    \"https://www.graphic.com.gh/lifestyle.html\",\n    \"https://www.graphic.com.gh/news/education.html\",\n    \"https://www.graphic.com.gh/native-daughter.html\",\n    \"https://www.graphic.com.gh/international.html\"\n]\n\nfor url in urls:\n    print(f\"Downloading data from: {url}\")\n    graphic = GraphicOnline(url=url)\n    graphic.download()\n```\n\n### Scrape data from [YenGH](https://www.yen.com.gh/)\n```python\n\n# OPTION: 1\n\nfrom yen.scrapy import YenNews\n\nurl = 'https://www.yen.com.gh/'\nprint(f\"Downloading data from: {url}\")\nyen = YenNews(url=url)\nyen.download()\n\n# OPTION: 2\n\nfrom yen.scrapy import YenNews\n\nurls = [\n    'https://www.yen.com.gh/',\n    'https://yen.com.gh/politics/',\n    'https://yen.com.gh/world/europe/',\n    'https://yen.com.gh/education/',\n    'https://yen.com.gh/ghana/',\n    'https://yen.com.gh/people/',\n    'https://yen.com.gh/world/asia/',\n    'https://yen.com.gh/world/africa/',\n    'https://yen.com.gh/entertainment/',\n    'https://yen.com.gh/business-economy/money/',\n    'https://yen.com.gh/business-economy/technology/'\n]\n\nfor url in urls:\n    print(f\"Downloading data from: {url}\")\n    yen = YenNews(url=url)\n    yen.download()\n```\n\n### Scrape data from [MyNewsGh](https://mynewsgh.com)\n```python\nfrom mynewsgh.scraper import MyNewsGh\n\n# scrape from multiple URLs\nurls = [\n  \"https://www.mynewsgh.com/category/politics/\",\n  \"https://www.mynewsgh.com/category/news/\",\n  \"https://www.mynewsgh.com/category/entertainment/\",\n  \"https://www.mynewsgh.com/category/business/\",\n  \"https://www.mynewsgh.com/category/lifestyle/\",\n  \"https://www.mynewsgh.com/tag/feature/\",\n  \"https://www.mynewsgh.com/category/world/\",\n  \"https://www.mynewsgh.com/category/sports/\"\n]\n\nfor url in urls:\n    print(f\"Downloading data from: {url}\")\n    my_news = MyNewsGh(url=url, limit_pages=50)\n    my_news.download()\n\n# scrape from a single URL\nfrom mynewsgh.scraper import MyNewsGh\n\nurl = \"https://www.mynewsgh.com/category/politics/\"\nmy_news = MyNewsGh(url=url, limit_pages=None)\nmy_news.download()\n```\n### Scrape data from [3News](https://3news.com)\n```python\nfrom threenews.scraper import ThreeNews\n\n# DO NOT RUN ALL AUTHORS: select ONLY few\n# DO NOT CHANGE THE AUTHOR NAMES\nauthors = [\n  \"laud-nartey\",\n  \"3xtra\",\n  \"essel-issac\",\n  \"arabaincoom\",\n  \"bbc\",\n  \"betty-kankam-boadu\",\n  \"kwameamoh\",\n  \"fiifi_forson\",\n  \"fdoku\",\n  \"frankappiah\",\n  \"godwin-asediba\",\n  \"afua-somuah\",\n  \"irene\",\n  \"joyce-sesi\",\n  \"3news_user\",\n  \"ntollo\",\n  \"pwaberi-denis\",\n  \"sonia-amade\",\n  \"effah-steven\",\n  \"michael-tetteh\"\n]\n\nfor author in authors:\n    print(f\"Downloading data from author: {author}\")\n    three_news = ThreeNews(author=author, limit_pages=50)\n    three_news.download()\n    \n# OR\nfrom threenews.scraper import ThreeNews\n\nthree = ThreeNews(author=\"laud-nartey\", limit_pages=None)\nthree.download()\n\n```\n### Scrape data from [PulseGh](https://pulse.com.gh)\n  + select ONLY few urls\n    * Note: these values may change\n\n    | Category                  | Number of Pages |\n    |---------------------------|-----------------|\n    | News                      | 40              |\n    | Entertainment             | 40              |\n    | Business                  | 40              |\n    | Lifestyle                 | 40              |\n    | Business/Domestic         | 26              |\n    | Business/International    | 40              |\n    | Sports/Football           | 99              |\n    | News/Politics             | 40              |\n    | News/Local                | 40              |\n    | News/World                | 40              |\n    | News/Filla                | 38              |\n    | Entertainment/Celebrities | 40              |\n    | Lifestyle/Fashion         | 40              |\n\n```python\nfrom pulsegh.scraper import PulseGh\n\nurls = [\n  \"https://www.pulse.com.gh/news\",\n  \"https://www.pulse.com.gh/news/politics\",\n  \"https://www.pulse.com.gh/entertainment\",\n  \"https://www.pulse.com.gh/lifestyle\",\n  \"https://www.pulse.com.gh/sports\",\n  \"https://www.pulse.com.gh/sports/football\",\n  \"https://www.pulse.com.gh/business/international\",\n  \"https://www.pulse.com.gh/business/domestic\",\n  \"https://www.pulse.com.gh/business\",\n  \"https://www.pulse.com.gh/quizzes\",\n  \"https://www.pulse.com.gh/news/filla\",\n  \"https://www.pulse.com.gh/news/world\"\n]\n\nfor url in urls:\n    print(f\"Downloading data from: {url}\")\n    pulse = PulseGh(url=url, limit_pages=5)\n    pulse.download()\n    \n# news has 40 pages\nfrom pulsegh.scraper import PulseGh\n\npulse = PulseGh(url=\"https://www.pulse.com.gh/news\", total_pages = 40, limit_pages=20)\npulse.download()\n\n# Sports/football has 99 pages\nfrom pulsegh.scraper import PulseGh\npulse = PulseGh(url=\"https://www.pulse.com.gh/sports/football\", total_pages=99, limit_pages=None)\npulse.download()\n\n```\n\nBuyMeCoffee\n-----------\n[![Build](https://www.buymeacoffee.com/assets/img/custom_images/yellow_img.png)](https://www.buymeacoffee.com/theodondrew)\n\nCredits\n-------\n-  `Theophilus Siameh`\n<div>\n    <a href=\"https://twitter.com/tsiameh\"><img src=\"https://img.shields.io/twitter/follow/tsiameh?color=blue&logo=twitter&style=flat\" alt=\"tsiameh twitter\"></a>\n</div>\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "A python package to scrape data from Ghana News Portals",
    "version": "1.0.27",
    "project_urls": {
        "Homepage": "https://github.com/donwany/ghananews-scraper"
    },
    "split_keywords": [
        "scraper",
        "data",
        "ghananews",
        "ghanaweb",
        "joynews",
        "myjoyonline",
        "news",
        "yen",
        "mynewsgh",
        "threenews",
        "web scraper",
        "ghana scraper"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4042274a4b55f2eea444c49d7d4573fd279ef55d76fbfe75e1c785c114ff90cf",
                "md5": "ca242934ec53112c88180a6b508f34f6",
                "sha256": "026c990b197cd6a74e9d0d6025219171d2618b623fa951b862356b091c7b8b6b"
            },
            "downloads": -1,
            "filename": "ghananews_scraper-1.0.27-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ca242934ec53112c88180a6b508f34f6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 36175,
            "upload_time": "2023-11-17T01:49:35",
            "upload_time_iso_8601": "2023-11-17T01:49:35.273823Z",
            "url": "https://files.pythonhosted.org/packages/40/42/274a4b55f2eea444c49d7d4573fd279ef55d76fbfe75e1c785c114ff90cf/ghananews_scraper-1.0.27-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b9fe44255922c62dca21c9d86d1aa6f91ae43c31b4efde7a35f0b29d3cfeaef3",
                "md5": "228e3effadcd529f993f897279f716dc",
                "sha256": "0c2a7f54130a54c035baf6f8eeba540d6efc604eef1eaa4a26d7b8efeaa38024"
            },
            "downloads": -1,
            "filename": "ghananews-scraper-1.0.27.tar.gz",
            "has_sig": false,
            "md5_digest": "228e3effadcd529f993f897279f716dc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 23497,
            "upload_time": "2023-11-17T01:49:37",
            "upload_time_iso_8601": "2023-11-17T01:49:37.192481Z",
            "url": "https://files.pythonhosted.org/packages/b9/fe/44255922c62dca21c9d86d1aa6f91ae43c31b4efde7a35f0b29d3cfeaef3/ghananews-scraper-1.0.27.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-17 01:49:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "donwany",
    "github_project": "ghananews-scraper",
    "github_not_found": true,
    "lcname": "ghananews-scraper"
}
        
Elapsed time: 0.33191s