![Python 3.7, 3.8, 3.9](https://img.shields.io/badge/Python-3.7%2C%203.8%2C%203.9-3776ab.svg?maxAge=2592000)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
### GhanaNews Scraper
A simple unofficial python package to scrape data from
[GhanaWeb](https://www.ghanaweb.com),
[MyJoyOnline](https://www.myjoyonline.com),
[DailyGraphic](https://www.graphic.com.gh),
[CitiBusinessNews](https://citibusinessnews.com),
[YenGH](https://www.yen.com.gh),
[3News](https://www.3news.com),
[MyNewsGh](https://www.mynewsgh.com),
[PulseGh](https://www.pulse.com.gh)
Affiliated to [Bank of Ghana Fx Rates](https://pypi.org/project/bank-of-ghana-fx-rates/) and
[GhanaShops-Scraper](https://pypi.org/project/ghanashops-scraper/)
### NOTE: `This library may keep changing due to changes to the respective websites.`
### How to install
```shell
pip install ghananews-scraper
```
### Example Google Colab Notebook
Click Here: [![Google Colab Notebook](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jZo8TUictFbZBCglXWxZPtbVDYtp_eUn?usp=sharing)
![#f03c15](https://placehold.co/15x15/f03c15/f03c15.png) `Warning: DO NOT RUN GHANAWEB CODE IN ONLINE Google Colabs)`
https://colab.research.google.com/drive/1zZUIyp9zBhwL5CqHS3Ggf5vJCr_yTYw0?usp=sharing
### Some GhanaWeb Urls:
```markdown
urls = [
"https://www.ghanaweb.com/GhanaHomePage/regional/"
"https://www.ghanaweb.com/GhanaHomePage/editorial/"
"https://www.ghanaweb.com/GhanaHomePage/health/"
"https://www.ghanaweb.com/GhanaHomePage/diaspora/"
"https://www.ghanaweb.com/GhanaHomePage/tabloid/"
"https://www.ghanaweb.com/GhanaHomePage/africa/"
"https://www.ghanaweb.com/GhanaHomePage/religion/"
"https://www.ghanaweb.com/GhanaHomePage/NewsArchive/"
"https://www.ghanaweb.com/GhanaHomePage/business/"
"https://www.ghanaweb.com/GhanaHomePage/SportsArchive/"
"https://www.ghanaweb.com/GhanaHomePage/entertainment/"
"https://www.ghanaweb.com/GhanaHomePage/africa/"
"https://www.ghanaweb.com/GhanaHomePage/television/"
]
```
### Outputs
- All outputs will be saved in a `.csv` file. Other file formats not yet supported.
### Usage
```python
from ghanaweb.scraper import GhanaWeb
url = 'https://www.ghanaweb.com/GhanaHomePage/politics/'
# url = "https://www.ghanaweb.com/GhanaHomePage/NewsArchive/"
# url = 'https://www.ghanaweb.com/GhanaHomePage/health/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/crime/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/regional/'
# url = 'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'
# web = GhanaWeb(url='https://www.ghanaweb.com/GhanaHomePage/politics/')
web = GhanaWeb(url=url)
# scrape data and save to `current working dir`
web.download(output_dir=None)
```
### Scrape list of articles from [GhanaWeb](https://ghanaweb.com)
```python
from ghanaweb.scraper import GhanaWeb
urls = [
'https://www.ghanaweb.com/GhanaHomePage/politics/',
'https://www.ghanaweb.com/GhanaHomePage/health/',
'https://www.ghanaweb.com/GhanaHomePage/crime/',
'https://www.ghanaweb.com/GhanaHomePage/regional/',
'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'
]
for url in urls:
print(f"Downloading: {url}")
web = GhanaWeb(url=url)
# download to current working directory
# if no location is specified
# web.download(output_dir="/Users/tsiameh/Desktop/")
web.download(output_dir=None)
```
### Scrape data from [MyJoyOnline](https://myjoyonline.com)
+ It is recommended to use option 1. Option 2 might run into timeout issues.
+ DO NOT RUN THIS IN `Google Colab NOTEBOOK`, either in .py script, visual studio code or in terminal due to `selenium` package.
+ You may pass driver_name = `chrome` or `firefox`
```python
# Option 1.
from myjoyonline.scraper import MyJoyOnlineNews
url = 'https://myjoyonline.com/politics'
print(f"Downloading data from: {url}")
joy = MyJoyOnlineNews(url=url)
# joy = MyJoyOnlineNews(url=url, driver_name="firefox")
joy.download()
# Option 2.
from myjoyonline.scraper import MyJoyOnlineNews
urls = [
'https://myjoyonline.com/news',
'https://myjoyonline.com/politics'
'https://myjoyonline.com/entertainment',
'https://myjoyonline.com/business',
'https://myjoyonline.com/sports',
'https://myjoyonline.com/opinion',
'https://myjoyonline.com/technology'
]
for url in urls:
print(f"Downloading data from: {url}")
joy = MyJoyOnlineNews(url=url)
# download to current working directory
# if no location is specified
# joy.download(output_dir="/Users/tsiameh/Desktop/")
joy.download()
```
### Scrape data from [CitiBusinessNews](https://citibusinessnews.com)
+ Here are some list of publisher names:
+ `citibusinessnews`
+ `aklama`
+ `ellen`
+ `emmanuel-oppong`
+ `nerteley`
+ `edna-agnes-boakye`
+ `nii-larte-lartey`
+ `naa-shika-caesar`
+ `ogbodu`
* Note: using publisher names fetches more data than the url.
```python
from citionline.scraper import CitiBusinessOnline
urls = [
"https://citibusinessnews.com/ghanabusinessnews/features/",
"https://citibusinessnews.com/ghanabusinessnews/telecoms-technology/",
"https://citibusinessnews.com/ghanabusinessnews/international/",
"https://citibusinessnews.com/ghanabusinessnews/news/government/",
"https://citibusinessnews.com/ghanabusinessnews/news/",
"https://citibusinessnews.com/ghanabusinessnews/business/",
"https://citibusinessnews.com/ghanabusinessnews/news/economy/",
"https://citibusinessnews.com/ghanabusinessnews/news/general/",
"https://citibusinessnews.com/ghanabusinessnews/news/top-stories/",
"https://citibusinessnews.com/ghanabusinessnews/business/tourism/"
]
for url in urls:
print(f"Downloading data from: {url}")
citi = CitiBusinessOnline(url=url)
citi.download()
# OR: scrape using publisher name
from citionline.authors import CitiBusiness
citi = CitiBusiness(author="citibusinessnews", limit_pages=4)
citi.download()
```
### Scrape data from [DailyGraphic](https://www.graphic.com.gh/)
```python
from graphiconline.scraper import GraphicOnline
urls = [
"https://www.graphic.com.gh/news.html",
"https://www.graphic.com.gh/news/politics.html",
"https://www.graphic.com.gh/lifestyle.html",
"https://www.graphic.com.gh/news/education.html",
"https://www.graphic.com.gh/native-daughter.html",
"https://www.graphic.com.gh/international.html"
]
for url in urls:
print(f"Downloading data from: {url}")
graphic = GraphicOnline(url=url)
graphic.download()
```
### Scrape data from [YenGH](https://www.yen.com.gh/)
```python
# OPTION: 1
from yen.scrapy import YenNews
url = 'https://www.yen.com.gh/'
print(f"Downloading data from: {url}")
yen = YenNews(url=url)
yen.download()
# OPTION: 2
from yen.scrapy import YenNews
urls = [
'https://www.yen.com.gh/',
'https://yen.com.gh/politics/',
'https://yen.com.gh/world/europe/',
'https://yen.com.gh/education/',
'https://yen.com.gh/ghana/',
'https://yen.com.gh/people/',
'https://yen.com.gh/world/asia/',
'https://yen.com.gh/world/africa/',
'https://yen.com.gh/entertainment/',
'https://yen.com.gh/business-economy/money/',
'https://yen.com.gh/business-economy/technology/'
]
for url in urls:
print(f"Downloading data from: {url}")
yen = YenNews(url=url)
yen.download()
```
### Scrape data from [MyNewsGh](https://mynewsgh.com)
```python
from mynewsgh.scraper import MyNewsGh
# scrape from multiple URLs
urls = [
"https://www.mynewsgh.com/category/politics/",
"https://www.mynewsgh.com/category/news/",
"https://www.mynewsgh.com/category/entertainment/",
"https://www.mynewsgh.com/category/business/",
"https://www.mynewsgh.com/category/lifestyle/",
"https://www.mynewsgh.com/tag/feature/",
"https://www.mynewsgh.com/category/world/",
"https://www.mynewsgh.com/category/sports/"
]
for url in urls:
print(f"Downloading data from: {url}")
my_news = MyNewsGh(url=url, limit_pages=50)
my_news.download()
# scrape from a single URL
from mynewsgh.scraper import MyNewsGh
url = "https://www.mynewsgh.com/category/politics/"
my_news = MyNewsGh(url=url, limit_pages=None)
my_news.download()
```
### Scrape data from [3News](https://3news.com)
```python
from threenews.scraper import ThreeNews
# DO NOT RUN ALL AUTHORS: select ONLY few
# DO NOT CHANGE THE AUTHOR NAMES
authors = [
"laud-nartey",
"3xtra",
"essel-issac",
"arabaincoom",
"bbc",
"betty-kankam-boadu",
"kwameamoh",
"fiifi_forson",
"fdoku",
"frankappiah",
"godwin-asediba",
"afua-somuah",
"irene",
"joyce-sesi",
"3news_user",
"ntollo",
"pwaberi-denis",
"sonia-amade",
"effah-steven",
"michael-tetteh"
]
for author in authors:
print(f"Downloading data from author: {author}")
three_news = ThreeNews(author=author, limit_pages=50)
three_news.download()
# OR
from threenews.scraper import ThreeNews
three = ThreeNews(author="laud-nartey", limit_pages=None)
three.download()
```
### Scrape data from [PulseGh](https://pulse.com.gh)
+ select ONLY few urls
* Note: these values may change
| Category | Number of Pages |
|---------------------------|-----------------|
| News | 40 |
| Entertainment | 40 |
| Business | 40 |
| Lifestyle | 40 |
| Business/Domestic | 26 |
| Business/International | 40 |
| Sports/Football | 99 |
| News/Politics | 40 |
| News/Local | 40 |
| News/World | 40 |
| News/Filla | 38 |
| Entertainment/Celebrities | 40 |
| Lifestyle/Fashion | 40 |
```python
from pulsegh.scraper import PulseGh
urls = [
"https://www.pulse.com.gh/news",
"https://www.pulse.com.gh/news/politics",
"https://www.pulse.com.gh/entertainment",
"https://www.pulse.com.gh/lifestyle",
"https://www.pulse.com.gh/sports",
"https://www.pulse.com.gh/sports/football",
"https://www.pulse.com.gh/business/international",
"https://www.pulse.com.gh/business/domestic",
"https://www.pulse.com.gh/business",
"https://www.pulse.com.gh/quizzes",
"https://www.pulse.com.gh/news/filla",
"https://www.pulse.com.gh/news/world"
]
for url in urls:
print(f"Downloading data from: {url}")
pulse = PulseGh(url=url, limit_pages=5)
pulse.download()
# news has 40 pages
from pulsegh.scraper import PulseGh
pulse = PulseGh(url="https://www.pulse.com.gh/news", total_pages = 40, limit_pages=20)
pulse.download()
# Sports/football has 99 pages
from pulsegh.scraper import PulseGh
pulse = PulseGh(url="https://www.pulse.com.gh/sports/football", total_pages=99, limit_pages=None)
pulse.download()
```
BuyMeCoffee
-----------
[![Build](https://www.buymeacoffee.com/assets/img/custom_images/yellow_img.png)](https://www.buymeacoffee.com/theodondrew)
Credits
-------
- `Theophilus Siameh`
<div>
<a href="https://twitter.com/tsiameh"><img src="https://img.shields.io/twitter/follow/tsiameh?color=blue&logo=twitter&style=flat" alt="tsiameh twitter"></a>
</div>
Raw data
{
"_id": null,
"home_page": "https://github.com/donwany/ghananews-scraper",
"name": "ghananews-scraper",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "Scraper,Data,GhanaNews,GhanaWeb,JoyNews,MyJoyOnline,News,Yen,MyNewsGh,ThreeNews,Web Scraper,Ghana Scraper",
"author": "Theophilus Siameh",
"author_email": "theodondre@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b9/fe/44255922c62dca21c9d86d1aa6f91ae43c31b4efde7a35f0b29d3cfeaef3/ghananews-scraper-1.0.27.tar.gz",
"platform": "any",
"description": "![Python 3.7, 3.8, 3.9](https://img.shields.io/badge/Python-3.7%2C%203.8%2C%203.9-3776ab.svg?maxAge=2592000)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)\n\n### GhanaNews Scraper\n A simple unofficial python package to scrape data from \n [GhanaWeb](https://www.ghanaweb.com),\n [MyJoyOnline](https://www.myjoyonline.com),\n [DailyGraphic](https://www.graphic.com.gh),\n [CitiBusinessNews](https://citibusinessnews.com),\n [YenGH](https://www.yen.com.gh),\n [3News](https://www.3news.com),\n [MyNewsGh](https://www.mynewsgh.com),\n [PulseGh](https://www.pulse.com.gh)\n Affiliated to [Bank of Ghana Fx Rates](https://pypi.org/project/bank-of-ghana-fx-rates/) and \n [GhanaShops-Scraper](https://pypi.org/project/ghanashops-scraper/)\n\n### NOTE: `This library may keep changing due to changes to the respective websites.`\n\n### How to install\n```shell\npip install ghananews-scraper\n```\n\n### Example Google Colab Notebook\n Click Here: [![Google Colab Notebook](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jZo8TUictFbZBCglXWxZPtbVDYtp_eUn?usp=sharing)\n\n![#f03c15](https://placehold.co/15x15/f03c15/f03c15.png) `Warning: DO NOT RUN GHANAWEB CODE IN ONLINE Google Colabs)`\n\nhttps://colab.research.google.com/drive/1zZUIyp9zBhwL5CqHS3Ggf5vJCr_yTYw0?usp=sharing\n### Some GhanaWeb Urls:\n```markdown\nurls = [\n \"https://www.ghanaweb.com/GhanaHomePage/regional/\"\t\n \"https://www.ghanaweb.com/GhanaHomePage/editorial/\"\n \"https://www.ghanaweb.com/GhanaHomePage/health/\"\n \"https://www.ghanaweb.com/GhanaHomePage/diaspora/\"\n \"https://www.ghanaweb.com/GhanaHomePage/tabloid/\"\n \"https://www.ghanaweb.com/GhanaHomePage/africa/\"\n \"https://www.ghanaweb.com/GhanaHomePage/religion/\"\n \"https://www.ghanaweb.com/GhanaHomePage/NewsArchive/\"\n \"https://www.ghanaweb.com/GhanaHomePage/business/\"\n \"https://www.ghanaweb.com/GhanaHomePage/SportsArchive/\"\n \"https://www.ghanaweb.com/GhanaHomePage/entertainment/\"\n \"https://www.ghanaweb.com/GhanaHomePage/africa/\"\n \"https://www.ghanaweb.com/GhanaHomePage/television/\"\n]\n```\n### Outputs\n - All outputs will be saved in a `.csv` file. Other file formats not yet supported.\n\n### Usage\n```python\nfrom ghanaweb.scraper import GhanaWeb\n\nurl = 'https://www.ghanaweb.com/GhanaHomePage/politics/'\n# url = \"https://www.ghanaweb.com/GhanaHomePage/NewsArchive/\"\n# url = 'https://www.ghanaweb.com/GhanaHomePage/health/'\n# url = 'https://www.ghanaweb.com/GhanaHomePage/crime/'\n# url = 'https://www.ghanaweb.com/GhanaHomePage/regional/'\n# url = 'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'\n\n# web = GhanaWeb(url='https://www.ghanaweb.com/GhanaHomePage/politics/')\nweb = GhanaWeb(url=url)\n# scrape data and save to `current working dir`\nweb.download(output_dir=None)\n```\n### Scrape list of articles from [GhanaWeb](https://ghanaweb.com)\n```python\nfrom ghanaweb.scraper import GhanaWeb\n\nurls = [\n 'https://www.ghanaweb.com/GhanaHomePage/politics/',\n 'https://www.ghanaweb.com/GhanaHomePage/health/',\n 'https://www.ghanaweb.com/GhanaHomePage/crime/',\n 'https://www.ghanaweb.com/GhanaHomePage/regional/',\n 'https://www.ghanaweb.com/GhanaHomePage/year-in-review/'\n ]\n\nfor url in urls:\n print(f\"Downloading: {url}\")\n web = GhanaWeb(url=url)\n # download to current working directory\n # if no location is specified\n # web.download(output_dir=\"/Users/tsiameh/Desktop/\")\n web.download(output_dir=None)\n```\n\n### Scrape data from [MyJoyOnline](https://myjoyonline.com)\n + It is recommended to use option 1. Option 2 might run into timeout issues.\n + DO NOT RUN THIS IN `Google Colab NOTEBOOK`, either in .py script, visual studio code or in terminal due to `selenium` package.\n + You may pass driver_name = `chrome` or `firefox`\n```python\n\n# Option 1.\nfrom myjoyonline.scraper import MyJoyOnlineNews\n\nurl = 'https://myjoyonline.com/politics'\nprint(f\"Downloading data from: {url}\")\njoy = MyJoyOnlineNews(url=url)\n# joy = MyJoyOnlineNews(url=url, driver_name=\"firefox\")\njoy.download()\n\n\n# Option 2.\nfrom myjoyonline.scraper import MyJoyOnlineNews\n\nurls = [\n 'https://myjoyonline.com/news',\n 'https://myjoyonline.com/politics'\n 'https://myjoyonline.com/entertainment',\n 'https://myjoyonline.com/business',\n 'https://myjoyonline.com/sports',\n 'https://myjoyonline.com/opinion',\n 'https://myjoyonline.com/technology'\n ]\n\nfor url in urls:\n print(f\"Downloading data from: {url}\")\n joy = MyJoyOnlineNews(url=url)\n # download to current working directory\n # if no location is specified\n # joy.download(output_dir=\"/Users/tsiameh/Desktop/\")\n joy.download()\n\n```\n\n### Scrape data from [CitiBusinessNews](https://citibusinessnews.com)\n + Here are some list of publisher names:\n + `citibusinessnews`\n + `aklama`\n + `ellen`\n + `emmanuel-oppong`\n + `nerteley`\n + `edna-agnes-boakye`\n + `nii-larte-lartey`\n + `naa-shika-caesar`\n + `ogbodu`\n * Note: using publisher names fetches more data than the url.\n\n```python\nfrom citionline.scraper import CitiBusinessOnline\n\nurls = [\n \"https://citibusinessnews.com/ghanabusinessnews/features/\",\n \"https://citibusinessnews.com/ghanabusinessnews/telecoms-technology/\",\n \"https://citibusinessnews.com/ghanabusinessnews/international/\",\n \"https://citibusinessnews.com/ghanabusinessnews/news/government/\",\n \"https://citibusinessnews.com/ghanabusinessnews/news/\",\n \"https://citibusinessnews.com/ghanabusinessnews/business/\",\n \"https://citibusinessnews.com/ghanabusinessnews/news/economy/\",\n \"https://citibusinessnews.com/ghanabusinessnews/news/general/\",\n \"https://citibusinessnews.com/ghanabusinessnews/news/top-stories/\",\n \"https://citibusinessnews.com/ghanabusinessnews/business/tourism/\"\n]\n\nfor url in urls:\n print(f\"Downloading data from: {url}\")\n citi = CitiBusinessOnline(url=url)\n citi.download()\n\n# OR: scrape using publisher name\nfrom citionline.authors import CitiBusiness\n\nciti = CitiBusiness(author=\"citibusinessnews\", limit_pages=4)\nciti.download()\n\n```\n\n### Scrape data from [DailyGraphic](https://www.graphic.com.gh/)\n```python\nfrom graphiconline.scraper import GraphicOnline\n\nurls = [\n \"https://www.graphic.com.gh/news.html\",\n \"https://www.graphic.com.gh/news/politics.html\",\n \"https://www.graphic.com.gh/lifestyle.html\",\n \"https://www.graphic.com.gh/news/education.html\",\n \"https://www.graphic.com.gh/native-daughter.html\",\n \"https://www.graphic.com.gh/international.html\"\n]\n\nfor url in urls:\n print(f\"Downloading data from: {url}\")\n graphic = GraphicOnline(url=url)\n graphic.download()\n```\n\n### Scrape data from [YenGH](https://www.yen.com.gh/)\n```python\n\n# OPTION: 1\n\nfrom yen.scrapy import YenNews\n\nurl = 'https://www.yen.com.gh/'\nprint(f\"Downloading data from: {url}\")\nyen = YenNews(url=url)\nyen.download()\n\n# OPTION: 2\n\nfrom yen.scrapy import YenNews\n\nurls = [\n 'https://www.yen.com.gh/',\n 'https://yen.com.gh/politics/',\n 'https://yen.com.gh/world/europe/',\n 'https://yen.com.gh/education/',\n 'https://yen.com.gh/ghana/',\n 'https://yen.com.gh/people/',\n 'https://yen.com.gh/world/asia/',\n 'https://yen.com.gh/world/africa/',\n 'https://yen.com.gh/entertainment/',\n 'https://yen.com.gh/business-economy/money/',\n 'https://yen.com.gh/business-economy/technology/'\n]\n\nfor url in urls:\n print(f\"Downloading data from: {url}\")\n yen = YenNews(url=url)\n yen.download()\n```\n\n### Scrape data from [MyNewsGh](https://mynewsgh.com)\n```python\nfrom mynewsgh.scraper import MyNewsGh\n\n# scrape from multiple URLs\nurls = [\n \"https://www.mynewsgh.com/category/politics/\",\n \"https://www.mynewsgh.com/category/news/\",\n \"https://www.mynewsgh.com/category/entertainment/\",\n \"https://www.mynewsgh.com/category/business/\",\n \"https://www.mynewsgh.com/category/lifestyle/\",\n \"https://www.mynewsgh.com/tag/feature/\",\n \"https://www.mynewsgh.com/category/world/\",\n \"https://www.mynewsgh.com/category/sports/\"\n]\n\nfor url in urls:\n print(f\"Downloading data from: {url}\")\n my_news = MyNewsGh(url=url, limit_pages=50)\n my_news.download()\n\n# scrape from a single URL\nfrom mynewsgh.scraper import MyNewsGh\n\nurl = \"https://www.mynewsgh.com/category/politics/\"\nmy_news = MyNewsGh(url=url, limit_pages=None)\nmy_news.download()\n```\n### Scrape data from [3News](https://3news.com)\n```python\nfrom threenews.scraper import ThreeNews\n\n# DO NOT RUN ALL AUTHORS: select ONLY few\n# DO NOT CHANGE THE AUTHOR NAMES\nauthors = [\n \"laud-nartey\",\n \"3xtra\",\n \"essel-issac\",\n \"arabaincoom\",\n \"bbc\",\n \"betty-kankam-boadu\",\n \"kwameamoh\",\n \"fiifi_forson\",\n \"fdoku\",\n \"frankappiah\",\n \"godwin-asediba\",\n \"afua-somuah\",\n \"irene\",\n \"joyce-sesi\",\n \"3news_user\",\n \"ntollo\",\n \"pwaberi-denis\",\n \"sonia-amade\",\n \"effah-steven\",\n \"michael-tetteh\"\n]\n\nfor author in authors:\n print(f\"Downloading data from author: {author}\")\n three_news = ThreeNews(author=author, limit_pages=50)\n three_news.download()\n \n# OR\nfrom threenews.scraper import ThreeNews\n\nthree = ThreeNews(author=\"laud-nartey\", limit_pages=None)\nthree.download()\n\n```\n### Scrape data from [PulseGh](https://pulse.com.gh)\n + select ONLY few urls\n * Note: these values may change\n\n | Category | Number of Pages |\n |---------------------------|-----------------|\n | News | 40 |\n | Entertainment | 40 |\n | Business | 40 |\n | Lifestyle | 40 |\n | Business/Domestic | 26 |\n | Business/International | 40 |\n | Sports/Football | 99 |\n | News/Politics | 40 |\n | News/Local | 40 |\n | News/World | 40 |\n | News/Filla | 38 |\n | Entertainment/Celebrities | 40 |\n | Lifestyle/Fashion | 40 |\n\n```python\nfrom pulsegh.scraper import PulseGh\n\nurls = [\n \"https://www.pulse.com.gh/news\",\n \"https://www.pulse.com.gh/news/politics\",\n \"https://www.pulse.com.gh/entertainment\",\n \"https://www.pulse.com.gh/lifestyle\",\n \"https://www.pulse.com.gh/sports\",\n \"https://www.pulse.com.gh/sports/football\",\n \"https://www.pulse.com.gh/business/international\",\n \"https://www.pulse.com.gh/business/domestic\",\n \"https://www.pulse.com.gh/business\",\n \"https://www.pulse.com.gh/quizzes\",\n \"https://www.pulse.com.gh/news/filla\",\n \"https://www.pulse.com.gh/news/world\"\n]\n\nfor url in urls:\n print(f\"Downloading data from: {url}\")\n pulse = PulseGh(url=url, limit_pages=5)\n pulse.download()\n \n# news has 40 pages\nfrom pulsegh.scraper import PulseGh\n\npulse = PulseGh(url=\"https://www.pulse.com.gh/news\", total_pages = 40, limit_pages=20)\npulse.download()\n\n# Sports/football has 99 pages\nfrom pulsegh.scraper import PulseGh\npulse = PulseGh(url=\"https://www.pulse.com.gh/sports/football\", total_pages=99, limit_pages=None)\npulse.download()\n\n```\n\nBuyMeCoffee\n-----------\n[![Build](https://www.buymeacoffee.com/assets/img/custom_images/yellow_img.png)](https://www.buymeacoffee.com/theodondrew)\n\nCredits\n-------\n- `Theophilus Siameh`\n<div>\n <a href=\"https://twitter.com/tsiameh\"><img src=\"https://img.shields.io/twitter/follow/tsiameh?color=blue&logo=twitter&style=flat\" alt=\"tsiameh twitter\"></a>\n</div>\n",
"bugtrack_url": null,
"license": "MIT License",
"summary": "A python package to scrape data from Ghana News Portals",
"version": "1.0.27",
"project_urls": {
"Homepage": "https://github.com/donwany/ghananews-scraper"
},
"split_keywords": [
"scraper",
"data",
"ghananews",
"ghanaweb",
"joynews",
"myjoyonline",
"news",
"yen",
"mynewsgh",
"threenews",
"web scraper",
"ghana scraper"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4042274a4b55f2eea444c49d7d4573fd279ef55d76fbfe75e1c785c114ff90cf",
"md5": "ca242934ec53112c88180a6b508f34f6",
"sha256": "026c990b197cd6a74e9d0d6025219171d2618b623fa951b862356b091c7b8b6b"
},
"downloads": -1,
"filename": "ghananews_scraper-1.0.27-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ca242934ec53112c88180a6b508f34f6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 36175,
"upload_time": "2023-11-17T01:49:35",
"upload_time_iso_8601": "2023-11-17T01:49:35.273823Z",
"url": "https://files.pythonhosted.org/packages/40/42/274a4b55f2eea444c49d7d4573fd279ef55d76fbfe75e1c785c114ff90cf/ghananews_scraper-1.0.27-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b9fe44255922c62dca21c9d86d1aa6f91ae43c31b4efde7a35f0b29d3cfeaef3",
"md5": "228e3effadcd529f993f897279f716dc",
"sha256": "0c2a7f54130a54c035baf6f8eeba540d6efc604eef1eaa4a26d7b8efeaa38024"
},
"downloads": -1,
"filename": "ghananews-scraper-1.0.27.tar.gz",
"has_sig": false,
"md5_digest": "228e3effadcd529f993f897279f716dc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 23497,
"upload_time": "2023-11-17T01:49:37",
"upload_time_iso_8601": "2023-11-17T01:49:37.192481Z",
"url": "https://files.pythonhosted.org/packages/b9/fe/44255922c62dca21c9d86d1aa6f91ae43c31b4efde7a35f0b29d3cfeaef3/ghananews-scraper-1.0.27.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-17 01:49:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "donwany",
"github_project": "ghananews-scraper",
"github_not_found": true,
"lcname": "ghananews-scraper"
}