<h1> Facebook Page Scraper </h1>
[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)
[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.6.9](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-360/)
<p> No need of API key, No limitation on number of requests. Import the library and <b> Just Do It !<b> </p>
<!--TABLE of contents-->
<h2> Table of Contents </h2>
<details open="open">
<summary>Table of Contents</summary>
<ol>
<li>
<a href="#">Getting Started</a>
<ul>
<li><a href="#Prerequisites">Prerequisites</a></li>
<li><a href="#Installation">Installation</a>
<ul>
<li><a href="#sourceInstallation">Installing from source</a></li>
<li><a href="#pypiInstallation">Installing with PyPI</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#Usage">Usage</a></li>
<ul>
<li><a href="#instantiation">How to instantiate?</a></li>
<ul>
<li><a href="#scraperParameters">Parameters for <code>Facebook_scraper()</code></a></li>
<li><a href="#JSONWay">Scrape in JSON format</a>
<ul><li><a href="#jsonOutput">JSON Output Format</a></li></ul>
</li>
<li><a href="#CSVWay">Scrape in CSV format</a>
<ul><li><a href="#csvParameter">Parameters for scrape_to_csv() method</a></li></ul>
</li>
<li><a href="#outputKeys">Keys of the output data</a></li>
</ul>
</ul>
<li><a href="#tech">Tech</a></li>
<li><a href="#license">License</a></li>
</ol>
</details>
<!--TABLE of contents //-->
<h2 id="Prerequisites"> Prerequisites </h2>
- Internet Connection
- Python 3.7+
- Chrome or Firefox browser installed on your machine
<br>
<hr>
<h2 id="Installation">Installation:</h2>
<h3 id="sourceInstallation"> Installing from source: </h3>
```
git clone https://github.com/shaikhsajid1111/facebook_page_scraper
```
<h4> Inside project's directory </h4>
```
python3 setup.py install
```
<br>
<p id="pypiInstallation">Installing with pypi</p>
```
pip3 install facebook-page-scraper
```
<br>
<hr>
<h2 id="instantiation"> How to use? </h2>
```python
#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper
#instantiate the Facebook_scraper class
page_or_group_name = "Meta"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
# get env password
fb_password = os.getenv('fb_password')
fb_email = os.getenv('fb_email')
# indicates if the Facebook target is a FB group or FB page
isGroup= False
meta_ai = Facebook_scraper(page_or_group_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless, isGroup=isGroup)
```
<h3 id="scraperParameters"> Parameters for <code>Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) </code> class </h3>
<table>
<th>
<tr>
<td> Parameter Name </td>
<td> Parameter Type </td>
<td> Description </td>
</tr>
</th>
<tr>
<td>
page_or_group_name
</td>
<td>
String
</td>
<td>
Name of the facebook page or group
</td>
</tr>
<tr>
<td>
posts_count
</td>
<td>
Integer
</td>
<td>
Number of posts to scrap, if not passed default is 10
</td>
</tr>
<tr>
<td>
browser
</td>
<td>
String
</td>
<td>
Which browser to use, either chrome or firefox. if not passed,default is chrome
</td>
</tr>
<tr>
<td>
proxy(optional)
</td>
<td>
String
</td>
<td>
Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be <code> user:password@IP:PORT </code>
</td>
</tr>
<tr>
<td>
timeout
</td>
<td>
Integer
</td>
<td>
The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes
</code>
</td>
</tr>
<tr>
<td>
headless
</td>
<td>
Boolean
</td>
<td>
Whether to run browser in headless mode?. Default is True
</code>
</td>
</tr>
<tr>
<td>
isGroup
</td>
<td>
Boolean
</td>
<td>
Whether the Facebook target is a group or page. Default is False
</code>
</td>
</tr>
<tr>
<td>
username
</td>
<td>
String
</td>
<td>
username to log into Facebook when scraping (recommended to use .env)
</code>
</td>
</tr>
<tr>
<td>
password
</td>
<td>
String
</td>
<td>
password to log into Facebook when scraping (recommended to use .env)
</code>
</td>
</tr>
</table>
<br>
<hr>
<br>
⚠️ <b> Warning: Use Logged-In Scraping at Your Own Risk </b> ⚠️
Using logged-in scraping methods may result in the permanent suspension of your account. Proceed with caution, as violating a platform's terms of service can lead to severe consequences. Exercise discretion and adhere to ethical practices when collecting data through scraping. The library/provider assumes no responsibility for any consequences resulting from the misuse of scraping methods.
<h3> Done with instantiation?. <b>Let the scraping begin!</b> </h3>
<br
>
<h3 id="JSONWay"> For post's data in <b>JSON</b> format:</h3>
```python
#call the scrap_to_json() method
json_data = meta_ai.scrap_to_json()
print(json_data)
```
Output:
```javascript
{
"2024182624425347": {
"name": "Meta AI",
"shares": 0,
"reactions": {
"likes": 154,
"loves": 19,
"wow": 0,
"cares": 0,
"sad": 0,
"angry": 0,
"haha": 0
},
"reaction_count": 173,
"comments": 2,
"content": "We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",
"posted_on": "2022-01-20T22:43:35",
"video": [],
"image": [
"https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71"
],
"post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R"
}, ...
}
```
<div id="jsonOutput">
Output Structure for JSON format:
```javascript
{
"id": {
"name": string,
"shares": integer,
"reactions": {
"likes": integer,
"loves": integer,
"wow": integer,
"cares": integer,
"sad": integer,
"angry": integer,
"haha": integer
},
"reaction_count": integer,
"comments": integer,
"content": string,
"video" : list,
"image" : list,
"posted_on": datetime, //string containing datetime in ISO 8601
"post_url": string
}
}
```
</div>
<br>
<hr>
<br>
<h3 id="CSVWay"> For saving post's data directly to <b>CSV</b> file</h3>
```python
#call scrap_to_csv(filename,directory) method
filename = "data_file" #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
meta_ai.scrap_to_csv(filename, directory)
```
content of `data_file.csv`:
```csv
id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url
2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R
...
```
<br>
<hr>
<br>
<h3 id="csvParameter"> Parameters for <code> scrap_to_csv(filename, directory) </code> method. </h3>
<table>
<th>
<tr>
<td> Parameter Name </td>
<td> Parameter Type </td>
<td> Description </td>
</tr>
</th>
<tr>
<td>
filename
</td>
<td>
String
</td>
<td>
Name of the CSV file where post's data will be saved
</td>
</tr>
<tr>
<td>
directory
</td>
<td>
String
</td>
<td>
Directory where CSV file have to be stored.
</td>
</tr>
</table>
<br>
<hr>
<br>
<h3 id="outputKeys">Keys of the outputs:</h3>
<table>
<th>
<tr>
<td>
Key
</td>
<td>
Type
</td>
<td>
Description
</td>
<tr>
</th>
<td>
<tr>
<td>
id
</td>
<td>
String
</td>
<td>
Post Identifier(integer casted inside string)
</td>
</tr>
</td>
<tr>
<td>
name
</td>
<td>
String
</td>
<td>
Name of the page
</td>
</tr>
<tr>
<td>
shares
</td>
<td>
Integer
</td>
<td>
Share count of post
</td>
</tr>
<tr>
<td>
reactions
</td>
<td>
Dictionary
</td>
<td>
Dictionary containing reactions as keys and its count as value. Keys => <code> ["likes","loves","wow","cares","sad","angry","haha"] </code>
</td>
</tr>
<tr>
<td>
reaction_count
</td>
<td>
Integer
</td>
<td>
Total reaction count of post
</td>
</tr>
<tr>
<td>
comments
</td>
<td>
Integer
</td>
<td>
Comments count of post
</td>
</tr>
<tr>
<td>
content
</td>
<td>
String
</td>
<td>
Content of post as text
</td>
</tr>
<tr>
<td>
video
</td>
<td>
List
</td>
<td>
URLs of video present in that post
</td>
</tr>
<tr>
<td>
images
</td>
<td>
List
</td>
<td>
List containing URLs of all images present in the post
</td>
</tr>
<tr>
<td>
posted_on
</td>
<td>
Datetime
</td>
<td>
Time at which post was posted(in ISO 8601 format)
</td>
</tr>
<tr>
<td>
post_url
</td>
<td>
String
</td>
<td>
URL for that post
</td>
</tr>
</table>
<br>
<hr>
<h2 id="tech"> Tech </h2>
<p>This project uses different libraries to work properly.</p>
<ul>
<li> <a href="https://www.selenium.dev/" target='_blank'>Selenium</a></li>
<li> <a href="https://pypi.org/project/webdriver-manager/" target='_blank'>Webdriver Manager</a></li>
<li> <a href="https://pypi.org/project/python-dateutil/" target='_blank'>Python Dateutil</a></li>
<li> <a href="https://pypi.org/project/selenium-wire/" target='_blank'>Selenium-wire</a></li>
</ul>
<br>
<hr>
If you encounter anything unusual please feel free to create issue <a href='https://github.com/shaikhsajid1111/facebook_page_scraper/issues'>here</a>
<hr>
<h2 id="license"> LICENSE </h2>
MIT
Raw data
{
"_id": null,
"home_page": "https://github.com/shaikhsajid1111/facebook_page_scraper",
"name": "facebook-page-scraper",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "web-scraping selenium facebook facebook-pages",
"author": "Sajid Shaikh",
"author_email": "shaikhsajid3732@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/e8/65/e22edef70b56fdf06282d6c970fec21328656d2420eefb5d8487828d2ce7/facebook_page_scraper-5.0.6.tar.gz",
"platform": null,
"description": "<h1> Facebook Page Scraper </h1>\r\n\r\n[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)\r\n[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.6.9](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-360/)\r\n\r\n<p> No need of API key, No limitation on number of requests. Import the library and <b> Just Do It !<b> </p>\r\n\r\n<!--TABLE of contents-->\r\n<h2> Table of Contents </h2>\r\n<details open=\"open\">\r\n <summary>Table of Contents</summary>\r\n <ol>\r\n <li>\r\n <a href=\"#\">Getting Started</a>\r\n <ul>\r\n <li><a href=\"#Prerequisites\">Prerequisites</a></li>\r\n <li><a href=\"#Installation\">Installation</a>\r\n <ul>\r\n <li><a href=\"#sourceInstallation\">Installing from source</a></li>\r\n <li><a href=\"#pypiInstallation\">Installing with PyPI</a></li>\r\n </ul>\r\n </li>\r\n </ul>\r\n </li>\r\n <li><a href=\"#Usage\">Usage</a></li>\r\n <ul>\r\n <li><a href=\"#instantiation\">How to instantiate?</a></li>\r\n <ul>\r\n <li><a href=\"#scraperParameters\">Parameters for <code>Facebook_scraper()</code></a></li>\r\n <li><a href=\"#JSONWay\">Scrape in JSON format</a>\r\n <ul><li><a href=\"#jsonOutput\">JSON Output Format</a></li></ul>\r\n </li>\r\n <li><a href=\"#CSVWay\">Scrape in CSV format</a>\r\n <ul><li><a href=\"#csvParameter\">Parameters for scrape_to_csv() method</a></li></ul>\r\n </li>\r\n <li><a href=\"#outputKeys\">Keys of the output data</a></li>\r\n </ul>\r\n </ul>\r\n <li><a href=\"#tech\">Tech</a></li>\r\n <li><a href=\"#license\">License</a></li>\r\n </ol>\r\n</details>\r\n\r\n<!--TABLE of contents //-->\r\n\r\n<h2 id=\"Prerequisites\"> Prerequisites </h2>\r\n\r\n- Internet Connection\r\n- Python 3.7+\r\n- Chrome or Firefox browser installed on your machine\r\n <br>\r\n\r\n<hr>\r\n<h2 id=\"Installation\">Installation:</h2>\r\n\r\n<h3 id=\"sourceInstallation\"> Installing from source: </h3>\r\n\r\n```\r\ngit clone https://github.com/shaikhsajid1111/facebook_page_scraper\r\n```\r\n\r\n<h4> Inside project's directory </h4>\r\n\r\n```\r\npython3 setup.py install\r\n```\r\n\r\n<br>\r\n<p id=\"pypiInstallation\">Installing with pypi</p>\r\n\r\n```\r\npip3 install facebook-page-scraper\r\n```\r\n\r\n<br>\r\n<hr>\r\n<h2 id=\"instantiation\"> How to use? </h2>\r\n\r\n```python\r\n#import Facebook_scraper class from facebook_page_scraper\r\nfrom facebook_page_scraper import Facebook_scraper\r\n\r\n#instantiate the Facebook_scraper class\r\n\r\npage_or_group_name = \"Meta\"\r\nposts_count = 10\r\nbrowser = \"firefox\"\r\nproxy = \"IP:PORT\" #if proxy requires authentication then user:password@IP:PORT\r\ntimeout = 600 #600 seconds\r\nheadless = True\r\n# get env password\r\nfb_password = os.getenv('fb_password')\r\nfb_email = os.getenv('fb_email')\r\n# indicates if the Facebook target is a FB group or FB page\r\nisGroup= False\r\nmeta_ai = Facebook_scraper(page_or_group_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless, isGroup=isGroup)\r\n\r\n```\r\n\r\n<h3 id=\"scraperParameters\"> Parameters for <code>Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) </code> class </h3>\r\n<table>\r\n<th>\r\n<tr>\r\n<td> Parameter Name </td>\r\n<td> Parameter Type </td>\r\n<td> Description </td>\r\n</tr>\r\n</th>\r\n\r\n<tr>\r\n<td>\r\npage_or_group_name\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nName of the facebook page or group\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nposts_count\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nNumber of posts to scrap, if not passed default is 10\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nbrowser\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nWhich browser to use, either chrome or firefox. if not passed,default is chrome\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nproxy(optional)\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nOptional argument, if user wants to set proxy, if proxy requires authentication then the format will be <code> user:password@IP:PORT </code>\r\n</td>\r\n</tr>\r\n<tr>\r\n<td>\r\ntimeout\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nThe maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes\r\n </code>\r\n</td>\r\n</tr>\r\n<tr>\r\n<td>\r\nheadless\r\n</td>\r\n<td>\r\nBoolean\r\n</td>\r\n<td>\r\nWhether to run browser in headless mode?. Default is True\r\n </code>\r\n</td>\r\n</tr>\r\n<tr>\r\n\r\n<td>\r\nisGroup\r\n</td>\r\n<td>\r\nBoolean\r\n</td>\r\n<td>\r\nWhether the Facebook target is a group or page. Default is False\r\n </code>\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nusername\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nusername to log into Facebook when scraping (recommended to use .env)\r\n </code>\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\npassword\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\npassword to log into Facebook when scraping (recommended to use .env)\r\n </code>\r\n</td>\r\n</tr>\r\n\r\n</table>\r\n<br>\r\n<hr>\r\n<br>\r\n\u26a0\ufe0f <b> Warning: Use Logged-In Scraping at Your Own Risk </b> \u26a0\ufe0f\r\n\r\nUsing logged-in scraping methods may result in the permanent suspension of your account. Proceed with caution, as violating a platform's terms of service can lead to severe consequences. Exercise discretion and adhere to ethical practices when collecting data through scraping. The library/provider assumes no responsibility for any consequences resulting from the misuse of scraping methods.\r\n\r\n<h3> Done with instantiation?. <b>Let the scraping begin!</b> </h3>\r\n<br\r\n\r\n>\r\n\r\n<h3 id=\"JSONWay\"> For post's data in <b>JSON</b> format:</h3>\r\n\r\n```python\r\n#call the scrap_to_json() method\r\n\r\njson_data = meta_ai.scrap_to_json()\r\nprint(json_data)\r\n\r\n```\r\n\r\nOutput:\r\n\r\n```javascript\r\n\r\n{\r\n \"2024182624425347\": {\r\n \"name\": \"Meta AI\",\r\n \"shares\": 0,\r\n \"reactions\": {\r\n \"likes\": 154,\r\n \"loves\": 19,\r\n \"wow\": 0,\r\n \"cares\": 0,\r\n \"sad\": 0,\r\n \"angry\": 0,\r\n \"haha\": 0\r\n },\r\n \"reaction_count\": 173,\r\n \"comments\": 2,\r\n \"content\": \"We\u2019ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/\u2026/the-first-high-performance-self-s\u2026\",\r\n \"posted_on\": \"2022-01-20T22:43:35\",\r\n \"video\": [],\r\n \"image\": [\r\n \"https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71\"\r\n ],\r\n \"post_url\": \"https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R\"\r\n }, ...\r\n\r\n}\r\n```\r\n\r\n<div id=\"jsonOutput\">\r\nOutput Structure for JSON format:\r\n\r\n```javascript\r\n{\r\n \"id\": {\r\n \"name\": string,\r\n \"shares\": integer,\r\n \"reactions\": {\r\n \"likes\": integer,\r\n \"loves\": integer,\r\n \"wow\": integer,\r\n \"cares\": integer,\r\n \"sad\": integer,\r\n \"angry\": integer,\r\n \"haha\": integer\r\n },\r\n \"reaction_count\": integer,\r\n \"comments\": integer,\r\n \"content\": string,\r\n \"video\" : list,\r\n \"image\" : list,\r\n \"posted_on\": datetime, //string containing datetime in ISO 8601\r\n \"post_url\": string\r\n }\r\n}\r\n\r\n```\r\n\r\n</div>\r\n<br>\r\n<hr>\r\n<br>\r\n\r\n<h3 id=\"CSVWay\"> For saving post's data directly to <b>CSV</b> file</h3>\r\n\r\n```python\r\n#call scrap_to_csv(filename,directory) method\r\n\r\n\r\nfilename = \"data_file\" #file name without CSV extension,where data will be saved\r\ndirectory = \"E:\\data\" #directory where CSV file will be saved\r\nmeta_ai.scrap_to_csv(filename, directory)\r\n\r\n```\r\n\r\ncontent of `data_file.csv`:\r\n\r\n```csv\r\nid,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url\r\n2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,\"We\u2019ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/\u2026/the-first-high-performance-self-s\u2026\",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R\r\n...\r\n```\r\n\r\n<br>\r\n\r\n<hr>\r\n<br>\r\n\r\n<h3 id=\"csvParameter\"> Parameters for <code> scrap_to_csv(filename, directory) </code> method. </h3>\r\n\r\n<table>\r\n<th>\r\n<tr>\r\n<td> Parameter Name </td>\r\n<td> Parameter Type </td>\r\n<td> Description </td>\r\n</tr>\r\n</th>\r\n\r\n<tr>\r\n<td>\r\nfilename\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n\r\n<td>\r\nName of the CSV file where post's data will be saved\r\n</td>\r\n\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\ndirectory\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n\r\n<td>\r\nDirectory where CSV file have to be stored.\r\n</td>\r\n\r\n</tr>\r\n\r\n</table>\r\n\r\n<br>\r\n<hr>\r\n<br>\r\n\r\n<h3 id=\"outputKeys\">Keys of the outputs:</h3>\r\n<table>\r\n<th>\r\n<tr>\r\n\r\n<td>\r\nKey\r\n</td>\r\n\r\n<td>\r\nType\r\n</td>\r\n\r\n<td>\r\nDescription\r\n</td>\r\n\r\n<tr>\r\n</th>\r\n\r\n<td>\r\n<tr>\r\n\r\n<td>\r\nid\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nPost Identifier(integer casted inside string)\r\n</td>\r\n</tr>\r\n\r\n</td>\r\n\r\n<tr>\r\n<td>\r\nname\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nName of the page\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nshares\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nShare count of post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nreactions\r\n</td>\r\n<td>\r\nDictionary\r\n</td>\r\n<td>\r\nDictionary containing reactions as keys and its count as value. Keys => <code> [\"likes\",\"loves\",\"wow\",\"cares\",\"sad\",\"angry\",\"haha\"] </code>\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nreaction_count\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nTotal reaction count of post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\ncomments\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nComments count of post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\ncontent\r\n</td>\r\n<td>\r\n String\r\n</td>\r\n<td>\r\nContent of post as text\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nvideo\r\n</td>\r\n<td>\r\n List\r\n</td>\r\n<td>\r\nURLs of video present in that post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nimages\r\n</td>\r\n<td>\r\n List\r\n</td>\r\n<td>\r\nList containing URLs of all images present in the post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nposted_on\r\n</td>\r\n<td>\r\nDatetime\r\n</td>\r\n<td>\r\nTime at which post was posted(in ISO 8601 format)\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\npost_url\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nURL for that post\r\n</td>\r\n</tr>\r\n\r\n</table>\r\n<br>\r\n\r\n<hr>\r\n<h2 id=\"tech\"> Tech </h2>\r\n<p>This project uses different libraries to work properly.</p>\r\n<ul>\r\n<li> <a href=\"https://www.selenium.dev/\" target='_blank'>Selenium</a></li>\r\n<li> <a href=\"https://pypi.org/project/webdriver-manager/\" target='_blank'>Webdriver Manager</a></li>\r\n<li> <a href=\"https://pypi.org/project/python-dateutil/\" target='_blank'>Python Dateutil</a></li>\r\n<li> <a href=\"https://pypi.org/project/selenium-wire/\" target='_blank'>Selenium-wire</a></li>\r\n</ul>\r\n<br>\r\n\r\n<hr>\r\nIf you encounter anything unusual please feel free to create issue <a href='https://github.com/shaikhsajid1111/facebook_page_scraper/issues'>here</a>\r\n<hr>\r\n\r\n<h2 id=\"license\"> LICENSE </h2>\r\nMIT\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python package to scrap facebook's pages front end with no limitations",
"version": "5.0.6",
"project_urls": {
"Homepage": "https://github.com/shaikhsajid1111/facebook_page_scraper"
},
"split_keywords": [
"web-scraping",
"selenium",
"facebook",
"facebook-pages"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9a653576c8ab2a19cc1f99c59cb6cc79c04c267334b99c03514a5edcef0d7c6a",
"md5": "989a58b37189c58b575083419171e3c4",
"sha256": "f9bbcc3d7c15f22db859fbd9222e35ea71946633ab05f95ee07b2184aa8ad832"
},
"downloads": -1,
"filename": "facebook_page_scraper-5.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "989a58b37189c58b575083419171e3c4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 22346,
"upload_time": "2024-07-14T12:42:39",
"upload_time_iso_8601": "2024-07-14T12:42:39.491003Z",
"url": "https://files.pythonhosted.org/packages/9a/65/3576c8ab2a19cc1f99c59cb6cc79c04c267334b99c03514a5edcef0d7c6a/facebook_page_scraper-5.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e865e22edef70b56fdf06282d6c970fec21328656d2420eefb5d8487828d2ce7",
"md5": "19dad44d0fe3c018e2b2cad140a5b36f",
"sha256": "ac180043940b8487eaf6e37ea02cfc5ec2c5b8bac5d746596411309baf175a07"
},
"downloads": -1,
"filename": "facebook_page_scraper-5.0.6.tar.gz",
"has_sig": false,
"md5_digest": "19dad44d0fe3c018e2b2cad140a5b36f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 24312,
"upload_time": "2024-07-14T12:42:41",
"upload_time_iso_8601": "2024-07-14T12:42:41.575272Z",
"url": "https://files.pythonhosted.org/packages/e8/65/e22edef70b56fdf06282d6c970fec21328656d2420eefb5d8487828d2ce7/facebook_page_scraper-5.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-07-14 12:42:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "shaikhsajid1111",
"github_project": "facebook_page_scraper",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "facebook-page-scraper"
}