facebook-page-scraper

Name	facebook-page-scraper JSON
Version	5.0.4 JSON
	download
home_page	https://github.com/shaikhsajid1111/facebook_page_scraper
Summary	Python package to scrap facebook's pages front end with no limitations
upload_time	2024-04-06 05:29:54
maintainer	None
docs_url	None
author	Sajid Shaikh
requires_python	>=3.7
license	MIT
keywords	web-scraping selenium facebook facebook-pages
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <h1> Facebook Page Scraper </h1>

[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)
[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.6.9](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-360/)

<p> No need of API key, No limitation on number of requests. Import the library and <b> Just Do It !<b> </p>

<!--TABLE of contents-->
<h2> Table of Contents </h2>
<details open="open">
  <summary>Table of Contents</summary>
  <ol>
    <li>
      <a href="#">Getting Started</a>
      <ul>
        <li><a href="#Prerequisites">Prerequisites</a></li>
        <li><a href="#Installation">Installation</a>
        <ul>
        <li><a href="#sourceInstallation">Installing from source</a></li>
        <li><a href="#pypiInstallation">Installing with PyPI</a></li>
        </ul>
        </li>
      </ul>
    </li>
    <li><a href="#Usage">Usage</a></li>
    <ul>
    <li><a href="#instantiation">How to instantiate?</a></li>
    <ul>
    <li><a href="#scraperParameters">Parameters for <code>Facebook_scraper()</code></a></li>
    <li><a href="#JSONWay">Scrape in JSON format</a>
    <ul><li><a href="#jsonOutput">JSON Output Format</a></li></ul>
    </li>
    <li><a href="#CSVWay">Scrape in CSV format</a>
    <ul><li><a href="#csvParameter">Parameters for scrape_to_csv() method</a></li></ul>
    </li>
    <li><a href="#outputKeys">Keys of the output data</a></li>
    </ul>
    </ul>
    <li><a href="#tech">Tech</a></li>
    <li><a href="#license">License</a></li>
  </ol>
</details>

<!--TABLE of contents //-->

<h2 id="Prerequisites"> Prerequisites </h2>

- Internet Connection
- Python 3.7+
- Chrome or Firefox browser installed on your machine
  <br>

<hr>
<h2 id="Installation">Installation:</h2>

<h3 id="sourceInstallation"> Installing from source: </h3>

```
git clone https://github.com/shaikhsajid1111/facebook_page_scraper
```

<h4> Inside project's directory </h4>

```
python3 setup.py install
```

<br>
<p id="pypiInstallation">Installing with pypi</p>

```
pip3 install facebook-page-scraper
```

<br>
<hr>
<h2 id="instantiation"> How to use? </h2>

```python
#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_or_group_name = "Meta"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
# get env password
fb_password = os.getenv('fb_password')
fb_email = os.getenv('fb_email')
# indicates if the Facebook target is a FB group or FB page
isGroup= False
meta_ai = Facebook_scraper(page_or_group_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless, isGroup=isGroup)

```

<h3 id="scraperParameters"> Parameters for  <code>Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) </code> class </h3>
<table>
<th>
<tr>
<td> Parameter Name </td>
<td> Parameter Type </td>
<td> Description </td>
</tr>
</th>

<tr>
<td>
page_or_group_name
</td>
<td>
String
</td>
<td>
Name of the facebook page or group
</td>
</tr>

<tr>
<td>
posts_count
</td>
<td>
Integer
</td>
<td>
Number of posts to scrap, if not passed default is 10
</td>
</tr>

<tr>
<td>
browser
</td>
<td>
String
</td>
<td>
Which browser to use, either chrome or firefox. if not passed,default is chrome
</td>
</tr>

<tr>
<td>
proxy(optional)
</td>
<td>
String
</td>
<td>
Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be <code> user:password@IP:PORT </code>
</td>
</tr>
<tr>
<td>
timeout
</td>
<td>
Integer
</td>
<td>
The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes
 </code>
</td>
</tr>
<tr>
<td>
headless
</td>
<td>
Boolean
</td>
<td>
Whether to run browser in headless mode?. Default is True
 </code>
</td>
</tr>
<tr>

<td>
isGroup
</td>
<td>
Boolean
</td>
<td>
Whether the Facebook target is a group or page. Default is False
 </code>
</td>
</tr>

<tr>
<td>
username
</td>
<td>
String
</td>
<td>
username to log into Facebook when scraping (recommended to use .env)
 </code>
</td>
</tr>

<tr>
<td>
password
</td>
<td>
String
</td>
<td>
password to log into Facebook when scraping (recommended to use .env)
 </code>
</td>
</tr>

</table>
<br>
<hr>
<br>
⚠️ <b> Warning: Use Logged-In Scraping at Your Own Risk </b> ⚠️

Using logged-in scraping methods may result in the permanent suspension of your account. Proceed with caution, as violating a platform's terms of service can lead to severe consequences. Exercise discretion and adhere to ethical practices when collecting data through scraping. The library/provider assumes no responsibility for any consequences resulting from the misuse of scraping methods.

<h3> Done with instantiation?. <b>Let the scraping begin!</b> </h3>
<br

>

<h3 id="JSONWay"> For post's data in <b>JSON</b> format:</h3>

```python
#call the scrap_to_json() method

json_data = meta_ai.scrap_to_json()
print(json_data)

```

Output:

```javascript

{
  "2024182624425347": {
    "name": "Meta AI",
    "shares": 0,
    "reactions": {
      "likes": 154,
      "loves": 19,
      "wow": 0,
      "cares": 0,
      "sad": 0,
      "angry": 0,
      "haha": 0
    },
    "reaction_count": 173,
    "comments": 2,
    "content": "We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/…/the-first-high-performance-self-s…",
    "posted_on": "2022-01-20T22:43:35",
    "video": [],
    "image": [
      "https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71"
    ],
    "post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R"
  }, ...

}
```

<div id="jsonOutput">
Output Structure for JSON format:

```javascript
{
    "id": {
        "name": string,
        "shares": integer,
        "reactions": {
            "likes": integer,
            "loves": integer,
            "wow": integer,
            "cares": integer,
            "sad": integer,
            "angry": integer,
            "haha": integer
        },
        "reaction_count": integer,
        "comments": integer,
        "content": string,
        "video" : list,
        "image" : list,
        "posted_on": datetime,  //string containing datetime in ISO 8601
        "post_url": string
    }
}

```

</div>
<br>
<hr>
<br>

<h3 id="CSVWay"> For saving post's data directly to <b>CSV</b> file</h3>

```python
#call scrap_to_csv(filename,directory) method


filename = "data_file"  #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
meta_ai.scrap_to_csv(filename, directory)

```

content of `data_file.csv`:

```csv
id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url
2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/…/the-first-high-performance-self-s…",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R
...
```

<br>

<hr>
<br>

<h3 id="csvParameter"> Parameters for  <code> scrap_to_csv(filename, directory) </code> method. </h3>

<table>
<th>
<tr>
<td> Parameter Name </td>
<td> Parameter Type </td>
<td> Description </td>
</tr>
</th>

<tr>
<td>
filename
</td>
<td>
String
</td>

<td>
Name of the CSV file where post's data will be saved
</td>

</tr>

<tr>
<td>
directory
</td>
<td>
String
</td>

<td>
Directory where CSV file have to be stored.
</td>

</tr>

</table>

<br>
<hr>
<br>

<h3 id="outputKeys">Keys of the outputs:</h3>
<table>
<th>
<tr>

<td>
Key
</td>

<td>
Type
</td>

<td>
Description
</td>

<tr>
</th>

<td>
<tr>

<td>
id
</td>
<td>
String
</td>
<td>
Post Identifier(integer casted inside string)
</td>
</tr>

</td>

<tr>
<td>
name
</td>
<td>
String
</td>
<td>
Name of the page
</td>
</tr>

<tr>
<td>
shares
</td>
<td>
Integer
</td>
<td>
Share count of post
</td>
</tr>

<tr>
<td>
reactions
</td>
<td>
Dictionary
</td>
<td>
Dictionary containing reactions as keys and its count as value. Keys => <code> ["likes","loves","wow","cares","sad","angry","haha"] </code>
</td>
</tr>

<tr>
<td>
reaction_count
</td>
<td>
Integer
</td>
<td>
Total reaction count of post
</td>
</tr>

<tr>
<td>
comments
</td>
<td>
Integer
</td>
<td>
Comments count of post
</td>
</tr>

<tr>
<td>
content
</td>
<td>
 String
</td>
<td>
Content of post as text
</td>
</tr>

<tr>
<td>
video
</td>
<td>
 List
</td>
<td>
URLs of video present in that post
</td>
</tr>

<tr>
<td>
images
</td>
<td>
 List
</td>
<td>
List containing URLs of all images present in the post
</td>
</tr>

<tr>
<td>
posted_on
</td>
<td>
Datetime
</td>
<td>
Time at which post was posted(in ISO 8601 format)
</td>
</tr>

<tr>
<td>
post_url
</td>
<td>
String
</td>
<td>
URL for that post
</td>
</tr>

</table>
<br>

<hr>
<h2 id="tech"> Tech </h2>
<p>This project uses different libraries to work properly.</p>
<ul>
<li> <a href="https://www.selenium.dev/" target='_blank'>Selenium</a></li>
<li> <a href="https://pypi.org/project/webdriver-manager/" target='_blank'>Webdriver Manager</a></li>
<li> <a href="https://pypi.org/project/python-dateutil/" target='_blank'>Python Dateutil</a></li>
<li> <a href="https://pypi.org/project/selenium-wire/" target='_blank'>Selenium-wire</a></li>
</ul>
<br>

<hr>
If you encounter anything unusual please feel free to create issue <a href='https://github.com/shaikhsajid1111/facebook_page_scraper/issues'>here</a>
<hr>

<h2 id="license"> LICENSE </h2>
MIT

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/shaikhsajid1111/facebook_page_scraper",
    "name": "facebook-page-scraper",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "web-scraping selenium facebook facebook-pages",
    "author": "Sajid Shaikh",
    "author_email": "shaikhsajid3732@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2b/16/80690a353f9d0cdc25f7fce9f68f111a52a2cd06ce6e3b9b465ca8e91bd0/facebook_page_scraper-5.0.4.tar.gz",
    "platform": null,
    "description": "<h1> Facebook Page Scraper </h1>\r\n\r\n[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)\r\n[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.6.9](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-360/)\r\n\r\n<p> No need of API key, No limitation on number of requests. Import the library and <b> Just Do It !<b> </p>\r\n\r\n<!--TABLE of contents-->\r\n<h2> Table of Contents </h2>\r\n<details open=\"open\">\r\n  <summary>Table of Contents</summary>\r\n  <ol>\r\n    <li>\r\n      <a href=\"#\">Getting Started</a>\r\n      <ul>\r\n        <li><a href=\"#Prerequisites\">Prerequisites</a></li>\r\n        <li><a href=\"#Installation\">Installation</a>\r\n        <ul>\r\n        <li><a href=\"#sourceInstallation\">Installing from source</a></li>\r\n        <li><a href=\"#pypiInstallation\">Installing with PyPI</a></li>\r\n        </ul>\r\n        </li>\r\n      </ul>\r\n    </li>\r\n    <li><a href=\"#Usage\">Usage</a></li>\r\n    <ul>\r\n    <li><a href=\"#instantiation\">How to instantiate?</a></li>\r\n    <ul>\r\n    <li><a href=\"#scraperParameters\">Parameters for <code>Facebook_scraper()</code></a></li>\r\n    <li><a href=\"#JSONWay\">Scrape in JSON format</a>\r\n    <ul><li><a href=\"#jsonOutput\">JSON Output Format</a></li></ul>\r\n    </li>\r\n    <li><a href=\"#CSVWay\">Scrape in CSV format</a>\r\n    <ul><li><a href=\"#csvParameter\">Parameters for scrape_to_csv() method</a></li></ul>\r\n    </li>\r\n    <li><a href=\"#outputKeys\">Keys of the output data</a></li>\r\n    </ul>\r\n    </ul>\r\n    <li><a href=\"#tech\">Tech</a></li>\r\n    <li><a href=\"#license\">License</a></li>\r\n  </ol>\r\n</details>\r\n\r\n<!--TABLE of contents //-->\r\n\r\n<h2 id=\"Prerequisites\"> Prerequisites </h2>\r\n\r\n- Internet Connection\r\n- Python 3.7+\r\n- Chrome or Firefox browser installed on your machine\r\n  <br>\r\n\r\n<hr>\r\n<h2 id=\"Installation\">Installation:</h2>\r\n\r\n<h3 id=\"sourceInstallation\"> Installing from source: </h3>\r\n\r\n```\r\ngit clone https://github.com/shaikhsajid1111/facebook_page_scraper\r\n```\r\n\r\n<h4> Inside project's directory </h4>\r\n\r\n```\r\npython3 setup.py install\r\n```\r\n\r\n<br>\r\n<p id=\"pypiInstallation\">Installing with pypi</p>\r\n\r\n```\r\npip3 install facebook-page-scraper\r\n```\r\n\r\n<br>\r\n<hr>\r\n<h2 id=\"instantiation\"> How to use? </h2>\r\n\r\n```python\r\n#import Facebook_scraper class from facebook_page_scraper\r\nfrom facebook_page_scraper import Facebook_scraper\r\n\r\n#instantiate the Facebook_scraper class\r\n\r\npage_or_group_name = \"Meta\"\r\nposts_count = 10\r\nbrowser = \"firefox\"\r\nproxy = \"IP:PORT\" #if proxy requires authentication then user:password@IP:PORT\r\ntimeout = 600 #600 seconds\r\nheadless = True\r\n# get env password\r\nfb_password = os.getenv('fb_password')\r\nfb_email = os.getenv('fb_email')\r\n# indicates if the Facebook target is a FB group or FB page\r\nisGroup= False\r\nmeta_ai = Facebook_scraper(page_or_group_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless, isGroup=isGroup)\r\n\r\n```\r\n\r\n<h3 id=\"scraperParameters\"> Parameters for  <code>Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless) </code> class </h3>\r\n<table>\r\n<th>\r\n<tr>\r\n<td> Parameter Name </td>\r\n<td> Parameter Type </td>\r\n<td> Description </td>\r\n</tr>\r\n</th>\r\n\r\n<tr>\r\n<td>\r\npage_or_group_name\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nName of the facebook page or group\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nposts_count\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nNumber of posts to scrap, if not passed default is 10\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nbrowser\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nWhich browser to use, either chrome or firefox. if not passed,default is chrome\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nproxy(optional)\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nOptional argument, if user wants to set proxy, if proxy requires authentication then the format will be <code> user:password@IP:PORT </code>\r\n</td>\r\n</tr>\r\n<tr>\r\n<td>\r\ntimeout\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nThe maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes\r\n </code>\r\n</td>\r\n</tr>\r\n<tr>\r\n<td>\r\nheadless\r\n</td>\r\n<td>\r\nBoolean\r\n</td>\r\n<td>\r\nWhether to run browser in headless mode?. Default is True\r\n </code>\r\n</td>\r\n</tr>\r\n<tr>\r\n\r\n<td>\r\nisGroup\r\n</td>\r\n<td>\r\nBoolean\r\n</td>\r\n<td>\r\nWhether the Facebook target is a group or page. Default is False\r\n </code>\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nusername\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nusername to log into Facebook when scraping (recommended to use .env)\r\n </code>\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\npassword\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\npassword to log into Facebook when scraping (recommended to use .env)\r\n </code>\r\n</td>\r\n</tr>\r\n\r\n</table>\r\n<br>\r\n<hr>\r\n<br>\r\n\u26a0\ufe0f <b> Warning: Use Logged-In Scraping at Your Own Risk </b> \u26a0\ufe0f\r\n\r\nUsing logged-in scraping methods may result in the permanent suspension of your account. Proceed with caution, as violating a platform's terms of service can lead to severe consequences. Exercise discretion and adhere to ethical practices when collecting data through scraping. The library/provider assumes no responsibility for any consequences resulting from the misuse of scraping methods.\r\n\r\n<h3> Done with instantiation?. <b>Let the scraping begin!</b> </h3>\r\n<br\r\n\r\n>\r\n\r\n<h3 id=\"JSONWay\"> For post's data in <b>JSON</b> format:</h3>\r\n\r\n```python\r\n#call the scrap_to_json() method\r\n\r\njson_data = meta_ai.scrap_to_json()\r\nprint(json_data)\r\n\r\n```\r\n\r\nOutput:\r\n\r\n```javascript\r\n\r\n{\r\n  \"2024182624425347\": {\r\n    \"name\": \"Meta AI\",\r\n    \"shares\": 0,\r\n    \"reactions\": {\r\n      \"likes\": 154,\r\n      \"loves\": 19,\r\n      \"wow\": 0,\r\n      \"cares\": 0,\r\n      \"sad\": 0,\r\n      \"angry\": 0,\r\n      \"haha\": 0\r\n    },\r\n    \"reaction_count\": 173,\r\n    \"comments\": 2,\r\n    \"content\": \"We\u2019ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/\u2026/the-first-high-performance-self-s\u2026\",\r\n    \"posted_on\": \"2022-01-20T22:43:35\",\r\n    \"video\": [],\r\n    \"image\": [\r\n      \"https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71\"\r\n    ],\r\n    \"post_url\": \"https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R\"\r\n  }, ...\r\n\r\n}\r\n```\r\n\r\n<div id=\"jsonOutput\">\r\nOutput Structure for JSON format:\r\n\r\n```javascript\r\n{\r\n    \"id\": {\r\n        \"name\": string,\r\n        \"shares\": integer,\r\n        \"reactions\": {\r\n            \"likes\": integer,\r\n            \"loves\": integer,\r\n            \"wow\": integer,\r\n            \"cares\": integer,\r\n            \"sad\": integer,\r\n            \"angry\": integer,\r\n            \"haha\": integer\r\n        },\r\n        \"reaction_count\": integer,\r\n        \"comments\": integer,\r\n        \"content\": string,\r\n        \"video\" : list,\r\n        \"image\" : list,\r\n        \"posted_on\": datetime,  //string containing datetime in ISO 8601\r\n        \"post_url\": string\r\n    }\r\n}\r\n\r\n```\r\n\r\n</div>\r\n<br>\r\n<hr>\r\n<br>\r\n\r\n<h3 id=\"CSVWay\"> For saving post's data directly to <b>CSV</b> file</h3>\r\n\r\n```python\r\n#call scrap_to_csv(filename,directory) method\r\n\r\n\r\nfilename = \"data_file\"  #file name without CSV extension,where data will be saved\r\ndirectory = \"E:\\data\" #directory where CSV file will be saved\r\nmeta_ai.scrap_to_csv(filename, directory)\r\n\r\n```\r\n\r\ncontent of `data_file.csv`:\r\n\r\n```csv\r\nid,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url\r\n2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,\"We\u2019ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/\u2026/the-first-high-performance-self-s\u2026\",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R\r\n...\r\n```\r\n\r\n<br>\r\n\r\n<hr>\r\n<br>\r\n\r\n<h3 id=\"csvParameter\"> Parameters for  <code> scrap_to_csv(filename, directory) </code> method. </h3>\r\n\r\n<table>\r\n<th>\r\n<tr>\r\n<td> Parameter Name </td>\r\n<td> Parameter Type </td>\r\n<td> Description </td>\r\n</tr>\r\n</th>\r\n\r\n<tr>\r\n<td>\r\nfilename\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n\r\n<td>\r\nName of the CSV file where post's data will be saved\r\n</td>\r\n\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\ndirectory\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n\r\n<td>\r\nDirectory where CSV file have to be stored.\r\n</td>\r\n\r\n</tr>\r\n\r\n</table>\r\n\r\n<br>\r\n<hr>\r\n<br>\r\n\r\n<h3 id=\"outputKeys\">Keys of the outputs:</h3>\r\n<table>\r\n<th>\r\n<tr>\r\n\r\n<td>\r\nKey\r\n</td>\r\n\r\n<td>\r\nType\r\n</td>\r\n\r\n<td>\r\nDescription\r\n</td>\r\n\r\n<tr>\r\n</th>\r\n\r\n<td>\r\n<tr>\r\n\r\n<td>\r\nid\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nPost Identifier(integer casted inside string)\r\n</td>\r\n</tr>\r\n\r\n</td>\r\n\r\n<tr>\r\n<td>\r\nname\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nName of the page\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nshares\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nShare count of post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nreactions\r\n</td>\r\n<td>\r\nDictionary\r\n</td>\r\n<td>\r\nDictionary containing reactions as keys and its count as value. Keys => <code> [\"likes\",\"loves\",\"wow\",\"cares\",\"sad\",\"angry\",\"haha\"] </code>\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nreaction_count\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nTotal reaction count of post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\ncomments\r\n</td>\r\n<td>\r\nInteger\r\n</td>\r\n<td>\r\nComments count of post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\ncontent\r\n</td>\r\n<td>\r\n String\r\n</td>\r\n<td>\r\nContent of post as text\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nvideo\r\n</td>\r\n<td>\r\n List\r\n</td>\r\n<td>\r\nURLs of video present in that post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nimages\r\n</td>\r\n<td>\r\n List\r\n</td>\r\n<td>\r\nList containing URLs of all images present in the post\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\nposted_on\r\n</td>\r\n<td>\r\nDatetime\r\n</td>\r\n<td>\r\nTime at which post was posted(in ISO 8601 format)\r\n</td>\r\n</tr>\r\n\r\n<tr>\r\n<td>\r\npost_url\r\n</td>\r\n<td>\r\nString\r\n</td>\r\n<td>\r\nURL for that post\r\n</td>\r\n</tr>\r\n\r\n</table>\r\n<br>\r\n\r\n<hr>\r\n<h2 id=\"tech\"> Tech </h2>\r\n<p>This project uses different libraries to work properly.</p>\r\n<ul>\r\n<li> <a href=\"https://www.selenium.dev/\" target='_blank'>Selenium</a></li>\r\n<li> <a href=\"https://pypi.org/project/webdriver-manager/\" target='_blank'>Webdriver Manager</a></li>\r\n<li> <a href=\"https://pypi.org/project/python-dateutil/\" target='_blank'>Python Dateutil</a></li>\r\n<li> <a href=\"https://pypi.org/project/selenium-wire/\" target='_blank'>Selenium-wire</a></li>\r\n</ul>\r\n<br>\r\n\r\n<hr>\r\nIf you encounter anything unusual please feel free to create issue <a href='https://github.com/shaikhsajid1111/facebook_page_scraper/issues'>here</a>\r\n<hr>\r\n\r\n<h2 id=\"license\"> LICENSE </h2>\r\nMIT\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package to scrap facebook's pages front end with no limitations",
    "version": "5.0.4",
    "project_urls": {
        "Homepage": "https://github.com/shaikhsajid1111/facebook_page_scraper"
    },
    "split_keywords": [
        "web-scraping",
        "selenium",
        "facebook",
        "facebook-pages"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "659eaeb3d05a152251ec4be0458c8bf3df3c734fffe761c8f94883f07ff105ba",
                "md5": "197292b5385897730765955251500ca1",
                "sha256": "98721dcdff3e07f2183752d5565cd8e5026d7e6f69a6852eb083dfede25e834b"
            },
            "downloads": -1,
            "filename": "facebook_page_scraper-5.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "197292b5385897730765955251500ca1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 22300,
            "upload_time": "2024-04-06T05:29:51",
            "upload_time_iso_8601": "2024-04-06T05:29:51.625378Z",
            "url": "https://files.pythonhosted.org/packages/65/9e/aeb3d05a152251ec4be0458c8bf3df3c734fffe761c8f94883f07ff105ba/facebook_page_scraper-5.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2b1680690a353f9d0cdc25f7fce9f68f111a52a2cd06ce6e3b9b465ca8e91bd0",
                "md5": "b8b7ad54548435a9576d2a1b836b36b8",
                "sha256": "198e653f2b14c18177ecea3fb20de964ecd0ea880bbbe7c35a341d77960dbe0e"
            },
            "downloads": -1,
            "filename": "facebook_page_scraper-5.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "b8b7ad54548435a9576d2a1b836b36b8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 24271,
            "upload_time": "2024-04-06T05:29:54",
            "upload_time_iso_8601": "2024-04-06T05:29:54.146005Z",
            "url": "https://files.pythonhosted.org/packages/2b/16/80690a353f9d0cdc25f7fce9f68f111a52a2cd06ce6e3b9b465ca8e91bd0/facebook_page_scraper-5.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-06 05:29:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "shaikhsajid1111",
    "github_project": "facebook_page_scraper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "facebook-page-scraper"
}

Sajid Shaikh