twitter-scraper-selenium


Nametwitter-scraper-selenium JSON
Version 5.0.0 PyPI version JSON
download
home_pagehttps://github.com/shaikhsajid1111/twitter-scraper-selenium
SummaryPython package to scrap twitter's front-end easily with selenium
upload_time2023-06-04 12:14:15
maintainer
docs_urlNone
authorSajid Shaikh
requires_python>=3.6
licenseMIT
keywords web-scraping selenium social media twitter keyword twitter-profile twitter-keywords automation json csv twitter-hashtag hashtag
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1> Twitter scraper selenium </h1>
<p> Python's package to scrape Twitter's front-end easily with selenium.  </p>


[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.6.9](https://img.shields.io/badge/python-3.6+-blue.svg)](https://www.python.org/downloads/release/python-360/)
[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)

<!--TABLE of contents-->
<h2> Table of Contents </h2>
<details open="open">
  <summary>Table of Contents</summary>
  <ol>
    <li>
      <a href="#getting-started">Getting Started</a>
      <ul>
        <li><a href="#Prerequisites">Prerequisites</a></li>
        <li><a href="#Installation">Installation</a>
        <ul>
        <li><a href="#sourceInstallation">Installing from source</a></li>
        <li><a href="#pypiInstallation">Installing with PyPI</a></li>
        </ul>
        </li>
      </ul>
    </li>
    <li><a href="#Usage">Usage</a>
    <ul><li><a href="#availableFunction">Available Functions in this package- Summary</a></li></ul>
    <ul><li><a href="#profileDetail">Scraping profile's details</a>
    <ul>
    <li><a href="#profileDetailExample">In JSON Format - Example</a></li>
    <li><a href="#profileDetailArgument">Function Argument</a></li>
    <li><a href="#profileDetailKeys">Keys of the output</a></li>
    </ul>
    </li></ul>
    <!---->
    <ul>
    <li><a href="#profile">Scraping profile's tweets</a>
    <ul>
    <li><a href="#profileJson">In JSON format - Example</a></li>
    <li><a href="#profileCSV">In CSV format - Example</a></li>
    <li><a href="#profileArgument">Function Arguments</a></li>
    <li><a href="#profileOutput">Keys of the output data</a></li>
    </ul>
    <li><a href="#keywordAPI">Scraping tweets using query/keyword with API</a>
    <ul>
    <li><a href="#keywordAPI">In JSON Format - Example</a></li>
    <li><a href="#scrape_keyword_with_apiArgs">Function Argument</a></li>
    <li><a href="#scrape_keyword_with_apiKeys">Keys of the output.</a></li>
    </ul>
    </li>
    <li><a href="#keyword">Scraping tweets using keywords with browser automation</a>
    <ul>
    <li><a href="#keywordJson">In JSON format - Example</a></li>
    <li><a href="#keywordCSV">In CSV format - Example</a></li>
    <li><a href="#keywordArgument">Function Arguments</a></li>
    <li><a href="#keywordOutput">Keys of the output data</a></li>
    </ul>
    </li>
    <li><a href="#scrape_with_api">Scraping tweets using topic url with API</a></li>
    <ul>
    <li><a href="#scrape_with_api">In JSON format -  Example</a></li>
    <li><a href="#scrape_topic_with_api_args">Function Arguments</a></li>
    <li><a href="#scrape_topic_with_api_args_keys">Keys of the output</a></li>
    </ul>
    <li><a href="#to-scrape-topic-tweets-with-url">Scraping tweets using topic url - Example</a></li>
    <ul>
    <li><a href="#scrape_topic_with_api_args">In JSON format -  Example</a></li>
    <li><a href="#topicArgument">Function Arguments</a></li>
    <li><a href="#profileOutput">Keys of the output:</a></li>
    </ul>
    <li><a href='#to-scrape-user-tweets-with-api'>Scraping user's tweet using API</a></li>
    <ul>
    <li><a href='#to-scrape-user-tweets-with-api'>In JSON format - Example</a></li>
    <li><a href='#users_api_parameter'>Function Arguments</a></li>
    <li><a href='#scrape_user_with_api_args_keys'>Keys of the output</a></li>
    </ul>
    <li><a href="#proxy">Using scraper with proxy</a>
    <ul>
    <li><a href="#unauthenticatedProxy">Unauthenticated Proxy</a></li>
    <li><a href="#authenticatedProxy">Authenticated Proxy</a></li>
    </ul>
    </li>
    </li>
    </ul>
    </li>
    <li><a href="#privacy">Privacy</a></li>
    <li><a href="#license">License</a></li>
  </ol>
</details>

<!--TABLE of contents //-->
<br>
<hr>
<h2 id="Prerequisites">Prerequisites </h2>
<li> Internet Connection </li>
<li> Python 3.6+ </li>
<li> Chrome or Firefox browser installed on your machine </li>
<hr>
<h2 id="Installation"> Installation </h2>
<h3 id="sourceInstallation">Installing from the source</h3>
<p>Download the source code or clone it with:<p>

```
git clone https://github.com/shaikhsajid1111/twitter-scraper-selenium
```

<p>Open terminal inside the downloaded folder:</p>

<br>

```
 python3 setup.py install
```

<h3 id="pypiInstallation">
Installing with <a href="https://pypi.org">PyPI</a>
</h3>

```
pip3 install twitter-scraper-selenium
```

<hr>
<h2 id="Usage">
Usage</h2>
<h3 id="availableFunction">Available Function In this Package - Summary</h3>
<div>
<table>
<thead>
<tr>
<td>Function Name</td>
<td>Function Description</td>
<td>Scraping Method</td>
<td>Scraping Speed</td>
</tr>
</thead>
<tr>
<td><code>scrape_profile()</code></td>
<td>Scrape's Twitter user's profile tweets</td>
<td>Browser Automation</td>
<td>Slow</td>
</tr>
<tr>
<td><code>scrape_keyword()</code></td>
<td>Scrape's Twitter tweets using keyword provided.</td>
<td>Browser Automation</td>
<td>Slow</td>
</tr>
<tr>
<td><code>scrape_topic()</code></td>
<td>Scrape's Twitter tweets by URL. It expects the URL of the topic.</td>
<td>Browser Automation</td>
<td>Slow</td>
</tr>
<tr>
<td><code>scrape_keyword_with_api()</code></td>
<td>Scrape's Twitter tweets by query/keywords. For an advanced search, query can be built from <a href="https://developer.twitter.com/apitools/query">here</a>.</td>
<td>HTTP Request</td>
<td>Fast</td>
</tr>
<tr>
<td><code>get_profile_details()</code></td>
<td>Scrape's Twitter user details.</td>
<td>HTTP Request</td>
<td>Fast</td>
</tr>
<tr>
<td><code>scrape_topic_with_api()</code></td>
<td>Scrape's Twitter tweets by URL. It expects the URL of the topic</td>
<td>Browser Automation & HTTP Request</td>
<td>Fast</td>
</tr>
<tr>
<td><code>scrape_profile_with_api()</code></td>
<td>Scrape's Twitter tweets by twitter profile username. It expects the username of the profile</td>
<td>Browser Automation & HTTP Request</td>
<td>Fast</td>
</tr>
</table>
<p>
Note: HTTP Request Method sends the request to Twitter's API directly for scraping data, and Browser Automation visits that page, scroll while collecting the data.</p>
</div>
<br>
<hr>
<h3 id="profileDetail">To scrape twitter profile details:</h3>
<div id="profileDetailExample">

```python
from twitter_scraper_selenium import get_profile_details

twitter_username = "TwitterAPI"
filename = "twitter_api_data"
get_profile_details(twitter_username=twitter_username, filename=filename)

```
Output:
```js
{
	"id": 6253282,
	"id_str": "6253282",
	"name": "Twitter API",
	"screen_name": "TwitterAPI",
	"location": "San Francisco, CA",
	"profile_location": null,
	"description": "The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don't get an answer? It's on my website.",
	"url": "https:\/\/t.co\/8IkCzCDr19",
	"entities": {
		"url": {
			"urls": [{
				"url": "https:\/\/t.co\/8IkCzCDr19",
				"expanded_url": "https:\/\/developer.twitter.com",
				"display_url": "developer.twitter.com",
				"indices": [
					0,
					23
				]
			}]
		},
		"description": {
			"urls": []
		}
	},
	"protected": false,
	"followers_count": 6133636,
	"friends_count": 12,
	"listed_count": 12936,
	"created_at": "Wed May 23 06:01:13 +0000 2007",
	"favourites_count": 31,
	"utc_offset": null,
	"time_zone": null,
	"geo_enabled": null,
	"verified": true,
	"statuses_count": 3656,
	"lang": null,
	"contributors_enabled": null,
	"is_translator": null,
	"is_translation_enabled": null,
	"profile_background_color": null,
	"profile_background_image_url": null,
	"profile_background_image_url_https": null,
	"profile_background_tile": null,
	"profile_image_url": null,
	"profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/942858479592554497\/BbazLO9L_normal.jpg",
	"profile_banner_url": null,
	"profile_link_color": null,
	"profile_sidebar_border_color": null,
	"profile_sidebar_fill_color": null,
	"profile_text_color": null,
	"profile_use_background_image": null,
	"has_extended_profile": null,
	"default_profile": false,
	"default_profile_image": false,
	"following": null,
	"follow_request_sent": null,
	"notifications": null,
	"translator_type": null
}
```
</div>
<br>
<div id="profileDetailArgument">
<p><code>get_profile_details()</code> arguments:</p>

<table>
    <thead>
        <tr>
            <td>Argument</td>
            <td>Argument Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>twitter_username</td>
            <td>String</td>
            <td>Twitter Username</td>
        </tr>
        <tr>
            <td>output_filename</td>
            <td>String</td>
            <td>What should be the filename where output is stored?.</td>
        </tr>
        <tr>
            <td>output_dir</td>
            <td>String</td>
            <td>What directory output file should be saved?</td>
        </tr>
        <tr>
            <td>proxy</td>
            <td>String</td>
            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
        </tr>
    </tbody>
</table>

</div>
<hr>
<br>
<div>
<h4 id="profileDetailKeys">Keys of the output:</p>
Detail of each key can be found <a href="https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/user">here</a>.</h4>
</div>
<br>
<hr>
<h3 id="profile">To scrape profile's tweets:</h3>
<p id="profileJson">In JSON format:</p>

```python
from twitter_scraper_selenium import scrape_profile

microsoft = scrape_profile(twitter_username="microsoft",output_format="json",browser="firefox",tweets_count=10)
print(microsoft)
```
Output:
```javascript
{
  "1430938749840629773": {
    "tweet_id": "1430938749840629773",
    "username": "Microsoft",
    "name": "Microsoft",
    "profile_picture": "https://twitter.com/Microsoft/photo",
    "replies": 29,
    "retweets": 58,
    "likes": 453,
    "is_retweet": false,
    "retweet_link": "",
    "posted_time": "2021-08-26T17:02:38+00:00",
    "content": "Easy to use and efficient for all \u2013 Windows 11 is committed to an accessible future.\n\nHere's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW ",
    "hashtags": [],
    "mentions": [],
    "images": [],
    "videos": [],
    "tweet_url": "https://twitter.com/Microsoft/status/1430938749840629773",
    "link": "https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC"
  },...
}
```
<hr>
<p id="profileCSV">In CSV format:</p>

```python
from twitter_scraper_selenium import scrape_profile


scrape_profile(twitter_username="microsoft",output_format="csv",browser="firefox",tweets_count=10,filename="microsoft",directory="/home/user/Downloads")


```

Output:
<br>
<table class="table table-bordered table-hover table-condensed" style="line-height: 14px;overflow:hidden;white-space: nowrap">
<thead><tr><th title="Field #1">tweet_id</th>
<th title="Field #2">username</th>
<th title="Field #3">name</th>
<th title="Field #4">profile_picture</th>
<th title="Field #5">replies</th>
<th title="Field #6">retweets</th>
<th title="Field #7">likes</th>
<th title="Field #8">is_retweet</th>
<th title="Field #9">retweet_link</th>
<th title="Field #10">posted_time</th>
<th title="Field #11">content</th>
<th title="Field #12">hashtags</th>
<th title="Field #13">mentions</th>
<th title="Field #14">images</th>
<th title="Field #15">videos</th>
<th title="Field #16">post_url</th>
<th title="Field #17">link</th>
</tr></thead>
<tbody><tr>
<td>1430938749840629773</td>
<td>Microsoft</td>
<td>Microsoft</td>
<td>https://twitter.com/Microsoft/photo</td>
<td align="right">64</td>
<td align="right">75</td>
<td align="right">521</td>
<td>False</td>
<td> </td>
<td>2021-08-26T17:02:38+00:00</td>
<td>Easy to use and efficient for all – Windows 11 is committed to an accessible future.<br/><br/>Here&#39;s how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW </td>
<td>[]</td>
<td>[]</td>
<td>[]</td>
<td>[]</td>
<td>https://twitter.com/Microsoft/status/1430938749840629773</td>
<td>https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC</td>
</tr>

</tbody>
</table>
<p>...</p>

<br><hr>
<div id="profileArgument">
<p><code>scrape_profile()</code> arguments:</p>

<table>
    <thead>
        <tr>
            <td>Argument</td>
            <td>Argument Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>twitter_username</td>
            <td>String</td>
            <td>Twitter username of the account</td>
        </tr>
        <tr>
            <td>browser</td>
            <td>String</td>
            <td>Which browser to use for scraping?, Only 2 are supported Chrome and Firefox. Default is set to Firefox</td>
        </tr>
        <tr>
            <td>proxy</td>
            <td>String</td>
            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
        </tr>
        <tr>
            <td>tweets_count</td>
            <td>Integer</td>
            <td>Number of posts to scrape. Default is 10.</td>
        </tr>
        <tr>
            <td>output_format</td>
            <td>String</td>
            <td>The output format, whether JSON or CSV. Default is JSON.</td>
        </tr>
        <tr>
            <td>filename</td>
            <td>String</td>
            <td>If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as username passed.</td>
        </tr>
        <tr>
            <td>directory</td>
            <td>String</td>
            <td>If output_format parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.</td>
        </tr>
        <tr>
            <td>headless</td>
            <td>Boolean</td>
            <td>Whether to run crawler headlessly?. Default is <code>True</code></td>
        </tr>
    </tbody>
</table>

</div>
<hr>
<br>
<div id="profileOutput">
<p>Keys of the output</p>

<table>
    <thead>
        <tr>
            <td>Key</td>
            <td>Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>tweet_id</td>
            <td>String</td>
            <td>Post Identifier(integer casted inside string)</td>
        </tr>
        <tr>
            <td>username</td>
            <td>String</td>
            <td>Username of the profile</td>
        </tr>
        <tr>
            <td>name</td>
            <td>String</td>
            <td>Name of the profile</td>
        </tr>
        <tr>
            <td>profile_picture</td>
            <td>String</td>
            <td>Profile Picture link</td>
        </tr>
        <tr>
            <td>replies</td>
            <td>Integer</td>
            <td>Number of replies of tweet</td>
        </tr>
        <tr>
            <td>retweets</td>
            <td>Integer</td>
            <td>Number of retweets of tweet</td>
        </tr>
        <tr>
            <td>likes</td>
            <td>Integer</td>
            <td>Number of likes of tweet</td>
        </tr>
        <tr>
            <td>is_retweet</td>
            <td>boolean</td>
            <td>Is the tweet a retweet?</td>
        </tr>
        <tr>
            <td>retweet_link</td>
            <td>String</td>
            <td>If it is retweet, then the retweet link else it'll be empty string</td>
        </tr>
        <tr>
            <td>posted_time</td>
            <td>String</td>
            <td>Time when tweet was posted in ISO 8601 format</td>
        </tr>
        <tr>
            <td>content</td>
            <td>String</td>
            <td>content of tweet as text</td>
        </tr>
        <tr>
            <td>hashtags</td>
            <td>Array</td>
            <td>Hashtags presents in tweet, if they're present in tweet</td>
        </tr>
        <tr>
            <td>mentions</td>
            <td>Array</td>
            <td>Mentions presents in tweet, if they're present in tweet</td>
        </tr>
        <tr>
            <td>images</td>
            <td>Array</td>
            <td>Images links, if they're present in tweet</td>
        </tr>
        <tr>
            <td>videos</td>
            <td>Array</td>
            <td>Videos links, if they're present in tweet</td>
        </tr>
        <tr>
            <td>tweet_url</td>
            <td>String</td>
            <td>URL of the tweet</td>
        </tr>
        <tr>
            <td>link</td>
            <td>String</td>
            <td>If any link is present inside tweet for some external website. </td>
        </tr>
    </tbody>
</table>
</div>
<br>
<hr>
<h3 id="keywordAPI">To scrape tweets using keywords with API:</h3>
<div>

```python
from twitter_scraper_selenium import scrape_keyword_with_api

query = "#gaming"
tweets_count = 10
output_filename = "gaming_hashtag_data"
scrape_keyword_with_api(query=query, tweets_count=tweets_count, output_filename=output_filename)

```
Output:
```js
{
  "1583821467732480001": {
    "tweet_url" : "https://twitter.com/yakubblackbeard/status/1583821467732480001",
    "tweet_details":{
      ...
    },
    "user_details":{
      ...
    }
  }, ...
}
```
</div>
<br>
<div id="scrape_keyword_with_apiArgs">
<p><code>scrape_keyword_with_api()</code> arguments:</p>

<table>
    <thead>
        <tr>
            <td>Argument</td>
            <td>Argument Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>query</td>
            <td>String</td>
            <td>Query to search. The query can be built from <a href="https://developer.twitter.com/apitools/query">here</a> for advanced search.</td>
        </tr>
        <tr>
            <td>tweets_count</td>
            <td>Integer</td>
            <td>Number of tweets to scrape.</td>
        </tr>
        <tr>
            <td>output_filename</td>
            <td>String</td>
            <td>What should be the filename where output is stored?.</td>
        </tr>
        <tr>
            <td>output_dir</td>
            <td>String</td>
            <td>What directory output file should be saved?</td>
        </tr>
        <tr>
            <td>proxy</td>
            <td>String</td>
            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
        </tr>
    </tbody>
</table>

</div>
<hr>
<br>
<div>
<p id="scrape_keyword_with_apiKeys">Keys of the output:</p>
<table>
<thead>
        <tr>
            <td>Key</td>
            <td>Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
    <tr>
    <td>tweet_url</td>
    <td>String</td>
    <td>URL of the tweet.</td>
    </tr>
    <tr>
    <td>tweet_details</td>
    <td>Dictionary</td>
    <td>A dictionary containing the data about the tweet. All fields which will be available inside can be checked <a href="https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet">here<a></td>
    </tr>
    <tr>
    <td>user_details</td>
    <td>Dictionary</td>
    <td>A dictionary containing the data about the tweet owner. All fields which will be available inside can be checked <a href="https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/user">here<a></td>
    </tr>
    </tbody>
</table>
</div>
<br>
<br>
<hr>
<h3>To scrape tweets using keywords with browser automation</h3>
<div>
<p id="keywordJson">In JSON format:</p>

```python
from twitter_scraper_selenium import scrape_keyword
#scrape 10 posts by searching keyword "india" from date 30th August till date 31st August
india = scrape_keyword(keyword="india", browser="firefox",
                      tweets_count=10,output_format="json" ,until="2021-08-31", since="2021-08-30")
print(india)

```
Output:
```javascript
{
  "1432493306152243200": {
    "tweet_id": "1432493306152243200",
    "username": "TOICitiesNews",
    "name": "TOI Cities",
    "profile_picture": "https://twitter.com/TOICitiesNews/photo",
    "replies": 0,
    "retweets": 0,
    "likes": 0,
    "is_retweet": false,
    "posted_time": "2021-08-30T23:59:53+00:00",
    "content": "Paralympians rake in medals, India Inc showers them with rewards",
    "hashtags": [],
    "mentions": [],
    "images": [],
    "videos": [],
    "tweet_url": "https://twitter.com/TOICitiesNews/status/1432493306152243200",
    "link": "https://t.co/odmappLovL?amp=1"
  },...
}
```
</div>
<br>
<hr>
<div id="keywordCSV">
<p>In CSV format:</p>

```python
from twitter_scraper_selenium import scrape_keyword

scrape_keyword(keyword="india", browser="firefox",
                      tweets_count=10, until="2021-08-31", since="2021-08-30",output_format="csv",filename="india")
```
<br>
Output:
<table class="table table-bordered table-hover table-condensed" style="line-height: 14px;overflow:hidden;white-space: nowrap">
<thead><tr><th title="Field #1">tweet_id</th>
<th title="Field #2">username</th>
<th title="Field #3">name</th>
<th title="Field #4">profile_picture</th>
<th title="Field #5">replies</th>
<th title="Field #6">retweets</th>
<th title="Field #7">likes</th>
<th title="Field #8">is_retweet</th>
<th title="Field #9">posted_time</th>
<th title="Field #10">content</th>
<th title="Field #11">hashtags</th>
<th title="Field #12">mentions</th>
<th title="Field #13">images</th>
<th title="Field #14">videos</th>
<th title="Field #15">tweet_url</th>
<th title="Field #16">link</th>
</tr></thead>
<tbody>

<tr>
<td>1432493306152243200</td>
<td>TOICitiesNews</td>
<td>TOI Cities</td>
<td>https://twitter.com/TOICitiesNews/photo</td>
<td>0</td>
<td align="right">0</td>
<td align="right">0</td>
<td>False</td>
<td>2021-08-30T23:59:53+00:00</td>
<td>Paralympians rake in medals, India Inc showers them with rewards</td>
<td>[]</td>
<td>[]</td>
<td>[]</td>
<td>[]</td>
<td>https://twitter.com/TOICitiesNews/status/1432493306152243200</td>
<td>https://t.co/odmappLovL?amp=1</td>
</tr>

</tbody></table>
<p> ... </p>
</div>
<hr>
<br>
<div id="keywordArgument">
<p><code>scrape_keyword()</code> arguments:</p>

<table>
    <thead>
        <tr>
            <td>Argument</td>
            <td>Argument Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>keyword</td>
            <td>String</td>
            <td>Keyword to search on twitter. </td>
        </tr>
        <tr>
            <td>browser</td>
            <td>String</td>
            <td>Which browser to use for scraping?, Only 2 are supported Chrome and Firefox,default is set to Firefox.</td>
        </tr>
        <tr>
            <td>until</td>
            <td>String</td>
            <td>Optional parameter, Until date for scraping, a end date from where search ends. Format for date is YYYY-MM-DD.</td>
        </tr>
        <tr>
            <td>since </td>
            <td>String</td>
            <td>Optional parameter, Since date for scraping, a past date from where to search from. Format for date is YYYY-MM-DD.</td>
        </tr>
        <tr>
            <td>proxy</td>
            <td>Integer</td>
            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port</td>
        </tr>
        <tr>
            <td>tweets_count</td>
            <td>Integer</td>
            <td>Number of posts to scrape. Default is 10.</td>
        </tr>
        <tr>
            <td>output_format</td>
            <td>String</td>
            <td>The output format, whether JSON or CSV. Default is JSON.</td>
        </tr>
        <tr>
            <td>filename</td>
            <td>String</td>
            <td>If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as keyword passed.</td>
        </tr>
        <tr>
            <td>directory</td>
            <td>String</td>
            <td>If output parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.</td>
        </tr>
        <tr>
            <td>since_id</td>
            <td>Integer</td>
            <td>After (NOT inclusive) a specified Snowflake ID. Example <a href="https://twitter.com/search?q=since_id%3A1138872932887924737%20max_id%3A1144730280353247233%20%23nasamoontunes&src=typed_query&f=live">here</a></td>
        </tr>
        <tr>
            <td>max_id</td>
            <td>Integer</td>
            <td>At or before (inclusive) a specified Snowflake ID. Example <a href="https://twitter.com/search?q=since_id%3A1138872932887924737%20max_id%3A1144730280353247233%20%23nasamoontunes&src=typed_query&f=live">here</a></td>
        </tr>
        <tr>
            <td>within_time</td>
            <td>String</td>
            <td>Search within the last number of days, hours, minutes, or seconds. Example <code>2d, 3h, 5m, 30s</code>.</td>
        </tr>
        <tr>
            <td>headless</td>
            <td>Boolean</td>
            <td>Whether to run crawler headlessly?. Default is <code>True</code></td>
        </tr>
    </tbody>
</table>
</div>
<hr>
<div id="keywordOutput">
<p>Keys of the output</p>

<table>
    <thead>
        <tr>
            <td>Key</td>
            <td>Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>tweet_id</td>
            <td>String</td>
            <td>Post Identifier(integer casted inside string)</td>
        </tr>
        <tr>
            <td>username</td>
            <td>String</td>
            <td>Username of the profile</td>
        </tr>
        <tr>
            <td>name</td>
            <td>String</td>
            <td>Name of the profile</td>
        </tr>
        <tr>
            <td>profile_picture</td>
            <td>String</td>
            <td>Profile Picture link</td>
        </tr>
        <tr>
            <td>replies</td>
            <td>Integer</td>
            <td>Number of replies of tweet</td>
        </tr>
        <tr>
            <td>retweets</td>
            <td>Integer</td>
            <td>Number of retweets of tweet</td>
        </tr>
        <tr>
            <td>likes</td>
            <td>Integer</td>
            <td>Number of likes of tweet</td>
        </tr>
        <tr>
            <td>is_retweet</td>
            <td>boolean</td>
            <td>Is the tweet a retweet?</td>
        </tr>
        <tr>
            <td>posted_time</td>
            <td>String</td>
            <td>Time when tweet was posted in ISO 8601 format</td>
        </tr>
        <tr>
            <td>content</td>
            <td>String</td>
            <td>content of tweet as text</td>
        </tr>
        <tr>
            <td>hashtags</td>
            <td>Array</td>
            <td>Hashtags presents in tweet, if they're present in tweet</td>
        </tr>
        <tr>
            <td>mentions</td>
            <td>Array</td>
            <td>Mentions presents in tweet, if they're present in tweet</td>
        </tr>
        <tr>
            <td>images</td>
            <td>Array</td>
            <td>Images links, if they're present in tweet</td>
        </tr>
        <tr>
            <td>videos</td>
            <td>Array</td>
            <td>Videos links, if they're present in tweet</td>
        </tr>
        <tr>
            <td>tweet_url</td>
            <td>String</td>
            <td>URL of the tweet</td>
        </tr>
        <tr>
            <td>link</td>
            <td>String</td>
            <td>If any link is present inside tweet for some external website. </td>
        </tr>
    </tbody>
</table>
</div>
<br>
<hr>
<br>
<h3 id="scrape_with_api">To scrape topic tweets with URL using API </h3>

```python
from twitter_scraper_selenium import scrape_topic_with_api

topic_url = 'https://twitter.com/i/topics/1468157909318045697'
scrape_topic_with_api(URL=topic_url, output_filename='solana_cryptocurrency', tweets_count=50)
```

Output:
```js
{
  "1584979408338632705": {
    "tweet_url" : "https://twitter.com/AptosBullCNFT/status/1584979408338632705",
    "tweet_details":{
      ...
    },
    "user_details":{
      ...
    }
  }, ...
}
```


<div id="scrape_topic_with_api_args">
<p><code>scrape_topic_with_api()</code> arguments: </p>


<table>
    <thead>
        <tr>
            <td>Argument</td>
            <td>Argument Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>URL</td>
            <td>String</td>
            <td>Twitter's Topic URL</td>
        </tr>
        <tr>
            <td>tweets_count</td>
            <td>Integer</td>
            <td>Number of tweets to scrape.</td>
        </tr>
        <tr>
            <td>output_filename</td>
            <td>String</td>
            <td>What should be the filename where output is stored?.</td>
        </tr>
        <tr>
            <td>output_dir</td>
            <td>String</td>
            <td>What directory output file should be saved?</td>
        </tr>
        <tr>
            <td>proxy</td>
            <td>String</td>
            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
        </tr>
        <tr>
            <td>browser</td>
            <td>String</td>
            <td>Which browser to use for extracting out graphql key. Default is firefox.</td>
        </tr>
        <tr>
            <td>headless</td>
            <td>String</td>
            <td>Whether to run browser in headless mode?</td>
        </tr>
    </tbody>
</table>
</div>
<hr>

<div id="scrape_topic_with_api_args_keys"> <p>Keys of the output:<p>
  Same as <a href="#scrape_keyword_with_apiKeys">scrape_keyword_with_api</a>
</div>
<br>
<hr>

<h3 id="to-scrape-topic-tweets-with-url"> To scrape topic tweets with URL using browser automation: </h3>

```python
from twitter_scraper_selenium import scrape_topic
# scrape 10 tweets from steam deck topic on twitter
data = scrape_topic(filename="steamdeck", url='https://twitter.com/i/topics/1415728297065861123',
                     browser="firefox", tweets_count=10)
```

<div id="scrape_topic_with_api_args_keys"> <p>Keys of the output:<p>
  Same as <a href="#profileOutput">scrape_profile</a>
</div>
<hr>


<div id="topicArgument">
<p><code>scrape_topic()</code> arguments:</p>


| Arguments     | Argument <br> Type | Description                                                                                                                            |
|---------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------|
| filename      | str                | Filename to write result output.                                                                                                       |
| URL           | str                | Topic URL.                                                                                                                             |
| browser       | str                | Which browser to use for scraping? <br> Only 2 are supported Chrome and Firefox. default firefox                                       |
| proxy         | str                | If user wants to use proxy for scraping. <br> If the proxy is authenticated proxy then the proxy format is username:password@host:port |
| tweets_count  | int                | Number of posts to scrape. default 10.                                                                                                  |
| output_format | str                | The output format whether JSON or CSV. Default json.                                                                                   |
| directory     | str                | Directory to save output file. Deafult current working directory.                                                                      |
<br>
<hr>
<div id="to-scrape-user-tweets-with-api">

<p>To Scrap profile's tweets with API:</p>

```python
from twitter_scraper_selenium import scrape_profile_with_api

scrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)
```
</div>
<br>
<div id="users_api_parameter">
<p><code>scrape_profile_with_api()</code> Arguments:<p>
<table>
    <thead>
        <tr>
            <td>Argument</td>
            <td>Argument Type</td>
            <td>Description</td>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>username</td>
            <td>String</td>
            <td>Twitter's Profile username</td>
        </tr>
        <tr>
            <td>tweets_count</td>
            <td>Integer</td>
            <td>Number of tweets to scrape.</td>
        </tr>
        <tr>
            <td>output_filename</td>
            <td>String</td>
            <td>What should be the filename where output is stored?.</td>
        </tr>
        <tr>
            <td>output_dir</td>
            <td>String</td>
            <td>What directory output file should be saved?</td>
        </tr>
        <tr>
            <td>proxy</td>
            <td>String</td>
            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>
        </tr>
        <tr>
            <td>browser</td>
            <td>String</td>
            <td>Which browser to use for extracting out graphql key. Default is firefox.</td>
        </tr>
        <tr>
            <td>headless</td>
            <td>String</td>
            <td>Whether to run browser in headless mode?</td>
        </tr>
    </tbody>
</table>
</div>
<br>
<div id="scrape_user_with_api_args_keys"> <p>Output:<p>

```js
{
  "1608939190548598784": {
    "tweet_url" : "https://twitter.com/elonmusk/status/1608939190548598784",
    "tweet_details":{
      ...
    },
    "user_details":{
      ...
    }
  }, ...
}
```

</div>
<br>
<hr>
</div>

<h3 id="proxy"> Using scraper with proxy (http proxy) </h3>

<div id="unauthenticatedProxy">
<p>Just pass <code>proxy</code> argument to function.</p>

```python
from twitter_scraper_selenium import scrape_keyword

scrape_keyword(keyword="#india", browser="firefox",tweets_count=10,output="csv",filename="india",
proxy="66.115.38.247:5678") #In IP:PORT format

```
</div>

<br>
<div id="authenticatedProxy">
<p> Proxy that requires authentication: </p>

```python

from twitter_scraper_selenium import scrape_profile

microsoft_data = scrape_profile(twitter_username="microsoft", browser="chrome", tweets_count=10, output="json",
                      proxy="sajid:pass123@66.115.38.247:5678")  #  username:password@IP:PORT
print(microsoft_data)


```

</div>
<br>
<hr>
<div id="privacy">
<h2>Privacy</h2>

<p>
This scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrape anything private.
</p>
</div>
<br>
<hr>
<div id="license">
<h2>LICENSE</h2>

MIT
</div>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/shaikhsajid1111/twitter-scraper-selenium",
    "name": "twitter-scraper-selenium",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "web-scraping selenium social media twitter keyword twitter-profile twitter-keywords automation json csv twitter-hashtag hashtag",
    "author": "Sajid Shaikh",
    "author_email": "shaikhsajid3732@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/ee/30/f391ccfe52b741e0763c3e87f8e49dced32663ed1f0e5797de640eda1611/twitter_scraper_selenium-5.0.0.tar.gz",
    "platform": null,
    "description": "<h1> Twitter scraper selenium </h1>\r\n<p> Python's package to scrape Twitter's front-end easily with selenium.  </p>\r\n\r\n\r\n[![PyPI license](https://img.shields.io/pypi/l/ansicolortags.svg)](https://opensource.org/licenses/MIT) [![Python >=3.6.9](https://img.shields.io/badge/python-3.6+-blue.svg)](https://www.python.org/downloads/release/python-360/)\r\n[![Maintenance](https://img.shields.io/badge/Maintained-Yes-green.svg)](https://github.com/shaikhsajid1111/facebook_page_scraper/graphs/commit-activity)\r\n\r\n<!--TABLE of contents-->\r\n<h2> Table of Contents </h2>\r\n<details open=\"open\">\r\n  <summary>Table of Contents</summary>\r\n  <ol>\r\n    <li>\r\n      <a href=\"#getting-started\">Getting Started</a>\r\n      <ul>\r\n        <li><a href=\"#Prerequisites\">Prerequisites</a></li>\r\n        <li><a href=\"#Installation\">Installation</a>\r\n        <ul>\r\n        <li><a href=\"#sourceInstallation\">Installing from source</a></li>\r\n        <li><a href=\"#pypiInstallation\">Installing with PyPI</a></li>\r\n        </ul>\r\n        </li>\r\n      </ul>\r\n    </li>\r\n    <li><a href=\"#Usage\">Usage</a>\r\n    <ul><li><a href=\"#availableFunction\">Available Functions in this package- Summary</a></li></ul>\r\n    <ul><li><a href=\"#profileDetail\">Scraping profile's details</a>\r\n    <ul>\r\n    <li><a href=\"#profileDetailExample\">In JSON Format - Example</a></li>\r\n    <li><a href=\"#profileDetailArgument\">Function Argument</a></li>\r\n    <li><a href=\"#profileDetailKeys\">Keys of the output</a></li>\r\n    </ul>\r\n    </li></ul>\r\n    <!---->\r\n    <ul>\r\n    <li><a href=\"#profile\">Scraping profile's tweets</a>\r\n    <ul>\r\n    <li><a href=\"#profileJson\">In JSON format - Example</a></li>\r\n    <li><a href=\"#profileCSV\">In CSV format - Example</a></li>\r\n    <li><a href=\"#profileArgument\">Function Arguments</a></li>\r\n    <li><a href=\"#profileOutput\">Keys of the output data</a></li>\r\n    </ul>\r\n    <li><a href=\"#keywordAPI\">Scraping tweets using query/keyword with API</a>\r\n    <ul>\r\n    <li><a href=\"#keywordAPI\">In JSON Format - Example</a></li>\r\n    <li><a href=\"#scrape_keyword_with_apiArgs\">Function Argument</a></li>\r\n    <li><a href=\"#scrape_keyword_with_apiKeys\">Keys of the output.</a></li>\r\n    </ul>\r\n    </li>\r\n    <li><a href=\"#keyword\">Scraping tweets using keywords with browser automation</a>\r\n    <ul>\r\n    <li><a href=\"#keywordJson\">In JSON format - Example</a></li>\r\n    <li><a href=\"#keywordCSV\">In CSV format - Example</a></li>\r\n    <li><a href=\"#keywordArgument\">Function Arguments</a></li>\r\n    <li><a href=\"#keywordOutput\">Keys of the output data</a></li>\r\n    </ul>\r\n    </li>\r\n    <li><a href=\"#scrape_with_api\">Scraping tweets using topic url with API</a></li>\r\n    <ul>\r\n    <li><a href=\"#scrape_with_api\">In JSON format -  Example</a></li>\r\n    <li><a href=\"#scrape_topic_with_api_args\">Function Arguments</a></li>\r\n    <li><a href=\"#scrape_topic_with_api_args_keys\">Keys of the output</a></li>\r\n    </ul>\r\n    <li><a href=\"#to-scrape-topic-tweets-with-url\">Scraping tweets using topic url - Example</a></li>\r\n    <ul>\r\n    <li><a href=\"#scrape_topic_with_api_args\">In JSON format -  Example</a></li>\r\n    <li><a href=\"#topicArgument\">Function Arguments</a></li>\r\n    <li><a href=\"#profileOutput\">Keys of the output:</a></li>\r\n    </ul>\r\n    <li><a href='#to-scrape-user-tweets-with-api'>Scraping user's tweet using API</a></li>\r\n    <ul>\r\n    <li><a href='#to-scrape-user-tweets-with-api'>In JSON format - Example</a></li>\r\n    <li><a href='#users_api_parameter'>Function Arguments</a></li>\r\n    <li><a href='#scrape_user_with_api_args_keys'>Keys of the output</a></li>\r\n    </ul>\r\n    <li><a href=\"#proxy\">Using scraper with proxy</a>\r\n    <ul>\r\n    <li><a href=\"#unauthenticatedProxy\">Unauthenticated Proxy</a></li>\r\n    <li><a href=\"#authenticatedProxy\">Authenticated Proxy</a></li>\r\n    </ul>\r\n    </li>\r\n    </li>\r\n    </ul>\r\n    </li>\r\n    <li><a href=\"#privacy\">Privacy</a></li>\r\n    <li><a href=\"#license\">License</a></li>\r\n  </ol>\r\n</details>\r\n\r\n<!--TABLE of contents //-->\r\n<br>\r\n<hr>\r\n<h2 id=\"Prerequisites\">Prerequisites </h2>\r\n<li> Internet Connection </li>\r\n<li> Python 3.6+ </li>\r\n<li> Chrome or Firefox browser installed on your machine </li>\r\n<hr>\r\n<h2 id=\"Installation\"> Installation </h2>\r\n<h3 id=\"sourceInstallation\">Installing from the source</h3>\r\n<p>Download the source code or clone it with:<p>\r\n\r\n```\r\ngit clone https://github.com/shaikhsajid1111/twitter-scraper-selenium\r\n```\r\n\r\n<p>Open terminal inside the downloaded folder:</p>\r\n\r\n<br>\r\n\r\n```\r\n python3 setup.py install\r\n```\r\n\r\n<h3 id=\"pypiInstallation\">\r\nInstalling with <a href=\"https://pypi.org\">PyPI</a>\r\n</h3>\r\n\r\n```\r\npip3 install twitter-scraper-selenium\r\n```\r\n\r\n<hr>\r\n<h2 id=\"Usage\">\r\nUsage</h2>\r\n<h3 id=\"availableFunction\">Available Function In this Package - Summary</h3>\r\n<div>\r\n<table>\r\n<thead>\r\n<tr>\r\n<td>Function Name</td>\r\n<td>Function Description</td>\r\n<td>Scraping Method</td>\r\n<td>Scraping Speed</td>\r\n</tr>\r\n</thead>\r\n<tr>\r\n<td><code>scrape_profile()</code></td>\r\n<td>Scrape's Twitter user's profile tweets</td>\r\n<td>Browser Automation</td>\r\n<td>Slow</td>\r\n</tr>\r\n<tr>\r\n<td><code>scrape_keyword()</code></td>\r\n<td>Scrape's Twitter tweets using keyword provided.</td>\r\n<td>Browser Automation</td>\r\n<td>Slow</td>\r\n</tr>\r\n<tr>\r\n<td><code>scrape_topic()</code></td>\r\n<td>Scrape's Twitter tweets by URL. It expects the URL of the topic.</td>\r\n<td>Browser Automation</td>\r\n<td>Slow</td>\r\n</tr>\r\n<tr>\r\n<td><code>scrape_keyword_with_api()</code></td>\r\n<td>Scrape's Twitter tweets by query/keywords. For an advanced search, query can be built from <a href=\"https://developer.twitter.com/apitools/query\">here</a>.</td>\r\n<td>HTTP Request</td>\r\n<td>Fast</td>\r\n</tr>\r\n<tr>\r\n<td><code>get_profile_details()</code></td>\r\n<td>Scrape's Twitter user details.</td>\r\n<td>HTTP Request</td>\r\n<td>Fast</td>\r\n</tr>\r\n<tr>\r\n<td><code>scrape_topic_with_api()</code></td>\r\n<td>Scrape's Twitter tweets by URL. It expects the URL of the topic</td>\r\n<td>Browser Automation & HTTP Request</td>\r\n<td>Fast</td>\r\n</tr>\r\n<tr>\r\n<td><code>scrape_profile_with_api()</code></td>\r\n<td>Scrape's Twitter tweets by twitter profile username. It expects the username of the profile</td>\r\n<td>Browser Automation & HTTP Request</td>\r\n<td>Fast</td>\r\n</tr>\r\n</table>\r\n<p>\r\nNote: HTTP Request Method sends the request to Twitter's API directly for scraping data, and Browser Automation visits that page, scroll while collecting the data.</p>\r\n</div>\r\n<br>\r\n<hr>\r\n<h3 id=\"profileDetail\">To scrape twitter profile details:</h3>\r\n<div id=\"profileDetailExample\">\r\n\r\n```python\r\nfrom twitter_scraper_selenium import get_profile_details\r\n\r\ntwitter_username = \"TwitterAPI\"\r\nfilename = \"twitter_api_data\"\r\nget_profile_details(twitter_username=twitter_username, filename=filename)\r\n\r\n```\r\nOutput:\r\n```js\r\n{\r\n\t\"id\": 6253282,\r\n\t\"id_str\": \"6253282\",\r\n\t\"name\": \"Twitter API\",\r\n\t\"screen_name\": \"TwitterAPI\",\r\n\t\"location\": \"San Francisco, CA\",\r\n\t\"profile_location\": null,\r\n\t\"description\": \"The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don't get an answer? It's on my website.\",\r\n\t\"url\": \"https:\\/\\/t.co\\/8IkCzCDr19\",\r\n\t\"entities\": {\r\n\t\t\"url\": {\r\n\t\t\t\"urls\": [{\r\n\t\t\t\t\"url\": \"https:\\/\\/t.co\\/8IkCzCDr19\",\r\n\t\t\t\t\"expanded_url\": \"https:\\/\\/developer.twitter.com\",\r\n\t\t\t\t\"display_url\": \"developer.twitter.com\",\r\n\t\t\t\t\"indices\": [\r\n\t\t\t\t\t0,\r\n\t\t\t\t\t23\r\n\t\t\t\t]\r\n\t\t\t}]\r\n\t\t},\r\n\t\t\"description\": {\r\n\t\t\t\"urls\": []\r\n\t\t}\r\n\t},\r\n\t\"protected\": false,\r\n\t\"followers_count\": 6133636,\r\n\t\"friends_count\": 12,\r\n\t\"listed_count\": 12936,\r\n\t\"created_at\": \"Wed May 23 06:01:13 +0000 2007\",\r\n\t\"favourites_count\": 31,\r\n\t\"utc_offset\": null,\r\n\t\"time_zone\": null,\r\n\t\"geo_enabled\": null,\r\n\t\"verified\": true,\r\n\t\"statuses_count\": 3656,\r\n\t\"lang\": null,\r\n\t\"contributors_enabled\": null,\r\n\t\"is_translator\": null,\r\n\t\"is_translation_enabled\": null,\r\n\t\"profile_background_color\": null,\r\n\t\"profile_background_image_url\": null,\r\n\t\"profile_background_image_url_https\": null,\r\n\t\"profile_background_tile\": null,\r\n\t\"profile_image_url\": null,\r\n\t\"profile_image_url_https\": \"https:\\/\\/pbs.twimg.com\\/profile_images\\/942858479592554497\\/BbazLO9L_normal.jpg\",\r\n\t\"profile_banner_url\": null,\r\n\t\"profile_link_color\": null,\r\n\t\"profile_sidebar_border_color\": null,\r\n\t\"profile_sidebar_fill_color\": null,\r\n\t\"profile_text_color\": null,\r\n\t\"profile_use_background_image\": null,\r\n\t\"has_extended_profile\": null,\r\n\t\"default_profile\": false,\r\n\t\"default_profile_image\": false,\r\n\t\"following\": null,\r\n\t\"follow_request_sent\": null,\r\n\t\"notifications\": null,\r\n\t\"translator_type\": null\r\n}\r\n```\r\n</div>\r\n<br>\r\n<div id=\"profileDetailArgument\">\r\n<p><code>get_profile_details()</code> arguments:</p>\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Argument</td>\r\n            <td>Argument Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>twitter_username</td>\r\n            <td>String</td>\r\n            <td>Twitter Username</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_filename</td>\r\n            <td>String</td>\r\n            <td>What should be the filename where output is stored?.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_dir</td>\r\n            <td>String</td>\r\n            <td>What directory output file should be saved?</td>\r\n        </tr>\r\n        <tr>\r\n            <td>proxy</td>\r\n            <td>String</td>\r\n            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n\r\n</div>\r\n<hr>\r\n<br>\r\n<div>\r\n<h4 id=\"profileDetailKeys\">Keys of the output:</p>\r\nDetail of each key can be found <a href=\"https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/user\">here</a>.</h4>\r\n</div>\r\n<br>\r\n<hr>\r\n<h3 id=\"profile\">To scrape profile's tweets:</h3>\r\n<p id=\"profileJson\">In JSON format:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\nmicrosoft = scrape_profile(twitter_username=\"microsoft\",output_format=\"json\",browser=\"firefox\",tweets_count=10)\r\nprint(microsoft)\r\n```\r\nOutput:\r\n```javascript\r\n{\r\n  \"1430938749840629773\": {\r\n    \"tweet_id\": \"1430938749840629773\",\r\n    \"username\": \"Microsoft\",\r\n    \"name\": \"Microsoft\",\r\n    \"profile_picture\": \"https://twitter.com/Microsoft/photo\",\r\n    \"replies\": 29,\r\n    \"retweets\": 58,\r\n    \"likes\": 453,\r\n    \"is_retweet\": false,\r\n    \"retweet_link\": \"\",\r\n    \"posted_time\": \"2021-08-26T17:02:38+00:00\",\r\n    \"content\": \"Easy to use and efficient for all \\u2013 Windows 11 is committed to an accessible future.\\n\\nHere's how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW \",\r\n    \"hashtags\": [],\r\n    \"mentions\": [],\r\n    \"images\": [],\r\n    \"videos\": [],\r\n    \"tweet_url\": \"https://twitter.com/Microsoft/status/1430938749840629773\",\r\n    \"link\": \"https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC\"\r\n  },...\r\n}\r\n```\r\n<hr>\r\n<p id=\"profileCSV\">In CSV format:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\n\r\nscrape_profile(twitter_username=\"microsoft\",output_format=\"csv\",browser=\"firefox\",tweets_count=10,filename=\"microsoft\",directory=\"/home/user/Downloads\")\r\n\r\n\r\n```\r\n\r\nOutput:\r\n<br>\r\n<table class=\"table table-bordered table-hover table-condensed\" style=\"line-height: 14px;overflow:hidden;white-space: nowrap\">\r\n<thead><tr><th title=\"Field #1\">tweet_id</th>\r\n<th title=\"Field #2\">username</th>\r\n<th title=\"Field #3\">name</th>\r\n<th title=\"Field #4\">profile_picture</th>\r\n<th title=\"Field #5\">replies</th>\r\n<th title=\"Field #6\">retweets</th>\r\n<th title=\"Field #7\">likes</th>\r\n<th title=\"Field #8\">is_retweet</th>\r\n<th title=\"Field #9\">retweet_link</th>\r\n<th title=\"Field #10\">posted_time</th>\r\n<th title=\"Field #11\">content</th>\r\n<th title=\"Field #12\">hashtags</th>\r\n<th title=\"Field #13\">mentions</th>\r\n<th title=\"Field #14\">images</th>\r\n<th title=\"Field #15\">videos</th>\r\n<th title=\"Field #16\">post_url</th>\r\n<th title=\"Field #17\">link</th>\r\n</tr></thead>\r\n<tbody><tr>\r\n<td>1430938749840629773</td>\r\n<td>Microsoft</td>\r\n<td>Microsoft</td>\r\n<td>https://twitter.com/Microsoft/photo</td>\r\n<td align=\"right\">64</td>\r\n<td align=\"right\">75</td>\r\n<td align=\"right\">521</td>\r\n<td>False</td>\r\n<td> </td>\r\n<td>2021-08-26T17:02:38+00:00</td>\r\n<td>Easy to use and efficient for all \u2013 Windows 11 is committed to an accessible future.<br/><br/>Here&#39;s how it empowers everyone to create, connect, and achieve more: https://msft.it/6009X6tbW </td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>https://twitter.com/Microsoft/status/1430938749840629773</td>\r\n<td>https://blogs.windows.com/windowsexperience/2021/07/01/whats-coming-in-windows-11-accessibility/?ocid=FY22_soc_omc_br_tw_Windows_AC</td>\r\n</tr>\r\n\r\n</tbody>\r\n</table>\r\n<p>...</p>\r\n\r\n<br><hr>\r\n<div id=\"profileArgument\">\r\n<p><code>scrape_profile()</code> arguments:</p>\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Argument</td>\r\n            <td>Argument Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>twitter_username</td>\r\n            <td>String</td>\r\n            <td>Twitter username of the account</td>\r\n        </tr>\r\n        <tr>\r\n            <td>browser</td>\r\n            <td>String</td>\r\n            <td>Which browser to use for scraping?, Only 2 are supported Chrome and Firefox. Default is set to Firefox</td>\r\n        </tr>\r\n        <tr>\r\n            <td>proxy</td>\r\n            <td>String</td>\r\n            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweets_count</td>\r\n            <td>Integer</td>\r\n            <td>Number of posts to scrape. Default is 10.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_format</td>\r\n            <td>String</td>\r\n            <td>The output format, whether JSON or CSV. Default is JSON.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>filename</td>\r\n            <td>String</td>\r\n            <td>If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as username passed.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>directory</td>\r\n            <td>String</td>\r\n            <td>If output_format parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>headless</td>\r\n            <td>Boolean</td>\r\n            <td>Whether to run crawler headlessly?. Default is <code>True</code></td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n\r\n</div>\r\n<hr>\r\n<br>\r\n<div id=\"profileOutput\">\r\n<p>Keys of the output</p>\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Key</td>\r\n            <td>Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>tweet_id</td>\r\n            <td>String</td>\r\n            <td>Post Identifier(integer casted inside string)</td>\r\n        </tr>\r\n        <tr>\r\n            <td>username</td>\r\n            <td>String</td>\r\n            <td>Username of the profile</td>\r\n        </tr>\r\n        <tr>\r\n            <td>name</td>\r\n            <td>String</td>\r\n            <td>Name of the profile</td>\r\n        </tr>\r\n        <tr>\r\n            <td>profile_picture</td>\r\n            <td>String</td>\r\n            <td>Profile Picture link</td>\r\n        </tr>\r\n        <tr>\r\n            <td>replies</td>\r\n            <td>Integer</td>\r\n            <td>Number of replies of tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>retweets</td>\r\n            <td>Integer</td>\r\n            <td>Number of retweets of tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>likes</td>\r\n            <td>Integer</td>\r\n            <td>Number of likes of tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>is_retweet</td>\r\n            <td>boolean</td>\r\n            <td>Is the tweet a retweet?</td>\r\n        </tr>\r\n        <tr>\r\n            <td>retweet_link</td>\r\n            <td>String</td>\r\n            <td>If it is retweet, then the retweet link else it'll be empty string</td>\r\n        </tr>\r\n        <tr>\r\n            <td>posted_time</td>\r\n            <td>String</td>\r\n            <td>Time when tweet was posted in ISO 8601 format</td>\r\n        </tr>\r\n        <tr>\r\n            <td>content</td>\r\n            <td>String</td>\r\n            <td>content of tweet as text</td>\r\n        </tr>\r\n        <tr>\r\n            <td>hashtags</td>\r\n            <td>Array</td>\r\n            <td>Hashtags presents in tweet, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>mentions</td>\r\n            <td>Array</td>\r\n            <td>Mentions presents in tweet, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>images</td>\r\n            <td>Array</td>\r\n            <td>Images links, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>videos</td>\r\n            <td>Array</td>\r\n            <td>Videos links, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweet_url</td>\r\n            <td>String</td>\r\n            <td>URL of the tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>link</td>\r\n            <td>String</td>\r\n            <td>If any link is present inside tweet for some external website. </td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n</div>\r\n<br>\r\n<hr>\r\n<h3 id=\"keywordAPI\">To scrape tweets using keywords with API:</h3>\r\n<div>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_keyword_with_api\r\n\r\nquery = \"#gaming\"\r\ntweets_count = 10\r\noutput_filename = \"gaming_hashtag_data\"\r\nscrape_keyword_with_api(query=query, tweets_count=tweets_count, output_filename=output_filename)\r\n\r\n```\r\nOutput:\r\n```js\r\n{\r\n  \"1583821467732480001\": {\r\n    \"tweet_url\" : \"https://twitter.com/yakubblackbeard/status/1583821467732480001\",\r\n    \"tweet_details\":{\r\n      ...\r\n    },\r\n    \"user_details\":{\r\n      ...\r\n    }\r\n  }, ...\r\n}\r\n```\r\n</div>\r\n<br>\r\n<div id=\"scrape_keyword_with_apiArgs\">\r\n<p><code>scrape_keyword_with_api()</code> arguments:</p>\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Argument</td>\r\n            <td>Argument Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>query</td>\r\n            <td>String</td>\r\n            <td>Query to search. The query can be built from <a href=\"https://developer.twitter.com/apitools/query\">here</a> for advanced search.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweets_count</td>\r\n            <td>Integer</td>\r\n            <td>Number of tweets to scrape.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_filename</td>\r\n            <td>String</td>\r\n            <td>What should be the filename where output is stored?.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_dir</td>\r\n            <td>String</td>\r\n            <td>What directory output file should be saved?</td>\r\n        </tr>\r\n        <tr>\r\n            <td>proxy</td>\r\n            <td>String</td>\r\n            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n\r\n</div>\r\n<hr>\r\n<br>\r\n<div>\r\n<p id=\"scrape_keyword_with_apiKeys\">Keys of the output:</p>\r\n<table>\r\n<thead>\r\n        <tr>\r\n            <td>Key</td>\r\n            <td>Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n    <tr>\r\n    <td>tweet_url</td>\r\n    <td>String</td>\r\n    <td>URL of the tweet.</td>\r\n    </tr>\r\n    <tr>\r\n    <td>tweet_details</td>\r\n    <td>Dictionary</td>\r\n    <td>A dictionary containing the data about the tweet. All fields which will be available inside can be checked <a href=\"https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet\">here<a></td>\r\n    </tr>\r\n    <tr>\r\n    <td>user_details</td>\r\n    <td>Dictionary</td>\r\n    <td>A dictionary containing the data about the tweet owner. All fields which will be available inside can be checked <a href=\"https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/user\">here<a></td>\r\n    </tr>\r\n    </tbody>\r\n</table>\r\n</div>\r\n<br>\r\n<br>\r\n<hr>\r\n<h3>To scrape tweets using keywords with browser automation</h3>\r\n<div>\r\n<p id=\"keywordJson\">In JSON format:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_keyword\r\n#scrape 10 posts by searching keyword \"india\" from date 30th August till date 31st August\r\nindia = scrape_keyword(keyword=\"india\", browser=\"firefox\",\r\n                      tweets_count=10,output_format=\"json\" ,until=\"2021-08-31\", since=\"2021-08-30\")\r\nprint(india)\r\n\r\n```\r\nOutput:\r\n```javascript\r\n{\r\n  \"1432493306152243200\": {\r\n    \"tweet_id\": \"1432493306152243200\",\r\n    \"username\": \"TOICitiesNews\",\r\n    \"name\": \"TOI Cities\",\r\n    \"profile_picture\": \"https://twitter.com/TOICitiesNews/photo\",\r\n    \"replies\": 0,\r\n    \"retweets\": 0,\r\n    \"likes\": 0,\r\n    \"is_retweet\": false,\r\n    \"posted_time\": \"2021-08-30T23:59:53+00:00\",\r\n    \"content\": \"Paralympians rake in medals, India Inc showers them with rewards\",\r\n    \"hashtags\": [],\r\n    \"mentions\": [],\r\n    \"images\": [],\r\n    \"videos\": [],\r\n    \"tweet_url\": \"https://twitter.com/TOICitiesNews/status/1432493306152243200\",\r\n    \"link\": \"https://t.co/odmappLovL?amp=1\"\r\n  },...\r\n}\r\n```\r\n</div>\r\n<br>\r\n<hr>\r\n<div id=\"keywordCSV\">\r\n<p>In CSV format:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_keyword\r\n\r\nscrape_keyword(keyword=\"india\", browser=\"firefox\",\r\n                      tweets_count=10, until=\"2021-08-31\", since=\"2021-08-30\",output_format=\"csv\",filename=\"india\")\r\n```\r\n<br>\r\nOutput:\r\n<table class=\"table table-bordered table-hover table-condensed\" style=\"line-height: 14px;overflow:hidden;white-space: nowrap\">\r\n<thead><tr><th title=\"Field #1\">tweet_id</th>\r\n<th title=\"Field #2\">username</th>\r\n<th title=\"Field #3\">name</th>\r\n<th title=\"Field #4\">profile_picture</th>\r\n<th title=\"Field #5\">replies</th>\r\n<th title=\"Field #6\">retweets</th>\r\n<th title=\"Field #7\">likes</th>\r\n<th title=\"Field #8\">is_retweet</th>\r\n<th title=\"Field #9\">posted_time</th>\r\n<th title=\"Field #10\">content</th>\r\n<th title=\"Field #11\">hashtags</th>\r\n<th title=\"Field #12\">mentions</th>\r\n<th title=\"Field #13\">images</th>\r\n<th title=\"Field #14\">videos</th>\r\n<th title=\"Field #15\">tweet_url</th>\r\n<th title=\"Field #16\">link</th>\r\n</tr></thead>\r\n<tbody>\r\n\r\n<tr>\r\n<td>1432493306152243200</td>\r\n<td>TOICitiesNews</td>\r\n<td>TOI Cities</td>\r\n<td>https://twitter.com/TOICitiesNews/photo</td>\r\n<td>0</td>\r\n<td align=\"right\">0</td>\r\n<td align=\"right\">0</td>\r\n<td>False</td>\r\n<td>2021-08-30T23:59:53+00:00</td>\r\n<td>Paralympians rake in medals, India Inc showers them with rewards</td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>[]</td>\r\n<td>https://twitter.com/TOICitiesNews/status/1432493306152243200</td>\r\n<td>https://t.co/odmappLovL?amp=1</td>\r\n</tr>\r\n\r\n</tbody></table>\r\n<p> ... </p>\r\n</div>\r\n<hr>\r\n<br>\r\n<div id=\"keywordArgument\">\r\n<p><code>scrape_keyword()</code> arguments:</p>\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Argument</td>\r\n            <td>Argument Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>keyword</td>\r\n            <td>String</td>\r\n            <td>Keyword to search on twitter. </td>\r\n        </tr>\r\n        <tr>\r\n            <td>browser</td>\r\n            <td>String</td>\r\n            <td>Which browser to use for scraping?, Only 2 are supported Chrome and Firefox,default is set to Firefox.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>until</td>\r\n            <td>String</td>\r\n            <td>Optional parameter, Until date for scraping, a end date from where search ends. Format for date is YYYY-MM-DD.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>since </td>\r\n            <td>String</td>\r\n            <td>Optional parameter, Since date for scraping, a past date from where to search from. Format for date is YYYY-MM-DD.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>proxy</td>\r\n            <td>Integer</td>\r\n            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweets_count</td>\r\n            <td>Integer</td>\r\n            <td>Number of posts to scrape. Default is 10.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_format</td>\r\n            <td>String</td>\r\n            <td>The output format, whether JSON or CSV. Default is JSON.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>filename</td>\r\n            <td>String</td>\r\n            <td>If output parameter is set to CSV, then it is necessary for filename parameter to passed. If not passed then the filename will be same as keyword passed.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>directory</td>\r\n            <td>String</td>\r\n            <td>If output parameter is set to CSV, then it is valid for directory parameter to be passed. If not passed then CSV file will be saved in current working directory.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>since_id</td>\r\n            <td>Integer</td>\r\n            <td>After (NOT inclusive) a specified Snowflake ID. Example <a href=\"https://twitter.com/search?q=since_id%3A1138872932887924737%20max_id%3A1144730280353247233%20%23nasamoontunes&src=typed_query&f=live\">here</a></td>\r\n        </tr>\r\n        <tr>\r\n            <td>max_id</td>\r\n            <td>Integer</td>\r\n            <td>At or before (inclusive) a specified Snowflake ID. Example <a href=\"https://twitter.com/search?q=since_id%3A1138872932887924737%20max_id%3A1144730280353247233%20%23nasamoontunes&src=typed_query&f=live\">here</a></td>\r\n        </tr>\r\n        <tr>\r\n            <td>within_time</td>\r\n            <td>String</td>\r\n            <td>Search within the last number of days, hours, minutes, or seconds. Example <code>2d, 3h, 5m, 30s</code>.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>headless</td>\r\n            <td>Boolean</td>\r\n            <td>Whether to run crawler headlessly?. Default is <code>True</code></td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n</div>\r\n<hr>\r\n<div id=\"keywordOutput\">\r\n<p>Keys of the output</p>\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Key</td>\r\n            <td>Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>tweet_id</td>\r\n            <td>String</td>\r\n            <td>Post Identifier(integer casted inside string)</td>\r\n        </tr>\r\n        <tr>\r\n            <td>username</td>\r\n            <td>String</td>\r\n            <td>Username of the profile</td>\r\n        </tr>\r\n        <tr>\r\n            <td>name</td>\r\n            <td>String</td>\r\n            <td>Name of the profile</td>\r\n        </tr>\r\n        <tr>\r\n            <td>profile_picture</td>\r\n            <td>String</td>\r\n            <td>Profile Picture link</td>\r\n        </tr>\r\n        <tr>\r\n            <td>replies</td>\r\n            <td>Integer</td>\r\n            <td>Number of replies of tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>retweets</td>\r\n            <td>Integer</td>\r\n            <td>Number of retweets of tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>likes</td>\r\n            <td>Integer</td>\r\n            <td>Number of likes of tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>is_retweet</td>\r\n            <td>boolean</td>\r\n            <td>Is the tweet a retweet?</td>\r\n        </tr>\r\n        <tr>\r\n            <td>posted_time</td>\r\n            <td>String</td>\r\n            <td>Time when tweet was posted in ISO 8601 format</td>\r\n        </tr>\r\n        <tr>\r\n            <td>content</td>\r\n            <td>String</td>\r\n            <td>content of tweet as text</td>\r\n        </tr>\r\n        <tr>\r\n            <td>hashtags</td>\r\n            <td>Array</td>\r\n            <td>Hashtags presents in tweet, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>mentions</td>\r\n            <td>Array</td>\r\n            <td>Mentions presents in tweet, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>images</td>\r\n            <td>Array</td>\r\n            <td>Images links, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>videos</td>\r\n            <td>Array</td>\r\n            <td>Videos links, if they're present in tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweet_url</td>\r\n            <td>String</td>\r\n            <td>URL of the tweet</td>\r\n        </tr>\r\n        <tr>\r\n            <td>link</td>\r\n            <td>String</td>\r\n            <td>If any link is present inside tweet for some external website. </td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n</div>\r\n<br>\r\n<hr>\r\n<br>\r\n<h3 id=\"scrape_with_api\">To scrape topic tweets with URL using API </h3>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_topic_with_api\r\n\r\ntopic_url = 'https://twitter.com/i/topics/1468157909318045697'\r\nscrape_topic_with_api(URL=topic_url, output_filename='solana_cryptocurrency', tweets_count=50)\r\n```\r\n\r\nOutput:\r\n```js\r\n{\r\n  \"1584979408338632705\": {\r\n    \"tweet_url\" : \"https://twitter.com/AptosBullCNFT/status/1584979408338632705\",\r\n    \"tweet_details\":{\r\n      ...\r\n    },\r\n    \"user_details\":{\r\n      ...\r\n    }\r\n  }, ...\r\n}\r\n```\r\n\r\n\r\n<div id=\"scrape_topic_with_api_args\">\r\n<p><code>scrape_topic_with_api()</code> arguments: </p>\r\n\r\n\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Argument</td>\r\n            <td>Argument Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>URL</td>\r\n            <td>String</td>\r\n            <td>Twitter's Topic URL</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweets_count</td>\r\n            <td>Integer</td>\r\n            <td>Number of tweets to scrape.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_filename</td>\r\n            <td>String</td>\r\n            <td>What should be the filename where output is stored?.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_dir</td>\r\n            <td>String</td>\r\n            <td>What directory output file should be saved?</td>\r\n        </tr>\r\n        <tr>\r\n            <td>proxy</td>\r\n            <td>String</td>\r\n            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>browser</td>\r\n            <td>String</td>\r\n            <td>Which browser to use for extracting out graphql key. Default is firefox.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>headless</td>\r\n            <td>String</td>\r\n            <td>Whether to run browser in headless mode?</td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n</div>\r\n<hr>\r\n\r\n<div id=\"scrape_topic_with_api_args_keys\"> <p>Keys of the output:<p>\r\n  Same as <a href=\"#scrape_keyword_with_apiKeys\">scrape_keyword_with_api</a>\r\n</div>\r\n<br>\r\n<hr>\r\n\r\n<h3 id=\"to-scrape-topic-tweets-with-url\"> To scrape topic tweets with URL using browser automation: </h3>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_topic\r\n# scrape 10 tweets from steam deck topic on twitter\r\ndata = scrape_topic(filename=\"steamdeck\", url='https://twitter.com/i/topics/1415728297065861123',\r\n                     browser=\"firefox\", tweets_count=10)\r\n```\r\n\r\n<div id=\"scrape_topic_with_api_args_keys\"> <p>Keys of the output:<p>\r\n  Same as <a href=\"#profileOutput\">scrape_profile</a>\r\n</div>\r\n<hr>\r\n\r\n\r\n<div id=\"topicArgument\">\r\n<p><code>scrape_topic()</code> arguments:</p>\r\n\r\n\r\n| Arguments     | Argument <br> Type | Description                                                                                                                            |\r\n|---------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------|\r\n| filename      | str                | Filename to write result output.                                                                                                       |\r\n| URL           | str                | Topic URL.                                                                                                                             |\r\n| browser       | str                | Which browser to use for scraping? <br> Only 2 are supported Chrome and Firefox. default firefox                                       |\r\n| proxy         | str                | If user wants to use proxy for scraping. <br> If the proxy is authenticated proxy then the proxy format is username:password@host:port |\r\n| tweets_count  | int                | Number of posts to scrape. default 10.                                                                                                  |\r\n| output_format | str                | The output format whether JSON or CSV. Default json.                                                                                   |\r\n| directory     | str                | Directory to save output file. Deafult current working directory.                                                                      |\r\n<br>\r\n<hr>\r\n<div id=\"to-scrape-user-tweets-with-api\">\r\n\r\n<p>To Scrap profile's tweets with API:</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_profile_with_api\r\n\r\nscrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)\r\n```\r\n</div>\r\n<br>\r\n<div id=\"users_api_parameter\">\r\n<p><code>scrape_profile_with_api()</code> Arguments:<p>\r\n<table>\r\n    <thead>\r\n        <tr>\r\n            <td>Argument</td>\r\n            <td>Argument Type</td>\r\n            <td>Description</td>\r\n        </tr>\r\n    </thead>\r\n    <tbody>\r\n        <tr>\r\n            <td>username</td>\r\n            <td>String</td>\r\n            <td>Twitter's Profile username</td>\r\n        </tr>\r\n        <tr>\r\n            <td>tweets_count</td>\r\n            <td>Integer</td>\r\n            <td>Number of tweets to scrape.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_filename</td>\r\n            <td>String</td>\r\n            <td>What should be the filename where output is stored?.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>output_dir</td>\r\n            <td>String</td>\r\n            <td>What directory output file should be saved?</td>\r\n        </tr>\r\n        <tr>\r\n            <td>proxy</td>\r\n            <td>String</td>\r\n            <td>Optional parameter, if user wants to use proxy for scraping. If the proxy is authenticated proxy then the proxy format is username:password@host:port.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>browser</td>\r\n            <td>String</td>\r\n            <td>Which browser to use for extracting out graphql key. Default is firefox.</td>\r\n        </tr>\r\n        <tr>\r\n            <td>headless</td>\r\n            <td>String</td>\r\n            <td>Whether to run browser in headless mode?</td>\r\n        </tr>\r\n    </tbody>\r\n</table>\r\n</div>\r\n<br>\r\n<div id=\"scrape_user_with_api_args_keys\"> <p>Output:<p>\r\n\r\n```js\r\n{\r\n  \"1608939190548598784\": {\r\n    \"tweet_url\" : \"https://twitter.com/elonmusk/status/1608939190548598784\",\r\n    \"tweet_details\":{\r\n      ...\r\n    },\r\n    \"user_details\":{\r\n      ...\r\n    }\r\n  }, ...\r\n}\r\n```\r\n\r\n</div>\r\n<br>\r\n<hr>\r\n</div>\r\n\r\n<h3 id=\"proxy\"> Using scraper with proxy (http proxy) </h3>\r\n\r\n<div id=\"unauthenticatedProxy\">\r\n<p>Just pass <code>proxy</code> argument to function.</p>\r\n\r\n```python\r\nfrom twitter_scraper_selenium import scrape_keyword\r\n\r\nscrape_keyword(keyword=\"#india\", browser=\"firefox\",tweets_count=10,output=\"csv\",filename=\"india\",\r\nproxy=\"66.115.38.247:5678\") #In IP:PORT format\r\n\r\n```\r\n</div>\r\n\r\n<br>\r\n<div id=\"authenticatedProxy\">\r\n<p> Proxy that requires authentication: </p>\r\n\r\n```python\r\n\r\nfrom twitter_scraper_selenium import scrape_profile\r\n\r\nmicrosoft_data = scrape_profile(twitter_username=\"microsoft\", browser=\"chrome\", tweets_count=10, output=\"json\",\r\n                      proxy=\"sajid:pass123@66.115.38.247:5678\")  #  username:password@IP:PORT\r\nprint(microsoft_data)\r\n\r\n\r\n```\r\n\r\n</div>\r\n<br>\r\n<hr>\r\n<div id=\"privacy\">\r\n<h2>Privacy</h2>\r\n\r\n<p>\r\nThis scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrape anything private.\r\n</p>\r\n</div>\r\n<br>\r\n<hr>\r\n<div id=\"license\">\r\n<h2>LICENSE</h2>\r\n\r\nMIT\r\n</div>\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package to scrap twitter's front-end easily with selenium",
    "version": "5.0.0",
    "project_urls": {
        "Homepage": "https://github.com/shaikhsajid1111/twitter-scraper-selenium"
    },
    "split_keywords": [
        "web-scraping",
        "selenium",
        "social",
        "media",
        "twitter",
        "keyword",
        "twitter-profile",
        "twitter-keywords",
        "automation",
        "json",
        "csv",
        "twitter-hashtag",
        "hashtag"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2ce5a9af9e99588d34bb09e4f10931eb1b0da8d8cbc0d1e8eec363b8fc4fdc04",
                "md5": "787ae2cf387ef0753aed11a6d8b16b88",
                "sha256": "7bb2b7de70fd577111fb71428a2daf3cffd6ad85d7e2edf1b1021f710bce0837"
            },
            "downloads": -1,
            "filename": "twitter_scraper_selenium-5.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "787ae2cf387ef0753aed11a6d8b16b88",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 33309,
            "upload_time": "2023-06-04T12:14:13",
            "upload_time_iso_8601": "2023-06-04T12:14:13.224948Z",
            "url": "https://files.pythonhosted.org/packages/2c/e5/a9af9e99588d34bb09e4f10931eb1b0da8d8cbc0d1e8eec363b8fc4fdc04/twitter_scraper_selenium-5.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ee30f391ccfe52b741e0763c3e87f8e49dced32663ed1f0e5797de640eda1611",
                "md5": "53624f7d762b2187fd417b198e949ce8",
                "sha256": "99b69efd8edd956073db33edd5bf2a651e6e89f82c99edada4300d0821d07695"
            },
            "downloads": -1,
            "filename": "twitter_scraper_selenium-5.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "53624f7d762b2187fd417b198e949ce8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 34655,
            "upload_time": "2023-06-04T12:14:15",
            "upload_time_iso_8601": "2023-06-04T12:14:15.334435Z",
            "url": "https://files.pythonhosted.org/packages/ee/30/f391ccfe52b741e0763c3e87f8e49dced32663ed1f0e5797de640eda1611/twitter_scraper_selenium-5.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-04 12:14:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "shaikhsajid1111",
    "github_project": "twitter-scraper-selenium",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "twitter-scraper-selenium"
}
        
Elapsed time: 0.07564s