# Python Quick Start
<details>
<summary><b>Python 3.6+ setup (required if not already installed)</b></summary>
This package uses [f-strings](https://cito.github.io/blog/f-strings/) (more [here](https://realpython.com/python-f-strings/)), and so requires Python 3.6+.
If you have an older version of Python, you can download Python 3.9.1 (follow links below) and follow the instructions to set up Python for your machine. If you want to install a different version, visit the [Python Downloads page](https://www.python.org/downloads/) and select the version you want.
- [macOS 64-bit installer](https://www.python.org/ftp/python/3.9.1/python-3.9.1-macosx10.9.pkg)
- [Windows x86-64 executable installer](https://www.python.org/ftp/python/3.9.1/python-3.9.1-amd64.exe)
- [Windows x86 executable installer](https://www.python.org/ftp/python/3.9.1/python-3.9.1.exe)
- [Gzipped source tarball](https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tgz) (most useful for Linux)
</details>
<details>
<summary><b>Permissions for first run</b></summary>
This is required to make sure you can download and install the required Selenium binary dependencies.
<details>
<summary><b>On Windows: make sure you open <code>Command Prompt</code> or <code>Powershell</code> (both work) in "Run as Administrator" mode</b></summary>
- shortcut: <kbd>⊞ Win</kbd> + <kbd>X</kbd> + <kbd>A</kbd>
</details>
<details>
<summary><b>On Unix based machines (MacOS, Linux): make sure you have read and write access to <code>/usr/local/bin/</code></b></summary>
- if you're not sure, open terminal and run `sudo chown $USER /usr/local/bin/`
</details>
<br>
</details>
<details>
<summary><b>Using <code>venv</code> (optional)</b></summary>
While creating a virtual environment **is not required** to use this package, creating a virtual environment is useful for avoiding dependency conflicts with other projects. If **you are sure you do not need to worry about dependency conflicts with other projects**, skip this step.
Python has ***many*** ways to set up and use a virtual environment. The following instructions use the `venv` provided with the python standard library for simplicity. You do not need to use this particular implementation of a virtual environment, but virtual environments are outside of the scope of this project, so **you will need to figure out how to set up and use a different implementation of python virtual environments on your own if you choose a different implementation of a virtual environment, since there are too many different variations to cover here**.
<pre>
### CREATING the virtual environment on MacOS/Linux ###
python3 -m venv ytvl-venv
source ytvl-venv/bin/activate
# python3 # enter the python shell inside this virtual environment
deactivate # exit this virtual environment
### USING the virtual environment on MacOS/Linux ###
# if ytvl-venv is in the directory you are currently in:
source ytvl-venv/bin/activate
# if ytvl-venv is NOT in the directory you are currently in:
source /absolute/path/to/ytvl-venv/bin/activate
deactivate # exit this virtual environment
</pre>
<pre>
### CREATING the virtual environment on Windows (NOT FOR git BASH) ###
python -m venv ytvl-venv
ytvl-venv\Scripts\activate
# python # enter the python shell inside this virtual environment
deactivate # exit this virtual environment
### USING the virtual environment on Windows (NOT FOR git BASH) ###
# if ytvl-venv is in the directory you are currently in:
ytvl-venv\Scripts\activate
# if ytvl-venv is NOT in the directory you are currently in:
## you may need to
## include the .ps1 extenstion (activate.ps1) in Powershell
## or include the .bat extension (activate.bat) in Command Prompt
\absolute\path\to\ytvl-venv\Scripts\activate
deactivate # exit this virtual environment
</pre>
<pre>
### CREATING the virtual environment on Windows (FOR git BASH) ###
python -m venv ytvl-venv
source ytvl-venv/Scripts/activate
# python # enter the python shell inside this virtual environment
deactivate # exit this virtual environment
### USING the virtual environment on Windows (FOR git BASH) ###
# if ytvl-venv is in the directory you are currently in:
source ytvl-venv/Scripts/activate
# if ytvl-venv is NOT in the directory you are currently in:
source /absolute/path/to/ytvl-venv/Scripts/activate
deactivate # exit this virtual environment
</pre>
</details>
<details>
<summary><b>Installing the package</b></summary>
After you install Python 3.6+ and ensure you have the required permissions as needed and have activated your virtual environment as required (if you decide to use a virtual environment - you do not **need** to use a virtual environment, but if you choose to use `venv`, follow the instructions above), enter the following in your command line:
```shell
# if something isn't working properly, try rerunning this
# the problem may have been fixed with a newer version
pip3 install -U yt-videos-list # MacOS/Linux
pip install -U yt-videos-list # Windows
# if that doesn't work:
python3 -m pip install -U yt-videos-list # MacOS/Linux
python -m pip install -U yt-videos-list # Windows
```
</details>
<details>
<summary><b>If you're on Windows: make sure you <i>always</i> open <code>Command Prompt</code> or <code>Powershell</code> (both work) in "Run as Administrator" mode!</b></summary>
- shortcut: <kbd>⊞ Win</kbd> + <kbd>X</kbd> + <kbd>A</kbd>
- this allows `yt_videos_list` to update selenium webdriver binaries to be compatible with newer browser versions as browsers are updated (e.g. your Firefox browser updates from version 77 to version 82)
- to see the commands being run, see the `yt_videos_list/docs/dependencies.json` file
</details>
<details>
<summary><b>Running the package from the python interpreter</b></summary>
```shell
python3 # MacOS/Linux
python # Windows
```
```python
from yt_videos_list import ListCreator
my_driver = 'firefox' # SUBSTITUTE DRIVER YOU WANT (options below)
lc = ListCreator(driver=my_driver, scroll_pause_time=0.8)
lc.create_list_for(url='https://www.youtube.com/user/schafer5')
lc.create_list_for(url='https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ', log_silently=True)
# Set `log_silently` to `True` to mute program logging to the console.
# The program will log the prgram status and any program information
# to only the log file for the channel being scraped
# (this is useful when scraping multiple channels at once with multi-threading).
# By default, the program logs to both the log file for the channel being scraped AND the console.
# to name the file using the channel ID instead of the channel name, set file_name='id'
# this is useful when scraping multiple channels with the same name:
lc.create_list_for(url='https://www.youtube.com/channel/UCb2EYjrzI6WpNAmPZeihhag', file_name='id')
lc.create_list_for(url='https://www.youtube.com/channel/UCDzYhlGOvGqsYw8IaTKDT8g', file_name='id')
# for more details about this method:
help(lc.create_list_for)
# see the new files that were just created:
import os
os.system('ls -lt | head') # MacOS/Linux
os.system('dir /O-D | find "_videos_list"') # Windows
# for more information on using the module:
help(lc)
```
- `driver` options include:
- `'firefox'`
- `'opera'`
- `'safari'` (MacOS only)
- `'chrome'`
- `'brave'`
- `'edge'` (Windows only!)
- increase `scroll_pause_time` for laggy internet and decrease `scroll_pause_time` for fast internet
</details>
#### If you already scraped a channel and the channel uploaded a new video, simply rerun this program on that channel and this package updates your files to include the newer video(s)!
<details>
<summary><b>Scraping multiple channels from a file simultaneously with multi-threading</b></summary>
Add the url to every channel you want to extract information from in a `txt` file with every url placed on a new line.
- example: [`channels.txt`](./channels.txt) (NOTE this is a relative link, so this ***might*** not link properly on non-GitHub hosted sites!)
Enter the python interpreter:
```
python3 # MacOS/Linux
python # Windows
```
```python
from yt_videos_list import ListCreator
lc = ListCreator(driver='firefox', scroll_pause_time=1.2)
lc.create_list_from(path_to_channel_urls_file='channels.txt', number_of_threads=4)
# configuring settings:
lc.create_list_from(
path_to_channel_urls_file='channels.txt',
number_of_threads=4,
min_sleep=1,
max_sleep=5,
after_n_channels_pause_for_s=(20, 10),
log_subthread_status_silently=False,
log_subthread_info_silently=False
) # defaults (keyword argument form)
lc.create_list_from('channels.txt', 4, 1, 5, (20, 10), False, False) # defaults (positional argument form)
lc.create_list_from('channels.txt', min_sleep=3, max_sleep=10) # modifying only min_sleep and max_sleep
help(lc.create_list_from) # see API method details
```
</details>
<details>
<summary><b>Explicitly downloading all Selenium dependencies</b></summary>
Ideal if you use Selenium for other projects 😎
- Make sure you already have the `yt-videos-list` package installed (follow directions above for getting set up), then run the following:
```shell
pip3 install -U yt-videos-list # MacOS/Linux: ensure latest package
python3 # MacOS/Linux: enter python interpreter
pip install -U yt-videos-list # Windows: ensure latest package
python # Windows: enter python interpreter
```
```python
from yt_videos_list.download import selenium_webdriver_dependencies
selenium_webdriver_dependencies.download_all()
```
That's all! 🤓
</details>
<details>
<summary><b>More API information</b></summary>
---
**NOTE** that you can also access all the information below from the Python interpreter by entering
```python
import yt_videos_list
help(yt_videos_list)
```
---
```python
# default options for the ListCreator instance
ListCreator(
txt=True,
csv=True,
md=True,
file_suffix=True,
all_video_data_in_memory=False,
video_data_returned=False,
video_id_only=False,
reverse_chronological=True,
headless=False,
scroll_pause_time=0.8,
driver='firefox',
cookie_consent=False,
verify_page_bottom_n_times=3,
file_buffering=-1,
)
```
There are a number of optional arguments you can specify during the instantiation of the ListCreator instance. The preceding arguments are run by default, but in case you want more flexibility, you can specify the:
- `driver` argument:
- Firefox (default)
- Opera
- Safari (MacOS only)
- Chrome
- Brave
- Edge (Windows only)
- `driver='firefox'`
- `driver='opera'`
- `driver='safari'`
- `driver='chrome'`
- `driver='brave'`
- `driver='edge'`
- `cookie_consent` argument:
- `False` (default) - block all cookie options if prompted by YouTube (at consent.youtube.com)
- `True` - accept all cookie options if prompted by YouTube (also at consent.youtube.com)
- `cookie_consent=False` (default) OR `cookie_consent=True`
- `txt`, `csv`, `md` file type argument:
- `True` (default) - create a file for the specified type
- `False` - do not create a file for the specified type
- `txt=True` (default) OR `txt=False`
- `csv=True` (default) OR `csv=False`
- ` md=True` (default) OR ` md=False`
- `file_suffix` argument:
- `True` (default) - add a file suffix to the output file name
- `ChannelName_reverse_chronological_videos_list.csv`
- `ChannelName_chronological_videos_list.csv`
- `False` - do NOT add a file suffix to the output file name
- this means if a reverse chronological file and a chronological file is made for the same channel, they will have the same name!
- `ChannelName.csv` (reverse chronological output file)
- `ChannelName.csv` (chronological output file)
-> `file_suffix=True` (default) OR `file_suffix=False`
- `all_video_data_in_memory` argument:
- `False` (default) - do not scrape the entire page
- `True` - scrape the entire page (must ALSO set the `video_data_returned` attribute to `True` to return this data!)
- `all_video_data_in_memory=False` (default) OR `all_video_data_in_memory=True`
- `video_data_returned` argument:
- `False` (default) - do not return video data collected from the current scrape job (return dummy data instead: `[[0, '', '', '']]`)
- `True` - return video data collected from the current scrape job
- if `all_video_data_in_memory` attribute set to `False`, the returned data MIGHT not be the full data, and video numbering MIGHT be incorrect
- set `all_video_data_in_memory` attribute to `True` to return ALL video data for channel (video number will then also ALWAYS be correct)
- `video_data_returned=False` (default) OR `video_data_returned=True`
- `video_id_only` argument:
- `False` (default) - include the full URL to video: `https://www.youtube.com/watch?v=ElevenChars`
- `True` - include only the identifier parameter to video: `ElevenChars`
- `video_id_only=False` (default) OR `video_id_only=True`
- `reverse_chronological` argument:
- `True` (default) - write the files in order from most recent video to the oldest video
- `False` - write the files in order from oldest video to the most recent video
- `reverse_chronological=True` (default) OR `reverse_chronological=False`
- `headless` argument:
- `False` (default) - run the driver with an open Selenium instance for viewing
- `True` - run the driver in "invisible" mode
- `headless=False` (default) OR `headless=True`
- `scroll_pause_time` argument:
- any float values greater than `0` (default `0.8`)
- The value you provide will be how long the program waits before trying to scroll the videos list page down for the channel you want to scrape. For fast internet connections, you may want to reduce the value, and for slow connections you may want to increase the value.
- `scroll_pause_time=0.8` (default)
- CAUTION: reducing this value too much will result in the program not capturing all the videos, so be careful! Experiment :)
- `verify_page_bottom_n_times` argument:
- any int values greater than `0` (defaults to `3`)
- NOTE: this argument is only used when CREATING a new file for a new channel, and is unused when UPDATING an existing file for an already scraped channel.
- The value you provide will be how many times the program needs to verify it acually reached the bottom of the page before accepting it is the bottom of the page, and starting to write the information to the output file(s).
- For channels that have uploaded THOUSANDS of videos, increase this value to a large number that you think should be sufficient to verify the program reached the bottom of the page.
- To determine HOW large of a value you should provide, determine the length of time you'd like to wait before being reasonably sure that you reached the bottom of the page and it's not just YouTube's server trying to fetch the response from an old database entry, and divide the time you decided to wait by the `scroll_pause_time` argument.
- For example, if you want to wait 45 seconds and you set the `scrioll_pause_time` value to `1.0`:
-> `your_time / scroll_pause_time`
-> `45 / 1.0`
-> `45`
-> therefore: `verify_page_bottom_n_times=45`
- For channels with only a couple hundred videos (or less), the default value of verify_`page_bottom_n_times=3` **should** be sufficient.
- See commit a68f8f62e5c343cbb0641125e271bb96cc4f0750 for more details.
- `file_buffering` argument:
- any `int` values greater than `0` (default `-1`, which uses the default OS setting)
- LEAVE THIS ALONE IF YOU'RE UNSURE!
- Documentation:
- https://docs.python.org/3/library/functions.html#open
- Deep dive:
- https://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file
- https://stackoverflow.com/questions/10019456/usage-of-sys-stdout-flush-method
- https://stackoverflow.com/questions/230751/how-can-i-flush-the-output-of-the-print-function
- https://en.wikipedia.org/wiki/Data_buffer
- https://stackoverflow.com/questions/1450551/buffered-vs-unbuffered-io
- https://www.quora.com/What-does-flushing-files-or-Stdin-do-in-Python
- https://www.quora.com/Whats-the-difference-between-buffered-I-O-and-unbuffered-I-O
- https://stackoverflow.com/questions/8409050/unix-buffered-vs-unbuffered-i-o
- https://medium.com/@bramblexu/three-ways-to-close-buffer-for-stdout-stdin-stderr-in-python-8be694bd2737
- https://www.quora.com/In-C-what-does-buffering-I-O-or-buffered-I-O-mean
</details>
<details>
<summary><b><code>scrapetube</code> integration</b></summary>
[`scrapetube`](https://github.com/dermasmid/scrapetube) is a much more efficient backend developer tool that loads the videos uploaded by a channel. This package ***also*** supports loading information from playlists and searches, which `yt-videos-list` currently does not do. Integration with `scrapetube` will be available in a future `yt-videos-list` release!
To keep things backwards-compatible and maintainable, the `scrapetube` integration will be accessible through an almost identical, **separate** interface as the `ListCreator` interface, and the original `ListCreator` interface will continue to be available and continue to receive updates. 🤓
</details>
<details>
<summary><b>Cloning and running locally</b></summary>
To clone the repository and install the most updated version of the package that may not yet be available on the latest release through [PyPI](pypi.org/project/yt-videos-list/), clone this repository and run:
```
cd yt_videos_list/python # MacOS/Linux
python3 -m pip install . # MacOS/Linux
cd yt_videos_list\python # Windows
python -m pip install . # Windows
```
To make your own changes to the `yt_videos_list` python package and run the changes locally:
```
# make changes to the codebase in the
# ===> /dev <=== directory
python3 minifier.py # MacOS/Linux
pip3 install . # MacOS/Linux
python minifier.py # Windows
pip install . # Windows
```
NOTE: make the changes to the codebase in the `yt_videos_list/python/dev` directory!!
- the code in the `yt_videos_list/python/yt-videos-list` directory is minified with
- leading indents stipped to the minimum (1 space for each nested scope)
- whitespace for padding (e.g. extra spaces to align variable assignments) stripped
- comments stripped
- as a result, the code in the `yt_videos_list/python/yt-videos-list` directory is NOT human readable, and the `yt_videos_list/python/dev` directory should be used for development instead!
- the `minifier.py` module performs all the code preprocessing and packages the code from `yt_videos_list/python/dev` into the final version seen in the `yt_videos_list/python/yt-videos-list` directory
- so running `minifier.py` ***before*** installing the local package with `pip install .` (Windows) or `pip3 install .` is essential!
</details>
<details>
<summary><b>Running tests</b></summary>
The tests use the custom `ThreadWithResult` subclass of `threading.Thread` provided by the `save-thread-result` package, so make sure you install that module using
```
pip3 install -U save-thread-result # MacOS/Linux
pip install -U save-thread-result # Windows
# if that doesn't work:
python3 -m pip install -U save-thread-result # MacOS/Linux
python -m pip install -U save-thread-result # Windows
```
Then, make sure you're in the `yt_videos_list/python` directory, then run:
```
tests\run_tests.bat # Windows
#### Any shell on MacOS/Linux
bash tests/run_tests.sh # this works
csh tests/run_tests.sh # this works
dash tests/run_tests.sh # this works
ksh tests/run_tests.sh # this also works
tcsh tests/run_tests.sh # this works too
zsh tests/run_tests.sh # this works as well
# you can try other shells and
# they should work too, since
# there's no special syntax in
# the run_tests.sh file
```
</details>
Raw data
{
"_id": null,
"home_page": "https://github.com/slow-but-steady/yt-videos-list/tree/main/python",
"name": "yt-videos-list",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6, <4",
"maintainer_email": "",
"keywords": "YouTube videos URL scraping automation Selenium csv txt macos windows linux",
"author": "slow-but-steady",
"author_email": "yt.videos.list@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/af/a8/6c8c4b9907e72868b69bb7356545654702440d54cdaf498e5bc2343ac4e3/yt_videos_list-0.6.7.tar.gz",
"platform": null,
"description": "# Python Quick Start\n\n<details>\n <summary><b>Python 3.6+ setup (required if not already installed)</b></summary>\n\nThis package uses [f-strings](https://cito.github.io/blog/f-strings/) (more [here](https://realpython.com/python-f-strings/)), and so requires Python 3.6+.\n\nIf you have an older version of Python, you can download Python 3.9.1 (follow links below) and follow the instructions to set up Python for your machine. If you want to install a different version, visit the [Python Downloads page](https://www.python.org/downloads/) and select the version you want.\n- [macOS 64-bit installer](https://www.python.org/ftp/python/3.9.1/python-3.9.1-macosx10.9.pkg)\n- [Windows x86-64 executable installer](https://www.python.org/ftp/python/3.9.1/python-3.9.1-amd64.exe)\n- [Windows x86 executable installer](https://www.python.org/ftp/python/3.9.1/python-3.9.1.exe)\n- [Gzipped source tarball](https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tgz) (most useful for Linux)\n</details>\n\n<details>\n <summary><b>Permissions for first run</b></summary>\n\n This is required to make sure you can download and install the required Selenium binary dependencies.\n <details>\n <summary><b>On Windows: make sure you open <code>Command Prompt</code> or <code>Powershell</code> (both work) in \"Run as Administrator\" mode</b></summary>\n\n - shortcut: <kbd>\u229e Win</kbd> + <kbd>X</kbd> + <kbd>A</kbd>\n </details>\n <details>\n <summary><b>On Unix based machines (MacOS, Linux): make sure you have read and write access to <code>/usr/local/bin/</code></b></summary>\n\n - if you're not sure, open terminal and run `sudo chown $USER /usr/local/bin/`\n </details>\n<br>\n</details>\n\n<details>\n <summary><b>Using <code>venv</code> (optional)</b></summary>\n\n While creating a virtual environment **is not required** to use this package, creating a virtual environment is useful for avoiding dependency conflicts with other projects. If **you are sure you do not need to worry about dependency conflicts with other projects**, skip this step.\n\n Python has ***many*** ways to set up and use a virtual environment. The following instructions use the `venv` provided with the python standard library for simplicity. You do not need to use this particular implementation of a virtual environment, but virtual environments are outside of the scope of this project, so **you will need to figure out how to set up and use a different implementation of python virtual environments on your own if you choose a different implementation of a virtual environment, since there are too many different variations to cover here**.\n\n<pre>\n### CREATING the virtual environment on MacOS/Linux ###\n\npython3 -m venv ytvl-venv\nsource ytvl-venv/bin/activate\n# python3 # enter the python shell inside this virtual environment\ndeactivate # exit this virtual environment\n\n### USING the virtual environment on MacOS/Linux ###\n\n# if ytvl-venv is in the directory you are currently in:\nsource ytvl-venv/bin/activate\n\n# if ytvl-venv is NOT in the directory you are currently in:\nsource /absolute/path/to/ytvl-venv/bin/activate\n\ndeactivate # exit this virtual environment\n</pre>\n\n<pre>\n### CREATING the virtual environment on Windows (NOT FOR git BASH) ###\n\npython -m venv ytvl-venv\nytvl-venv\\Scripts\\activate\n# python # enter the python shell inside this virtual environment\ndeactivate # exit this virtual environment\n\n### USING the virtual environment on Windows (NOT FOR git BASH) ###\n\n# if ytvl-venv is in the directory you are currently in:\nytvl-venv\\Scripts\\activate\n\n# if ytvl-venv is NOT in the directory you are currently in:\n## you may need to\n## include the .ps1 extenstion (activate.ps1) in Powershell\n## or include the .bat extension (activate.bat) in Command Prompt\n\\absolute\\path\\to\\ytvl-venv\\Scripts\\activate\n\ndeactivate # exit this virtual environment\n</pre>\n\n<pre>\n### CREATING the virtual environment on Windows (FOR git BASH) ###\n\npython -m venv ytvl-venv\nsource ytvl-venv/Scripts/activate\n# python # enter the python shell inside this virtual environment\ndeactivate # exit this virtual environment\n\n### USING the virtual environment on Windows (FOR git BASH) ###\n\n# if ytvl-venv is in the directory you are currently in:\nsource ytvl-venv/Scripts/activate\n\n# if ytvl-venv is NOT in the directory you are currently in:\nsource /absolute/path/to/ytvl-venv/Scripts/activate\n\ndeactivate # exit this virtual environment\n</pre>\n\n</details>\n\n<details>\n <summary><b>Installing the package</b></summary>\n\nAfter you install Python 3.6+ and ensure you have the required permissions as needed and have activated your virtual environment as required (if you decide to use a virtual environment - you do not **need** to use a virtual environment, but if you choose to use `venv`, follow the instructions above), enter the following in your command line:\n```shell\n# if something isn't working properly, try rerunning this\n# the problem may have been fixed with a newer version\n\npip3 install -U yt-videos-list # MacOS/Linux\npip install -U yt-videos-list # Windows\n\n\n# if that doesn't work:\n\npython3 -m pip install -U yt-videos-list # MacOS/Linux\npython -m pip install -U yt-videos-list # Windows\n```\n</details>\n\n<details>\n <summary><b>If you're on Windows: make sure you <i>always</i> open <code>Command Prompt</code> or <code>Powershell</code> (both work) in \"Run as Administrator\" mode!</b></summary>\n\n - shortcut: <kbd>\u229e Win</kbd> + <kbd>X</kbd> + <kbd>A</kbd>\n - this allows `yt_videos_list` to update selenium webdriver binaries to be compatible with newer browser versions as browsers are updated (e.g. your Firefox browser updates from version 77 to version 82)\n - to see the commands being run, see the `yt_videos_list/docs/dependencies.json` file\n</details>\n\n<details>\n <summary><b>Running the package from the python interpreter</b></summary>\n\n```shell\npython3 # MacOS/Linux\npython # Windows\n```\n```python\nfrom yt_videos_list import ListCreator\n\n\nmy_driver = 'firefox' # SUBSTITUTE DRIVER YOU WANT (options below)\nlc = ListCreator(driver=my_driver, scroll_pause_time=0.8)\n\n\nlc.create_list_for(url='https://www.youtube.com/user/schafer5')\nlc.create_list_for(url='https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ', log_silently=True)\n# Set `log_silently` to `True` to mute program logging to the console.\n# The program will log the prgram status and any program information\n# to only the log file for the channel being scraped\n# (this is useful when scraping multiple channels at once with multi-threading).\n# By default, the program logs to both the log file for the channel being scraped AND the console.\n\n\n# to name the file using the channel ID instead of the channel name, set file_name='id'\n# this is useful when scraping multiple channels with the same name:\nlc.create_list_for(url='https://www.youtube.com/channel/UCb2EYjrzI6WpNAmPZeihhag', file_name='id')\nlc.create_list_for(url='https://www.youtube.com/channel/UCDzYhlGOvGqsYw8IaTKDT8g', file_name='id')\n\n# for more details about this method:\nhelp(lc.create_list_for)\n\n\n# see the new files that were just created:\nimport os\nos.system('ls -lt | head') # MacOS/Linux\nos.system('dir /O-D | find \"_videos_list\"') # Windows\n\n# for more information on using the module:\nhelp(lc)\n```\n- `driver` options include:\n - `'firefox'`\n - `'opera'`\n - `'safari'` (MacOS only)\n - `'chrome'`\n - `'brave'`\n - `'edge'` (Windows only!)\n- increase `scroll_pause_time` for laggy internet and decrease `scroll_pause_time` for fast internet\n</details>\n\n#### If you already scraped a channel and the channel uploaded a new video, simply rerun this program on that channel and this package updates your files to include the newer video(s)!\n\n<details>\n <summary><b>Scraping multiple channels from a file simultaneously with multi-threading</b></summary>\n\nAdd the url to every channel you want to extract information from in a `txt` file with every url placed on a new line.\n- example: [`channels.txt`](./channels.txt) (NOTE this is a relative link, so this ***might*** not link properly on non-GitHub hosted sites!)\n\nEnter the python interpreter:\n\n```\npython3 # MacOS/Linux\npython # Windows\n```\n```python\nfrom yt_videos_list import ListCreator\n\nlc = ListCreator(driver='firefox', scroll_pause_time=1.2)\nlc.create_list_from(path_to_channel_urls_file='channels.txt', number_of_threads=4)\n\n# configuring settings:\nlc.create_list_from(\n path_to_channel_urls_file='channels.txt',\n number_of_threads=4,\n min_sleep=1,\n max_sleep=5,\n after_n_channels_pause_for_s=(20, 10),\n log_subthread_status_silently=False,\n log_subthread_info_silently=False\n) # defaults (keyword argument form)\nlc.create_list_from('channels.txt', 4, 1, 5, (20, 10), False, False) # defaults (positional argument form)\nlc.create_list_from('channels.txt', min_sleep=3, max_sleep=10) # modifying only min_sleep and max_sleep\n\nhelp(lc.create_list_from) # see API method details\n```\n\n</details>\n\n<details>\n <summary><b>Explicitly downloading all Selenium dependencies</b></summary>\n\nIdeal if you use Selenium for other projects \ud83d\ude0e\n- Make sure you already have the `yt-videos-list` package installed (follow directions above for getting set up), then run the following:\n```shell\npip3 install -U yt-videos-list # MacOS/Linux: ensure latest package\npython3 # MacOS/Linux: enter python interpreter\npip install -U yt-videos-list # Windows: ensure latest package\npython # Windows: enter python interpreter\n```\n```python\nfrom yt_videos_list.download import selenium_webdriver_dependencies\nselenium_webdriver_dependencies.download_all()\n```\nThat's all! \ud83e\udd13\n</details>\n\n<details>\n <summary><b>More API information</b></summary>\n\n---\n**NOTE** that you can also access all the information below from the Python interpreter by entering\n```python\nimport yt_videos_list\nhelp(yt_videos_list)\n```\n\n---\n```python\n# default options for the ListCreator instance\n\nListCreator(\n txt=True,\n csv=True,\n md=True,\n file_suffix=True,\n all_video_data_in_memory=False,\n video_data_returned=False,\n video_id_only=False,\n reverse_chronological=True,\n headless=False,\n scroll_pause_time=0.8,\n driver='firefox',\n cookie_consent=False,\n verify_page_bottom_n_times=3,\n file_buffering=-1,\n )\n```\nThere are a number of optional arguments you can specify during the instantiation of the ListCreator instance. The preceding arguments are run by default, but in case you want more flexibility, you can specify the:\n- `driver` argument:\n - Firefox (default)\n - Opera\n - Safari (MacOS only)\n - Chrome\n - Brave\n - Edge (Windows only)\n - `driver='firefox'`\n - `driver='opera'`\n - `driver='safari'`\n - `driver='chrome'`\n - `driver='brave'`\n - `driver='edge'`\n- `cookie_consent` argument:\n - `False` (default) - block all cookie options if prompted by YouTube (at consent.youtube.com)\n - `True` - accept all cookie options if prompted by YouTube (also at consent.youtube.com)\n - `cookie_consent=False` (default) OR `cookie_consent=True`\n- `txt`, `csv`, `md` file type argument:\n - `True` (default) - create a file for the specified type\n - `False` - do not create a file for the specified type\n - `txt=True` (default) OR `txt=False`\n - `csv=True` (default) OR `csv=False`\n - ` md=True` (default) OR ` md=False`\n- `file_suffix` argument:\n - `True` (default) - add a file suffix to the output file name\n - `ChannelName_reverse_chronological_videos_list.csv`\n - `ChannelName_chronological_videos_list.csv`\n - `False` - do NOT add a file suffix to the output file name\n - this means if a reverse chronological file and a chronological file is made for the same channel, they will have the same name!\n - `ChannelName.csv` (reverse chronological output file)\n - `ChannelName.csv` (chronological output file)\n -> `file_suffix=True` (default) OR `file_suffix=False`\n- `all_video_data_in_memory` argument:\n - `False` (default) - do not scrape the entire page\n - `True` - scrape the entire page (must ALSO set the `video_data_returned` attribute to `True` to return this data!)\n - `all_video_data_in_memory=False` (default) OR `all_video_data_in_memory=True`\n- `video_data_returned` argument:\n - `False` (default) - do not return video data collected from the current scrape job (return dummy data instead: `[[0, '', '', '']]`)\n - `True` - return video data collected from the current scrape job\n - if `all_video_data_in_memory` attribute set to `False`, the returned data MIGHT not be the full data, and video numbering MIGHT be incorrect\n - set `all_video_data_in_memory` attribute to `True` to return ALL video data for channel (video number will then also ALWAYS be correct)\n - `video_data_returned=False` (default) OR `video_data_returned=True`\n- `video_id_only` argument:\n - `False` (default) - include the full URL to video: `https://www.youtube.com/watch?v=ElevenChars`\n - `True` - include only the identifier parameter to video: `ElevenChars`\n - `video_id_only=False` (default) OR `video_id_only=True`\n- `reverse_chronological` argument:\n - `True` (default) - write the files in order from most recent video to the oldest video\n - `False` - write the files in order from oldest video to the most recent video\n - `reverse_chronological=True` (default) OR `reverse_chronological=False`\n- `headless` argument:\n - `False` (default) - run the driver with an open Selenium instance for viewing\n - `True` - run the driver in \"invisible\" mode\n - `headless=False` (default) OR `headless=True`\n- `scroll_pause_time` argument:\n - any float values greater than `0` (default `0.8`)\n - The value you provide will be how long the program waits before trying to scroll the videos list page down for the channel you want to scrape. For fast internet connections, you may want to reduce the value, and for slow connections you may want to increase the value.\n - `scroll_pause_time=0.8` (default)\n - CAUTION: reducing this value too much will result in the program not capturing all the videos, so be careful! Experiment :)\n- `verify_page_bottom_n_times` argument:\n - any int values greater than `0` (defaults to `3`)\n - NOTE: this argument is only used when CREATING a new file for a new channel, and is unused when UPDATING an existing file for an already scraped channel.\n - The value you provide will be how many times the program needs to verify it acually reached the bottom of the page before accepting it is the bottom of the page, and starting to write the information to the output file(s).\n - For channels that have uploaded THOUSANDS of videos, increase this value to a large number that you think should be sufficient to verify the program reached the bottom of the page.\n - To determine HOW large of a value you should provide, determine the length of time you'd like to wait before being reasonably sure that you reached the bottom of the page and it's not just YouTube's server trying to fetch the response from an old database entry, and divide the time you decided to wait by the `scroll_pause_time` argument.\n - For example, if you want to wait 45 seconds and you set the `scrioll_pause_time` value to `1.0`:\n -> `your_time / scroll_pause_time`\n -> `45 / 1.0`\n -> `45`\n -> therefore: `verify_page_bottom_n_times=45`\n - For channels with only a couple hundred videos (or less), the default value of verify_`page_bottom_n_times=3` **should** be sufficient.\n - See commit a68f8f62e5c343cbb0641125e271bb96cc4f0750 for more details.\n- `file_buffering` argument:\n - any `int` values greater than `0` (default `-1`, which uses the default OS setting)\n - LEAVE THIS ALONE IF YOU'RE UNSURE!\n - Documentation:\n - https://docs.python.org/3/library/functions.html#open\n - Deep dive:\n - https://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file\n - https://stackoverflow.com/questions/10019456/usage-of-sys-stdout-flush-method\n - https://stackoverflow.com/questions/230751/how-can-i-flush-the-output-of-the-print-function\n - https://en.wikipedia.org/wiki/Data_buffer\n - https://stackoverflow.com/questions/1450551/buffered-vs-unbuffered-io\n - https://www.quora.com/What-does-flushing-files-or-Stdin-do-in-Python\n - https://www.quora.com/Whats-the-difference-between-buffered-I-O-and-unbuffered-I-O\n - https://stackoverflow.com/questions/8409050/unix-buffered-vs-unbuffered-i-o\n - https://medium.com/@bramblexu/three-ways-to-close-buffer-for-stdout-stdin-stderr-in-python-8be694bd2737\n - https://www.quora.com/In-C-what-does-buffering-I-O-or-buffered-I-O-mean\n\n</details>\n\n<details>\n<summary><b><code>scrapetube</code> integration</b></summary>\n\n[`scrapetube`](https://github.com/dermasmid/scrapetube) is a much more efficient backend developer tool that loads the videos uploaded by a channel. This package ***also*** supports loading information from playlists and searches, which `yt-videos-list` currently does not do. Integration with `scrapetube` will be available in a future `yt-videos-list` release!\n\nTo keep things backwards-compatible and maintainable, the `scrapetube` integration will be accessible through an almost identical, **separate** interface as the `ListCreator` interface, and the original `ListCreator` interface will continue to be available and continue to receive updates. \ud83e\udd13\n\n</details>\n\n<details>\n<summary><b>Cloning and running locally</b></summary>\n\nTo clone the repository and install the most updated version of the package that may not yet be available on the latest release through [PyPI](pypi.org/project/yt-videos-list/), clone this repository and run:\n```\ncd yt_videos_list/python # MacOS/Linux\npython3 -m pip install . # MacOS/Linux\n\ncd yt_videos_list\\python # Windows\npython -m pip install . # Windows\n```\nTo make your own changes to the `yt_videos_list` python package and run the changes locally:\n```\n# make changes to the codebase in the\n# ===> /dev <=== directory\npython3 minifier.py # MacOS/Linux\npip3 install . # MacOS/Linux\n\npython minifier.py # Windows\npip install . # Windows\n```\nNOTE: make the changes to the codebase in the `yt_videos_list/python/dev` directory!!\n - the code in the `yt_videos_list/python/yt-videos-list` directory is minified with\n - leading indents stipped to the minimum (1 space for each nested scope)\n - whitespace for padding (e.g. extra spaces to align variable assignments) stripped\n - comments stripped\n - as a result, the code in the `yt_videos_list/python/yt-videos-list` directory is NOT human readable, and the `yt_videos_list/python/dev` directory should be used for development instead!\n - the `minifier.py` module performs all the code preprocessing and packages the code from `yt_videos_list/python/dev` into the final version seen in the `yt_videos_list/python/yt-videos-list` directory\n - so running `minifier.py` ***before*** installing the local package with `pip install .` (Windows) or `pip3 install .` is essential!\n</details>\n\n<details>\n<summary><b>Running tests</b></summary>\n\nThe tests use the custom `ThreadWithResult` subclass of `threading.Thread` provided by the `save-thread-result` package, so make sure you install that module using\n```\npip3 install -U save-thread-result # MacOS/Linux\npip install -U save-thread-result # Windows\n\n# if that doesn't work:\n\npython3 -m pip install -U save-thread-result # MacOS/Linux\npython -m pip install -U save-thread-result # Windows\n```\n\nThen, make sure you're in the `yt_videos_list/python` directory, then run:\n```\ntests\\run_tests.bat # Windows\n#### Any shell on MacOS/Linux\nbash tests/run_tests.sh # this works\ncsh tests/run_tests.sh # this works\ndash tests/run_tests.sh # this works\nksh tests/run_tests.sh # this also works\ntcsh tests/run_tests.sh # this works too\nzsh tests/run_tests.sh # this works as well\n# you can try other shells and\n# they should work too, since\n# there's no special syntax in\n# the run_tests.sh file\n```\n</details>\n\n\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "YouTube bot to make a YouTube videos list (including all video titles and URLs uploaded by a channel) with end-to-end web scraping - no API tokens required. \ud83c\udf1f Star this repo if you found it useful! \ud83c\udf1f",
"version": "0.6.7",
"project_urls": {
"Bug Reports": "https://github.com/slow-but-steady/yt-videos-list/issues",
"Homepage": "https://github.com/slow-but-steady/yt-videos-list/tree/main/python",
"PyPi Funding": "https://donate.pypi.org",
"Source": "https://github.com/slow-but-steady/yt-videos-list/tree/main/python"
},
"split_keywords": [
"youtube",
"videos",
"url",
"scraping",
"automation",
"selenium",
"csv",
"txt",
"macos",
"windows",
"linux"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "29e40f97dfcaca31cec3b5d7a956feda053b5477475d36b03bf43f36b54975b1",
"md5": "d79d824c6812a999e231888b8afb77a1",
"sha256": "6a317c24759047571faac6d9fe5a35e91619785715800c613a36104c369e5c3f"
},
"downloads": -1,
"filename": "yt_videos_list-0.6.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d79d824c6812a999e231888b8afb77a1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6, <4",
"size": 40608,
"upload_time": "2023-11-11T04:05:00",
"upload_time_iso_8601": "2023-11-11T04:05:00.011870Z",
"url": "https://files.pythonhosted.org/packages/29/e4/0f97dfcaca31cec3b5d7a956feda053b5477475d36b03bf43f36b54975b1/yt_videos_list-0.6.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "afa86c8c4b9907e72868b69bb7356545654702440d54cdaf498e5bc2343ac4e3",
"md5": "0aa7a97303a41c2b37b70a8c47fc6f40",
"sha256": "25277c5dd4b96f58eed901cddcaeefa4473f07f80a396be37fefee13a8a6fa94"
},
"downloads": -1,
"filename": "yt_videos_list-0.6.7.tar.gz",
"has_sig": false,
"md5_digest": "0aa7a97303a41c2b37b70a8c47fc6f40",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6, <4",
"size": 43072,
"upload_time": "2023-11-11T04:05:03",
"upload_time_iso_8601": "2023-11-11T04:05:03.324533Z",
"url": "https://files.pythonhosted.org/packages/af/a8/6c8c4b9907e72868b69bb7356545654702440d54cdaf498e5bc2343ac4e3/yt_videos_list-0.6.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-11 04:05:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "slow-but-steady",
"github_project": "yt-videos-list",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "yt-videos-list"
}