Name | yt-fts JSON |
Version |
0.1.57
JSON |
| download |
home_page | None |
Summary | Search all of a YouTube channel from the command line |
upload_time | 2024-09-06 09:56:24 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | This is free and unencumbered software released into the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. For more information, please refer to <https://unlicense.org> |
keywords |
youtube
subtitles
search
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# yt-fts - YouTube Full Text Search
`yt-fts` is a command line program that uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to scrape all of a YouTube
channels subtitles and load them into a sqlite database that is searchable from the command line. It allows you to
query a channel for specific key word or phrase and will generate time stamped YouTube urls to
the video containing the keyword.
It also supports semantic search via the [OpenAI embeddings API](https://beta.openai.com/docs/api-reference/) using [chromadb](https://github.com/chroma-core/chroma).
- [Blog Post](https://notjoemartinez.com/blog/youtube_full_text_search/)
- [LLM/RAG Chat Bot](#llm-chat-bot)
- [Video Summaries](#summarize)
- [Semantic Search](#vsearch-semantic-search)
- [CHANGELOG](CHANGELOG.md)
https://github.com/NotJoeMartinez/yt-fts/assets/39905973/6ffd8962-d060-490f-9e73-9ab179402f14
## Installation
pip
```bash
pip install yt-fts
```
## `download`
Download subtitles for a channel.
Takes a channel url as an argument. Specify the number of jobs to parallelize the download with the `--jobs` flag.
Use the `--cookies-from-browser` to use cookies from your browser in the requests, will help if you're getting errors
that request you to sign in. You can also run the `update` command several times to gradually get more videos into the database.
```bash
yt-fts download --jobs 5 "https://www.youtube.com/@3blue1brown"
yt-fts download --cookies-from-browser firefox "https://www.youtube.com/@3blue1brown"
```
## `list`
List saved channels.
The (ss) next to the channel name indicates that the channel has semantic search enabled.
```bash
yt-fts list
```
```
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID ┃ Name ┃ Count ┃ Channel ID ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1 │ ChessPage1 (ss) │ 19 │ UCO2QPmnJFjdvJ6ch-pe27dQ │
│ 2 │ 3Blue1Brown │ 127 │ UCYO_jab_esuFRV4b17AJtAw │
│ 3 │ george hotz archive │ 410 │ UCwgKmJM4ZJQRJ-U5NjvR2dg │
│ 4 │ The Tim Dillon Show │ 288 │ UC4woSp8ITBoYDmjkukhEhxg │
│ 5 │ Academy of Ideas (ss) │ 190 │ UCiRiQGCHGjDLT9FQXFW0I3A │
└────┴───────────────────────┴───────┴──────────────────────────┘
```
## `search` (Full Text Search)
Full text search for a string in saved channels.
- The search string does not have to be a word for word and match
- Search strings are limited to 40 characters.
```bash
# search in all channels
yt-fts search "[search query]"
# search in channel
yt-fts search "[search query]" --channel "[channel name or id]"
# search in specific video
yt-fts search "[search query]" --video-id "[video id]"
# limit results
yt-fts search "[search query]" --limit "[number of results]" --channel "[channel name or id]"
# export results to csv
yt-fts search "[search query]" --export --channel "[channel name or id]"
```
Advanced Search Syntax:
The search string supports sqlite [Enhanced Query Syntax](https://www.sqlite.org/fts3.html#full_text_index_queries).
which includes things like [prefix queries](https://www.sqlite.org/fts3.html#termprefix) which you can use to match parts of a word.
```bash
# AND search
yt-fts search "knife AND Malibu" --channel "The Tim Dillon Show"
# OR SEARCH
yt-fts search "knife OR Malibu" --channel "The Tim Dillon Show"
# wild cards
yt-fts search "rea* kni* Mali*" --channel "The Tim Dillon Show"
```
# Semantic Search and RAG
You can enable semantic search for a channel by using the `mbeddings` command.
This requires an OpenAI API key set in the environment variable `OPENAI_API_KEY`, or
you can pass the key with the `--openai-api-key` flag.
## `embeddings`
Fetches OpenAI embeddings for specified channel
```bash
# make sure openAI key is set
# export OPENAI_API_KEY="[yourOpenAIKey]"
yt-fts embeddings --channel "3Blue1Brown"
# specify time interval in seconds to split text by default is 30
# the larger the interval the more accurate the llm response
# but semantic search will have more text for you to read.
yt-fts embeddings --interval 60 --channel "3Blue1Brown"
```
After the embeddings are saved you will see a `(ss)` next to the channel name when you
list channels, and you will be able to use the `vsearch` command for that channel.
## `llm` (Chat Bot)
Starts interactive chat session with `gpt-4o` OpenAI model using
the semantic search results of your initial prompt as the context
to answer questions. If it can't answer your question, it has a
mechanism to update the context by running targeted query based
off the conversation. The channel must have semantic search enabled.
```bash
yt-fts llm --channel "3Blue1Brown" "How does back propagation work?"
```
## `summarize`
Summarizes a YouTube video transcript, providing time stamped URLS.
Requires a valid YouTube video URL or video ID as argument. If the
trancript is not in the database it will try to scrape it.
```bash
yt-fts summarize "https://www.youtube.com/watch?v=9-Jl0dxWQs8"
# or
yt-fts summarize "9-Jl0dxWQs8"
```
output:
```
In this video, 3Blue1Brown explores how large language models (LLMs) like GPT-3
might store facts within their vast...
1 Introduction to Fact Storage in LLMs:
• The video starts by questioning how LLMs store specific facts and
introduces the idea that these facts might be stored in a particular part of the
network known as multi-layer perceptrons (MLPs).
• 0:00
2 Overview of Transformers and MLPs:
• Provides a refresher on transformers and explains that the video will focus
```
## `vsearch` (Semantic Search)
`vsearch` is for "Vector search". This requires that you enable semantic
search for a channel with `embeddings`. It has the same options as
`search` but output will be sorted by similarity to the search string and
the default return limit is 10.
```bash
# search by channel name
yt-fts vsearch "[search query]" --channel "[channel name or id]"
# search in specific video
yt-fts vsearch "[search query]" --video-id "[video id]"
# limit results
yt-fts vsearch "[search query]" --limit "[number of results]" --channel "[channel name or id]"
# export results to csv
yt-fts vsearch "[search query]" --export --channel "[channel name or id]"
```
## How To
**Export search results:**
For both the `search` and `vsearch` commands you can export the results to a csv file with
the `--export` flag. and it will save the results to a csv file in the current directory.
```bash
yt-fts search "life in the big city" --export
yt-fts vsearch "existing in large metropolaten center" --export
```
**Delete a channel:**
You can delete a channel with the `delete` command.
```bash
yt-fts delete --channel "3Blue1Brown"
```
**Update a channel:**
The update command currently only works for full text search and will not update the
semantic search embeddings.
```bash
yt-fts update --channel "3Blue1Brown"
```
**Export all of a channel's transcript:**
This command will create a directory in current working directory with the YouTube
channel id of the specified channel.
```bash
# Export to vtt
yt-fts export --channel "[id/name]" --format "[vtt/txt]"
```
Raw data
{
"_id": null,
"home_page": null,
"name": "yt-fts",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "youtube, subtitles, search",
"author": null,
"author_email": "NotJoeMartinez <notjoemartinez@protonmail.com>",
"download_url": "https://files.pythonhosted.org/packages/a0/0b/63d66bbe3d717409061b1444603f3b24065b54b71ad28c03766088da4a11/yt_fts-0.1.57.tar.gz",
"platform": null,
"description": "# yt-fts - YouTube Full Text Search \n`yt-fts` is a command line program that uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to scrape all of a YouTube \nchannels subtitles and load them into a sqlite database that is searchable from the command line. It allows you to\nquery a channel for specific key word or phrase and will generate time stamped YouTube urls to\nthe video containing the keyword. \n\nIt also supports semantic search via the [OpenAI embeddings API](https://beta.openai.com/docs/api-reference/) using [chromadb](https://github.com/chroma-core/chroma).\n\n- [Blog Post](https://notjoemartinez.com/blog/youtube_full_text_search/)\n- [LLM/RAG Chat Bot](#llm-chat-bot)\n- [Video Summaries](#summarize)\n- [Semantic Search](#vsearch-semantic-search)\n- [CHANGELOG](CHANGELOG.md)\n\nhttps://github.com/NotJoeMartinez/yt-fts/assets/39905973/6ffd8962-d060-490f-9e73-9ab179402f14\n\n## Installation \n\npip \n\n```bash\npip install yt-fts\n```\n\n## `download`\nDownload subtitles for a channel. \n\nTakes a channel url as an argument. Specify the number of jobs to parallelize the download with the `--jobs` flag. \nUse the `--cookies-from-browser` to use cookies from your browser in the requests, will help if you're getting errors \nthat request you to sign in. You can also run the `update` command several times to gradually get more videos into the database. \n\n```bash\nyt-fts download --jobs 5 \"https://www.youtube.com/@3blue1brown\"\nyt-fts download --cookies-from-browser firefox \"https://www.youtube.com/@3blue1brown\"\n```\n\n## `list`\nList saved channels.\n\nThe (ss) next to the channel name indicates that the channel has semantic search enabled. \n\n```bash\nyt-fts list\n```\n\n```\n\u250f\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 ID \u2503 Name \u2503 Count \u2503 Channel ID \u2503\n\u2521\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n\u2502 1 \u2502 ChessPage1 (ss) \u2502 19 \u2502 UCO2QPmnJFjdvJ6ch-pe27dQ \u2502\n\u2502 2 \u2502 3Blue1Brown \u2502 127 \u2502 UCYO_jab_esuFRV4b17AJtAw \u2502\n\u2502 3 \u2502 george hotz archive \u2502 410 \u2502 UCwgKmJM4ZJQRJ-U5NjvR2dg \u2502\n\u2502 4 \u2502 The Tim Dillon Show \u2502 288 \u2502 UC4woSp8ITBoYDmjkukhEhxg \u2502\n\u2502 5 \u2502 Academy of Ideas (ss) \u2502 190 \u2502 UCiRiQGCHGjDLT9FQXFW0I3A \u2502\n\u2514\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\n```\n\n## `search` (Full Text Search)\nFull text search for a string in saved channels.\n\n- The search string does not have to be a word for word and match \n- Search strings are limited to 40 characters. \n\n```bash\n# search in all channels\nyt-fts search \"[search query]\" \n\n# search in channel \nyt-fts search \"[search query]\" --channel \"[channel name or id]\" \n\n# search in specific video\nyt-fts search \"[search query]\" --video-id \"[video id]\"\n\n# limit results \nyt-fts search \"[search query]\" --limit \"[number of results]\" --channel \"[channel name or id]\"\n\n# export results to csv\nyt-fts search \"[search query]\" --export --channel \"[channel name or id]\" \n```\n\nAdvanced Search Syntax:\n\nThe search string supports sqlite [Enhanced Query Syntax](https://www.sqlite.org/fts3.html#full_text_index_queries).\nwhich includes things like [prefix queries](https://www.sqlite.org/fts3.html#termprefix) which you can use to match parts of a word. \n\n```bash\n# AND search\nyt-fts search \"knife AND Malibu\" --channel \"The Tim Dillon Show\" \n\n# OR SEARCH \nyt-fts search \"knife OR Malibu\" --channel \"The Tim Dillon Show\" \n\n# wild cards\nyt-fts search \"rea* kni* Mali*\" --channel \"The Tim Dillon Show\" \n```\n\n\n# Semantic Search and RAG\nYou can enable semantic search for a channel by using the `mbeddings` command.\nThis requires an OpenAI API key set in the environment variable `OPENAI_API_KEY`, or \nyou can pass the key with the `--openai-api-key` flag. \n\n\n## `embeddings`\nFetches OpenAI embeddings for specified channel\n```bash\n\n# make sure openAI key is set\n# export OPENAI_API_KEY=\"[yourOpenAIKey]\"\n\nyt-fts embeddings --channel \"3Blue1Brown\"\n\n# specify time interval in seconds to split text by default is 30 \n# the larger the interval the more accurate the llm response \n# but semantic search will have more text for you to read. \nyt-fts embeddings --interval 60 --channel \"3Blue1Brown\" \n```\nAfter the embeddings are saved you will see a `(ss)` next to the channel name when you \nlist channels, and you will be able to use the `vsearch` command for that channel. \n\n## `llm` (Chat Bot)\nStarts interactive chat session with `gpt-4o` OpenAI model using \nthe semantic search results of your initial prompt as the context\nto answer questions. If it can't answer your question, it has a \nmechanism to update the context by running targeted query based \noff the conversation. The channel must have semantic search enabled.\n\n```bash\nyt-fts llm --channel \"3Blue1Brown\" \"How does back propagation work?\"\n```\n\n## `summarize`\nSummarizes a YouTube video transcript, providing time stamped URLS. \nRequires a valid YouTube video URL or video ID as argument. If the \ntrancript is not in the database it will try to scrape it.\n\n```bash\nyt-fts summarize \"https://www.youtube.com/watch?v=9-Jl0dxWQs8\"\n# or\nyt-fts summarize \"9-Jl0dxWQs8\"\n```\noutput:\n```\nIn this video, 3Blue1Brown explores how large language models (LLMs) like GPT-3 \nmight store facts within their vast... \n\n 1 Introduction to Fact Storage in LLMs: \n \u2022 The video starts by questioning how LLMs store specific facts and \n introduces the idea that these facts might be stored in a particular part of the \n network known as multi-layer perceptrons (MLPs). \n \u2022 0:00 \n 2 Overview of Transformers and MLPs: \n \u2022 Provides a refresher on transformers and explains that the video will focus \n```\n\n## `vsearch` (Semantic Search)\n`vsearch` is for \"Vector search\". This requires that you enable semantic \nsearch for a channel with `embeddings`. It has the same options as \n`search` but output will be sorted by similarity to the search string and \nthe default return limit is 10. \n\n```bash\n# search by channel name\nyt-fts vsearch \"[search query]\" --channel \"[channel name or id]\"\n\n# search in specific video\nyt-fts vsearch \"[search query]\" --video-id \"[video id]\"\n\n# limit results \nyt-fts vsearch \"[search query]\" --limit \"[number of results]\" --channel \"[channel name or id]\"\n\n# export results to csv\nyt-fts vsearch \"[search query]\" --export --channel \"[channel name or id]\" \n\n```\n\n## How To\n\n**Export search results:**\n\nFor both the `search` and `vsearch` commands you can export the results to a csv file with \nthe `--export` flag. and it will save the results to a csv file in the current directory. \n```bash\nyt-fts search \"life in the big city\" --export\nyt-fts vsearch \"existing in large metropolaten center\" --export\n```\n\n**Delete a channel:**\nYou can delete a channel with the `delete` command. \n\n```bash\nyt-fts delete --channel \"3Blue1Brown\"\n```\n\n\n**Update a channel:**\nThe update command currently only works for full text search and will not update the \nsemantic search embeddings. \n\n```bash\nyt-fts update --channel \"3Blue1Brown\"\n```\n\n\n**Export all of a channel's transcript:**\n\nThis command will create a directory in current working directory with the YouTube \nchannel id of the specified channel.\n```bash\n# Export to vtt\nyt-fts export --channel \"[id/name]\" --format \"[vtt/txt]\"\n```\n",
"bugtrack_url": null,
"license": "This is free and unencumbered software released into the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. For more information, please refer to <https://unlicense.org> ",
"summary": "Search all of a YouTube channel from the command line",
"version": "0.1.57",
"project_urls": {
"Homepage": "https://github.com/NotJoeMartinez/yt-fts"
},
"split_keywords": [
"youtube",
" subtitles",
" search"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5694c221610fa091f103b3d219dd6de737308c331b0050b14fc8fea10c3fad6d",
"md5": "2ddad87316f25dc77e14f9863d0a9206",
"sha256": "8e365a8d4f71ae83f89699401347d278d09b4ad9b39f8448f6397ed3601c5116"
},
"downloads": -1,
"filename": "yt_fts-0.1.57-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2ddad87316f25dc77e14f9863d0a9206",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 30849,
"upload_time": "2024-09-06T09:56:23",
"upload_time_iso_8601": "2024-09-06T09:56:23.177794Z",
"url": "https://files.pythonhosted.org/packages/56/94/c221610fa091f103b3d219dd6de737308c331b0050b14fc8fea10c3fad6d/yt_fts-0.1.57-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a00b63d66bbe3d717409061b1444603f3b24065b54b71ad28c03766088da4a11",
"md5": "c0acd6a0909f47ca27a1736fb712f79b",
"sha256": "42474cd4b9142b7c39ad034a0453761a7406ff46746218cc0078065ac2faf2a2"
},
"downloads": -1,
"filename": "yt_fts-0.1.57.tar.gz",
"has_sig": false,
"md5_digest": "c0acd6a0909f47ca27a1736fb712f79b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 30766,
"upload_time": "2024-09-06T09:56:24",
"upload_time_iso_8601": "2024-09-06T09:56:24.345154Z",
"url": "https://files.pythonhosted.org/packages/a0/0b/63d66bbe3d717409061b1444603f3b24065b54b71ad28c03766088da4a11/yt_fts-0.1.57.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-06 09:56:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "NotJoeMartinez",
"github_project": "yt-fts",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "yt-fts"
}