# preserva-tweet
## Ingest Tweets from a Twitter Export into Preservica
This library provides a Python module which will ingest a Twitter export
zip file into Preservica as individual tweets with any attached media files such as images or video.
The tweets can then be rendered directly from within Preservica.
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/carj/preserva-tweet
## Support
preserva-tweet is 3rd party open source client and is not affiliated or supported by Preservica Ltd.
There is no support for use of the library by Preservica Ltd.
Bug reports can be raised directly on GitHub.
Users of preserva-tweet should make sure they are licensed to use the Preservica REST APIs.
## License
The package is available as open source under the terms of the Apache License 2.0
## Installation
preserva-tweet is available from the Python Package Index (PyPI)
https://pypi.org/project/preserva-tweet/
To install preserva-tweet, simply run this simple command in your terminal of choice:
$ pip install preserva-tweet
## Downloading your Twitter Archive
### Step 1
Log in to your X account and open the Settings and Privacy panel.
Go to the “Your Account” tab and select “Download an Archive of Your Data.
### Step 2
For security purposes, you’ll need to re-enter your password. You’ll also need to provide a verification code.
### Step 3
Once you’ve successfully completed these steps, you’ll see an option to request your archive.
Click the “Request Archive” button to begin processing.
### Step 4
The button will change to “Requesting Archive” and you’ll see a notice that your request is pending.
Now it’s time to wait. It can take 24hrs for the export to be ready.
### Step 5
When your archive is ready to download, you’ll get both an email in your inbox and a notification in your X account.
Since Twitter archives are only available for a limited time, pay attention to the expiration date.
## Ingesting Tweets
To run the module specify the location of the twitter export using the -a or --archive flag.
The parent Preservica collection for the tweets must be specified using the -c --collection flag as a UUID
preserva-tweet uses the pyPreservica python library for ingesting content. This means that preserva-tweet can use the
same authentication methods as pyPreservica for reading Preservica credentials. See:
https://pypreservica.readthedocs.io/en/latest/intro.html#authentication
$ python -m preserva-tweet -a twitter-2024-10-17.zip -c a7ad52e3-2cb3-4cb5-af2a-3ab08829a2a8
```
usage: preserva-tweet [-h] -a ARCHIVE -c COLLECTION [-v] [-d] [-u USERNAME] [-p PASSWORD] [-s SERVER] [-t SECURITY_TAG] [--validate]
Ingest a Twitter Account History Export into Preservica
options:
-h, --help show this help message and exit
-a ARCHIVE, --archive ARCHIVE
Twitter export ZIP archive path
-c COLLECTION, --collection COLLECTION
The Preservica parent collection uuid
-v, --verbose Print information as tweets are ingested
-d, --dry-run process the twitter export without ingesting
-u USERNAME, --username USERNAME
Your Preservica username if not using credentials.properties
-p PASSWORD, --password PASSWORD
Your Preservica password if not using credentials.properties
-s SERVER, --server SERVER
Your Preservica server domain name if not using credentials.properties
-t SECURITY_TAG, --security-tag SECURITY_TAG
The Preservica security tag of the ingested tweets (default is "open")
--validate Validate the twitter ingest to check for missing tweets
```
## Notes
The preserva-tweet program does need an internet connection to run. Most of the images and video's are fetched from
the ZIP archive, but some assets such as thumbnails for the videos are fetched directly from the twitter servers.
For large Twitter accounts the export will come as multiple ZIP files. Just run the program once for each ZIP file.
preserva-tweet will not ingest the same tweet twice if the script is re-run against the same ZIP file.
This also means you can always do an new export
in the future and re-run the program to add in any new tweets.
For Preservica NewGen interface customers preserva-tweet will create a custom metadata group to store tweet metadata.
## Validate Mode
preserva-tweet has a validation mode which is enabled using the --validate flag.
This will check that each tweet within the ZIP archive has been ingested into Preservica. This mode can be run after
the main ingest and will provide details of any tweets which were not ingested successfully.
The results of the ingested and not ingested tweets are written to csv files.
The validate mode can be run using:
$ python -m preserva-tweet -a twitter-2024-10-17.zip --validate
Raw data
{
"_id": null,
"home_page": "https://github.com/carj/preserva-tweet",
"name": "preserva-tweet",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Preservica API Preservation Twitter",
"author": "James Carr",
"author_email": "drjamescarr@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/b7/82/92291de6e4c0c55c925566ed4bf38c1536cee0e6ec3fe2206cba9dec02a9/preserva_tweet-0.5.0.tar.gz",
"platform": null,
"description": "# preserva-tweet\r\n\r\n## Ingest Tweets from a Twitter Export into Preservica\r\n\r\nThis library provides a Python module which will ingest a Twitter export\r\nzip file into Preservica as individual tweets with any attached media files such as images or video.\r\nThe tweets can then be rendered directly from within Preservica.\r\n\r\n## Contributing\r\n\r\nBug reports and pull requests are welcome on GitHub at https://github.com/carj/preserva-tweet\r\n\r\n## Support \r\n\r\npreserva-tweet is 3rd party open source client and is not affiliated or supported by Preservica Ltd.\r\nThere is no support for use of the library by Preservica Ltd.\r\nBug reports can be raised directly on GitHub.\r\n\r\nUsers of preserva-tweet should make sure they are licensed to use the Preservica REST APIs. \r\n\r\n## License\r\n\r\nThe package is available as open source under the terms of the Apache License 2.0\r\n\r\n## Installation\r\n\r\npreserva-tweet is available from the Python Package Index (PyPI)\r\n\r\nhttps://pypi.org/project/preserva-tweet/\r\n\r\nTo install preserva-tweet, simply run this simple command in your terminal of choice:\r\n\r\n $ pip install preserva-tweet\r\n\r\n## Downloading your Twitter Archive\r\n\r\n### Step 1\r\n\r\n Log in to your X account and open the Settings and Privacy panel. \r\n Go to the \u201cYour Account\u201d tab and select \u201cDownload an Archive of Your Data.\r\n\r\n### Step 2\r\n\r\nFor security purposes, you\u2019ll need to re-enter your password. You\u2019ll also need to provide a verification code.\r\n\r\n### Step 3\r\n\r\nOnce you\u2019ve successfully completed these steps, you\u2019ll see an option to request your archive. \r\nClick the \u201cRequest Archive\u201d button to begin processing.\r\n\r\n### Step 4\r\n\r\nThe button will change to \u201cRequesting Archive\u201d and you\u2019ll see a notice that your request is pending. \r\nNow it\u2019s time to wait. It can take 24hrs for the export to be ready.\r\n\r\n### Step 5\r\n\r\nWhen your archive is ready to download, you\u2019ll get both an email in your inbox and a notification in your X account. \r\nSince Twitter archives are only available for a limited time, pay attention to the expiration date.\r\n\r\n## Ingesting Tweets\r\n\r\nTo run the module specify the location of the twitter export using the -a or --archive flag.\r\nThe parent Preservica collection for the tweets must be specified using the -c --collection flag as a UUID\r\n\r\npreserva-tweet uses the pyPreservica python library for ingesting content. This means that preserva-tweet can use the\r\nsame authentication methods as pyPreservica for reading Preservica credentials. See: \r\nhttps://pypreservica.readthedocs.io/en/latest/intro.html#authentication\r\n\r\n\r\n $ python -m preserva-tweet -a twitter-2024-10-17.zip -c a7ad52e3-2cb3-4cb5-af2a-3ab08829a2a8\r\n\r\n```\r\nusage: preserva-tweet [-h] -a ARCHIVE -c COLLECTION [-v] [-d] [-u USERNAME] [-p PASSWORD] [-s SERVER] [-t SECURITY_TAG] [--validate]\r\n\r\nIngest a Twitter Account History Export into Preservica\r\n\r\noptions:\r\n -h, --help show this help message and exit\r\n -a ARCHIVE, --archive ARCHIVE\r\n Twitter export ZIP archive path\r\n -c COLLECTION, --collection COLLECTION\r\n The Preservica parent collection uuid\r\n -v, --verbose Print information as tweets are ingested\r\n -d, --dry-run process the twitter export without ingesting\r\n -u USERNAME, --username USERNAME\r\n Your Preservica username if not using credentials.properties\r\n -p PASSWORD, --password PASSWORD\r\n Your Preservica password if not using credentials.properties\r\n -s SERVER, --server SERVER\r\n Your Preservica server domain name if not using credentials.properties\r\n -t SECURITY_TAG, --security-tag SECURITY_TAG\r\n The Preservica security tag of the ingested tweets (default is \"open\")\r\n --validate Validate the twitter ingest to check for missing tweets\r\n\r\n\r\n```\r\n\r\n\r\n\r\n## Notes\r\n\r\nThe preserva-tweet program does need an internet connection to run. Most of the images and video's are fetched from \r\nthe ZIP archive, but some assets such as thumbnails for the videos are fetched directly from the twitter servers.\r\n\r\nFor large Twitter accounts the export will come as multiple ZIP files. Just run the program once for each ZIP file.\r\npreserva-tweet will not ingest the same tweet twice if the script is re-run against the same ZIP file. \r\nThis also means you can always do an new export \r\nin the future and re-run the program to add in any new tweets.\r\n\r\nFor Preservica NewGen interface customers preserva-tweet will create a custom metadata group to store tweet metadata. \r\n\r\n## Validate Mode\r\n\r\npreserva-tweet has a validation mode which is enabled using the --validate flag.\r\nThis will check that each tweet within the ZIP archive has been ingested into Preservica. This mode can be run after\r\nthe main ingest and will provide details of any tweets which were not ingested successfully.\r\nThe results of the ingested and not ingested tweets are written to csv files.\r\n\r\nThe validate mode can be run using:\r\n\r\n $ python -m preserva-tweet -a twitter-2024-10-17.zip --validate\r\n\r\n\r\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Python module for ingesting Twitter exports into Preservica",
"version": "0.5.0",
"project_urls": {
"Discussion Forum": "https://github.com/carj/preserva-tweet",
"Documentation": "https://github.com/carj/preserva-tweet",
"Homepage": "https://github.com/carj/preserva-tweet",
"Source": "https://github.com/carj/preserva-tweet"
},
"split_keywords": [
"preservica",
"api",
"preservation",
"twitter"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "48f0f81344abf51551ea3166e193525d6aa6698d85f47cfcb92e288e17fb026b",
"md5": "5979a29856cbcac897033bfe8eb0b802",
"sha256": "9573603f354a3661fa492211e45a277ba9dbd7c3044afe160e2bec1da07e5a90"
},
"downloads": -1,
"filename": "preserva_tweet-0.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5979a29856cbcac897033bfe8eb0b802",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 14684,
"upload_time": "2024-10-25T08:51:14",
"upload_time_iso_8601": "2024-10-25T08:51:14.268722Z",
"url": "https://files.pythonhosted.org/packages/48/f0/f81344abf51551ea3166e193525d6aa6698d85f47cfcb92e288e17fb026b/preserva_tweet-0.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b78292291de6e4c0c55c925566ed4bf38c1536cee0e6ec3fe2206cba9dec02a9",
"md5": "c27a7d126cbc4076bb02b2f2675b4e48",
"sha256": "1771e5a3b17a4412e6ed17398f7d6cce34b7cd7e62b2c69663fd9c0b98a88578"
},
"downloads": -1,
"filename": "preserva_tweet-0.5.0.tar.gz",
"has_sig": false,
"md5_digest": "c27a7d126cbc4076bb02b2f2675b4e48",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 16270,
"upload_time": "2024-10-25T08:51:15",
"upload_time_iso_8601": "2024-10-25T08:51:15.584643Z",
"url": "https://files.pythonhosted.org/packages/b7/82/92291de6e4c0c55c925566ed4bf38c1536cee0e6ec3fe2206cba9dec02a9/preserva_tweet-0.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-25 08:51:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "carj",
"github_project": "preserva-tweet",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "preserva-tweet"
}