Personal backups: The geek way
Living in the digital era is awesome. Everyone has smartphones, iPads, GoPro’s and the same old computer. We all have email, all sort of documents and specially photos and videos (can you keep up with an Instagram feed??).
Actually based on a random live counter, the whole internet is 10,961,628 Petabytes big as of the second I started this paragraph. That means the average person carries 1.5 Terabytes in his pocket.
Great! Now everyone can just buy whatever 2 TB drive is available in the store and we can all save our data in our pocket drive. Problem is: sometimes stuff gets broken. Who doesn’t have that friend that lost his entire life because his pocket drive went nuts for good?
Moving faster: almost everyone I know uses services like Dropbox, iCloud Storage, Google Drive or One Drive. And that’s great! Especially because the latter two actually offer a decent amount of storage for free.
While this is a huge step to avoid loosing your data, there are a few caveats:
- If you have a lot of data (lets say 100GB and up) it starts getting expensive
- Those providers have raw access to your data
- People set small passwords for convenience and get hacked
Because of this I decided to start uploading all my data encrypted to a cloud storage provider. While I’m not ditching my pocket 2TB drive nor Google Drive, now I have an encrypted replica of both personal and sensitive data.
A few providers I know: Amazon, Google, Microsoft, OVH and Backblaze. I chose Backblaze to start with, mainly due to their good prices. You can even have them ship a drive to your house.
The technical bits
When I was looking for a tool to upload my data to the cloud, I was looking for 3 main requirements:
- Support for multiple cloud storage providers
- Incremental backups
- Encrypted backups via PGP (nice to have)
That’s how I found Duplicity, and I even got a bonus feature: it’s incredibly easy to use! You can get it on your Mac using Homebrew:
$ brew install duplicity
or Linux using your package manager:
$ apt-get install duplicity # Debian based
$ yum install duplicity # CentOS based
Sorry Windows folks, this is the geek way!
For reference, the version I have installed now is 0.7.12.
I decided to create two buckets, one for documents and another for photos and video. That’s because documents are always changing and media is not, so I can set different Lifecycle Rules. You’ll probably want to keep all versions of documents, but that is not necessarily true with media (unless your job is to produce and edit media).
Generating PGP key with Keybase
To manage PGP keys I use GPGTools on Mac, but GnuPG will work on both Mac and Linux. If you’re not familiar with PGP follow this tutorial from RedHat to create your key.
There is also a recent tool which is still invite only but it’s definitely bringing a shiny face to managing keys and encryption: Keybase.
After installing keybase app, you can generate your new key with these simple commands:
$ keybase pgp gen
Enter your real name, which will be publicly visible in your new key: My Name
Enter a public email address for your key: my@email.com
Enter another email address (or <enter> when done):
Push an encrypted copy of your new secret key to the Keybase.io server? [Y/n] n
▶ INFO PGP User ID: My Name <my@email.com> [primary]
▶ INFO Generating primary key (4096 bits)
▶ INFO Generating encryption subkey (4096 bits)
▶ INFO Generated new PGP key:
▶ INFO user: My Name <my@email.com>
▶ INFO 4096-bit RSA key, ID 6A3D610F79008975, created 2017-07-03
▶ INFO Exported new key to the local GPG keychain
Backup
Now to the good part, lets upload a folder to Backblaze:
$ duplicity --encrypt-sign-key=<your-key-id> --encrypt-key=<your-key-id> <folder-to-backup> b2://<your-account-id>@<bucket-name>/<destination-folder> --log-file=duplicity_$(date +"%Y%m%d%H%M%S").log
In the example above, I encrypt and sign the data with my key and send my folder-to-backup
to my bucket-name
inside the destination-folder
. This also produces a log file named something like duplicity_20170622013652.log
.
You’ll be prompted for your Backblaze Application Key, and if your key requires a passphrase, you’ll be prompted for both encryption and signing keys.
With a bit of magic from b2 cli tool and jq, lets list the files in our bucket:
$ b2 list-file-names <bucket-name> | jq -r '.files[] | [.fileName ,.size] | @tsv'
destination-folder/duplicity-full-signatures.20170622T015944Z.sigtar.gpg 3156104
destination-folder/duplicity-full.20170622T015944Z.manifest.gpg 1570
destination-folder duplicity-full.20170622T015944Z.vol1.difftar.gpg 209787961
destination-folder/duplicity-full.20170622T015944Z.vol2.difftar.gpg 209770656
destination-folder/duplicity-full.20170622T015944Z.vol3.difftar.gpg 117115477
Here I want you to notice the manifest file, and that the sizes of the vol*
files are something like 200MB each, except the last one.
Try to download and open the manifest file (you’ll have to decrypt it):
Hostname your-computer-hostname
Localdir 20170622
Volume 1:
StartingPath .
EndingPath file1.pdf 11
Hash SHA1 6abebce21621499d4cb63ab05fd87ee845eb2a97
Volume 2:
StartingPath file1.pdf 12
EndingPath file4.docx 373
Hash SHA1 86c01b4cb69e3ab04750b4066165222790362e38
Volume 3:
StartingPath file4.docx 374
EndingPath file7.xlsx
Hash SHA1 10c534901e759a1de3ab021dc09d4cb692ea033e
Filelist 7
new file1.pdf
new file2.pdf
new file3.pdf
new file4.docx
new file5.docx
new file6.docx
new file7.xlsx
As you can see, this is a list mapping all your files to each volume. This means that if you need to download a file smaller than 200MB, you’ll download a maximum of 400MB, instead of the whole thing!
Of course you won’t need to download the manifest file every time you want to do a ls
of your files, duplicity has that feature:
$ duplicity list-current-files b2://<your-account-id>@<bucket-name>/<destination-folder>
Password for '<your-account-id>@B2':
Local and Remote metadata are synchronised, no sync needed.
Last full backup date: Thu Jun 22 02:59:44 2017
Thu Jun 22 02:02:44 2017 .
Thu Jun 22 01:57:40 2017 file1.pdf
Thu Jun 22 01:57:41 2017 file2.pdf
Thu Jun 22 01:57:43 2017 file3.pdf
Thu Jun 22 01:57:47 2017 file4.docx
Thu Jun 22 01:57:49 2017 file5.docx
Thu Jun 22 01:57:50 2017 file6.docx
Thu Jun 22 01:57:51 2017 file7.xlsx
Restore
We have our files backed up in Backblaze, now lets try to restore them:
$ duplicity restore b2://<your-account-id>@<bucket-name>/<destination-folder> <restore_folder>
Or if you want to restore a single file:
$ duplicity restore --file-to-restore file3.pdf b2://<your-account-id>@<bucket-name>/<destination-folder> file3.pdf
After this you’ll have your file3.pdf
in the folder you were at, just like that!
Some ideas
Feel free to check out Duplicity Man page for more information. It has a lot of features and once you get your backups up and running, you can tweak the example commands I gave as much as you want.
If you also want to backup data you have on a server, one idea is to create a PGP key that belongs to the server, and use it to sign the backup, while using your personal PGP public key to encrypt it. Use Cron for periodic backups!
Note: Backblaze doesn’t really support users and ACL’s, so I’d create a different account for automated backups.
Now get backing up those petabytes!