A look at our data storage methods
Hey there, Daniel here. Among other things, I help out with some of the sysadmin type stuff around here.
This article is a bit different to what we usually post - its a behind-the-scenes look at how we handle our data storage needs, ensuring our collection is backed up and nothing is ever lost or corrupted.
When the TalonBrave.info file archive first came around, the collection wasn’t that huge, the local copies of everything lived on some external USB hard drives, got manually FTP’d into place on the web server and got backed up, uh, sometimes.
As the collection grew the stack of external drives got replaced with a little home NAS unit, and then we finally moved to a more “serious” solution - a server running TrueNAS (formerly known as FreeNAS). The best part of this is that TrueNAS uses ZFS, which has filesystem-level mirroring and integrity checking, so it should be close to impossible for anything in the collection to silently be corrupted by a hardware failure and it should be able to recover from a single disk failure without any data loss (and it has, once, so far).
In addition, around the time we migrated to the new NAS, we also put a proper backup plan into place. Everything on the server is regularly dumped to… uh, tape. Yeah, not my favourite storage medium either. It’s historically slow and ridiculously expensive for what you get… but it still (just about) makes sense when you’re storing multiple copies of multiple terabytes of data, potentially kept on the shelf for several years between uses, assuming you buy a second-hand drive.
I had lots of crazy ideas about how to do our backups at the time, from writing a program that would span all the individual files from the archive across a collection of Blu-Ray M-Discs and maintain a database of what versions of what files were on what disc, to a collection of hard drives, to various solutions around S3, but in the end settled on tapes for simplicity and cost. Yes, even tapes wound up being cost effective (by a huge margin) compared to M-Disc and Amazon S3. Hard drives would probably come in similar (to the 4th generation LTO tapes we’re currently using), but they’re inherently a bit more fragile than tapes should be.
The backup format is about as simple as it can get - tar. POSIX (“pax”) tar archives specifically. In the worst-case scenario I don’t want to be messing around trying to recover from a blank slate and finding that some fancy (or proprietary) backup software has changed under my feet or added any unexpected difficulties to restoring the backups.
Every 6 months or so I do a full dump of the system and in the interim I take differential backups. Our backup script also produces full text listings of the files/directories written to tape and the checksums of each file, so if there is ever any doubt about the integrity/correctness of something we can go back and see when/if it was ever changed, or find something that was unintentionally deleted and know which tape to recover it from without me having to index dozens of tapes at the time.
The physical backup tapes are stored offsite, individually labeled with what set they are from and also catalogued in a LibreOffice Base database (chosen to be accessible even if we have to start from a blank slate).
With all of that in mind, I want to remind everyone that the best way to preserve something is to share it - don’t assume nothing will ever happen to us, TalonBrave.info or any particular thing on the site, if you think it needs preserving, then make a copy!
I hope this brief glimpse into the exciting world of copying data around was interesting, if there’s anything else you’re curious about, then do go ahead and reach out to us.