Nick Sellen

Data storage and backup

11 January 2013
I'm evolving my data storage policy (personal/work/everything), the first step is categorising my data according to usage patterns. The dimensions I need think about are storage size, access requirements, ability to replace to data if lost, and security sensitivity. The situation is more complex than simple checkboxes can represent and as a result the table below is a bit wordy.
Date Size Access requirements Replaceability Security
Contacts Small Kept on my phone and online (and ideally on laptop too) Should be kept Should be secure
Calendar appointments Small On phone and online I don't use these much so not so critical Should be secure
Notes Small Would like these on my phone, laptop, and online Important to keep Should be reasonably secure
Project documents Small/Medium Accesible from laptop is important, online access would be useful Important to keep at least for duration of project, less so afterwards Should be secure
Personal documents Small/Medium Accessible from laptop is important, online access would be useful Should be kept Should be secure
Emails Small/Medium Accessible online, having them on my laptop is useful Should be kept Should be secure
Source code Small/Medium Ideally all version controlled using git with a public or private remote Assuming I keep the remote up to date, totally dispensable Project dependent, but mostly not too critical
Application specific data Small/Medium/Large Very dependent on the application (e.g. I use Adobe Lightroom extensively which stores meta-data about my photo collections) Often important to keep Reasonably important
Computer system backup Large A nearby hard disk seems the only viable option to get suitable transfer speed (using Time Machine) Important to keep Reasonably important
Photos Large Need some on my laptop (having downloaded from memory stick), some on nearby drives, and maybe others more distantly archived Important to keep Not very important
Films Large A few with my laptop, the rest can be dumped elsewhere Easily replaceable, no tears will be lost Does not matter
Music Large Small selection on phone, larger selection on laptop, even larger selection easily available, massive vault not easily available Could be replaced technically, but it would be expensive and time consuming Does not matter

Rules

Data storage has a few purposes and so I want to be clear what I'm trying to achieve. For bulky data I want to keep I need:
  • a minimum of two physical hard disks to prevent a single disk failure loosing all data
  • a minimum of two physical locations to prevent theft, fire, or other disasters causing all data to be lost
  • easy regular access to keep copies of data up to date without much hassle (for many people a home storage server would suffice, but as I don't have a home this isn't so good)
For easily replaceable bulky data (i.e. films) one hard disk is enough. For small data storage in an online system I would generally assume that it is likely to stay there and be resilient to single disk failures.

Online bulk storage

Up to now I've only been using small amounts of online storage (source code, a few bits in dropbox, some notes with simplenote) and I haven't been making use of any bulk online storage, preferring a nearby hard disk (although I haven't been satisfying my >2 physical location criteria though). I've got to a point where the storage space on my laptop is severely constrained (mostly due to music and photos) so I need a better solution - I can't use a nearby hard disk to put it on as it would end up on only one disk. I do have a Netgear ReadyNAS with 1TB space, but nowhere to put it that I will be close to regularly enough. My conclusion is that I need to go cloud. My constraints however are:
  • no reliance on a company that might disappear (sorry startups)
  • no special storage format - some services use generic storage (such as Amazon S3) but do something funky with the data meaning you either need them to retrieve it, or at least understand how they store it
  • sync capability
  • a way to access the data files as files (rather than combined archives)
  • ideally a way to bulk import/export
I've been looking at various online services and decided:
Thing Description Conclusion
Mozy Amazon S3 backed online storage I'm afraid I just can't trust a company called "Mozy" is going to be around as long as I want my data to be and there is no indication I'd be able to get it from S3 directly. It also looks a bit too corporatey for my liking - resulting in distrust from me.
Arq OSX application that sends data to Amazon S3 It stores data in a funky format (to make data transfer more efficient, particularly for lots of small files) and I don't want that. Looks nice otherwise though.
Amazon CloudDrive Friendly interface for storing data in S3 including a nice OSX client Great feature it that you can store unlimited music files on the paid plans (at the time of writing), however the service is not orientated towards synchronising data - unless I left the files in the "upload" folder it would try and resend all the files again. The online music player is also not yet available outside the USA. I can imagine some of these things changing in the future though.
Dropbox Online data syncing storage I love dropbox and use it a lot - the free account gives you 2GB which enough for the kind of files I use it for (random documents). However the prices creep up rapidly for larger storage amounts making it not a viable option for bulkier data. The service is more aimed at a smaller collection of documents you use day-to-day.
Tarsnap "Online backups for the truly paranoid". Client tool to encrypt and send data to Amazon S3 Interesting idea - very secure. However I'd probably lose the key. It also seems to rely too much on one person and I don't trust I'd be able get it back without complication after he's dead (you even have to compile the software yourself). I also want to access the files themselves rather than archives of files. Might suit you though.
rsync + S3 mount Just normal rsync used in conjunction with a mounted Amazon S3 bucket I like rsync and S3 so why not together! Well, rsync works best when it has local access to the file. When sending files remotely you would normally have rsync at the remote end to co-ordinate things. By mounting S3, rsync thinks it is actually local and so performs unoptimally - i.e. it's sloooow.
Duplicity rsync-like tool sending to Amazon S3 Well, looks interesting but it puts things into tar archives which I don't want.

I don't have comments enabled here, but you are welcome to contact me directly. View the homepage for different ways to do that.