Data storage and backup

11 January 2013

I'm evolving my data storage policy (personal/work/everything), the first step is categorising my data according to usage patterns. The dimensions I need think about are storage size, access requirements, ability to replace to data if lost, and security sensitivity. The situation is more complex than simple checkboxes can represent and as a result the table below is a bit wordy.

Date	Size	Access requirements	Replaceability	Security
Contacts	Small	Kept on my phone and online (and ideally on laptop too)	Should be kept	Should be secure
Calendar appointments	Small	On phone and online	I don't use these much so not so critical	Should be secure
Notes	Small	Would like these on my phone, laptop, and online	Important to keep	Should be reasonably secure
Project documents	Small/Medium	Accesible from laptop is important, online access would be useful	Important to keep at least for duration of project, less so afterwards	Should be secure
Personal documents	Small/Medium	Accessible from laptop is important, online access would be useful	Should be kept	Should be secure
Emails	Small/Medium	Accessible online, having them on my laptop is useful	Should be kept	Should be secure
Source code	Small/Medium	Ideally all version controlled using git with a public or private remote	Assuming I keep the remote up to date, totally dispensable	Project dependent, but mostly not too critical
Application specific data	Small/Medium/Large	Very dependent on the application (e.g. I use Adobe Lightroom extensively which stores meta-data about my photo collections)	Often important to keep	Reasonably important
Computer system backup	Large	A nearby hard disk seems the only viable option to get suitable transfer speed (using Time Machine)	Important to keep	Reasonably important
Photos	Large	Need some on my laptop (having downloaded from memory stick), some on nearby drives, and maybe others more distantly archived	Important to keep	Not very important
Films	Large	A few with my laptop, the rest can be dumped elsewhere	Easily replaceable, no tears will be lost	Does not matter
Music	Large	Small selection on phone, larger selection on laptop, even larger selection easily available, massive vault not easily available	Could be replaced technically, but it would be expensive and time consuming	Does not matter

Rules

Data storage has a few purposes and so I want to be clear what I'm trying to achieve. For bulky data I want to keep I need:

a minimum of two physical hard disks to prevent a single disk failure loosing all data
a minimum of two physical locations to prevent theft, fire, or other disasters causing all data to be lost
easy regular access to keep copies of data up to date without much hassle (for many people a home storage server would suffice, but as I don't have a home this isn't so good)

For easily replaceable bulky data (i.e. films) one hard disk is enough. For small data storage in an online system I would generally assume that it is likely to stay there and be resilient to single disk failures.

Online bulk storage

Up to now I've only been using small amounts of online storage (source code, a few bits in dropbox, some notes with simplenote) and I haven't been making use of any bulk online storage, preferring a nearby hard disk (although I haven't been satisfying my >2 physical location criteria though). I've got to a point where the storage space on my laptop is severely constrained (mostly due to music and photos) so I need a better solution - I can't use a nearby hard disk to put it on as it would end up on only one disk. I do have a Netgear ReadyNAS with 1TB space, but nowhere to put it that I will be close to regularly enough. My conclusion is that I need to go cloud. My constraints however are:

no reliance on a company that might disappear (sorry startups)
no special storage format - some services use generic storage (such as Amazon S3) but do something funky with the data meaning you either need them to retrieve it, or at least understand how they store it
sync capability
a way to access the data files as files (rather than combined archives)
ideally a way to bulk import/export

I've been looking at various online services and decided:

Thing	Description	Conclusion
Mozy	Amazon S3 backed online storage	I'm afraid I just can't trust a company called "Mozy" is going to be around as long as I want my data to be and there is no indication I'd be able to get it from S3 directly. It also looks a bit too corporatey for my liking - resulting in distrust from me.
Arq	OSX application that sends data to Amazon S3	It stores data in a funky format (to make data transfer more efficient, particularly for lots of small files) and I don't want that. Looks nice otherwise though.
Amazon CloudDrive	Friendly interface for storing data in S3 including a nice OSX client	Great feature it that you can store unlimited music files on the paid plans (at the time of writing), however the service is not orientated towards synchronising data - unless I left the files in the "upload" folder it would try and resend all the files again. The online music player is also not yet available outside the USA. I can imagine some of these things changing in the future though.
Dropbox	Online data syncing storage	I love dropbox and use it a lot - the free account gives you 2GB which enough for the kind of files I use it for (random documents). However the prices creep up rapidly for larger storage amounts making it not a viable option for bulkier data. The service is more aimed at a smaller collection of documents you use day-to-day.
Tarsnap	"Online backups for the truly paranoid". Client tool to encrypt and send data to Amazon S3	Interesting idea - very secure. However I'd probably lose the key. It also seems to rely too much on one person and I don't trust I'd be able get it back without complication after he's dead (you even have to compile the software yourself). I also want to access the files themselves rather than archives of files. Might suit you though.
rsync + S3 mount	Just normal rsync used in conjunction with a mounted Amazon S3 bucket	I like rsync and S3 so why not together! Well, rsync works best when it has local access to the file. When sending files remotely you would normally have rsync at the remote end to co-ordinate things. By mounting S3, rsync thinks it is actually local and so performs unoptimally - i.e. it's sloooow.
Duplicity	rsync-like tool sending to Amazon S3	Well, looks interesting but it puts things into tar archives which I don't want.

I don't have comments enabled here, but you are welcome to contact me directly. View the homepage for different ways to do that.