I'm evolving my data storage policy (personal/work/everything), the first step is categorising my data according to usage patterns. The dimensions I need think about are storage size, access requirements, ability to replace to data if lost, and security sensitivity.
The situation is more complex than simple checkboxes can represent and as a result the table below is a bit wordy.
Date |
Size |
Access requirements |
Replaceability |
Security |
Contacts |
Small |
Kept on my phone and online (and ideally on laptop too) |
Should be kept |
Should be secure |
Calendar appointments |
Small |
On phone and online |
I don't use these much so not so critical |
Should be secure |
Notes |
Small |
Would like these on my phone, laptop, and online |
Important to keep |
Should be reasonably secure |
Project documents |
Small/Medium |
Accesible from laptop is important, online access would be useful |
Important to keep at least for duration of project, less so afterwards |
Should be secure |
Personal documents |
Small/Medium |
Accessible from laptop is important, online access would be useful |
Should be kept |
Should be secure |
Emails |
Small/Medium |
Accessible online, having them on my laptop is useful |
Should be kept |
Should be secure |
Source code |
Small/Medium |
Ideally all version controlled using git with a public or private remote |
Assuming I keep the remote up to date, totally dispensable |
Project dependent, but mostly not too critical |
Application specific data |
Small/Medium/Large |
Very dependent on the application (e.g. I use Adobe Lightroom extensively which stores meta-data about my photo collections) |
Often important to keep |
Reasonably important |
Computer system backup |
Large |
A nearby hard disk seems the only viable option to get suitable transfer speed (using Time Machine) |
Important to keep |
Reasonably important |
Photos |
Large |
Need some on my laptop (having downloaded from memory stick), some on nearby drives, and maybe others more distantly archived |
Important to keep |
Not very important |
Films |
Large |
A few with my laptop, the rest can be dumped elsewhere |
Easily replaceable, no tears will be lost |
Does not matter |
Music |
Large |
Small selection on phone, larger selection on laptop, even larger selection easily available, massive vault not easily available |
Could be replaced technically, but it would be expensive and time consuming |
Does not matter |
Rules
Data storage has a few purposes and so I want to be clear what I'm trying to achieve. For
bulky data I want to keep I need:
- a minimum of two physical hard disks to prevent a single disk failure loosing all data
- a minimum of two physical locations to prevent theft, fire, or other disasters causing all data to be lost
- easy regular access to keep copies of data up to date without much hassle (for many people a home storage server would suffice, but as I don't have a home this isn't so good)
For easily replaceable
bulky data (i.e. films) one hard disk is enough. For small data storage in an online system I would generally
assume that it is likely to stay there and be resilient to single disk failures.
Online bulk storage
Up to now I've only been using small amounts of online storage (source code, a few bits in dropbox, some notes with simplenote) and I haven't been making use of any
bulk online storage, preferring a nearby hard disk (although I haven't been satisfying my >2 physical location criteria though).
I've got to a point where the storage space on my laptop is severely constrained (mostly due to music and photos) so I need a better solution - I can't use a nearby hard disk to put it on as it would end up on only one disk. I do have a Netgear ReadyNAS with 1TB space, but nowhere to put it that I will be close to regularly enough.
My conclusion is that I need to
go cloud. My constraints however are:
- no reliance on a company that might disappear (sorry startups)
- no special storage format - some services use generic storage (such as Amazon S3) but do something funky with the data meaning you either need them to retrieve it, or at least understand how they store it
- sync capability
- a way to access the data files as files (rather than combined archives)
- ideally a way to bulk import/export
I've been looking at various online services and decided:
Thing |
Description |
Conclusion |
Mozy |
Amazon S3 backed online storage |
I'm afraid I just can't trust a company called "Mozy" is going to be around as long as I want my data to be and there is no indication I'd be able to get it from S3 directly. It also looks a bit too corporatey for my liking - resulting in distrust from me. |
Arq |
OSX application that sends data to Amazon S3 |
It stores data in a funky format (to make data transfer more efficient, particularly for lots of small files) and I don't want that. Looks nice otherwise though. |
Amazon CloudDrive |
Friendly interface for storing data in S3 including a nice OSX client |
Great feature it that you can store unlimited music files on the paid plans (at the time of writing), however the service is not orientated towards synchronising data - unless I left the files in the "upload" folder it would try and resend all the files again. The online music player is also not yet available outside the USA. I can imagine some of these things changing in the future though. |
Dropbox |
Online data syncing storage |
I love dropbox and use it a lot - the free account gives you 2GB which enough for the kind of files I use it for (random documents). However the prices creep up rapidly for larger storage amounts making it not a viable option for bulkier data. The service is more aimed at a smaller collection of documents you use day-to-day. |
Tarsnap |
"Online backups for the truly paranoid". Client tool to encrypt and send data to Amazon S3 |
Interesting idea - very secure. However I'd probably lose the key. It also seems to rely too much on one person and I don't trust I'd be able get it back without complication after he's dead (you even have to compile the software yourself). I also want to access the files themselves rather than archives of files. Might suit you though. |
rsync + S3 mount |
Just normal rsync used in conjunction with a mounted Amazon S3 bucket |
I like rsync and S3 so why not together! Well, rsync works best when it has local access to the file. When sending files remotely you would normally have rsync at the remote end to co-ordinate things. By mounting S3, rsync thinks it is actually local and so performs unoptimally - i.e. it's sloooow. |
Duplicity |
rsync-like tool sending to Amazon S3 |
Well, looks interesting but it puts things into tar archives which I don't want. |