A while back i talked about a company called Diomede who had a new way of storing data… i wont go into too much detail here (check the old post) but basically they offered 3 ways of storing your data: online, near line and offline… online accessible instantly, near line available within 5 min of requesting it and offline within 4 hours…

this got me thinking, and again i mention a lot of stuff in the post, but now after playing with ZFS for a bit, i have been thinking a lot more about it… and i am going to explain this here explaining it using a photography workflow.

  • Photos are taken and stored on CF Cards.
  • they are ingested into the main workstation on to a RAID 1 array for tagging and processing. this is online storage.
  • next they are transferred to my ZFS Server (which at the moment i don’t have, but am thinking about…). this is nearline storage, but with GigE and fast disks, this is nearly as fast as the RAID 1 array… the Online and Near Line storage arrays should be pretty much in sync with each other… any changes must be replicated.
  • nightly there is a ZFS snapshot taken (incremental) and uploaded to Amazon S3 or CloudFiles… if very little has changed, this will be a quick upload. if there has been a lot of photos uploaded, then, well its going to take a while. S3 and CloudFiles are the Offline storage.
  • finally, monthly a full ZFS snapshot should be uploaded. 

The biggest problem with this whole thing is bandwidth and i have said this multiple times before… in reality, you could use some sort of compression, but in a quick test with 32 files, 16 CR2 files from my Canon EOS 5D MK II, and 16 XMP side cars weighing in at a total of 384Mb, saved me 12mb using Winzip 12… its about 3% smaller… on a 113 hour upload, it would save you about 3 hours, but its not a great saving…

Anyway, i have the hardware pretty much here… a couple of 1Tb hdds, and some smaller disks also… Mobo, Ram, Proc, etc. once i build this (first have to get my Home Server data off the drives…) i will start posting info.