Data Storage – archiving and protecting your important digital works

Data Storage – archiving and protecting your important digital works


In todays world of digital everything, the growth of data continues to outweigh the ability to back it up or archive it with ease.

How do we effectively keep our data moving forward for generations?

If there is one goal of this document (should you decide not to read further), it is to make you think about whether you have your important digital data backed up. Is it a recent backup (last few weeks)? and if not, then back it up now; don’t procrastinate.

We all have data that we can do “without”, however ironically it is our most important data that is often over looked (our family photos, legal documents, etc). Unfortunately the adage that you don’t know what you have lost until its too late is all too applicable and it pertains to both data backups as well as data archiving; they are different.

As a technologist with 20+ years of industry experience, I am starting to ponder the question of how we attain long term data availability and archiving in the context of “how as a regular user I will still be able to retrieve my imagery/writings 20 years from now”.  I am looking at data backup for the near term (immediate out to 18 months) and data archiving retrieval (2 to 15 years +).

Backing up data is nothing new and pay forward data migration is nothing new either.

When I started my career in the early 90’s I worked alongside a department that was dedicated to migrating old seismic data from ancient magnetic tape (1970s-80s) to newer magnetic tape (DLT at the time).  The staff had the task of recovering what was already 20 year old data and migrating it onto a “modern” format; modern being the tech of the early 90’s.  There were many challenges both technically and process wise; and while technology has advanced greatly, many of these challenges remain today.

Fast forward 20 years to the 2010’s and people are migrating their legacy data to new formats. For example: audio Compact Disc conversion to playable formats such as AAC, MP3, and FLAC for modern music players, tablets, phones, etc.  This is effectively pay forward migration… and to a certain degree a form of backup and archiving (the CD’s become the archives, the audio files the new content with multiple devices being the active backup).

The question is, are you doing this with your other data? your PDF’s? your photos? all the digital information that you create?

The problems and hurdles to solve:

  • Not enough capacity on traditional backup devices to meet the rapid disk growth for photos and video
  • Not enough network capacity to backup large volumes of non-static data to the cloud on a regular basis in an economical way or practical time frame.
  • A general misunderstanding or complete lack of awareness by regular end users on how, when or why they should backup and archive their data before it is lost.
  • A general misunderstanding of the difference between backups and archiving.

Available options:

  • The “cloud” (on-line backup)
  • Traditional optical media (DVD, BRD, MDISC, yes even compact disk)
  • Tape (ancient but still viable to a degree)
  • Additional dedicated hard disks, lots of external hard disks
  • RAID Arrays + File parity (or variations of this)

Paranoia, value to you vs. value of investment

I am personally very paranoid about my photographic imagery that I produce, especially my “digital negatives”; my other digital data… not so much.  As a result I goto great lengths to protect my digital imagery and invest far less in my less important data.

A good data strategy demands that you evaluate your types of data and then provide a valuation on that data.  If you honestly don’t give a crap about your digital data then there is very little reason for you to invest in technologies, or more importantly the time required to manage your data on those technologies and solutions.

Obviously on the other hand, if you are paranoid about your data surviving not just this week or month, but years into the future, then continue reading and I will outline some areas where you may want to invest and focus.

General Strategy

There are some absolutes and fuzzy areas with data storage; albeit your final outcome is the same: Protect the data, make it retrievable and most importantly, usable at some point in the future.

In order to achieve this you need to accomplish the following:

  • protect against erasure: physically, accidentally or by unforeseen circumstances (theft, fire, disaster)
  • have a minimum of 1 additional copy of the information on a different storage system (although I argue at least 3 copies is required)
  • understand and classify your data: its value to you and the retention time frame you wish to keep it
  • have a solid understanding of the risk of your technologies you choose to implement
  • maintain and evolve your strategy continuously

Moving forward…

Protect against erasure: physically, accidentally or by unforeseen circumstances

Erasure happens in various ways.

Hard drives fail, file systems go corrupt, computers get stolen and while no one honestly wants to admit it, sometimes we as end users have an “oops moment” and do something to delete the data that we working on (perhaps being tired, or your “cat jumped on the keyboard and hit the delete key by accident”).

Regardless of the trigger, data gets deleted which doesn’t help our long term archiving goals.  Protecting against erasure is a combination of recent secondary copies (backups) and best practices when working with your data to avoid accidents.

Additional copies

As outlined in the point above, one method to avoid total loss due to erasure is to have a additional copies of your data on physically separate devices/media.  The downside? you have to keep these copies in temporal sync as the longer you wait to update all copies, the more likely your newest data will be at risk and your additional copies stale.

Why distinct media? having multiple copies on the same drive may protect you from the empty trash scenario, however it does nothing to protect you against hardware failure.  Multiple copies, multiple media.

Understand and classify your data

Doing all of this additional work can be resource consuming (time, money, bandwidth, electricity, etc)… so understand your data and only invest in the data that is of value to you.

Traditionally, backup strategies made “full backups” of everything at regular intervals, however in a modern world why bother? Backup only the long term data you care about at intervals that ensure you have a backup copy recent enough to allow you to use the data should something go wrong.

For example: if you store a digital photo and never edit it, why back it up repeatedly on the same media? (not to be confused with multiple physical copies for archiving).

Verify your data integrity

An often overlooked option even in the Enterprise, is the verification of your backups.  Blind backup and archiving is easy to do, verifying that your data was not corrupted in transport to the archive/backup medium takes a bit more effort, however it ensures you have a good copy of your data.  This is where many people fail.

It only takes a quick sampling of your data (open a few files) to ensure that the integrity of your files are not compromised; and pay close attention to those error logs when burning discs, or copying the data to another disk/cloud storage.  I don’t know how many time I have seen people both personally and professionally have a smile turn to a frantic frown when they realize their backup is no good.

Solid understanding of the risk for a given archive media

Many people I talk to have a considerable mis-understanding of the risks of the various storage technologies.  Many believe that optical media is full-proof or that hard disks never fail.  I can’t count the number of times I have been standing in a store listening to “experts” incorrectly explaining a technology to a poor consumer while blowing smoke right up the consumers arse. *sigh*, seriously this is a personal pet-peeve for me.

I have used nearly every type of storage media personally over the years, from magnetic tape, hard disk, magneto-optical to optical mediums and they all have one thing in common.

THEY ALL FAIL at some point.

The following table outlines some of the various options you have for storage and their risk.

[table “6” not found /]

A note about cloud storage.

As a technologist I have been dealing with the “cloud” for over 10 years in one form or another and there are no end of cloud backup solutions out there waiting to take your dollars and bandwidth.  The cloud is nothing new, from some of the first on-line storage systems in 1999/2000 to the too numerous to count systems in the 2010’s.

The largest challenge and concern I have with cloud storage is that you are never truly in control of “their” storage media, their true redundancy and the fact that once you stop paying you generically loose your data.  If you are like me with greater than 12TB of archive data currently, the months it takes to back up this data to a solution such as Crashplan is not worth the potential loss due to a credit card not having the correct expiry date.

Don’t get me wrong, using cloud drives such as Dropbox, Google GDrive, Microsoft and Apple’s offerings have their place and I do like millions of others use these services for convenience.  Services in the cloud have the benefits of being nearly always available, a great place to store your itinerary and mobile photos while traveling for example (protects against loss when your mobile device is stolen).

All I am advocating for is that one should carefully evaluate the financial, bandwidth and temporal cost vs. the risk of loosing your data.  Remember, even Google doesn’t warranty your free Gmail, albeit I have never lost a single piece of data in the many years I have been using the service – its all about your comfort level at playing the odds.

Maintain and evolve your strategy continuously

Everything in technology becomes stale at some point.

If true long term storage (measured in years or decades) is important to you, then maintaining and evolving your strategy is an absolute requirement.  How I store and manage my digital photographs today (Q4 2013) is very different from how I managed them in 2010 let alone when I first got started with high end digital imaging in 2005.

While I still have all the data, and most of it has remained relatively unchanged in terms of JPG and CR2 raw file formats, the data has been stored on at least a dozen different hard disk setups over the years, as well as various optical media (CD first, then DVD, now Dual Layer 50GB Blu-ray).

Constant evaluation, review, maintenance and evolution is a must.

My strategy (as of 2013)

All of this post has led up to this.  If you skipped right to this via the Table of Contents I strongly urge you to go back and read the rest above !

Rob’s Data Storage and Archive Strategy =

Dual RAID arrays + Off-site Disk + Optical Media off-site + File Parity Calculations on primary disk array

A long way of saying I have it backed up nearly a dozen ways from Sunday as I am paranoid about my data as I have had all the formats loose data at some point in their lifespan.

[table “7” not found /]

The method you choose and the level of complexity you choose to invest in your archiving strategy is ultimately up to you.  If you are saying to yourself “crap I don’t have any backups” than at least I got you thinking about protecting your data, and if you already have a solid strategy than excellent.

Take the time to consider your data, value it, archive it and you will thank yourself in the future.  Family digital photos comes to mind as the obvious data to archive (you hear about this on the news all the time; someone stole the laptop that had the Wedding photos on it, etc)… however any file that has value to you is worth copying somewhere else.



Tool-set

Just a few tools free tools you can use to help you with data storage archiving and integrity.

File Synchronization Tools

rsync (Linux, OSX, Windows) – available on most UN*X operating systems, available for download on Windows

robocopy (Windows) – included with Windows 7 and 8

Parity Tools

par2cmdline (Linux)

QuickPar / Multipar (Windows)

Macpar Deluxe (OSX)

RAID Controllers or external NAS devices

There are many to choose from, I am personally a fan of Areca.

External NAS devices such as Drobo, QNAP and Synology are also popular and offer various RAID setups.  I have used all and generally prefer Synology over the others; in my specific case I have setup my own NAS with a spare Areca controller for specific feature sets not available in the commercial products.


Originally Posted: December 5, 2013
Updated: December 5, 2013

| Article posted in: ||