Data Storage — archiving and protecting your important digital works

In todays world of dig­i­tal every­thing, the growth of data con­tin­ues to out­weigh the abil­i­ty to back it up or archive it with ease.

How do we effec­tive­ly keep our data mov­ing for­ward for generations?

If there is one goal of this doc­u­ment (should you decide not to read fur­ther), it is to make you think about whether you have your impor­tant dig­i­tal data backed up. Is it a recent back­up (last few weeks)? and if not, then back it up now; don’t procrastinate.

We all have data that we can do “with­out”, how­ev­er iron­i­cal­ly it is our most impor­tant data that is often over looked (our fam­i­ly pho­tos, legal doc­u­ments, etc). Unfor­tu­nate­ly the adage that you don’t know what you have lost until its too late is all too applic­a­ble and it per­tains to both data back­ups as well as data archiv­ing; they are different.

As a tech­nol­o­gist with 20+ years of indus­try expe­ri­ence, I am start­ing to pon­der the ques­tion of how we attain long term data avail­abil­i­ty and archiv­ing in the con­text of “how as a reg­u­lar user I will still be able to retrieve my imagery/writings 20 years from now”.  I am look­ing at data back­up for the near term (imme­di­ate out to 18 months) and data archiv­ing retrieval (2 to 15 years +).

Back­ing up data is noth­ing new and pay for­ward data migra­tion is noth­ing new either.

When I start­ed my career in the ear­ly 90’s I worked along­side a depart­ment that was ded­i­cat­ed to migrat­ing old seis­mic data from ancient mag­net­ic tape (1970s-80s) to new­er mag­net­ic tape (DLT at the time).  The staff had the task of recov­er­ing what was already 20 year old data and migrat­ing it onto a “mod­ern” for­mat; mod­ern being the tech of the ear­ly 90’s.  There were many chal­lenges both tech­ni­cal­ly and process wise; and while tech­nol­o­gy has advanced great­ly, many of these chal­lenges remain today.

Fast for­ward 20 years to the 2010’s and peo­ple are migrat­ing their lega­cy data to new for­mats. For exam­ple: audio Com­pact Disc con­ver­sion to playable for­mats such as AAC, MP3, and FLAC for mod­ern music play­ers, tablets, phones, etc.  This is effec­tive­ly pay for­ward migra­tion… and to a cer­tain degree a form of back­up and archiv­ing (the CD’s become the archives, the audio files the new con­tent with mul­ti­ple devices being the active backup).

The ques­tion is, are you doing this with your oth­er data? your PDF’s? your pho­tos? all the dig­i­tal infor­ma­tion that you create?

The problems and hurdles to solve:

  • Not enough capac­i­ty on tra­di­tion­al back­up devices to meet the rapid disk growth for pho­tos and video
  • Not enough net­work capac­i­ty to back­up large vol­umes of non-sta­t­ic data to the cloud on a reg­u­lar basis in an eco­nom­i­cal way or prac­ti­cal time frame.
  • A gen­er­al mis­un­der­stand­ing or com­plete lack of aware­ness by reg­u­lar end users on how, when or why they should back­up and archive their data before it is lost.
  • A gen­er­al mis­un­der­stand­ing of the dif­fer­ence between back­ups and archiving.

Available options:

  • The “cloud” (on-line backup)
  • Tra­di­tion­al opti­cal media (DVD, BRD, MDISC, yes even com­pact disk)
  • Tape (ancient but still viable to a degree)
  • Addi­tion­al ded­i­cat­ed hard disks, lots of exter­nal hard disks
  • RAID Arrays + File par­i­ty (or vari­a­tions of this)

Paranoia, value to you vs. value of investment

I am per­son­al­ly very para­noid about my pho­to­graph­ic imagery that I pro­duce, espe­cial­ly my “dig­i­tal neg­a­tives”; my oth­er dig­i­tal data… not so much.  As a result I goto great lengths to pro­tect my dig­i­tal imagery and invest far less in my less impor­tant data.

A good data strat­e­gy demands that you eval­u­ate your types of data and then pro­vide a val­u­a­tion on that data.  If you hon­est­ly don’t give a crap about your dig­i­tal data then there is very lit­tle rea­son for you to invest in tech­nolo­gies, or more impor­tant­ly the time required to man­age your data on those tech­nolo­gies and solutions.

Obvi­ous­ly on the oth­er hand, if you are para­noid about your data sur­viv­ing not just this week or month, but years into the future, then con­tin­ue read­ing and I will out­line some areas where you may want to invest and focus.

General Strategy

There are some absolutes and fuzzy areas with data stor­age; albeit your final out­come is the same: Pro­tect the data, make it retriev­able and most impor­tant­ly, usable at some point in the future.

In order to achieve this you need to accom­plish the following:

  • pro­tect against era­sure: phys­i­cal­ly, acci­den­tal­ly or by unfore­seen cir­cum­stances (theft, fire, disaster)
  • have a min­i­mum of 1 addi­tion­al copy of the infor­ma­tion on a dif­fer­ent stor­age sys­tem (although I argue at least 3 copies is required)
  • under­stand and clas­si­fy your data: its val­ue to you and the reten­tion time frame you wish to keep it
  • have a sol­id under­stand­ing of the risk of your tech­nolo­gies you choose to implement
  • main­tain and evolve your strat­e­gy continuously

Mov­ing forward…

Protect against erasure: physically, accidentally or by unforeseen circumstances

Era­sure hap­pens in var­i­ous ways.

Hard dri­ves fail, file sys­tems go cor­rupt, com­put­ers get stolen and while no one hon­est­ly wants to admit it, some­times we as end users have an “oops moment” and do some­thing to delete the data that we work­ing on (per­haps being tired, or your “cat jumped on the key­board and hit the delete key by accident”).

Regard­less of the trig­ger, data gets delet­ed which does­n’t help our long term archiv­ing goals.  Pro­tect­ing against era­sure is a com­bi­na­tion of recent sec­ondary copies (back­ups) and best prac­tices when work­ing with your data to avoid accidents.

Additional copies

As out­lined in the point above, one method to avoid total loss due to era­sure is to have a addi­tion­al copies of your data on phys­i­cal­ly sep­a­rate devices/media.  The down­side? you have to keep these copies in tem­po­ral sync as the longer you wait to update all copies, the more like­ly your newest data will be at risk and your addi­tion­al copies stale.

Why dis­tinct media? hav­ing mul­ti­ple copies on the same dri­ve may pro­tect you from the emp­ty trash sce­nario, how­ev­er it does noth­ing to pro­tect you against hard­ware fail­ure.  Mul­ti­ple copies, mul­ti­ple media.

Understand and classify your data

Doing all of this addi­tion­al work can be resource con­sum­ing (time, mon­ey, band­width, elec­tric­i­ty, etc)… so under­stand your data and only invest in the data that is of val­ue to you.

Tra­di­tion­al­ly, back­up strate­gies made “full back­ups” of every­thing at reg­u­lar inter­vals, how­ev­er in a mod­ern world why both­er? Back­up only the long term data you care about at inter­vals that ensure you have a back­up copy recent enough to allow you to use the data should some­thing go wrong.

For exam­ple: if you store a dig­i­tal pho­to and nev­er edit it, why back it up repeat­ed­ly on the same media? (not to be con­fused with mul­ti­ple phys­i­cal copies for archiving).

Verify your data integrity

An often over­looked option even in the Enter­prise, is the ver­i­fi­ca­tion of your back­ups.  Blind back­up and archiv­ing is easy to do, ver­i­fy­ing that your data was not cor­rupt­ed in trans­port to the archive/backup medi­um takes a bit more effort, how­ev­er it ensures you have a good copy of your data.  This is where many peo­ple fail.

It only takes a quick sam­pling of your data (open a few files) to ensure that the integri­ty of your files are not com­pro­mised; and pay close atten­tion to those error logs when burn­ing discs, or copy­ing the data to anoth­er disk/cloud stor­age.  I don’t know how many time I have seen peo­ple both per­son­al­ly and pro­fes­sion­al­ly have a smile turn to a fran­tic frown when they real­ize their back­up is no good.

Solid understanding of the risk for a given archive media

Many peo­ple I talk to have a con­sid­er­able mis-under­stand­ing of the risks of the var­i­ous stor­age tech­nolo­gies.  Many believe that opti­cal media is full-proof or that hard disks nev­er fail.  I can’t count the num­ber of times I have been stand­ing in a store lis­ten­ing to “experts” incor­rect­ly explain­ing a tech­nol­o­gy to a poor con­sumer while blow­ing smoke right up the con­sumers arse. *sigh*, seri­ous­ly this is a per­son­al pet-peeve for me.

I have used near­ly every type of stor­age media per­son­al­ly over the years, from mag­net­ic tape, hard disk, mag­ne­to-opti­cal to opti­cal medi­ums and they all have one thing in common.

THEY ALL FAIL at some point.

The fol­low­ing table out­lines some of the var­i­ous options you have for stor­age and their risk.

[table “6” not found /]

A note about cloud storage.

As a tech­nol­o­gist I have been deal­ing with the “cloud” for over 10 years in one form or anoth­er and there are no end of cloud back­up solu­tions out there wait­ing to take your dol­lars and band­width.  The cloud is noth­ing new, from some of the first on-line stor­age sys­tems in 1999/2000 to the too numer­ous to count sys­tems in the 2010’s.

The largest chal­lenge and con­cern I have with cloud stor­age is that you are nev­er tru­ly in con­trol of “their” stor­age media, their true redun­dan­cy and the fact that once you stop pay­ing you gener­i­cal­ly loose your data.  If you are like me with greater than 12TB of archive data cur­rent­ly, the months it takes to back up this data to a solu­tion such as Crash­plan is not worth the poten­tial loss due to a cred­it card not hav­ing the cor­rect expiry date.

Don’t get me wrong, using cloud dri­ves such as Drop­box, Google GDrive, Microsoft and Apple’s offer­ings have their place and I do like mil­lions of oth­ers use these ser­vices for con­ve­nience.  Ser­vices in the cloud have the ben­e­fits of being near­ly always avail­able, a great place to store your itin­er­ary and mobile pho­tos while trav­el­ing for exam­ple (pro­tects against loss when your mobile device is stolen).

All I am advo­cat­ing for is that one should care­ful­ly eval­u­ate the finan­cial, band­width and tem­po­ral cost vs. the risk of loos­ing your data.  Remem­ber, even Google does­n’t war­ran­ty your free Gmail, albeit I have nev­er lost a sin­gle piece of data in the many years I have been using the ser­vice — its all about your com­fort lev­el at play­ing the odds.

Maintain and evolve your strategy continuously

Every­thing in tech­nol­o­gy becomes stale at some point.

If true long term stor­age (mea­sured in years or decades) is impor­tant to you, then main­tain­ing and evolv­ing your strat­e­gy is an absolute require­ment.  How I store and man­age my dig­i­tal pho­tographs today (Q4 2013) is very dif­fer­ent from how I man­aged them in 2010 let alone when I first got start­ed with high end dig­i­tal imag­ing in 2005.

While I still have all the data, and most of it has remained rel­a­tive­ly unchanged in terms of JPG and CR2 raw file for­mats, the data has been stored on at least a dozen dif­fer­ent hard disk setups over the years, as well as var­i­ous opti­cal media (CD first, then DVD, now Dual Lay­er 50GB Blu-ray).

Con­stant eval­u­a­tion, review, main­te­nance and evo­lu­tion is a must.

My strategy (as of 2013)

All of this post has led up to this.  If you skipped right to this via the Table of Con­tents I strong­ly urge you to go back and read the rest above !

Rob’s Data Stor­age and Archive Strategy =

Dual RAID arrays + Off-site Disk + Opti­cal Media off-site + File Par­i­ty Cal­cu­la­tions on pri­ma­ry disk array

A long way of say­ing I have it backed up near­ly a dozen ways from Sun­day as I am para­noid about my data as I have had all the for­mats loose data at some point in their lifespan.

[table “7” not found /]

The method you choose and the lev­el of com­plex­i­ty you choose to invest in your archiv­ing strat­e­gy is ulti­mate­ly up to you.  If you are say­ing to your­self “crap I don’t have any back­ups” than at least I got you think­ing about pro­tect­ing your data, and if you already have a sol­id strat­e­gy than excellent.

Take the time to con­sid­er your data, val­ue it, archive it and you will thank your­self in the future.  Fam­i­ly dig­i­tal pho­tos comes to mind as the obvi­ous data to archive (you hear about this on the news all the time; some­one stole the lap­top that had the Wed­ding pho­tos on it, etc)… how­ev­er any file that has val­ue to you is worth copy­ing some­where else.

Tool-set

Just a few tools free tools you can use to help you with data stor­age archiv­ing and integrity.

File Synchronization Tools

rsync (Lin­ux, OSX, Win­dows) — avail­able on most UN*X oper­at­ing sys­tems, avail­able for down­load on Windows

robo­copy (Win­dows) — includ­ed with Win­dows 7 and 8

Parity Tools

par2cmdline (Lin­ux)

Quick­Par / Mul­ti­par (Win­dows)

Mac­par Deluxe (OSX)

RAID Controllers or external NAS devices

There are many to choose from, I am per­son­al­ly a fan of Are­ca.

Exter­nal NAS devices such as Drobo, QNAP and Syn­ol­o­gy are also pop­u­lar and offer var­i­ous RAID setups.  I have used all and gen­er­al­ly pre­fer Syn­ol­o­gy over the oth­ers; in my spe­cif­ic case I have set­up my own NAS with a spare Are­ca con­troller for spe­cif­ic fea­ture sets not avail­able in the com­mer­cial products.

Orig­i­nal­ly Post­ed: Decem­ber 5, 2013
Updat­ed: August 4, 2021


| Arti­cle post­ed in: Uncat­e­go­rized || Tagged as: |