Dear all,
I suspect that a lot of us rely on CDs for data archiving. My
experience of trying to restore archived data is that these are not
as reliable as I had expected.
The following is my experience, I'd be interested if anyone has a
similar, or indeed contrasting, story.
The reason I'm posting this is that there may be many people with a
similar backup strategy who have not yet needed to restore data. This
problem came to light when a graduating student wanted to look at
some of his old data. He had generated a lot of data over the years,
some of which I had archived to two CDs for him. When he tried to
reinstate the data on the NMR system, he found that he could not read
either disk. Fortunately I have a backup server with all data
belonging to current users. But this lead me to investigate the CDs I
have burned over the last three years or so.
Of approximately 170 disks that I have burned, perhaps 5% are at
least partially unreadable. This has caused me to rethink my backup
policy. Of course the vast majority of data will never be needed, but
there is little point in having unreliable backups.
Several of my CDs have been recoverable by trying different CD
drives, or by extracting the entire CD image file instead of
individual data files.
The CDs have been sitting on a shelf in my office in jewel cases, so
have had no external damage. A possible error I made was using
Neato/Fellowes sticky labels on the disks. Some of these have started
to come off the disks, which potentially could lead to balance and
spinning problems. Attempting to remove a label risks damage to the
recorded surface.
I have no evidence that the labels adversely affect the disks in any
other way, but have found web sites which claim the glue can interact
with the metal film. Fellowes denied that this was a problem when I
emailed them. Unfortunately I used labels on virtually all my disks
and so cannot compare with and without.
I have also bought blank disks in bulk, relatively inexpensively, and
so do not have experience of CDRs from many different manufacturers.
So how best to change policies to safeguard data?
My facility is fortunately in a single lab and is configured with a
data server mounted on all the instruments, so all data is already
centralized. There are currently 230 active users with a disk quota
of 500Mbytes each. The quota serves as a warning that data has
accumulated and should be safeguarded and to stop anyone filling the
disk by accidentally generating very large or numerous files. Many
users never reach this amount of data, but some acquire more.
Currently the main server drive is 146Gbytes and is synchronized
nightly with a second 146Gbyte drive as the main backup. A second
backup is generated monthly, spread over two 50Gbyte drives.
From about 2000 to 2003 the server used the 50Gbyte drive pair, which
I upgraded to the current 146Gbyte pair.
For future backups I plan to simply compress old data and leave it on
the backup servers. Compressing with tar and gzip results in about a
50% space saving for Bruker data. The primary server drive I keep for
current users to minimize the duration of the nightly backup. I will
still burn CDs, but will not rely on these as the primary backup.
When the current server drives become close to being full, I expect
that it will be economical to implement a more capacious system,
perhaps a RAID array.
best wishes,
Phil.
--
Dr Phil Dennison
NMR Facility Director (949)824-6010 (office)
Department of Chemistry (949)824-5649 (lab)
University of California, Irvine (949)824-8571 (fax)
Irvine, CA 92697-2025 dennison_at_uci.edu
USA
Received on Mon Oct 25 2004 - 09:09:54 MST