|
October 15, 1996
Managing Mass-Storage Monsters
If there's one maxim that holds true, it's "data expands to fill the
space available." It seems that no matter how much hard-disk space you
have, people will find lots of creative ways to fill it. This unfortunate
truism leads to an eternal search for more and better ways of maximizing
storage options.
This search doesn't have to lead to your continually buying more and
larger disks, but instead can be accepted as the need for better storage
management strategies and procedures. Instead of trying to fight fire
with fire, focus on building a comprehensive, multilevel strategy that
will provide for growth. Finding a way to manage the problem will yield
much more satisfaction than constantly trying to fight the symptoms.
Of course, this often is easier said than done. Although it's easy
to pay lip service to the desire for better mass-storage facilities,
it's often difficult to take the time and energy required to architect
a flexible solution to the problems. Even if you are able to piece together
an effective strategy, getting management to pay for something so esoteric
can be difficult. Additionally, there's the implementation, followed
by the routine maintenance and management, which can be boring as hell,
to say the least.
We can't buy these products for you nor can we help you personally
with the day-to-day management tasks, but we can help you design your
strategic solution. In addition, we will offer various tips we've collected
over the years.
Hard Drives Are the Root of All Evil
Let's face facts: The more drives you have, the harder your life is.
You have to back them up. You increase your exposure to the negative
effects of downtime that eventual failure is sure to bring. You also
can spend lots of money, even if drives are cheap on a one-off basis.
Instead of buying more drives, perhaps you should be buying fewer of
them. We don't mean you should consolidate many small devices into a
few large ones (though this is often a good idea), but you should find
a way to minimize the amount of front-line magnetic storage available
on your network. If data expands to fill the space available, then the
necessary correlation is that you can minimize the amount of "necessary" data
if you also reduce the amount of available storage.
This just isn't true, of course, but it does frame our most fundamental
position, which is that data needs to be prioritized before it can be
managed effectively. The best way to prioritize is to eliminate. Do you
really need a hard drive on each PC, or could you get by with a better-managed
server-based storage plan? If you manage applications and user data more
efficiently, you will reap more rewards in stability, which means fewer
disasters.
There are opposing arguments that say it is better to spread your risk
by putting local drives in every system, but this doesn't hold up in
the long term. Although it's true that if the server crashes all PCs
attached to it also are knocked out, you can minimize these risks if
the system is architected correctly. Having drives on each system means
(eventually) you will have failures on each system that will cost you
much more in labor than a single widespread failure would. In addition,
it's easier to prevent a single failure than it is to prevent hundreds
of them.
When you design a centralized storage mechanism, it's important to
recognize that there are several types of disk I/O generated. At one
extreme, there are small files that are loaded frequently, calling for
fast random access. At the other end, there are large files that are
loaded infrequently, but demand large throughput when they are loaded.
For the small files, what you want more than anything is fast seek
times. The disk will need to shoot from one corner to the other quickly,
since several simultaneous requests for small pieces of data are likely
to occur. Throughput is irrelevant, since a large pipe won't be filled
by these short bursts. These types of files generally are word-processing
documents, spreadsheets and many of the most common applications. Since
these files are accessed often, you can store many of them together on
the same disk without having much of an impact on performance--assuming
you purchased very fast drives. By using disks that support fast seek
times, the many requests will be satisfied quickly, allowing for good
overall performance.
For large databases and sequential files, however, the opposite is
true. These files tend to be large blocks of data that are not opened
and closed quickly, but instead are loaded once or twice a day and then
searched heavily. If someone needs a report or a query executed, you
need maximum throughput, as since a quick return of all the data will
make the entire operation faster. You don't care about seek time because
multiple random requests aren't as likely as raw reads of huge chunks
of data.
Which method is suitable for you? Both are, undoubtedly. Your best
bet is to set up two separate disk systems, each optimized for its specific
purpose.
Fault Tolerance
Once you've defined your distribution of media, you'll want to ensure
it doesn't go down or, if it does crash, you'll want to minimize the
impact on end users. Your best bet is to create your disk farms using
Redundant Arrays of Inexpensive Disks (RAID) Level 5 arrays, and then
mirror them using RAID 1. This becomes an indestructible setup (as long
as you have mirrored servers and power supplies as well).
This sort of fault tolerance used to be expensive, but with today's
disk prices it's no longer a forbidding proposition. In fact, many of
today's offerings provide this level of functionality and more in standard
configurations. Many even include options such as dynamic volume resizing,
firmware-based management programs and high-performance dedicated processors.
Some of these systems almost make a dedicated file server obsolete.
Beyond the simple RAID management tools, you should also look for monitoring
and alert capabilities, which allow you to fix any problems that might
occur. Among these alert capabilities, the ability to send Simple Network
Management Protocol (SNMP) traps allows the RAID device to send alerts
to a central SNMP console. Even better, look for a RAID subsystem that
will take advantage of server-based software to send e-mail or pager
alerts whenever alarms are sounded.
Alternative Media
If your LAN is like most, hard-disk storage isn't the only mass-storage
media that needs managing. Including alternative media devices is an
important part of proper planning and management.
One of the most common devices found on a LAN is a CD-ROM drive. With
the development of operating systems that consume hundreds of megabytes
of disk space, CD-ROM-based distribution is an inevitable event, even
on the smallest of LANs. But these little devices can cause management
nightmares. As the number of CDs proliferates, you are faced with two
options, neither of which are pleasant: You can constantly swap CDs in
and out of drives as users request/demand, or you can put more drives
and controllers into your already overcrowded servers.
There are other options. One is to buy a few multigigabyte hard disks
and copy the contents of the CDs onto the drives. Two benefits that come
from this solution are less manual effort and vastly improved response
times. Another option is to use robotic CD disc changers. Some of these
devices take discs in slots built into the changer, with a separate arm
that picks the appropriate disc. Other systems use moving multislot cartridges
and rotating disc holders like those found in musical jukeboxes. For
most robotic systems, the average time to find and position a disc for
writing or reading is six seconds.
Backing It Up
Although the goal behind developing a strategic storage management
plan is to simplify your life, having many different types of storage
on hand makes for a fairly complex environment. However, this is a far
easier-to-manage environment than one with many more different disk systems,
like most desktop-centric environments.
For example, backing up three or four fully mirrored RAID 5 arrays
is easier than backing up 50 different workstations. For one thing, you're
likely to have faster backup I/O channels on a server than you will on
a PC workstation via shared Ethernet, so it will probably take less time
to back up a gigabyte locally than 100 MB remotely. Also, you can increase
the aggregate throughput of your backups by adding multiple tape drives,
allowing drives to run simultaneously.
Some offerings support up to eight drives running concurrently, bringing
the aggregate throughput up to 15 GB per hour, assuming you connect the
drives directly to the server and run the software on the server. These
monsters usually combine parallel backups with streaming, increasing
throughput by continuously spinning the tapes. If you need to back up
multiple servers, you might consider a dedicated backup server with an
automatic tape changer.
Part of your overall backup solution will depend on the tape format
you choose to use. Quarter-inch cartridge (QIC) used to be the standard
choice just a few years ago, but over the past few years newer technologies
like 4-mm digital audio tape (DAT) and 8-mm helical scan drives have
become more popular. Newer entries like digital linear tape (DLT) offer
even greater capacities and speeds, and even QIC is making a comeback
with new, more flexible formats. Each of these different media offers
advantages, but only one should be used across your organization. Consistency
of media will make your life much easier, especially in times of crisis.
Other features to look for include flexible scheduling, alert notification
tools, 24-hour support and frequent updates and patch fixes. If your
backup vendor is falling down in any of these areas, re-evaluate your
choice before something unrecoverable happens.
Hierarchical Storage Management
Hard disks, CD-ROMs, WORM and tape drives are all fairly static forms
of storage--data doesn't get moved from one to the other unless a human
operator specifically instructs so. However, with the recent rise in
popularity of a variety of hierarchical storage management (HSM) offerings,
these migrations can occur dynamically, using just a bit of human prompting.
HSM products allow you to migrate infrequently used files to cheaper,
larger and slower mass-storage devices, leaving small "stub" files in
their place. The data is moved back to primary storage only when the
file is requested, eliminating expensive front-line storage without actually
getting rid of the data.
The way the migration occurs is determined by the administrator and
can generally be set according to length of time, type of file, amount
of available space and other criteria. As usual, there is a wide range
of products and implementations to choose from, but the criteria is pretty
simple. The product needs to be completely customizable in terms of origin
and destination: You may want to move data from a disk on server "A" to
a magneto-optical disk on server "B." It also needs to support complete,
invisible file recall, regardless of the client OS. It also helps if
the product is integrated with your existing backup solution, or offers
a suitable companion product so that the two are aware of each other's
functions. Finally, it needs to support all of the standard alert mechanisms,
so any alarm conditions that occur can be relayed to staff members for
immediate attention.
Working up an effective storage management strategy can be an intense
effort, involving a variety of employees and vendors. But the one-time
costs incurred from the effort and equipment purchase will likely be
offset by reduced management expenses incurred in running a poorly designed
storage management environment. Also, it can be easy to garner management
support for these efforts, once you've outlined the reduced level of
exposure to your organization, not to mention the benefits of the reduced
operating expenses.
Written by Eric
A. Hall.
Copyright © 1996 CMP Media, Inc. Used with permission. |