Webmaster
Resources | Website
Hosting Tutorials | RAID for Web Hosting
RAID -
which stands for Redundant Array of Inexpensive Drives, is
a concept introduced by Patterson, Gibson and Katz of the University
of California Berkeley in 1987 via the paper entitled "A Case for
Redundant Arrays of Inexpensive Disks (RAID)" The basic idea of
RAID was to combine multiple small, inexpensive disk drives into
an array of disk drives, that appears to the computer as a single,
logical storage unit or drive, to offer performance exceeding that
of a Single Large Expensive Drive (SLED)
Its
meaning has now evolved into Redundant Arrays of Independent Disks
(developed within the computing industry). RAID's various designs all involve two key design
goals: increased data reliability and increased input/output
performance.
Redundancy
is achieved by storing data in multiple hard disks and increased
performance, by allowing input/output operations to overlap in a
balanced way. It also increases the mean time between failures (MTBF),
so fault tolerance is increased
The Berkeley
paper described five types of array architectures, (RAIDs 1 through
5), each providing disk fault-tolerance and offering different trade-offs
in features and performance. This list has now been expanded to
9, to include RAIDs 6, 7, 10, and 53. In this article, however,
only the most popular ones will be discussed, including the non-redundant
RAID-0.
* RAID-0
(striped disks) - Its main purpose is to improve speed and requires two or
more physical drives. It does this through ‘striping’, or the use of
an algorithm to break files into smaller ones, called stripes, the
size of which is defined by the user. Each drive then receives a
stripe or more of these fragments to complete the writing process,
thus decreasing the time required to write the file. The same is
also true for the reverse (reading process), as both drives read at
the same time.
* RAID-1 (mirrored disks) - Its main purpose is security and also requires
at least two drives. Here, mirroring is done, meaning that data
is duplicated and written to two drives in an array. Fault tolerance
is its special feature because if either of drive fails, no data
is lost. It offers little in terms of performance though. When reading
data, it gets information from the drive that is not too busy, but
when writing, there is overhead as the controller must duplicate
the file it is sent before passing it along to the drives.
* RAID-5 (striped disks with parity) - Requires at least three and usually five disks
for the array and is best for multi-user systems in which performance
is not critical or which do few write operations. Here, you get
the speed of striping and the reliability of mirroring, since two
of the disk get stripes, and the third gets a parity bit for redundancy.
The assignment of stripes and parity bits among the disks shifts
constantly to eliminate the random write performance hit of the
dedicated drive receiving the parity information. They’re called
hardware RAID controllers because they require a special chip to
make the parity bits, and there is overhead due to the parity bit
calculation and writing.
There
are two possible approaches to RAID:
+ HardwareRAID – where the RAID subsystem is managed independently
from the host and presents to the host only a single disk per RAID
array. They are highly fault tolerant and are of two types: the
controller-based RAID and the external SCSI RAID.
+ Software RAID – occupies host system memory, consume CPU cycles,
is OS dependent and performance is directly dependent on CPU performance
and load. Examples are the MD driver in the Linux kernel; the Solstice DiskSuite and Veritas Volume Manager for the Solaris system; and
Adaptecs AAA-RAID controllers.
* parity -- (from the Latin paritas: equal or equivalent) refers
to a technique of checking whether data has been lost or written
over when it's moved from one place in storage to another or when
transmitted between computers.
|