Mailing List Archive

Mailing List: techdiver

Message Display

Date: Wed, 19 Jul 95 22:06:09 EDT
From: Jeff Kell <JEFF@UT*.UT*.ED*>
Organization: University of Tennessee at Chattanooga
Subject: Re: Draeger Atlantis
To: Marc Dufour <mdufour@CA*.OR*>, techdiver@terra.net

On Wed, 19 Jul 1995 18:59:45 -0500 you said:
>At 12:09 PM 7/19/95, Frank Deutschmann wrote:
>>Pretty close, but you are ignoring the (additional) complexity of any
>>hw/sw used to link the three computers.  (Yes, this is a nit, the end result
>>is still that the reliability is improved!)
>
>  Some time around 10 years ago, some computer manufacturer (was it HP?)
>introduced "self-healing" memory chips. That is, those could correct by
>themselves most manufacturing defects. The chips were much more complex than
>normal chips, but the wafer yield increased something like 300%, thus making
>them cheaper to manufacture them than the "normal" chips...

At last, I can give a qualified technical contribution to this list :-)

HP was at least one of the first manufacturers to employ memory arrays with
polynomial error correcting codes.  If you understand the concept of parity,
common with terminal emulation, you know that parity detects a single-bit
error and can raise an error condition (it knows an error has occurred, and
what it does about it is up to the application).  If you extrapolate parity
logic one step further by allowing multiple bits for error correction, you
can devise algorithms which can not only detect single-bit errors, but can
further correct single-bit errors (and certain multiple-bit errors).  In the
original HP ECC memory arrays (this had NOTHING to do with chip design) data
is stored in 16-bit words with five bits of ECC.  Using this scheme with an
intelligent memory controller (to compare the data with the ECC field) any
single-bit errors are corrected on the fly and the CPU never sees a data
error.  The HP hardware also "logs" errors so that faulty chips can be
isolated and replaced.

This is not new whizz-bang technology, it was present in the HP-3000 Series
II computer systems dating back to 1976 (maybe 1975).  Considering that most
other computer technology of the era either ignored errors and suffered the
consequences, or shut themselves down upon detecting a parity error, it was
a giant step forward in reliability at a nominal cost (just under 30% added
cost in chips (21 chips per word versus 16).

ECC technology has been expanded to bus architectures as well, but thus far
is not employed (that I know of) in serial transmission media (RS-232, LAN)
where the bandwidth would suffer a 30% drop.  Only parallel transmission
media (e.g., bus architecture) is subject to ECC benefit.

We now return you to your regularly-scheduled techdiver forum :-)

[\] Jeff Kell <jeff@ut*.ut*.ed*>

Navigate by Author: [Previous] [Next] [Author Search Index]
Navigate by Subject: [Previous] [Next] [Subject Search Index]

[Send Reply] [Send Message with New Topic]

[Search Selection] [Mailing List Home] [Home]