Say again‽‽

Resources

Say again??

Authors: Bjarne Hansen

Communication errors occur in many situations. We may miss hearing a spouse’s request to “buy milk” over the crying of an infant. A phone ringing during your presentation may cause you to stutter. Or a medicine expected to expire on 07-08-12 actually expires on 12-07-08.

Here are some reasons for misheard communications:

  • Ambient interference and noise
  • Overloaded processor (biological brain or silicon CPU)
  • Incorrect message format

We have many ways of overcoming communication errors.  Some are informal, used in everyday interactions. We can repeat a message (referred by some as nagging), so that if portions are garbled or drowned out then the whole message can be reconstructed. We can speak louder to overcome noise, or we can turn off the distraction, such as a phone.

More formal methods also exist, such as:

– Standardized message formats. See for example, “Date Format Quandary for FDA UDI for Medical Devices”, Aug 2012.

– Reduce susceptibility to outside interference. IEC60601-1-2:2007 Medical Electrical Equipment Collateral standard: Electromagnetic compatibility – Requirements and Tests, for example, specifies the degree to which medical devices must be immune to external radiated and conducted interference.

– Design for adequate message processing power. IEC62304:2006 Medical device software –Software life cycle processes, para 5.2.2, for example, requires defining the computing environment under which software will run.

– Embed error detection and correction in the message itself.

It is this latter method that I want to discuss. Imagine an electronic communication between a medical treatment device and its handheld controller. Data passed between the two might include treatment duration and dose – vital parameters – yet a faulty connector might cause one of them to be corrupted. How can we prevent a bad parameter from being used?

A fundamental verification the receiving treatment device should perform is a boundary check, that is, a sanity check on values before they are used. For example, it might be known during product design that a duration shorter than 1 second or longer than 100 seconds would be invalid, so the device should confirm that the received duration falls within the expected bounds and take appropriate action if it doesn’t. Boundary checks do not add length to the transmitted message, do not require any effort by the transmitter, and generally require minimal effort by the receiver.

More advanced verifications involve adding content to the message itself.  These can detect errors missed by boundary checks, but do require more effort on the part of the transmitter and receiver. A useful and commonly-used method is a checksum, which consists of a number calculated by the transmitter based on the message content, and sent with the message. The receiver calculates its own checksum from the message it heard, and compares it to the received checksum. If they do not match, some portion of the message was corrupted.

In the device example above, a checksum algorithm might be to add the duration and the dose numbers together. If the receiver sees that (dose + duration) does not equal the transmitted checksum, then it can conclude there was an error in the message. The error could have been in either the dose, duration, or the checksum itself. It is important to note that there are always some errors not detectable with a given algorithm. For example, an error that increased the dose by some number and decreased the duration by the same number at the same time would not be detected by this simple algorithm.  More advanced error detection algorithms can be used to reduce the chance that an error will go undetected.

The Cyclic Redundancy Check (CRC) is one such, and can provide increased error detection strength. CRCs are used widely, such as in Internet communications, optical media (CD, DVD, Blu-Ray), and bank machine transactions. The CRC algorithm repeatedly applies a polynomial division on the bits in a message, and the remainder of the division is the check value that is appended to the message. Selecting an optimum polynomial depends on several factors: the expected message lengths; nature of the expected errors (burst vs random errors, and length of errors); and what your requirements are for percentage of errors detected.

Much work has gone into evaluating polynomials; see for instance Koopman, P. & Chakravarty, T., “Cyclic Redundancy Code (CRC) Polynomial Selection For Embedded Networks,” Intl Conf on Dependable Systems and Networks (DSN04), June 2004. Selecting the wrong CRC polynomial has consequences ranging from bloated messages, to missing errors that needed to be detected.

The checksum / CRC methods of error detection are useful for dealing with accidental alteration of a message. Detecting intentional message tampering, such as by a malicious attacker, however is more complex.  Consider a radio link between a patient monitor and a display unit. An attacker with a radio transmitter can create a false message with incorrect data and append a valid CRC. The receiving display will not know the difference between the false message and a legitimate one.  Protecting against intentional errors requires cryptographic authentication means, which are more complex than detecting common unintentional errors on communication channels. Whether this level of sophistication is needed in a particular medical device will be determined during preparation of the risk analysis.

Hopefully this makes sense.  If my message is unclear, let’s communicate some more.