Fault Tolerant State Machine Design

by kaushal_buch on 2007-04-21
source:

Fault Tolerant State Machine Design

Kaushal D. Buch

Abstract

Fault tolerant state machines are essential for implementation of highly reliable hardware in noisy or fault triggerable environments. Due to the external noise/energy the state machines may enter into a false or invalid state, which may lead to malfunctioning of the entire hardware. This can be prevented by using different encoding techniques for state registers of FSM as well as by use of some recovery techniques that can be incorporated in FSM. The paper describes the state machine architecture, which is followed by various encoding techniques. Later, techniques for recovery of state machine from an invalid state to valid known state are shown.

These fault tolerant state machines are widely used in space designs. Fault tolerant machines are very useful in space hardware realized on FPGA, as they are very useful in combating Single Event Upsets (SEU).

Introduction

Finite State machines are very useful in digital design and are very often used for controlling the process or for implementation of definite sequence of process. Normally they have two basic types of digital elements – sequential and combinational. The sequential elements are those, which are used to change the state of the machine synchronously. The combinational logic forms the logical part of the state machine and is normally used for deciding the transition from one state to another based on certain conditions. It is also used to generate the output at each stage, if any. The basic FSM is as shown in figure 1.

Fundamentally, there are two types of FSM – Moore and Mealy. These are based on the mechanisms on which the output is generated. If the output is a function of only present state then it is called a Moore machine whereas if the output is a function of present state as well as input, then it is called a Mealy machine. Both these types of state machines are shown in figure 2.

The merits and demerits of both these state machines are discussed later in the paper. Also another type of state machine known as the Letcher state machine has been discussed. The state register encoding techniques are discussed subsequently and the ability to detect single bit errors in case of Hamming decoding is shown. The comparison between binary, one-hot, hamming –2 and hamming –3 decoding is also stated.

Figure 1: Basic State Machine

The error recovery in the case of state machines is described later, which is a crucial role of a fault tolerant state machine in order to function flawlessly.

Types of Finite State Machines

Three types of finite state machines are discussed here – Moore, Mealy and Letcher.

Moore Machine

The output of this state machine depends on the present state of the sequential element. As shown in figure 2, there is a combinational and a sequential block forming the basis of this machine. The outputs change synchronously with respect to clock and state transition.

Figure 2: Moore State Machine

From figure 2, it can be concluded that the next state Sn, is a function of current state Sp, and the current input X, but the outputs Y is a function of the current state Sp. Here Y and Sn can be expressed as follows:

Sn(t) = f (Sp(t), X(t))
Y(t) = h (Sp(t))

Mealy Machine

The output of this state machine depends on the present state of the sequential element and the input to the state machine. As shown in figure 3, there is a combinational and a sequential block forming the basis of this machine. The outputs change asynchronously with respect to input transition.

Figure 3: Mealy State Machine

As shown in figure 3, the next state Sn, is a function of Sp, and the current inputs X and the outputs Y is a function of the current state Sp and the current inputs X. In other words Y and Sn can be expressed as follows:

Sn(t) = f( Sp(t),X(t))
Y(t) = h (Sp(t), X(t))

Letcher State Machine

The letcher state machine comprises of three types of latches – clock signal driven, edge triggered and parallel entry shift registers, and a combinational logic module. The three latches are input latch, output latch and feedback latch. The basic Letcher state machine is as shown in figure 4.

Figure 4: Letcher State Machine

The input latch’s output signals and present state drive the combinational logic to determine the next logic state. The current state and the output of the input latch determine the input of the output latch. The output of the Letcher state machine is inherently synchronized by an outside supplied clock signal, which can be altered upon demand. Here the synchronous output signals take effect at the next clocking event.

The next state Sn, is a function of current state Sp, and the current inputs X. And the outputs Y is a function of the current state Sp and the current inputs X. Thus Y and Sn are expressed as follows:

Sn(t+x) = f(Sp(t), X(t))
Y(t+x) = h(Sp(t), X(t))

Metastability issues in State Machines

The Metastability issues occur to set-up or hold time violation of both the flip-flops, which form the sequential part of the state machine. These effects are more prone at higher frequencies. Due to this effect, the output of the flip-flop is non-predictable. These effects are there in Moore and Mealy state machines. Letcher state machine is not affected by this issue is it has a level triggered device as sequential element. The Mealy state machine is more noise prone as the output of the machine depends on the input and hence if input changes (due to glitch etc.) the output will also change.

The probability that the metastable state will occur is given by MTBF (Mean Time Between Failures). MTBF for a typical D-Flip Flop can be given by

MTBF = 1/ (A*fc*fd*exp(-B*Ts))

where fc = clock frequency
fd = frequency of asynchronous data rate
Ts = Clock Period or Propagation Delay or Setup Time.
A and B are Metastability constants that vary with construction and technology and system parameters.

It is clear from the above equation that the MTBF largely depends on fc and fd and hence if the frequency is higher or due to asynchronous inputs the probability of occurrence of Metastability is higher.

State Machine Encoding Styles

Binary

Normally this encoding scheme is used for encoding the states in state machines. The scheme is simple and implements a sequence of encoding i.e. a simple sequence of states. This kind of encoding is more error prone but requires less number of registers for encoding as compared to the other schemes discussed earlier.

One-Hot

The sequence of states is such that only one bit changes during one transition i.e. 0001, 0010, 0100 and 1000. Thus two bits change while transition from one state to another. Hence it has capacity to detect single bit error. The drawback in this kind of state machine is that number of memory elements required is equal to the number of states in the state machine and also it has more number of unused states.

Hamming 2 code

The difference between present state and next state of the state machine is two hamming units. This is useful for detection of single bit errors. This can be seen by the equation -

Hamming distance d = t + 1

Where t = number of error detection bits

The disadvantage in this type of encoding is the number of memory elements required which is equal to the number of states in the state machine.

Hamming 3 code

The difference between present state and next state of the state machine is three hamming units. This is useful for detection and correction of single bit errors. This can be seen by the equation -

Hamming distance d = 2t + 1

Where t = number of error correction bits

The disadvantage in this type of encoding is the number of memory elements required which is equal to the number of states in the state machine.

Error Recovery in Fault Tolerant State Machines

The state machine can be recovered from error by following ways:

Each invalid state can have a flag set, which indicates that the state machine has entered in a wrong state and needs to be transitioned into a valid state. Similarly, each state can have a sequence of states from which this state can be reached. Hence if this present state is not reached from those states mentioned in that sequence then it will detect that error has occurred and will try to come in known state.

Similarly after every certain clock cycles, a counter must check the state register to see if it is in the valid state or not. If not it should bring the state machine in a known state. It is functionality similar to that of a watchdog timer.

Conclusion

Various state machine types and the encoding techniques were discussed. It can be concluded that encoding techniques should be chosen based on the error correction capability required. Hamming 3 codes suit best for single bit error correction and hence can be used for space designs, which are susceptible to Single Event Upsets (SEU). Also the use of state machines with their merits and demerits are shown and hence the state machine should be implemented considering points. The error recovery techniques are helpful in case of multi-bit error or unavailability of failure detection capability and hence were discussed subsequently in the paper.

References

1. Michael Forbes, Jian Xiong, Chinmay Athanikar and Raghavendra Jayaram, “Finite State Engines”, Department of Electrical Engineering, University of Tulsa.
2. Melanie Berg, “A Simplified Approach to Fault Tolerant State Machine”, Ball Aerospace and Technologies Corp.
3. Gary Burke and Stephanie Taft, “Fault Tolerant State Machines”, Jet Propulsion Laboratory , California Institute of Technology, Pasadena, CA.

Browse

Tools

Fault Tolerant State Machine Design