# Soft Error Rate Estimation of Digital Circuits in the Presence of Multiple Event Transients (METs)

Mahdi Fazeli† Seyed Nematollah Ahmadian† Seyed Ghassem Miremadi† Hossein Asadi† Mehdi B. Tahoori‡ †Department of Computer Engineering, Sharif University of Technology, Tehran, Iran Email: {m\_fazeli,ahmadian}@ce.sharif.edu, {miremadi,asadi}@sharif.edu †Chair of Dependable Nano Computing, Karlsruhe Institute of Technology, Germany Email: mehdi.tahoori@kit.edu

Abstract—In this paper, we present a very fast and accurate technique to estimate the soft error rate of digital circuits in the presence of Multiple Event Transients (METs). In the proposed technique, called Multiple Event Probability Propagation (MEPP), a four-value logic and probability set are used to accurately propagate the effects of multiple erroneous values (transients) due to METs to the outputs and obtain soft error rate. MEPP considers a unified treatment of all three masking mechanisms i.e., logical, electrical, and timing, while propagating the transient glitches. Experimental results through comparisons with statistical fault injection confirm accuracy (only 2.5% difference) and speed-up (10,000X faster) of MEPP.

### I. Introduction

With contineous technology down-scaling, *CMOS* technology has become extremely sensitive to radiation-induced transient errors. Until recently, *Single Event Upsets* (SEUs) in latches and *flip-flops* (FFs) and *Single Event Transients* (SETs) in combinational logic cores of a circuit were regarded as the main effect of particle strikes. However, with smaller device geometries, a high energy particle can affect two or more adjacent nodes in a circuit resulting in *Multiple Event Transients* (METs) in the combinational and *Multiple Bit Upsets* (MBUs) in the sequential components [1], [2], [3], [4].

An SET or MET may result in a soft error if it is captured by circuit bistables. Combinational logic has been less susceptible to soft errors than the memory structures since it has more resistance to soft errors due to three masking factors: *logical masking*, *electrical masking*, and *timing masking*. SETs or METs occurring in the combinational logic may logically be masked and not reached to the inputs of the bistables (*logical masking*). Also, as an SET or MET event propagates through logic gates, its magnitude may be attenuated by logic gates due to electrical properties of the gates (*electrical masking*). Even if an SET or MET with large enough magnitude is propagated to a bistable input, it may not arrive at appropriate latching-window of the bistable (*timing masking*).

Accurate soft error modelling is an essential step in design of highly reliable digital system with minimal performance and power penalty. Using this, one can identify the most susceptible components in the circuit for soft error hardening.

Previously proposed SER estimation approaches mainly focus on single event upset and single transient fault model [5], [6], [7], [8], [9]. There are also some work to model the effect of MBUs in memory elements [1], [2], [3], [4]. There is little work investigating the effect of METs in combinational logic. These

978-3-9810801-7-9/DATE11/©2011 EDAA

methods, however, are mostly based on simulation using fault injection either at the circuit-level [10], [11] or device-level [12] and are very time-consuming especially for large circuits. The method presented in [13] addresses modeling of METs in combinational logic in which transient pulses are encoded and propagated at the gate-level using *Binary Decision Diagrams* (BDDs) and *Algebraic Decision Diagrams* (ADDs). However, the worst-case complexity of BDD encoding of logic functions is exponential to the number of logic gates [14]. Moreover, to propagate METs through multiple paths, it is required to keep track of several glitches. Therefore, as compared to SET modeling, propagation of METs in this method requires much more information especially for reconvergent paths.

This paper presents an analytical *Multiple Event Probability Propagation* (MEPP) technique to fast and accurately compute the SER of a digital circuit in the presence of METs. In our proposed technique (MEPP), a four-value logic and probability system are proposed to propagate multiple transient glitches to the *Primary Outputs* (POs) and FFs. In the MEPP technique, possible physically adjacent gates are identified as the fault sites using gate-level netlist of the circuit. The forward cones of the fault sites are then extracted and topologically sorted. Finally, the error probabilities of the transient glitches are propagated through the topologically sorted list from the fault sites to the reachable FFs and POs using the proposed four-value probability system, the propagation rules, and an erroneous glitch propagation approach.

We have validated the proposed technique by a reference model which is built on *Statistical Fault Injection* (SFI) using *Monte-Carlo* (MC) timing-accurate simulation. Our results show that difference between SER values using the proposed technique and SFI is about 2.5% while the proposed technique is orders of magnitude faster.

The rest of this paper is organized as follows. Sec. II proposes our SER modeling in digital circuits. Sec. III presents experimental results. Finally, Sec. IV concludes the paper.

## II. PROPOSED SER MODELING IN DIGITAL CIRCUITS

In this section, we explain the proposed technique (MEPP) to estimate the SER of a digital circuit in the presence of METs. The MEPP technique is based on a proposed four-value probability system as well as a static timing analysis method to propagate all glitches produced by an MET event from fault sites to reachable *FFs* and *POs*. Briefly, The MEPP technique consists of three main steps: 1) fault sites identification; 2) fault



Fig. 1. An example of a reconvergent (a) and a convergent path (b)

generation and propagation; and 3) failure probability computation. Before explaining these three steps in the subsequent subsections, we first describe the motivation of this work.

### A. Motivation

A four-value probability system has been proposed in [15] to propagate the error probabilities of an SET event towards POs and FFs. In this representation, given a single error site, each net can be in one of the four states (values): 0, 1, a,  $\bar{a}$ . A set of probabilities ( $P_0$ ,  $P_1$ ,  $P_a$ , and  $P_{\bar{a}}$ ) is associated with these four states (0, 1, a,  $\bar{a}$ ), respectively. A net or signal has value of a ( $\bar{a}$ ), if it has an erroneous value with the same (opposite) polarity as the SET (fault) site. A set of propagation rules for elementary gates was also developed to obtain these probabilities at the output of a gate, based on the corresponding probabilities at the gate inputs. By considering different polarities of errors, error propagation through reconvergent fanouts can be accurately computed.

However, in the MET error model, in addition to reconvergent paths (Fig.1.a), errors from multiple sources can be combined through convergent paths (Fig.1.b). Fig.1.a shows an example in which a transient glitch due to an SET event is propagated through a reconvergent path. In this case, an a-event is initially added at the output of the faulty gate. Based on the input values of the gates along the propagation path, a and  $\bar{a}$  events reach two inputs of the reconvergent gate. Since the reconvergent gate is an XOR gate, the output would be 1 and the error is masked. In this case, there is no need to know the initial value of the inputs, i.e., it is only required to store the polarity of the propagated glitches.

Now consider using this four-value probability system to propagate multiple transient glitches due to an MET event through convergent paths. Consider the example given in Fig.1.b. In this example, using this probability system, the output of the convergent gate cannot be determined since the polarity of the propagated values are not relevant. Note the propagated  $a/\bar{a}$  probabilities are from different sources and an event with the same polarity of another signal from a convergent path does not necessarily contain the same erroneous value. Consequently, the four-value probability system presented in [15] fails to model the propagation of multiple errors through convergent paths.

To model propagation of multiple transient glitches, we propose a four-value probability system. In the proposed system, each net in the circuit, at a given snapshot, may have four values as follows:

• 0: a net has logic value of 0.



Fig. 2. Four value Logic in a two-input AND gate

- 1: a net has logic value of 1.
- 0<sup>e</sup>: a net has an initial logic value of 1 but due to a particle strike, it contains an erroneous value of 0.
- 1<sup>e</sup>: a net has an initial logic value of 0 but due to a particle strike, it contains an erroneous value of 1.

Fig. 2 shows these input values for a two-input AND gate and the corresponding output. Based on these four values, we associate a set of probabilities  $P_0$ ,  $P_1$ ,  $P_0^e$ , and  $P_1^e$  for each net in the circuit. These probabilities, denoted by  $P_1e(U_i)$ ,  $P_0e(U_i)$ ,  $P_1(U_i)$ , and  $P_0(U_i)$ , are defined as follows:

- $P_1e(U_i)$  and  $P_0e(U_i)$  are defined as the probabilities of the output of node  $U_i$  being  $1^e$  and  $0^e$ , respectively.
- $P_1(U_i)$  and  $P_0(U_i)$  are the probabilities of the output of node  $U_i$  being 1 and 0, respectively. In this case, the error is masked and not propagated.

The main concept in the MEPP technique is obtaining these probabilities for all gates in the circuit, for a given set of multiple error sites. This is done by calculating these probabilities at the output of a gate based on corresponding probabilities at the gate inputs. Therefore, in one topological traversal of the netlist, these probabilities for all nets can be generated. As an example, for an *n*-input *AND* gate, the probability that the gate output has a value of 1 is equal to the probability that all its inputs are equal to 1. This can be computed as follows:

inputs are equal to 1. This can be computed as follows: 
$$P_1(out)_{AND} = \prod_{i=1}^n P_1(X_i) \tag{1}$$

The output of an n-input AND would have a value of  $1^e$   $(0^e)$  if each of its inputs contains a value of 1 or  $1^e$  (1 or  $0^e)$  except the case in which all of its inputs have a value of 1. Therefore, the probability that the gate output has a value of  $1^e$   $(0^e)$  can be computed as follows:

$$P_1e(out)_{AND} = \prod_{i=1}^{n} \left[ P_1(X_i) + P_1e(X_i) \right] - P_1(out)_{AND}$$
 (2)

$$P_0e(out)_{AND} = \prod_{i=1}^{n} \left[ P_1(X_i) + P_0e(X_i) \right] - P_1(out)_{AND}$$
 (3)

Finally, the probability that the output of an n-input AND has a value of 0 can be computed as follows:

$$P_0(out)_{AND} = 1 - [P_1(out)_{AND} + P_1e(out)_{AND} + P_0e(out)_{AND}]$$
(4)

Using a similar deduction, the four probabilities for the output of an n-input OR gate can be extracted as follows:



Fig. 3. Transient glitch propagation in a convergent path using MEPP

$$P_0(out)_{OR} = \prod_{i=1}^n P_0(X_i)$$
 (5)

$$P_1e(out)_{OR} = \prod_{i=1}^{n} \left[ P_0(X_i) + P_1e(X_i) \right] - P_0(out)_{OR}$$
 (6)

$$P_{1}e(out)_{OR} = \prod_{i=1}^{n} [P_{0}(X_{i}) + P_{1}e(X_{i})] - P_{0}(out)_{OR}$$

$$P_{0}e(out)_{OR} = \prod_{i=1}^{n} [P_{0}(X_{i}) + P_{0}e(X_{i})] - P_{0}(out)_{OR}$$

$$P_{1}(out)_{OR} = 1 - [P_{0}(out)_{OR} + P_{1}e(out)_{OR} + P_{0}e(out)_{OR}]$$
(8)

$$P_1(out)_{OR} = 1 - [P_0(out)_{OR} + P_1e(out)_{OR} + P_0e(out)_{OR}]$$
(8)

Unlike the method presented in [15], the proposed four-value probability system and the formulations presented in Equation 1 through Equation 8 can accurately handle propagation of errors from multiple error sites. To better understand how the MEPP technique can accurately address the propagation of multiple transient glitch through convergent paths, let's consider an example depicted in Fig. 3. In this example, an MET event generates two erroneous values ( $1^e$  and  $0^e$ ) at two fault sites. These two erroneous glitches and the corresponding probabilities are propagated through convergent paths according to Equation 1 through Equation 8. The propagated probabilities are shown in Fig. 3. Note that in general, the number of fault sites due to an MET event can be extended to more than two glitches. Since the polarity of the propagated glitches are considered in Equation 1 through Equation 8, any number of errors can be simply propagated using this MEPP technique.

## B. MET Fault Site Identification

Generally, there are three different mechanisms that can result in an MET event. The first mechanism is that two simultaneous particles strike the output of two different gates. It has been observed that the probability of this mechanism is very low and can be neglected [16]. In the second mechanism, one incident nucleon produces two ore more secondary charged particles that each of them has the ability to produce a transient glitch. Finally, the third mechanism occurs when a high energy particle passes through two ore more physically adjacent gates and produces two or more transient glitches.

By neglecting the first MET mechanism, we can conclude that an MET event occurs at the output of the physically adjacent gates. Therefore, the first step in SER estimation in the presence of METs is to identify the adjacent cells as the fault sites; i.e., the gates that can be simultaneously affected by a particle strike. This is not possible before extracting the layout of a circuit. However, using some information in the gate-level netlist of



Candidate fault sites for METs

a circuit, one can predict the adjacent gates with an acceptable accuracy. In this paper, we have assumed that two or more gates are adjacent if they fall into one of the following categories:

- 1) a gate and its fan-ins (GFI)
- 2) a gate and its fan-outs (GFO)
- 3) common fan-ins of a gate (CFI)
- 4) common fan-outs of a gate (CFO)

Examples of faults sites are shown in Fig. 4. Note more accurate fault site selection for MET SER estimation would require post-synthesis layout information, which is not typically available in the early design steps.

## C. MET Fault Generation and Propagation

In Section II-A, it was shown that the proposed four-value probability system can efficiently model logical masking when propagating erroneous values from multiple fault sites through logic gates. In this subsection, we extend the proposed technique to also model timing and electrical masking factors of MET events by using a static timing analysis of transient glitches similar to the method presented in [15]. In the proposed static timing analysis, two events are added to the event list of each fault site at  $T_0$  and  $T_0 + W$ , where  $T_0$  is the time of MET incident and W is the transient pulse width. The net value is equal to  $0^e$  or  $1^e$  at  $T_0$  and is equal to 1 or 0 at  $T_0 + W$ . Therefore, the net will retrieve its original value at  $T_0 + W$ before the particle strike. To propagate the transient glitches from the fault sites to reachable POs or FFs, the gates and nets of a circuit are categorized into on-path and off-path gates or nets. On-path gates or nets are those that are located within the forward cone of a fault site, i.e. the nets and the gates that are located in the structural paths between the fault site and POs or FFs. The remaining nets or gates are defined as off-path.

Propagation of a transient glitch to POs or FFs depends on the logic value of the off-path signals. A straightforward technique to propagate events is performing logic simulations to examine whether the transient glitch would propagate to POs or FFs. However, logic simulation is very time consuming and is not tractable for very large size circuits. Instead, Signal Probabilities (SPs) can be utilized for off-path nets and gates. Signal probability of a node is defined as the probability that the node has the logic value of 1.

While traversing the circuit netlist, SP is used for off-path nets and the error propagation probability rules are utilized for onpath nets. To accelerate the event propagation process, transient glitches are propagated through the forward cone of the fault



Fig. 5. Forward cones of two faulty gates in a sample circuit

| <b>Event at Time</b> t | Event at Time $t+w$ |
|------------------------|---------------------|
| $P_1(x) = 0$           | $P_1(x) = sp$       |
| $P_1 e(x) = 1 - sp$    | $P_1 e(x) = 0$      |
| $P_0e(x) = sp$         | $P_0 e(x) = 0$      |
| $P_0(x) = 0$           | $P_0(x) = 1 - sp$   |

TABLE I THE INITIAL AND ERRONEOUS VALUES OF A FAULT SITE AT TOME t and t+w

site instead of traversing the entire netlist. It is quite probable that the forward cones of multiple fault sites overlap each other. Fig. 5 shows the forward cones of two fault sites (A and B) in a sample circuit. As it can be seen, the forward cones of these two gates have common gates and nets. In the MEPP technique, the gate lists in the forward cone of the fault sites are extracted and then sorted using *topological sort* algorithm. Then, the sorted gate lists of the fault sites are merged and finally topologically sorted again.

To consider the electrical attenuation of the transient glitches, we use a similar technique to the previous work presented in [17], [18]. This electrical masking model is applied to transient glitches while they are being propagated along the paths. The electrical masking can affect the width of the propagated glitches as the propagated timed events can be modified due to change in the output rise and fall time.

### D. Soft Error Rate Computation

During the propagation of MET events, due to the existence of convergent and reconvergent paths in the circuit and the electrical attenuation of the gates, the propagated waveforms to the POs or FFs may have longer or shorter width than the initial width of the transient glitches (MET events). In addition, since the transient glitches propagate through convergent and reconvergent paths in the circuit, they are no longer a simple glitch like the initial MET event rather they may consist of several burst pulses with different widths.

The last step in the proposed technique is to measure the pulse width of the propagated waveforms. Consider an MET event strikes the outputs of two adjacent gates,  $G_l$  and  $G_m$ . Let's also consider that the MET event is propagated along the combinational logic of a circuit and reaches FFs and/or POs. Now, the event lists of all FFs and POs are available. The expected pulse width of the propagated waveform to a primary output  $PO_i$  can be computed according to Equation 9. Similarly,

the expected propagated pulse width to a flip-flop  $FF_i$  can be computed according to Equation 10.

$$E_{PO_i}^{PW}(G_l, G_m) = \sum_{j \in EventList(PO_i)} (P_{0e}^j + P_{1e}^j)(T_{j+1} - T_j)$$
 (9)

$$E_{FF_i}^{PW}(G_l, G_m) = \sum_{j \in EventList(FF_i)} (P_{0e}^j + P_{1e}^j)(T_{j+1} - T_j) \quad (10)$$

In these two equations,  $P_{0^e}^j$  ( $P_{1^e}^j$ ) is the probability that event j has the value of  $0^e$  ( $1^e$ ) and  $T_j$  is the time of  $j^{th}$  event of the event list. After computing the expected width for the propagated waveform in each PO and FF, we measure the latching probability of the propagated waveform caused by transient glitches at the outputs of  $G_l$  and  $G_m$  using Equation 11 and Equation 12.

$$LP_{PO_i}(G_l, G_m) = \frac{T_h + E_{PO_i}^{PW} + T_S}{T_{CLK}}$$
 (11)

$$LP_{FF_i}(G_l, G_m) = \frac{T_h + E_{FF_i}^{PW} + T_S}{T_{CLK}}$$
 (12)

 $T_h$  and  $T_s$  are the hold and setup time of flip-flops, respectively. Throughout this paper we assume that the primary outputs of a circuit are also connected to flip-flops. Therefore, there are two types of FFs in a typical circuit; FFs that hold the state of the circuit and FFs that contain the output values. When latching probabilities are computed for all FFs and POs, the circuit failure probability due to an MET at the outputs of  $G_l$  and  $G_m$  can be computed using Equation 13:

$$FP(G_l, G_m) = 1 - \prod_{i \in FFs \ and \ POs} (1 - LP_{FF_i, \ PO_i}(G_l, G_m))$$
 (13)

Finally, the overall failure probability of the circuit is the average of all computed Failure Probabilities (FP) for all pairs of fault sites. The overall SER of the circuit can be also computed by the product of the overall failure probability of the circuit and the raw error rate of the circuit. The raw error rates are technology dependent and can be extracted using either library characterization and circuit-level SPICE simulations or experimental study on library gates. The raw error rates are typically expressed in terms of Failure In Time (FIT). One FIT equals to one failure in a billion hours of circuit operation. It should be noted that in this paper we investigate SER analysis of combinational logic and compute the propagation probability of METs from logic gates to FFs. A subsequent step to obtain system-level SER is to investigate the propagation probability of the errors captured in FFs to observable system outputs, which is beyond the scope of this paper. The overall flow of the MEPP technique is shown in Algorithm 1.

Fig. 6 shows an example which illustrates how we use the MEPP technique to propagate two transient glitches caused by an MET event from two fault sites to the reachable *POs* and *FFs*. For the sake of simplicity, we have used a simple delay model in this example (gates delay: OR=5, AND=5, XOR=8 and NOT=3). Assume an energetic particle strikes the circuit and produces an MET event resulting in two transient glitches with the size of one unit of time at gates *B* and *D*. Initially, two events at times 0 and 1 are put at the outputs of gates *B* 



Fig. 6. A simple example of an MET event propagation using the proposed technique

```
1 Algorithm: SER estimation flow in Presence of METs
   dsanet: Design Netlist
   libcell: Technology Library Cells
    pce: Particle Charge Energy
    {\tt Read\_Technology\_Library\_Cells}(libcell);
    Characterize_Library_Cells(libcell);
    (w1, w2) = Calculated\_MET\_Pulse\_Widths(pce, libcell);
    gatelist = Extract_Netlist_Adjacancy_List(dsgnet);
    Compute_SP_MC_Simulation(gatelist);
10
   for each gate G_i in gatelist do
         \label{eq:fault_Site} \textbf{Fault\_Site}(G_i) \xleftarrow{-} \textbf{Extract possible multiple fault sites of } G_i
11
12 end
13 for each pair of (G_i, G_k) in Fault_Site(G_i) do
14
         List(G_j) \leftarrow Extract \& sort on-path gates reachable from G_j
15
         List(G_k) \leftarrow Extract \& sort on-path gates reachable from G_k
16
         List(G_{jk}) \leftarrow Merge \& sort List(G_j) and List(G_k)
         Event_List(G_j) \leftarrow Add_Event(P(1) = 0, P(1^e) = 1 - SP_{G_j},
          P(0^e) = SP_{G_i}, P(0) = 0, time=t);
         Event_List(G_j) \leftarrow Add_Event(P(1) = SP_{G_j}, P(1^e) = 0,
18
         P(0^e) = 0, P(0) = 1 - SP_{G_i}, \text{ time=}t + \dot{w}1);
         Event_List(G_k) \leftarrow Add_Event(P(1) = 0, P(1^e) = 1 - SP_{G_k},
19
          P(0^e) = SP_{G_k}, P(0) = 0, time = t);
         Event_List(G_k) \leftarrow Add_Event(P(1) = SP_{G_k}, P(1^e) = 0,
20
          P(0^e) = 0, P(0) = 1 - SP_{G_k}, \text{ time=}t + \tilde{w}2);
         for each gate G_l in List(G_{jk}) do
21
22
               Propagate_Events(G_l);
23
               Compute_MET_Electrical_Masking(G_l);
24
         end
25
         for each primary output PO_m in gatelist do
26
               Compute LP_{PO_m}(G_j, G_k); // using Equation 11
27
         end
28
         for each flip-flop FF_j in gatelist do
29
               Compute LP_{FF_i}(G_l, G_m); // using Equation 12
30
         Compute FP(G_l, G_m); // using Equation 13
31
32 end
```

**Algorithm 1**: MET SER Estimation

and D as the fault sites according to Table I. These two event pairs are then propagated through the forward cones of gates B and D, i.e., logic gates E, F, H, I, and J according to the propagation rules presented in Equation 1 through Equation 8. The error propagation probabilities for all logic gates in the forward cone of gates B and D have been shown in Fig. 6. Once all timed events are propagated to the inputs of FFs and POs, the expected width of the propagated waveforms at the input of all FFs and POs are computed according to Equation 9 and Equation 10 for the fault site pairs, i.e., gates B and D. As it can be seen, gate I is a convergent and gate J is a reconvergent

gate in this example.

## III. EXPERIMENTAL RESULTS

Here we evaluate the accuracy and the runtime of the proposed technique as compared with a reference model. The reference model is a *Statistical Fault Injection* (SFI) using Monte-Carlo simulations. In the SFI approach, transient glitches with the given widths are injected at the output of each pair of fault sites at a random time during the clock period. Using timing-accurate simulations, the propagation of glitches to *POs* or *FFs* were recorded. SFI terminates if the accuracy of the estimated SER falls within a pre-defined confidence interval (in our experiments, the variance was 2% and the confidence level was 99%). Note all three derating factors (logical, electrical, and timing derating) were incorporated in the SFI approach.

MEPP and SFI have been implemented and applied to ISCAS89 benchmark circuits. All simulations have been conducted on a cluster equipped with 32 Intel XEON© 5500 processors and 32GB main memory. In our experiments, we assumed that an MET event causes two different transient glitches at the output of two adjacent gates. Nevertheless, as mentioned in Sec. II-A, the proposed technique is capable of accurately handling any number of multiple transients.

Fig. 7 shows the accuracy of the proposed technique versus the SFI approach for double transient glitches with the duration of 80ps. Here, the *inaccuracy* is defined as the difference between the failure probability values obtained by MEPP and SFI. The difference between the failure probabilities obtained by MEPP and the SFI approach is, on average, less than 0.025. One source of inaccuracy is that the effect of signal cross-correlations is not considered in signal probabilities in presence of errors. However, the experimental results confirm that neglecting signal probability cross-correlations has a negligible impact on the overall accuracy of MEPP.

The run-times for SFI, MEPP, and SP computation are shown in Fig. 8. Note that the Y-axis is logarithmic. As shown in this figure, MEPP is about four orders of magnitude faster than SFI. SFI was completely intractable for larger circuits. For example, the SER estimation of the largest ISCAS'89 circuits (e.g. *s38417*) using SFI was not completed even in 30 days of



Fig. 7. Accuracy of MEPP vs. SFI for transient glitch widths of 80ps



Fig. 8. Runtimes of MEPP, SP computation, and SFI

runtime (not shown in Fig. 8) while the runtime of MEPP for the largest ISCAS'89 circuits is less than a few minutes.

Fig. 9 shows the contribution of the four categories of fault sites (discussed in Section II-B) in the overall SER of a circuit. As it can be seen, the cases in which a gate and its fan-outs (GFOs) and fan-ins (GFIs) are selected as the fault sites have the most contribution on the overall SER of the circuits (more than 92%). This is because, when a GFO or a GFI is selected as a fault site, the two produced glitches propagate through the same path with a delay difference equal to the gate delay. Intuitively, the two glitches based on GFI or GFO have less (cancellation) effect on each other than those generated by common fan-ins (CFIs) or common fan-outs (CFOs) of a gate. The conclusion from this analysis is that if only GFIs and GFOs are considered as possible multiple error sites for MET analysis, it results in almost 70% reduction in the number of candidates for multiple error sites (and the overall runtime) with minimal impact on accuracy.

# IV. CONCLUSIONS

In this paper, we proposed a very fast and accurate analytical technique for SER estimation of digital circuits in the presence of METs. The proposed technique has three main features: 1)



Fig. 9. The contribution of each fault site in the overall circuit SER

it does not require fault injections or logic simulations; 2) all three masking factors are considered; 3) the effects of multiple errors propagation in convergent paths as well as single error propagation through reconvergent paths are accurately modeled. The simulation results through comparison with statistical fault injections confirm the accuracy (2.5% difference) and speed-up (four orders of magnitude) of the proposed technique.

#### REFERENCES

- P. Reviriego, J. A. Maestro, and C. Cervantes, "Reliability analysis of memories suffering multiple bit upsets," *IEEE Transactions on Device* and Material Reliability, vol. 7, no. 4, December 2007.
- [2] D. Giot, P. Roche, G. Gasiot, and R. Harboe-Srensen, "Multiple-bit upset analysis in 90 nm srams: Heavy ions testing and 3d simulations," *IEEE Transactions on Nuclear Science*, vol. 54, no. 4, August 2007.
- [3] J. A. Maestro and P. Reviriego, "Study of the effects of mbus on the reliability of a 150 nm sram device," in *Proceedings of the International Design Automation Conference (DAC)*, June 2008, pp. 930–935.
- [4] D. Falgure and S. Petit, "A statistical method to extract mbu without scrambling information," *IEEE Transactions on Nuclear Science*, vol. 54, no. 4, August 2007.
- [5] R. Rao, K. Chopra, D. Blaauw, and D. Sylvester, "Computing the soft error rate of a combinational logic circuit using parameterized descriptors," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 26, no. 3, pp. 468–479, March 2007.
- [6] N. Seifert and N. Tam, "Timing vulnerability factors of sequentials," *IEEE Transactions on Device and Materials Reliability*, vol. 4, no. 3, pp. 516 522, September 2004.
- [7] Q. Zhou and K. Mohanram, "Cost-effective radiation hardening technique for combinational logic," in *Proceedings of the IEEE International Confer*ence on Computer Aided Design (ICCAD), November 2004, pp. 100–106.
- [8] N. Miskov-Zivanov and D. Marculescu, "Circuit reliability analysis using symbolic techniques," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 25, no. 12, pp. 2638–2649, December 2006.
- [9] B. Zhang, W. Wang, and M. Orshansky, "Faser: Fast analysis of soft error susceptibility for cell-based designs," in *Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED)*, March 2006.
- [10] C. Rusu, A. Bougerol, L. Anghel, C. Weulerse, N. Buard, S. Benhammadi, N. Renaud, G. Hubert, F. Wrobel, and R. Gaillard, "Multiple event transient induced by nuclear reactions in cmos logic cells," in *Proceedings* of *International On-Line Testing Symposium (IOLTS)*, July 2007, pp. 137– 145
- [11] R. C. Martin and N. M. Ghoniem, "The size effect of ion charge tracks on single-event multiple-bit upset," *IEEE Transactions on Nuclear Science*, vol. 34, no. 6, December 1987.
- [12] D. Rossi, M. Omana, F. Toma, and C. Metra, "Multiple transient faults in logic: An issue for next generation ics?" in *Proceedings of International* Symposium on Defect and Fault Tolerance in VLSI Systems (DFT), October 2005, pp. 352–360.
- [13] N. Miskov-Zivanov and D. Marculescu, "A systematic approach to modeling and analysis of transient faults in logic circuits," in *Proceedings of 10th Int'l Symposium on Quality Electronic Design (ISQED'09)*, February 2009, pp. 408–413.
- [14] R. Bryant, "Graph-based algorithms for boolean function manipulation," IEEE Transactions on Computers, vol. 35, pp. 677–691, 1986.
- [15] H. Asadi and M. B. Tahoori, "Soft error derating computation in sequential circuits," in *Proceedings of the IEEE International Conference on Computer Aided Design (ICCAD)*, November 2006.
- [16] F. Wrobel, J.-M. Palau, M.-C. Calvet, O. Bersillon, and H. Duarte, "Simulation of nucleon-induced nuclear reactions in a simplified sram structure: Scaling effects on seu and mbu cross sections," *IEEE Transactions on Nuclear Science*, vol. 48, no. 6, December 2001.
- [17] M. Fazeli, S. Miremadi, H. Asadi, and S. Ahmadian, "A fast and accurate multi-cycle soft error rate estimation approach to resilient embedded systems design," in *Proceedings of the International Conference on Dependable Systems and Networks (DSN)*, June-July 2010.
- [18] R. Rajaraman, J. Kim, N. Vijaykrishnan, Y. Xie, and M. Irwin, "Seat-la: A soft error analysis tool for combinational logic," in *Proceedings of the* 19th International Conference on VLSI Design, January 2006.