# Clock-Gating-Aware Low Launch WSA Test Pattern Generation for At-Speed Testing

Yi-Tsung Lin<sup>1</sup>, Jiun-Lang Huang<sup>1</sup>, and Xiaoqing Wen<sup>2</sup>

<sup>1</sup>Graduate Institute of Electronics Engineering
National Taiwan University, Taipei 106, Taiwan

<sup>2</sup>Department of Computer Science and Electronics
Kyushu Institute of Technology, Iizuka 820-8502, Japan
email: jlhuang@ntu.edu.tw

Abstract—Capture power management has become a necessity to avoid at-speed testing yield loss, especially for modern complex and low power designs. This paper proposes a test pattern generation methodology that utilizes the available clock-gating mechanism, a popular low power design technique, to reduce the capture cycle weighted switching activity (WSA) for at-speed testing. Compared to previous techniques that consider clock-gating, a very significant test power reduction is achieved without severe test pattern inflation.

Keywords-test pattern generation, clock-gating, test power reduction, at-speed testing

#### I. INTRODUCTION

It is known that the signal switching activity during manufacturing testing can be more than twice that during the functional mode [8]. The excessive switching activity may damage the circuit under test (CUT) or cause the CUT to malfunction during test application, which leads to test-incurred yield loss.

This work concerns the negative impact of excessive switching activity on the capture cycles during at-speed testing. Atspeed testing in general utilizes the two-pattern approach—the first pattern sets the circuit state and the second pattern activates the desired transition at the fault site. The fault is detected if the transition fails to propagate to the target flip-flop(s) within the functional clock period. Figure 1 depicts the timing diagram of the launch-on-capture (LOC) at-speed scan testing scheme. The rising edges of the two capture cycles,  $C_1$  and  $C_2$ , correspond to the functional clock cycle, called the launch cycle hereafter. If the transition launched at  $C_1$  does not propagate to the target flip-flop(s) before  $C_2$ , the chip under test is classified as faulty.

# A. Yield Loss due to Excessive Launch-Cycle IR-Drop

The LOC scheme suffers yield loss caused by the power supply noise in the launch cycle. If the power network synthesis process fails to consider the excessive switching activity during test application, the power network IR-drop during the launch cycle may be so large that the resulting extra delay [15] causes a good CUT to malfunction and fail the test. [7], [5] reported that, in a 130 nm ASIC design running at 150 MHz clock frequency, some circuits pass the transition fault test



Fig. 1. The launch-on-capture (LOC) at-speed testing scheme.

only if the supply voltage is above 1.55 V; otherwise, they fail

Assessing the possible yield loss associated with a test pattern due to excessive switching activity is non-trivial, if possible at all. First, the whole chip IR-drop profile with respect to a test pattern depends on the spatial and temporal distribution of the switching activity as well as the power grid structure [13], [14]. Second, even if the spatial and temporal IR-drop profile is available, deriving the resulting path delay is still difficult.

As a tradeoff between computation efficiency and estimation accuracy, this work utilizes the launch cycle weighted switching activity (WSA), called *launch WSA* hereafter for convenience, associated with a test pattern to assess its potential of causing yield loss. WSA in a clock cycle is defined as follows.

$$WSA = \sum_{i} s_i \times w_i \tag{1}$$

where

$$s_i = \begin{cases} 1 & \text{if signal } i \text{ switches} \\ 0 & \text{otherwise} \end{cases}$$
 (2)

and  $w_i$ , the weight of signal i, equals i's fanout size plus one.

#### B. Related Works

Many launch WSA reduction techniques have been proposed. They can be roughly categorized into three classes: X-filling, power-aware ATPG, and partial capture.

- 1) X-Filling: Given a set of partially specified test patterns, X-filling techniques fill the X bits to minimize the difference between the two at-speed patterns; this reduces the launch WSA [6], [12], [11]. If the given test is fully specified, test relaxation techniques [2], [4] can be applied to uncover the X bits in the test vectors without degrading fault coverage. X-filling techniques incur no test inflation and circuit modification; however, the achievable test power reduction depends on the original test set.
- 2) Power-Aware ATPG: A power-aware ATPG integrates the launch WSA constrain into its decision making mechanism [10]. The advantage, compared to X-filling, is that they explore a larger search space and has the potential of finding the optimal solution. In general, they are effective in lowering launch WSA but often causes high test inflation. Note that, in a power-aware ATPG, the random-fill stage prior to fault dropping is often replaced with low launch WSA X-filling.
- 3) Partial Capture: Partial capture techniques reduce launch WSA by capturing only a fraction of the test response at a time [9]. These approaches often require circuit modification to enhance fault coverage and lower test inflation.

Recently, techniques that utilize gated-clock to facilitate launch WSA reduction have been proposed [3], [1]. [3] presented the two-stage CTX scheme, which belongs to the X-filling category. In stage one, CTX aims to deactive as many clock control signals as possible; this reduces the number of flip-flops that capture the test response. In stage two, CTX reduces the number of flip-flops that have signal transitions during the launch cycle. CTX incurs no test set inflation becuase it only modifies the given fully specified test patterns. Without fault coverage loss, CTX achieves around 30% test power reduction. The drawback is the long fault simulation time needed to retain fault coverage. Furthermore, the solution space is limited by the given test set.

The technique in [1] is a power-aware ATPG approach. It associates with each clock control signal a default pattern, which is a cube that contains the care bits required to deactive the clock control signal. During test generation, ATPG merges the default pattern with the test pattern whenever possible. Comments.

#### C. The Proposed Low Launch WSA TPG Methodology

This paper presents a test pattern generation (TPG) methodology that utilizes the clock-gating mechanism to reduce launch WSA. To improve launch WSA reduction without incurring too much test inflation, the proposed technique introduces two test generation stages.

- 1) Cated-Clock-Intact TPG: In this TPG stage, faults are detected without activating or deactivating any clock control signals. The idea is to detect as many faults as possible before using the clock-gating mechanism to reduce launch WSA, which tends to cause test inflation.
- 2) FF-Activation Reluctant TPG: This stage prefers faults whose fault effects can be captured in already activated flip-flops. During test generation, clock controls are enabled only

when this is necessary to detect the target fault. The goal is to detect faults without unnecessarily activating new flip-flops.

The proposed technique then utilizes X-filling techniques to (1) deactivate as many clock controls as possible, and (2) reduce the number of flip-flops transitions during the launch cycle.

## D. Contributions

The main contributions of this work is as follows.

- It introduces the gated-clock-intact TPG stage. By leaving all clock controls unspecified, this stage lowers test inflation while helping retain the launch WSA reduction quality.
- It proposes to use the FF-activation reluctant TPG strategy to detect faults with as few flip-flops activated as possible. This significantly improves the launch WSA reduction.

Experimenal results on larger ITC'99 and IWLS'05 benchmark circuits show that the proposed technique outperforms previous techniques in launch WSA reduction without incurring severe test inflation.

## E. Paper Organization

The paper organization is as follows. Section II gives the necessary background of this work. Section III and Section IV describe the basic and enhanced flows of the proposed methodology, respectively, and present the experimental results. Finally, Section V concludes this work.

## II. PRELIMINARIES

## A. Clock-Gating

define terms: active FF: a FF whose clock is enabled. deactive FF: a FF whose clock is disabled. clock control, group, clock control enable/disable.

# B. Low Capture Power X-Filling

#### C. TPG Model in the Presence of Gated Clock

#### III. BASIC LOW LAUNCH WSA TPG METHODOLOGY

To better understand the proposed TPG methodology, this section describes a simplified version, called the "basic flow."



Fig. 2. The basic flow.

The basic flow consists of the proposed "FF-activation reluctant TPG" stage and the following X-filling stages.

The basic flow is designed to utilize the clock-gating mechanism to reduce the launch WSA without paying too much attention to the test inflation issue. As the experimental results show, it achieves significant launch WSA reduction but suffers test inflation.

# A. Overview of the Basic Flow

Figure 2 illustrates the basic flow. It starts with the "FF-Activation Reluctant TPG (FAR-TPG)" stage which intends to detect as many faults as possible while at the same time activating as few flip-flops as possible. Note that FAR-TPG limits activation but not deactivation of flip-flops because the latter helps reduce launch WSA.

The remaining unspecified clock controls are processed in the "FF-Deactivation" and "FF-Silencing" stages. The idea is similar to CTX [3]—the former justifies as many clock controls to zero as possible; the latter minimizes flip-flop transitions.

In each iteration, the basic flow generates a fully specified test pattern. It then performs fault dropping to detect more faults.

# B. FF-Activation Reluctant TPG

This is the main test generation procedure of the basic flow; it aims to detect as many faults as possible without enabling too many flip-flops.

Details of the FF-activation reluctant TPG (FAR-TPG) is shown in Figure 3. First, a primary fault is selected and then targeted by a regular TPG; this may activate or deactivate some clock control signals. If the percentage of activated flip-flops has reached a preset threshold, k%, the flow exits FAR-TPG. Choice of k?

To limit the number of activated flip-flops, FAR-TPG selects the secondary faults from the fanin cones of currently activated flip-flops and PO's because detecting other faults is unlikely without activating more flip-flops. Secondary faults are targeted by the "FF-activationless TPG." The FF-activationless TPG tries to propagate the fault effect(s) to activated flip-flops



Fig. 3. The basic flow details.

without activating any more clock control. If FF-activationless TPG succeeds, the (inner) loop is repeated; otherwise, this fault is targeted by the regular TPG, which will activate some clock control signal(s).

Note that FAR-TPG sets no constraint on flip-flop deactivation.

# C. FF-Deactivation

This stage (see Figure 3 for details) deactivates as many flip-flops as possible to boost launch WSA reduction.

In each iteration, the largest unspecified group G is identified. Then, its clock control signal  $EN\_G$  is jutisfied to zero. This process continues until there is no more untried unspecified group.

# D. FF-Silencing

This step applies JP-fill [11] to the partially specified test cube to reduce the number of transition flip-flops during the launch cycle.

# E. Basic Flow Experimental Results

The benchmark circuits include the bigger ones from ITC'99 and IWLS'05. Commercial tools are utilized to synthesize clock gating circuitry. There is a fine/coarse grain option for gated clock insertion. The former is much faster in terms of synthesis time.

1) Benchmark Circuit Statistics: Table I lists the benchmark circuits. The ".fine" and ".coarse" extensions denote the fine and coarse grain options, respectively. For the ITC'99 benchmark circuits, the fine and coarse grain options return

TABLE I
BENCHMARK CIRCUIT STATISTICS

|                | # gate           | # FF   | FF Gated-<br>FF % |      | avg grp<br>size |  |
|----------------|------------------|--------|-------------------|------|-----------------|--|
| b15.fine       | 21,434           | 415    | 100               | 31   | 13.38           |  |
| b17.fine       | 22,908           | 415    | 100               | 31   | 13.38           |  |
| b21.fine       | 15,362 214 99.06 |        | 99.06             | 8    | 26.50           |  |
| netcard        | 280,323          | 11,873 | -                 | -    | -               |  |
| netcard.coarse | 282,461          | 12,224 | 20.23             | 256  | 9.66            |  |
| netcard.fine   | 281,983          | 12,040 | 95.00             | 1024 | 11.17           |  |
| leon3mp        | 202,985          | 15,073 | -                 | -    | -               |  |
| leon3mp.coarse | 203,744          | 15,435 | 33.50             | 512  | 10.10           |  |
| leon3mp.fine   | 203,431          | 15,649 | 97.48             | 1024 | 14.90           |  |

the same result. For the IWLS'05 benchmark circuits, both the fine grain and coarse options are applied.

Columns 2 and 3 list the numbers of gates and flip-flops, respectively. Clock-gating synthesis increases both the gate and flip-flop counts. Column 4 is the percentage of flip-flops controlled by gated clocks. With the fine grain option, this percentage exceeds 95%; with the coarse grain option, it ranges from 20 to 35%. Column 5 lists the number of clock-gating groups. The average number of flip-flops per group is listed in column 6. Not shown in the table, the maximum and minimum group sizes are 32 and 4, respectively, for all but the original circuits.

- 2) Test Generation Results: Table II shows the test generation results. Four test generation methodologies are compared.
  - FAN-ATPG: This is the baseline ATPG without any launch WSA consideration.
  - CTX\*: This is implemented according to [3] for comparison; it uses the FAN-ATPG as the underlying test generation engine.
  - default pattern\*: This is implemented according to [1] for comparison; it uses the FAN-ATPG as the underlying test generation engine.
  - basic flow: This is the proposed basic flow.

For the baseline FAN-ATPG, the fault coverage, the pattern count, and the peak WSA are listed. For the other three methodologies, fault coverage, test inflation percentage, and peak WSA percentage compared to the baseline are shown. (Test inflation for CTX\* is always 0 and not shown.) As the table shows, CTX\* reduces peak WSA by 20 to 30 % without incurring any test inflation. "default pattern\*" improves the peak WSA reduction to be from 38 to 60%; however, it also incurs significant test inflation (24 to 68%).

The basic flow further improves the peak WSA reduction to more than 70%. In terms of test inflation, the result is unacceptable for the ITC'99 circuits—from 81 to 100%. Test inflation looks more reasonable (around 28%) for the IWLS'05 circuits, which is close to the default pattern\* results.



Fig. 4. The proposed flow.

TABLE III
EXPERIMENTAL RESULTS OF THE ENHANCED FLOW

|                | basic            | flow        | enhanced flow |                  |             |  |  |
|----------------|------------------|-------------|---------------|------------------|-------------|--|--|
|                | inflation<br>(%) | pWSA<br>(%) | F.C.<br>(%)   | inflation<br>(%) | pWSA<br>(%) |  |  |
| b15.fine       | 81.9             | 29.9        | 85.97         | 35.9             | 29.4        |  |  |
| b17.fine       | 100.1            | 24.2        | 85.30         | 36.1             | 25.1        |  |  |
| b21.fine       | 94.4             | 27.7        | 81.12         | 12.2             | 27.5        |  |  |
| netcard.coarse | 28.5             | 56.6        | 99.99         | 28.5             | 56.6        |  |  |
| netcard.fine   | 27.7             | 28.2        | 99.99         | 8.6              | 28.4        |  |  |
| leon3mp.coarse | 29.0             | 57.6        | 99.99         | 29.0             | 57.6        |  |  |
| leon3mp.fine   | 28.5             | 31.9        | 99.99         | 14.5             | 31.8        |  |  |

#### IV. PROPOSED LOW LAUNCH WSA TPG METHODOLOGY

While the basic flow achieves very high peak launch WSA reduction, it sometimes incurs unacceptable test inflation. The reason is that the basic flow in 2 pays little attendion to test inflation management.

To alleviate the test inflation problem, the proposed low launch WSA TPG methodlogy (called the "enhanced flow" hereafter) introduces a new gated-clock-intact (GC-intact) TPG stage to the basic flow.

The GC-Intact TPG is depicted in Figure 4; it is also based on the dynamic compaction flow. In each iteration, it tries to detect as many faults as possible without activating or deactivating any clock control signal.

#### A. Experimental Results

The experimental results of the enhanced flow are shown in Table III. Test inflation and peak WSA results for the basic flow are also listed for ease of comparison.

Compared to the basic flow, the enhanced flow significantly reduces the test inflation to be between 8.6 and 36.1%; at the same time, the peak launch WSA performance remains almost the same. This validates the effectiveness of the CG-Intact TPG flow.

Results with respect to different k.

# TABLE II BASIC FLOW EXPERIMENTAL RESULTS

|              | FAN         |                  | CTX*    |             | default pattern* |             |                  | basic flow  |             |                  |             |
|--------------|-------------|------------------|---------|-------------|------------------|-------------|------------------|-------------|-------------|------------------|-------------|
|              | F.C.<br>(%) | pattern<br>count | pWSA    | F.C.<br>(%) | pWSA<br>(%)      | F.C.<br>(%) | inflation<br>(%) | pWSA<br>(%) | F.C.<br>(%) | inflation<br>(%) | pWSA<br>(%) |
| b15.fine     | 85.39       | 576              | 20,543  | 84.76       | 74.8             | 85.63       | 68.1             | 50.8        | 85.97       | 81.9             | 29.9        |
| b17.fine     | 85.78       | 684              | 24,138  | 84.27       | 68.7             | 85.19       | 65.9             | 54.8        | 85.30       | 100.1            | 24.2        |
| b21.fine     | 81.94       | 484              | 19,264  | 81.63       | 72.3             | 81.83       | 54.3             | 62.5        | 81.12       | 94.4             | 27.7        |
| netcard.fine | 99.99       | 50,986           | 203,465 | -           | -                | 99.99       | 26.5             | 40.5        | 99.99       | 27.7             | 28.2        |
| leon3mp.fine | 99.99       | 31,497           | 237,461 | -           | -                | 99.99       | 24.0             | 50.9        | 99.99       | 28.5             | 31.9        |

#### V. CONCLUSION

This paper presented a low launch WSA test pattern generation methodology for at-speed testing. By introducing the "CG-intact" and "FF-activation reluctant" test pattern generation stages, the proposed methodology achieves very high launch WSA reduction with acceptable test inflation. The future work includes (1) CPU time improvement, and (2) extension to handle circuits with multiple clock domains.

#### REFERENCES

- K. Chakravadhanula, V. Chickermane, B. Keller, P. Gallagher, and P. Narang. Capture power reduction using clock gating aware test generation. In *International Test Conference*, paper 4.3, 2009.
- [2] A. El-Maleh and K. Al-Utaibi. An efficient test relaxation technique for synchronous sequential circuits. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 23(6):933–940, June 2004.
- [3] H. Furukawa, X. Wen, K. Miyase, Y. Yamato, S. Kajihara, P. Girard, L.-T. Wang, and M. Tehranipoor. CTX: A clock-gating-based test relaxation and X-filling scheme for reducing yield loss risk in at-speed scan testing. In Asian Test Symposium, pages 397–402, 2008.
- [4] K. Miyase and S. Kajihara. XID: Don't care identification of test patterns for combinational circuits. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 23(2):321–326, February 2004.
- [5] S. Ravi. Power-aware test: Challenges and solutions. In *International Test Conference*, pages 1–10, 2007.
- [6] S. Remersaro, X. Lin, Z. Zhang, S. M. Reddy, I. Pomeranz, and J. Rajski. Preferred Fill: A scalable method to reduce capture power for scan based designs. In *International Test Conference*, paper 32.2, 2006.
- [7] J. Saxena, K. M. Butler, V. B. Jayaram, S. Kundu, N. V. Arvind, P. Sreeprakash, and M. Hachinger. A case study of IR-drop in structured at-speed testing. In *International Test Conference*, pages 1098–1104, 2003
- [8] L.-T. Wang, C. E. Stroud, and N. A. Touba, editors. System on Chip Test Architectures. Morgan Kaufmann Publishers, 2008.
- [9] S. Wang and W. Wei. A technique to reduce peak current and average power dissipation in scan designs by limited capture. In Asian and South Pacific Design Automation Conference, pages 810–816, 2007.
- [10] X. Wen, S. Kajihara, K. Miyase, T. Suzuki, K. K. S. L.-T. Wang, K. S. Abdel-Hafez, and K. Kinoshita. A new ATPG method for efficient capture power reduction during scan testing. In VLSI Test Symposium, pages 58–63, 2006.
- [11] X. Wen, K. Miyase, S. Kajihara, T. Suzuki, Y. Yamato, P. Girard, Y. Ohsumi, and L.-T. Wang. A novel scheme to reduce power supply noise for high-quality at-speed scan testing. In *International Test Conference*, paper 25.1, 2007.
- [12] X. Wen, K. Miyase, T. Suzuki, S. Kajihara, Y. Ohsumi, and K. K. Saluja. Critial-path-aware X-filling for effective IR-drop reduction in at-speed scan testing. In *Design Automation Conference*, pages 527–532, 2007.
- [13] M.-F. Wu, H.-C. Pan, J.-L. H. K.-H. T. T.-H. Wang, and W.-T. Cheng. Improved weight assignment for logic switching activity during at-speed test pattern generation. In Asian and South Pacific Design Automation Conference, pages 493–498, 2010.

- [14] M.-F. Wu, K.-H. Tsai, W.-T. Cheng, H.-C. Pan, J.-L. Huang, and A. Kifli. A scalable quantitative measure of IR-drop effects for scan pattern generation. In *International Conference on Computer-Aided Design*, pages 162–167, 2010.
- [15] T. Yoshida and M. Watari. MD-Scan method for low power scan testing. In Asian Test Symposium, pages 80–85, 2002.