Fall01 coverstory automated method by KLA Corporation

Cover

Story

An Automated Method for Overlay Sample Plan Optimization Xuemei Chen, Moshe E. Preil, KLA-Tencor Corporation Mathilde Le Goff-Dussable, Mireille Maenhoudt, IMEC, Leuven, Belgium

In this paper, we present an automated method for selecting optimal overlay sampling plans based on a systematic evaluation of the spatial variation components of overlay errors, overlay prediction errors, sampling confidence, and yield loss due to inadequate sampling. Generalized nested ANOVA and clustering analysis are used to quantify the major components of overlay variations in terms of stepper-related systematic variances, systematic variances of residuals, and random variances at the wafer, field, and site levels. Analysis programs have been developed to automatically evaluate various sampling plans with different number of fields and layouts, and identify the optimum plan for effective excursion detection and stepper/scanner control. For each sample plan, the overlay prediction error relative to full wafer sample is calculated, and its sampling confidence is estimated using robust tests. The relative yield loss risk due to inadequate sampling is quantified, and compared with the cost of sampling in determining a cost-optimal sampling plan. The methodology is applied to overlay data of CMP processed wafers. The different spatial variation characteristics of oxide and metal CMP processes are compared and proper sampling strategies are recommended. The robustness of the recommended sample plans was validated over time. The sample plan optimization program successfully detected process change while maintaining accurate and robust stepper/scanner control. Introduction

Shrinking design rules and increasing process complexity have imposed tighter tolerance on overlay control. The number of transistors on a single wafer is increased by more than a factor of four due to increasing wafer size and shrinking feature sizes. In addition, the effects of process non-uniformity coming from deposition and polishing become a significant part in the total overlay budget. As a result, accurate characterization and effective reduction of the variation components of overlay errors, especially spatial variation across a wafer, becomes essential to achieving maximum net good dice per wafer1, and hence yield. Adequate and cost-effective spatial sampling is, therefore, required to detect process excursions and provide confident assessment of the systematic and random components of overlay errors for effective process control. With the increased data points of interest and process complexities, a systematic and automatic sampling optimization approach is necessary. In this paper, we describe an automated method for overlay spatial sampling plan optimization based on spatial variation analysis, overlay prediction error minimization, sample confidence tests, and yield modeling. The optimized sampling plan Fall 2001

Yield Management Solutions

achieves a balance between the following objectives in overlay control: • It selects fields that minimize overlay prediction errors while maintaining adequate sampling confidence for lot disposition. • It quantifies the major components of overlay variations in terms of variance components, stepper/scanner correction parameters, and spatial signatures of interfield residuals. • It quantifies the impact of sample plans on yield risk and cost reduction. • It is robust enough to detect process changes over time while maintaining accurate stepper/ scanner control. In the following sections, we present the strategies and analysis modules used to achieve the above goals, and validate the methodology with applications to overlay data of CMP processed wafers. Overlay field selection strategy

The diagram in Figure 1 summarizes the inputs, analysis modules (with sub-modules), and outputs of the automated sample plan optimization program. Overlay data is collected using a KLA-Tencor 5xxx overlay metrology tool for three to five lots, five to seven wafers per lot, with every field measured at a given layer on a specific product from a stable process flow. Spatial variation analysis is applied to the full-wafer data to provide a comprehensive characterization of the overlay variance components and process signatures. Such decomposition of overlay errors into sources of variances provides guidelines for selecting fields that reduce overlay prediction errors and are least affected by process induced nonlinear errors. The full-sample overlay measurements are then used as reference data 16

Fall 2001

for evaluations of overlay prediction errors and sampling confidence for each sub-sampling plan as specified in a text file. Finally the yield modeling module estimates the risk/cost impacts of sampling plans. The program iteratively applies these analysis modules to the sub-sample plans and identifies the optimal sample plan that achieves minimal overlay prediction errors, sufficient sampling confidence, and minimum yield loss. A summary chart is then generated, which indicates the key metrics used in the optimization of sampling plans of different number of fields and spatial layouts.

Spatial variation analysis of overlay errors As the major objectives of overlay sampling are excursion detection and variation reduction through proper stepper/scanner correction, a comprehensive understanding of the sources of variation in the baseline process is essential. Table 1 summarizes the typical sources of overlay variation from a physical point of view. As shown, overlay variation exhibits itself in several dimensions (systematic vs. random; spatial vs. temporal) at a number of different scales (lot-to-lot, wafer-to-wafer, field-to-field, and site-to-site). Proper decomposition of the measured variations into these meaningful components enables us to allocate the sampling and process

control efforts more appropriately. Specifically, systematic variations can be reduced or compensated by applying proper stepper/scanner matching and correction, and improving process uniformity; whereas random variations can be reduced by timely detection of excursions at the appropriate time-space scale, and reducing the sources of uncertainties accordingly. Using similar concepts as in this section, we have developed a “generalized nested ANOVA” model for overlay to effectively quantify the source components of overlay variation as tabulated in Table 1. Compared to conventional nested ANOVA, the generalized nested ANOVA method is effective in decomposing what might otherwise be taken as random noise with large variance into separate systematic and random contributions at specific scales. The spatial variation analysis module includes applying the generalized nested ANOVA to both raw overlay data and the residuals after stepper/scanner correction. First, the total systematic and random components in the raw overlay data are separated at the site-to-site, fieldto-field, and wafer-to-wafer levels. Then a spatial regression model (commonly known as the stepper correction model) is fitted to the raw data to remove the systematic variations due to stage and lens distortions in the exposure system. This results

F i g u re 1. In put/out put stru c t u re and analysis modules of the sample plan optimizati on pro g r a m .

Yield Management Solutions

Table 1. Decompos iti on of sources of over lay vari ations into t ime-s pace an d s ystematic-ra ndom component s a t diff e rent scales.

in residuals that contain systematic variations induced by process nonuniformity, other systematic variations not accounted for by the regression model, and random variations. The generalized nested ANOVA is then applied to the residuals to assess the remaining systematic field-to-field and site-to-site variations, the former being characteristic of the process signatures while the latter being indicative of the lens and reticle signatures. Combining results from the aforementioned two-step generalized ANOVA, a complete decomposition of spatial variations of overlay errors is obtained. An example is shown in Figure 2a. (As the data used in this example are from a single lot, no systematic wafer-to-wafer

variance can be calculated in this case.) As indicated by the figure, after removal of the systematic stepper errors, a large portion of the systematic field-to-field variance remains, reflecting the spatial characteristics of the process layer, as shown by the vector plot of interfield residuals in Figure 2b. The process signatures are useful not only for process diagnosis, but also for selecting sample field locations that are least biased by nonlinear process effects, hence reducing the overlay prediction errors, as calculated in the prediction error evaluation module. Figures 2b and 2c illustrate a clustering analysis of the interfield residuals. In Figure 2c, the cumulative probability curve of the interfield residuals is plotted.

F i g u re 2 a. Spatial variation decomposition of overlay

data.

F i g u re 2 b. Spatial signatures of interfield residuals.

Fall 2001

The curve has three distinctive slopes, which indicates multiple mode distribution of the interfield residuals. The transition points in the cumulative probability curve separate the fields into clusters, which form spatial zones in the vector plot, as indicated by the color codes in Figure 2b. Fields in zones 1 and 2 are less affected by nonlinear process effects, while fields in zone 3 are most affected by the process nonuniformity. Including fields from zone 3 would bias the estimates of stepper/scanner correctibles, and should be avoided in a sampling plan that aims to have minimum overlay prediction errors. Our analysis showed that in a stable

F i g u re 2 c. Clustering analysis of interfield

residuals.

Yield Management Solutions

process, field-to-field variation is the major variance component of overlay errors. As indicated in Figure 2a, it is significantly larger than wafer-towafer and site-to-site variation. This forms the basis for us to focus the overlay sampling optimization at the field-to-field level, i.e., determining the optimal number of fields and spatial layouts for overlay sampling. However, if the wafer-towafer or lot-to-lot variations are significantly larger in an unstable process, it will be necessary to understand the root cause and pattern of such variations, and focus the sampling efforts to the reduction of variations over time. Nevertheless, the spatial variation analysis method presented in this study can still be used in such situations to assess the variation components and the changes in the spatial signatures of the process, and would be a useful tool for process diagnostics.

Overlay prediction errors evaluation Ideally, the most accurate stepper correction can be obtained by sampling every field in the wafer. However, this is not realistic. In this study, we try to find the sub-sampling plans that best approximate the full wafer-based correction. First, the fullwafer overlay data is modeled to produce a reference estimate of the true stepper/scanner correctibles. Subsets of the data are extracted to represent various sampling plans according to a sample plan specification. Each sub-sample data is modeled to generate the sub-sample estimates of stepper/scanner correctibles (all of the modeling is done using standard overlay models contained in the KLA-Tencor overlay analysis software). These estimates are then used to predict overlay errors at every site on the wafer, and the difference in the predicted overlay errors based on full-wafer and sub-sample model estimates is referred to as the overlay

prediction error. The maximum overlay prediction error across a wafer is estimated by adding the mean prediction error and three sigma of the residuals based on each sample plan. In Figure 3, the effects of sampling plans on overlay prediction errors are shown in a summary chart. In this chart, the maximum overlay prediction errors relative to the full wafer sample are plotted for different sampling plans. The x-axis lists the different sampling plans evaluated, with the first point being the full wafer sample, which has a prediction error of zero. From left to right, the maximum prediction errors for sample plans with increasing numbers of fields are plotted. For plans with the same number of fields, different field locations are also evaluated, and the layouts that yield the best and worst prediction errors are highlighted with their field location maps and other decision metrics. As can be seen, as the number of fields increases, the overlay prediction errors converge to that of the full wafer sample. By increasing the number of fields from four to 12, the overlay prediction error can be reduced by more than half. In addition, there is a larger variation in the prediction errors with respect to field locations for

smaller number of fields than for larger number of fields. Even though it is possible to find a sample plan that gives small prediction errors with fewer fields, such a plan would be more susceptible to variations at the field locations used. As shown in later sections, such a plan may not meet the other criteria used in the sample plan optimization, and may have insufficient sampling confidence and robustness with respect to process change. Besides the maximum overlay prediction errors, the summary chart in Figure 3 also indicates the other metrics used in the sample plan evaluation: p-values of robust tests and estimated yield loss. As discussed later in the paper, for the example data, a sampling plan with eight or more fields, including fields from the edges and center would be recommended to achieve better than one percent relative yield loss at an overlay tolerance of 50 nm. The effectiveness of variance reduction based on each sample plan can be assessed by examining the residuals resulting from the stepper/scanner correction. In Figure 4, we plot the three-sigma values of residuals across wafers, for each sample plan. As shown, selecting fields that min-

F i g u re 3. Summar y chart of samp ling p lan opti miza tion.

Fall 2001

Yield Management Solutions

plish the objectives of exposure tool control and lot disposition.

Yield implications of sampling plans

F i g u re 4. Res idua ls result ing from sa mple plan o ptimiza tion.

imize the overlay prediction errors yields residual distributions comparable to full wafer fit. The overlay data used in this example exhibit higher variations in the Y direction than in the X direction. Optimal sampling plans should minimize the residual distributions in both directions. Minimizing the magnitudes of the overlay vectors can effectively achieve this requirement, as was done in this study. The total prediction errors of sampling plans can be attributed to errors in estimating individual stepper correction parameters, as shown in Figure 5.

Sampling Confidence Tests Lot disposition decisions are usually based on an evaluation of the sample overlay distributions. It is, therefore, important that the sample data be representative of the full wafer overlay. In other words, the probability distributions of the sample data and the full wafer data should not be significantly different at a desired confidence level. We use robust tests (also called non-parametric tests) to ensure that the optimal sample plan provides sufficient confidence for lot disposition. Median tests are used to compare the centers of the sample and full-wafer distributions; dispersion tests are used to compare the spreads of the two distributions.

Both tests donâ&#x20AC;&#x2122;t assume normal distributions for the data being compared, and hence are suitable for overlay data, which contain higher systematic variances. If the p-value of such test is less than 0.05, the null hypothesis that the two samples are the same can be rejected at the 95 percent confidence level. An optimal plan should satisfy both tests. Example results of dispersion tests applied to different sampling plans are shown in Figure 6. At 95 percent confidence level, any plan that falls below the horizontal line is unacceptable, meaning it has a significantly different probability distribution than the full wafer data set. The p-values of robust tests are also indicated in the summary chart shown in Figure 3, and are combined with the overlay prediction errors in the optimization program to select plans that accom-

The impact of sampling plans on yield is twofold: on one hand, cost-optimal sampling plans that effectively detect variance excursions can reduce the material at risk (yield loss) and unnecessary rework (opportunity cost). On the other hand, adequate spatial sampling provides accurate characterization of the systematic variation components; hence it improves the feedback control of the processes and enhances yield. As discussed before, the sample plan optimization program selects fields that minimize the overlay prediction errors relative to full wafer sampling. Overlay prediction errors due to inadequate sampling would result in inadequate stepper correction and thus higher overlay errors. The yield loss due to inadequate spatial sampling of overlay can be estimated as in Figure 7a. Here we define net yield loss due to overlay as the average percentage of sites cross a wafer that have overlay errors exceeding the design tolerance. Using a full-wafer overlay data set, we apply various sampling plans, and calculate the stepper correction parameters based on each sampled data set. The cumulative probability function of overlay errors cross wafers is then calculated after applying the stepper correction based on each sample plan. As shown in Figure 7a, for a given overlay tol-

F i g u re 5. Over lay prediction err ors attributed to err or s in estimating individ ual correction coeff icients based on ea ch samp le p lan.

Fall 2001

Yield Management Solutions

F i g u re 6: Disp ersion tests of sampling pl ans

erance, the stepper correction based on the full-wafer sample plan results in the lowest yield loss, whereas the overlay distribution without applying any stepper correction has the highest yield loss. Any sub-sampled plans (e.g. Sample Plan i) would result in a net yield loss between these two bounds. The difference between the yield loss of full-wafer sample plan and that of the sub-sample plan, denoted as relative yield loss in Figure 7a, is indicative of the yield loss due to inadequate sampling. Based on the above assumptions, we calculate the relative yield losses as a function of overlay sampling plans and tolerances. The relationships are shown in Figure 7b. As design rules

shrink, the yield loss due to inadequate sampling increases significantly. More sample fields are required to meet tighter overlay tolerance. In this example, for an RMS overlay tolerance of 85 nm or greater, all sample plans can achieve a relative yield loss of better than two percent. However, as the overlay tolerance shrinks, the difference in yield loss between fullwafer and sub-sample plans increases sharply. Only those sampling plans with 12 fields or more can achieve relative yield loss of less than two percent for tighter overlay tolerances. Fewer field plans, for example, four-field plans, result in insufficient stepper correction and the resulting overlay errors can only meet an overlay tolerance of greater than 70 nm.

F i g u re 7 a. Model of yield loss due to inadequate overlay sampling.

Fall 2001

Yield Management Solutions

With this yield model, we can also relate the estimated yield loss to overlay prediction errors, as shown in Figure 8. Yield loss increases exponentially as overlay prediction errors increase due to insufficient sampling. This implies that, for an overlay RMS tolerance of 50 nm, a sampling plan with a prediction error of no more than 10 nm is necessary to achieve a yield loss of less than one percent. This can only be achieved by sampling eight or more fields as the summary chart in Figure 3 suggests. In this way, the yield loss reduction achieved by better sampling is quantified, and a cost effective sampling plan can be identified by weighing the increased yield loss risk against the cost of in-creased sampling (using Figure 8 and 3). It is worth noting that a one-percent reduction in yield loss (material at risk) could result in significant financial returns. For example, if we assume that a fab has 7000 wafer starts per week, with a $5000 value for each wafer, and a 40 nm overlay budget, then a one-percent yield loss reduction has a revenue potential of $19 million per year. As design rules shrink, the yield benefits of effective sample planning can be much higher. On the other hand, the overlay metrology COO (cost of ownership)

F i g u re 7 b. Impacts of sampling plans increase as overlay tolerance shrinks.

This case study validated the benefits of the sample plan optimization approach in characterizing variation components and identifying sampling strategies based on spatial process characteristics.

Robustness of sampling plans

F i g u re 8. Estimat ed yi eld impact of overlay p rediction erro r.

is one of the lowest in a wafer fab. This is due to low capital costs, low operational expenses and high wafer throughput. At a “cost per wafer pass” of about $0.75, overlay is one of the lowest fab expense items. In addition, the “time to results” (measurement and analysis) to double the sampling size from 10 to 20 fields requires less than two additional minutes per wafer. Once a wafer is in an overlay metrology system, the fab should sample enough to make confident overlay control decisions, with minimal yield loss impact due to inadequate sampling. With the automated, systematic approach developed in this study, we can quantify the various decision variables, and optimize the sampling strategy to achieve tighter design rules with lower yield loss risk. Fab results

Optimizing overlay sample plans for CMP processed layers Using the analysis modules described above, we evaluated the sampling strategies for CMP processed wafers. Full-wafer overlay data was collected for the same product at several layers to compare the effects of oxide CMP and metal CMP. An ASML PAS5500/300 stepper was used for the experiment, and the ASM run model in the

KLA-Tencor KLASS 4 software was used to estimate the stepper correction models. Spatial variation analysis and overlay prediction error evaluation results are shown in Figure 9. As can be inferred, the major differences between oxide CMP and metal CMP include: • With metal CMP, the proportion of site-to-site variance relative to field-to-field variance is much higher than the oxide CMP layer, also the variances have more random components than systematic components (whereas other layers show more systematic variances) • Intrafield modeling errors for metal CMP are higher compared to oxide CMP, and are more sensitive to sample field locations • Interfield residuals of metal CMP are more symmetrically distributed across the wafer with radial variation, whereas oxide CMP exhibits localized pattern • With fewer-field sample plans, there is a larger variation in overlay prediction accuracy with respect to sampling plans for metal CMP than for oxide CMP processed wafers. Sampling plans with more fields are required for metal CMP processed wafers to achieve the same overlay prediction errors as oxide CMP. Fall 2001

As the sample plan optimization method developed in this study is based on full-wafer measurements at one point in time, it is important to evaluate the robustness of the spatial sampling plans over time, in terms of stepper control accuracy and process change detection. We measured split lots over time to assess (1) if the best sample plans with different number of fields identified initially can maintain small overlay prediction errors; (2) if the spatial variation analysis can properly detect any process change. In Figure 10, the standard deviations of the overlay prediction errors for split lots processed over a year are plotted for the best sample plans with various numbers of fields. As the result suggests, stepper corrections based on sample plans with fewer fields are more susceptible to being biased by process variations at the field locations sampled. The robustness of optimal sampling plans improves with increased number of fields, and a minimum number of fields need to be measured to assure the robustness of sample plans in the long term. Spatial variation analysis as shown in Figure 11 indicates that the method is effective in detecting and characterizing process changes. Two lots—lot A and B processed before and after a process improvement—are analyzed. The variance decomposition indicates significant reductions in the systematic variances of residuals and random variances, implying the effects of process improvement. The change in systematic variances attributed to stepper errors is relatively small, which indicates a stable stepper control during

Yield Management Solutions

F i g u re 9. Over lay samp le p lan optimi zation for oxi de and tun gsten CMP pr ocessed wafers.

the experiment period. The spatial variation decomposition method developed in this study properly separates the systematic and random contributions from stepper and other processes, and is a key building block for an effective sample plan strategy. Conclusion

Through quantitative analyses and modeling, we demonstrated that more effective sample planning is a necessity

for a fab to meet tighter design rules and achieve robust stepper control with reduced material at risk. We have developed an automatic and systematic approach to identify the optimal sample plan, with the proper number of fields and spatial layout, based on comprehensive components, overlay prediction errors, sampling confidence, and relative yield loss due to inadequate sampling. The methodology proved to be effective and robust

over time in detecting process change and maintaining accurate stepper control. References 1 . W.H. Arnol d and J. Gre en ei ch, “ Im p a c t of S te p p e r O v e r l a y o n A dva nc e d D e si gn Rul es ” , OCG M i c rolithography Seminar Pro c e e dings, pp. 87-105, 1993. 2 . R. Elliott, R. K. Nurani, D. Gudmundsson, M. Preil, R. Nasongkhla, and J.G. Shanthikumar, “Critical Dimension Sample Planning for sub-0.25 micron Processes”, in the proceedings of Advanced Semiconductor Manufacturing Conference and Workshop, p.139-142, September 1999. A version of this article was originally presented at SPIE Conference, February 25 - March 2, 2001, Santa Clara, California, USA. as Chen, X., Preil, M., Le Goff-Dussable, M., Maenhoudt, M., “An Automated Method for Overlay Sample Plan Optimization Based on Spatial Variation Modeling,” Metrology, Inspection, and Process Control for Microlithography XV, SPIE 4344-31, 2001.

F i g u re 10. Robust ness of overlay pre d i c t i o n

F i g u re 11. Spa tial variat ion analyses of l ots

e rrors of optimal sa mpling p lan s.

in dic ating process impro v e m e n t .

Fall 2001

Yield Management Solutions

Getting better yields from reticles doesn’t have to be a puzzle.

For more about how

When a leading foundry needed to increase yields from their low k1 reticles,

TeraStar helped

they turned to TeraStar. That’s because TeraStar delivers the highest sensitivity

a major fab shorten

available in a reticle inspection tool. And by eliminating false and nuisance

its time to yield, please visit

defects, it gives the freedom to thoroughly inspect reticles – regardless of design

www.kla-tencor.com/tera.

complexity. As a result, in a 6-month period, engineers were able to move from zero yield on one of every four devices manufactured to finding every critical reticle defect. And bring their 0.13µm ramp yield issue under control faster and more efficiently than they ever thought possible. To see what you’ve been missing, please visit www.kla-tencor.com/tera, or call 1-800-450-5308. ©2001 KLA-Tencor Corporation

Accelerating Yield