Application of Maximum Entropy Method to Semiconductor Engineering

The maximum entropy method (MEM) is widely used in research fields such as linguistics, meteorology, physics, and chemistry. Recently, MEM application has become a subject of interest in the semiconductor engineering field, in which devices utilize very thin films composed of many materials. For thin film fabrication, it is essential to thoroughly understand atomic-scale structures, internal fixed charges, and bulk/interface traps, and many experimental techniques have been developed for evaluating these. However, the difficulty in interpreting the data they provide prevents the improvement of device fabrication processes. As a candidate for a very practical data analyzing technique, MEM is a promising approach to solve this problem. In this paper, we review the application of MEM to thin films used in semiconductor engineering. The method provides interesting and important information that cannot be obtained with conventional methods. This paper explains its theoretical background, important points for practical use, and application results.


Introduction
The maximum entropy method (MEM), which initially attracted attention from its use in astronomical image restoration, is now used in various research fields such as linguistics, meteorology, and data processing.It has also begun to be applied to spectroscopic data analysis in physics and chemistry.Its use in interpreting X-ray diffraction data has had a particularly significant impact.In this application area, Collins first used MEM to visualize electron density images [1].Later, the method was widely used to extract very detailed structural information on many compounds, e.g., metallofullerene compounds [2][3][4].It was also used to reveal hydrogen and oxygen positions [5,6] and even to determine very large protein structures [7].It should be noted that such structural information cannot be obtained by using the conventional Fourier transform method.This is because this method needs infinite data length, while actual data is finite and furthermore contains noise.This results in ghost peaks and anomalous negative electron density.On the other hand, MEM is tolerant to noise and provides only positive electron density, which enables us to determine the precise position of light elements such as hydrogen.In other MEM application examples, it has been used to extract the decay constant distribution from fluorimetry [8] and to analyze X-ray scattering [9,10], neutron diffraction [11,12] and scattering [13,14].In such applications it provides very detailed structural information.Recently, it was used to analyze X-ray reflectivity spectra [15].Ueda et al. showed that MEM can determine film thicknesses more precisely than the conventional Fourier transform method.These studies prove the validity of MEM as a tool to solve the inverse problems that often prevent precise data interpretation.
The application of MEM to semiconductor engineering has also been a subject of great interest.Although not many reports have been issued in this regard, MEM is gradually becoming popular as a powerful analyzing tool.The most popular application in this field may be the extraction of the depth distribution of atoms inside ultrathin films [16][17][18][19][20].In using MEM to analyze angle resolved X-ray photoelectron spectroscopy (ARXPS) data, Chang et al. reconstructed nitrogen depth distribution in nitrided silicon oxide films (SiON) and concluded that MEM enables nondestructive determination of depth distribution [19].Moreover, MEM made it possible to determine even the chemical-state-resolved depth distribution [19,20].This information cannot be obtained using other conventional methods such as medium energy ion scattering or secondary ion mass spectroscopy, which evidences the advantages of MEM.Since the equivalent oxide thickness for semiconductor devices is less than 1 nm, even subtle variance in chemical composition is important for understanding electrical characteristics.In addition, for implementing the new materials called high-k dielectrics, e.g., hafnium oxide, the depth distribution of atoms in the oxide becomes more crucial because a high-k oxide is likely to react with the underlying Si substrate, resulting in silicate layers being formed between the high-k oxide and Si [21].Although the silicate layers are very thin (several nm at most), they significantly affect properties such as leakage current, traps, interface state, and fixed charges.Therefore, nondestructive and highly sensitive structural determination is becoming increasingly important.For this reason, MEM as an ARXPS data analyzing tool is certain to become more popular and important in the future.
On the other hand, there have been few reports on MEM usage other than for ARXPS data analysis [22][23][24][25].Semiconductor device characteristics are affected not only by specific atom or chemical species distributions, but also by traps, interface states, and fixed charges present inside films and/or at interfaces between the gate insulator and substrate.For example, they can induce threshold voltage shifts, e.g., negative bias temperature instability and hot carrier injection, or leakage current, e.g., stress-induced leakage current and trap assisted tunneling, through carrier capture/emission events.These events prevent normal operation of devices and it is crucial to take them into account in device development.On the other hand, traps are utilized to accumulate carriers as memory in charge trap flash memory devices.Thus, they are linked more directly to device characteristics than depth distribution and detailed information about them is strongly required.
Several techniques are generally used to measure electrical characteristics since the characteristics are too subtle to observe with usual spectroscopic detection techniques such as XPS.These techniques include thermally-stimulated-current (TSC), isochronal annealing (IA), and thicknessdependent CV (TDCV).However, their data is difficult to interpret because their analyses are attributed to inverse problems that cannot be solved analytically.
Very recently, MEM was applied to TSC data to determine the density (N t ) and energy level (E t ) of traps in SiON thin films [22,23].The results evidenced that MEM provides detailed N t (E t ) distribution and even its compositional dependence.It was also used to determine the distribution in the gate insulator in metal-oxide-semiconductor field effect transistors (MOSFETs) [24].The obtained results indicate that more traps are present in smaller gate size MOSFETs, which implies that the gate edges are likely to be damaged during fabrication processes.These results are very informative for improving device fabrication processes and understanding trap physics.A scheme has also been developed for applying MEM to IA and TDCV data analysis to acquire information about fixed charges and interface states as well as traps [25].
This paper describes MEM application to semiconductor engineering techniques, e.g., ARXPS, TSC, IA, and TDCV.The method's theory, validity, and important points for practical use are explained.Application results obtained for simple metal-oxide-semiconductor (MOS) systems are also presented.

General Description of MEM
MEM is a method to reconstruct original data from discrete and finite experimental data sets with noise.For most engineers, its practical equations and important points for using them may be more desirable than its detailed theoretical background (Bayesian theory).Therefore, I explain them through brief basic theoretical examinations.The spectroscopic data of measurement techniques used in the semiconductor engineering field, such as ARXPS, TSC, IA, and TDCV, is expressed by the equation where I exp.n , M n,i , and y i are measured data, a known function including experimental conditions, and the original data one wishes to obtain.Generally, y i is discrete and finite, and includes noise.For an ARXPS example, I exp.n , M n,i , and y i correspond to the measured angle dependent signal intensity, a function including the X-ray incident angle and experimental conditions such as the sensitivity factor, and the depth distribution of atoms.Here, n and i are the data point and the distance from the surface of a sample.Since Equation (1) generally cannot be solved, an assumption is required to extract y i , e.g., y i can be approximated by a linear function y i = ax i + b, where a and b are constants and x i is the depth from the surface.In this case, since M n,i is a known function, a and b can be determined from the curve fitting procedure, which provides the depth distribution y i .This fitting method is widely used since it is a very easy and time-saving procedure; however, it requires prior knowledge about the depth distribution function, which is a great disadvantage since it is difficult to determine it in advance.Similarly, when y i is approximated by Gaussian functions, one must determine how many peaks are involved in advance.Thus, the conventional curve fitting method requires several assumptions, which means that an accurate y i cannot be obtained without the corresponding prior knowledge and that the obtained results are significantly dominated by the assumptions.Of course, since no prior knowledge regarding y i is usually given, it is also difficult to interpret the obtained results in many cases.Besides, the noise in measured data also prevents precise analysis because there are numerous model structures reproducing the experimental measured data containing noise.In principle, the number of such structures is infinite and users have to rely upon their own experiences or intuition to judge whether the reconstructed structures are precise even when satisfactorily fitting results are obtained.
As a means to solve this problem, MEM has stimulated interest since it is a method to estimate the original data element y i from the discrete and finite data set I exp.n including noise.Note that MEM estimates rather than solves.A great advantage of MEM is that one requires no prior knowledge about y i , i.e., it is model-free.In addition, MEM is tolerant to noise.I will here briefly describe the principle and the least equations for MEM.First, y i must satisfy the constraints [26] i and Furthermore, the functions and are defined, where σ 2 n and I MEM n are the error of the measure of I exp.n and the calculated result using Equation (1) and y i , and m i is a default function that satisfies the constraints and In addition, the free energy F is given by where α is the Lagrange parameter determined by the classical MEM condition.In a practical calculation procedure, (1) α is given.
(3) Calculate α using the equation where λ i are the eigenvalues of the matrix, (4) If α is equal to α , the calculated y i is the most probable y i .If not, the calculations are repeated with other α.
For determining α, there are also historical condition [26] and optimization methods [20], although the former significantly overestimates the experimental data and the latter needs long computing time.Although the validity of the classical MEM condition is still under debate, I have used it in this paper.When α is large, the entropy term S is overestimated, which results in oversmoothed y i .Conversely, S is underestimated for smaller α.In particular, the MEM calculation with α = 0 corresponds to the least square fitting method.
In principle, the default parameter m i in Equation ( 5) is arbitrary as long as it satisfies Equations ( 6) and ( 7), but the obtained results might somewhat depend on m i in some cases.Therefore, when MEM is used, m i has to be examined for each experiment.This point is very problematic for practical use and will be explained later.
One more problem is Equation (2).In many cases, y i is not a probability density function but a physical quantity, which means that Equation (2) is not satisfied.This suggests that MEM application requires a different approach depending on experiments.I will next describe the approaches for ARXPS, TSC, IA, and TDCV cases from this viewpoint.

Approach for ARXPS
The depth distribution (atomic concentration) is included in ARXPS data.When one considers the structure in Figure 1, ARXPS signal intensity I A (θ) for atom A is expressed as [18] where K, S A , L A (α), and λ A are the constant, the relative sensitivity factor for the atom A, the asymmetry term depending on the angle α between X-ray and photoelectron emission direction, and the inelastic mean free path for A. θ corresponds to the angle between the surface normal and the photoelectron emission direction.y A (t) means the concentration of A at the depth t.From this, the apparent concentration at θ for A becomes The MEM equations are expressed as and y A (t) can be easily calculated since it satisfies the constraint (Equation ( 2)).The y A (t) that minimizes F is the most probable depth distribution, y A (t).

Approach for TSC
TSC observes the emission of carriers captured by traps located inside thin films or near interfaces, and provides trap density-energy level distribution N t (E t ).In TSC experiments, traps are filled with carriers and then the sample is heated at a constant rate.When the thermal energy becomes equal to the trap energy level (E t ), the trapped carriers are emitted and the external current is observed.The trap density (N t ) and E t correspond to the current peak intensity and peak temperature.When a TSC peak originates from the trap with single E t , it can be easily analyzed by using a curve fitting procedure, while a peak composed of traps with several distributed E t is difficult to analyze without any assumptions regarding the number and/or distribution of E t .In this case, the model-free MEM is very useful.The basic TSC equation is [22,23] where q, ν, k, and β are the elemental charge, the attempt-to-frequency, the Boltzmann constant, and the heating rate, i.e., the absolute temperature T of a sample is T 0 + β × time.The aim of TSC is to determine N t (E t ).Equation ( 16) has the same shape as Equation ( 1); however, N t (E t ) is not a probability density function and does not satisfy Equation (2).In order to overcome this difficulty, I TSC (T ) and N t (E t ) must be normalized with the normalized constant P as the next equations [22,23] and Usually, P cannot be determined because P is the integrated value of the MEM calculation result.However, P in TSC can be easily obtained prior to the MEM calculations because P in Equation ( 18) is equal to the amount of all trapped carriers as This makes it easy to apply MEM to TSC.The basic MEM equations can be expressed as and Once N t (E t ) is determined, N t (E t ) can be obtained using P .These are the equations required for applying MEM to TSC.

Approach for IA
As described above, TSC can determine trap density and energy, N t (E t ).However, the energy region that TSC can measure is limited by the equipment.The conventional TSC can detect only bulk traps with E t below ≈ 1.5 eV (this corresponds to 350 • C) and cannot investigate the interface states (D it ) located at the interface between a gate insulator and underlying substrate [22,23].The IA technique can be used to overcome these difficulties though it is easy and simple, and requires no complex equipment or measurement system.In usual IA experiments, annealing for a constant period (δt) and data measurements are repeated as shown in Figure 2. In the present study, the data corresponds to annealing temperature (T ) dependent midgap voltage shift ∆V mg (T ) reflecting the amount of bulk trapped carriers, or to the T dependent interface state amount ∆D it (T ) (note that under the midgap voltage condition, the contribution of interface states to ∆V mg is minimized; however, it cannot be completely neglected).With increasing T , ∆V mg (T ) and ∆D it (T ) decrease because of the emission of trapped carriers and the recovery of interface states.I explain ∆V mg (T ) below; however, ∆D it (T ) can be treated in the same way.The aim of IA experiments is to determine the emission (recovery for interface state cases) activation energy distribution of all traps dominating ∆V mg , i.e., ∆V mgi (E a ).Note that E a becomes equal to E t in trap cases.The emission probability at temperature T , e p (E a , T ), is From this, the equation is obtained, where ∆V mg (E a , t, T n ) means ∆V mg from trap with E a during IA at T n for t.This leads to From this, the basic IA equation can be obtained [25] as As in the TSC case, ∆V mgi (E a ) can be normalized as where ∆V 0 mgi is the initial ∆V mg just after the trapping event and is directly determined by experiments.The equation is obtained, which leads to and These are the equations for applying MEM to IA.

Approach for TDCV
This case is more problematic than the above cases because the normalization constant cannot be determined before performing the MEM calculations, and another approach is required.Let us assume a metal/SiO 2 /Si system for the example below.The aim of TDCV is to determine the spatial distribution of fixed charges in SiO 2 .First, the initial SiO 2 thickness is divided into small pieces with thickness k.When the thickness is thinned to be t, e.g., by using the wet etching method, t/k is the number of involved pieces (Figure 3).
When the number n (n = 1, 2, . . ., t/k) is introduced, the midgap voltage shift due to fixed charges ∆V mg (t) can be calculated using the distance from the SiO 2 /Si interface, x: Here, ρ(x) and P are defined by the equation In addition, the χ 2 /N term is introduced: The ρ(kn) that satisfies the next equation is the most probable ρ(kn) .

Numerical Calculation
This section details MEM numerical calculation results.Particular attention is paid to the number of measured data points and noise because they are very important elements for experiments.In addition, as described above, the assumed default function, m, might affect the results and thus has to be considered.The computing time also largely depends on m and the users might need to choose the appropriate m before performing the MEM calculations.Therefore, m was also examined.

ARXPS Case
The assumed structure for ARXPS calculation is SiO 2 film on Si substrate, and C and N are distributed inside the film shown in Figure 4a.λ for C, N, Si and O are 3.37, 3.15, 3.80 and 2.80 Å [27], and Al Kα X-ray source was assumed.Generally, thin film thickness in semiconductor engineering is known in advance or can be easily determined by using spectroscopic ellipsometry, X-ray reflectivity, or XPS methods, and the Si concentration of the substrate is 100%.In this case, it is enough to perform MEM calculations under the assumption that the depth beyond the film thickness is uniform, Si = 100%, which reduces the computing time.Therefore, the composition in the region deeper than 5 nm from the surface was fixed to be Si 100%.Note that even when this assumption is not used, the obtained results are coincident within a few percent (figures not shown), though the computing time significantly varies.
First, m A (t) (A = O, Si, N, and C) dependence was examined.The ARXPS signal intensities calculated as a function of θ (0, 10, 20, ..., and 70 • ; total = 8 points) without noises are shown in Figure 4b.The MEM calculations were performed for two sets of m A (t) where m A (t) is set to be atomic percent determined from the intensity ratio at θ = 0 • , i.e., C, N, Si, and O = 1%, 3%, 57%, and 39%, and m A (t) for all elements is equal and uniform, that is, 0.25.The MEM calculations reproduce well the calculated ARXPS intensities and the assumed depth distributions as shown in Figure 4(b)-(d).In this study little m A (t) dependence is present, but it might appear when using other calculation routines and/or the assumed depth distribution.It may also be observed if a significantly different m A (t) is used.To confirm this, I carried out the MEM calculations for different assumed atom distribution systems with various m A (t), but only negligible dependence was observed.From these results, I think that m A (t) is to a considerable extent arbitrary.On the other hand, the computing time significantly depends on m A (t), and m A (t) determined by the values at θ = 0 • led to the shortest computing time in m A (t) that was tried.I therefore used the values below.If structural data determined by using methods such as Rutherford backscattering is available, one can use it as m A (t), i.e., prior knowledge, leading to more precise and faster determination.).Intuitively, it would appear that more data points gives more precise results, i.e., MEM can reproduce the assumed distribution better.However, the dependence on the number of data points is insignificant.Even with only five data points, the MEM results reproduce the assumed distribution relatively well.I think that measuring the data points at eight angles is sufficient in general cases.Livesey et al. pointed out that the data measured at angles above 70 • has negligible impact on the MEM calculations [16].They also pointed out that the absolute ARXPS signal intensity becomes considerably weaker at such angles.Therefore, in many cases it is sufficient to measure the eight data points at angles under 70 • .Next, the MEM tolerance to noise was examined.The MEM calculation results with ARXPS signals including artificial Gaussian noise (0%, 5%, 10%) are shown in Figure 6.It should be noted that even in the 10% case, the overall shapes of the assumed distribution are reproduced though they become somewhat distorted.This proves the validity of MEM and is very helpful for practical use.

TSC Case
For this case β =20 K/min and ν = 10 11 /s were assumed [22][23][24].The assumed N t (E t ) distribution is shown in Figure 7a (peaks A, B, and C).The corresponding TSC spectra calculated with Equation ( 16) is plotted in Figure 7b.The MEM calculation results are plotted in Figure 7a,b.Constant m(E t ) was used in this calculation.The MEM results well reproduce the assumed N t (E t ) distribution, which demonstrates that MEM can be applied to arbitrary E t distribution.The MEM calculations were also performed with exponential or Gaussian m(E t ) and almost the same results were obtained [24].However, the computing time significantly depended on m(E t ).It becomes shorter when m(E t ) is closer to the assumed N t (E t ).Thus, for practical use, one should chose m(E t ) carefully.Fortunately, the appropriate m(E t ) can be obtained from the equations [  The number of data points in TSC spectra is usually enough, typically more than 1000, and one does not need to be concerned about this point for MEM calculations.
Regarding noise, the MEM calculation is slightly affected by noises in I TSC (T ).The MEM calculation results with artificial Gaussian noises are shown in Figure 8.For larger noise, the peaks become gradually distorted.Moreover, a ghost shoulder and peak (indicated by arrows) appear.However, their effect is very slight.Note that this is an extreme case and that the usual I TSC (T ) is not so noisy (see [22,23]) except for the region where I TSC (T ) is extremely low (in the order of 10 −15 A).Therefore, in practical use, it is thought that the noise has much less impact on the MEM calculation.

IA Case
For this case, the ∆V mgi (E t ) distribution shown in Figure 9a and ν = 10 11 /s were assumed [22][23][24].From this distribution, the corresponding ∆V mg (T ) values were calculated as shown in Figure 9b.Here calculations were performed with ∆T = 20 K and δt =600 s.The MEM calculation results obtained constant m(E a ) are shown in Figure 9a,b.Exponential or Gaussian m(E a ) were examined with respect to m(E a ) dependence.However, the obtained results remain almost unchanged, which means that the MEM calculations are hardly affected by the assumed m(E a ).Unfortunately, unlike the TSC case, no appropriate m(E a ) is obtainable prior to the MEM calculations and it is difficult to even predict the ∆V mgi (E a ) curves from experimental data, ∆V mg (T ).Therefore, it was considered that there was no choice but to use the constant m(E a ) although the computing time might be longer than the case where the appropriate m(E a ) (closer to the ∆V mgi (E t )) is used.
With respect to the obtained ∆V mgi (E t ) shown in Figure 9a, the overall shape is reproduced but there is a large deviation between the assumed distribution and the MEM results.The intensity ratio between peaks is different and the peaks are smoothed, especially peaks A and B. This is because ∆T is somewhat large and the fine structures are not reflected in ∆V mg (T ) sufficiently.The ∆T value is limited by the temperature control precision of the annealing furnace.As a result, the number of data points is limited and the information in ∆V mg (T ) is averaged in comparison with the above TSC case, which degrades the precision of the MEM result.Actually, when more (less) data points, i.e., ∆T is 10 K (30 K), are assumed, the reproducibility is slightly improved (degraded) as shown in Figure 9(c).Therefore, the number of data points has to be taken into consideration when MEM is used.It should be noted that although its accuracy is lower than that of TSC, IA has three distinct advantages, i.e., it can be applied to devices, can observe very deep traps while TSC can detect only traps with E t up to ≈ 1.5 eV, and can determine D iti (E a ).
Regarding noise, I examined the noise dependent MEM calculations and the results are shown in Figure 9d.The noise significantly affects the obtained spectra, especially on peak A. This peak is very sharp and its recovery occurs in a very narrow temperature range, i.e., ∆V mg (T ) drops very sharply with temperature.As a result, the drop is included in only a few data points and it is susceptible to noise.This even shifts the peak A position.Similarly, peak B is considerably distorted.Meanwhile, the broad peak C is less susceptible to noise as expected from the above argument.This means that one should obtain high signal-to-noise ratio data for the MEM calculations.

TDCV Case
I assumed a SiO 2 (10 nm)/Si system with the three fixed charge distributions, ρ(t), shown in Figure 10a-c.The ∆V mg (t) calculated from each ρ(t) and the corresponding MEM results are plotted in Figure 10(a)-(d).The m(kn) value is impossible to deduce before MEM calculation.If ρ(t) is proportional to thickness, the distribution is localized at the interface.On the other hand, the distribution is uniform inside a film when ρ(t) 2 is proportional to the thickness.From these, one can deduce m(kn); however, this estimation is insufficient in many cases.Therefore, a constant m(kn) value was assumed although the computing time might be longer.Since this assumption seems to work well (see the results for the 10 point case), one can conclude that the constant m(kn) can be used when deducing its value is difficult.In addition, the number of the corresponding ∆V mg (t) data points are set to be 10, 5, and 3 to confirm the dependence of the MEM calculation accuracy on the number of data points.The obtained MEM results are plotted in Figure 10a-c.It would intuitively seem that fewer data points would result in less accurate results, as expected from the IA case.This was found to be the case; the MEM results significantly depend on the number of data points.When it is 10, MEM reproduces the assumed distributions well, while a slight (significant) deviation is seen for the 5 (3) point case.Therefore, one can conclude that at when MEM is applied to TDCV, at least 10 points are required to obtain satisfactory results.
Noise was found to have a considerable impact on the MEM calculation results (Figure 11).In particular, a large deviation from the assumed ρ(t) appears when the noise exceeds 5%.The peak becomes distorted and even a ghost shoulder and peak are present in the 10% case.However, one does not need to take noise into consideration since in general cases the CV measurement accuracy is less than 5%.
In concluding this section, I can state that MEM can analyze the data accurately without any prior knowledge regarding the results.In the next section, I will explain an example of application results to assist readers' understanding of the validity of MEM.
Figure 11.Noise dependence of the MEM calculation results for TDCV (constant m(kn) and the 10 point case).The black line is the assumed ρ(t).The red, blue, and green lines correspond to the artificially introduced Gaussian noise = 3%, 5%, and 10% cases, respectively.Depth from surface (nm)

Experiments and MEM Calculations
For experiments, SiO 2 /Si wafers contaminated with carbon were prepared.Cleaned Si(100) wafers were left in air for t l = 0, 1, 3, or 10 weeks, during which carbon contaminations adsorbed onto the wafers.Then, the wafers were cut into small pieces (1 × 1 cm 2 ) and annealed in O 2 ambient at 1100 • C until 10 nm thermal SiO 2 was grown.For ARXPS, SiO 2 was thinned to be 4 nm (thickness was monitored with a spectroscopic ellipsometer).ARXPS spectra for C 1s, O 1s, and Si 2p were measured with θ = 0, 10, 20, ..., 70 • .Monochromatized Al Kα was the X-ray source.Before measurements, the sample surfaces were cleaned with H 2 SO 4 + H 2 O 2 solvent to remove surface carbon adsorbates.Integrated peak intensity depending on θ was determined by peak fitting using a Gaussian function after subtracting the Shirley background.For the MEM calculations, m A (t) were determined from the measurement results at θ = 0 • .For λ C , λ Si , and λ O , 3.37, 3.80, and 2.80 Å were used, respectively [27].
For TSC measurement, all the contaminated 10 nm SiO 2 films were cleaned with H 2 SO 4 + H 2 O 2 solvent.Immediately following the cleaning, aluminum (99.99 % purity) electrodes with diameter 300 µ m were deposited onto the sample surface by vacuum evaporation technique (metal oxide semiconductor (MOS) structure).In addition, gold back contact electrodes were fabricated.The MOS mounted onto the sample holder specified for TSC was inserted into a chamber, the atmosphere in which was replaced with pure He gas.The avalanche injection technique was used to selectively inject holes into SiO 2 .The MOS temperature was then increased up to 350 • C with constant heating rate β = 20 K/min.During the heating up, the external current due to the emission of trapped holes was detected as the TSC signal I TSC (T ).The current sensitivity was ≈ 5 × 10 −15 A. For the MEM calculations that were performed, ν was assumed to be 10 11 /s [22][23][24].Although for precisely determining the absolute value of E t , ν must be determined by other methods [24,29], it is sufficient for our purpose to distinguish E t because N t (E t ) merely shifts by about 0.1 eV toward higher E t for ν larger by one order [24].The m(E t ) value was calculated from I TSC (T ), Equations ( 41) and (42).
For IA measurements, the same samples as those for the TSC case were used.After avalanche injection, IA was performed following the temperature and ∆V mg (T ) (∆D it (T )) measurement sequence shown in Figure 12.For T ≤300 K, the values were measured at T , while the measurements were performed at 300 K for T ≥300 K to avoid damage that might be imposed by high temperature measurements.The IA temperature step ∆T and period δt were set to be 20 K and 600 s.The heating rate was 20 K/min.All the IA measurements were carried out under N 2 ambient.The ∆V mg (T ) (∆D it (T )) values were recorded using an LCR meter.This ∆V mg (T ) data includes the contributions of the trapped charges and interface states (although the latter are minimized at midgap voltage, a few of them can be included), while ∆D it (T ) is determined only by the interface state density.For ν and m(E a ) for MEM calculations, 10 11 /s [22][23][24] and the constant m(E a ) were used.In order to reduce experimental noise, the same measurements were repeated four times and the data were averaged.
The same contaminated samples were also used for TDCV measurements.An HF solution was used to thin SiO 2 to several thickness t.Then, MOS capacitor structures were formed.The V mg (t) for each sample was recorded with an LCR meter.The ∆V mg (t) was determined from the differences of V mg (t) between non-contaminated and contaminated samples.All the measurements were performed at room temperature.A constant m(kn) was used for MEM calculations.

Results and Discussion
The MEM calculation results on ARXPS are shown in Figure 13 (only C depth distribution is shown).Clearly, the depth distributions depend on t l .Most carbon is located near the interface, while the amount of carbon atoms increases for longer t l .These carbon atoms are undoubtedly due to the carbon contamination.Note that C present near the surface is due to the surface C adsorbates and is not discussed in this paper.
In TSC in Figure 14a, the peaks A and C decrease for longer t l .In addition, new B peaks are observed in longer t l samples, which implies the carbon contamination generates new traps.The TSC peaks are too broad to analyze using a cure fitting method.The MEM calculation results are shown in Figure 14a,b.Several peaks can be clearly observed at around 0.8, 1.1, and 1.3 eV.The t l = 0 sample without the carbon contamination shows no 1.1 eV peak, while 0.8 and 1.3 eV peaks are clearly present, which indicates that the 1.1 eV peak originates from the carbon, while the others are intrinsic SiO 2 traps.From [30,31], the A and C peaks can be E δ -and E γ -centers (oxygen deficiency), respectively.This A and C peak intensity becomes less for longer t l , which implies that the oxygen deficiencies are passivated by the carbon.Therefore, the dependence of A and C peaks on the carbon is different from that of B. In addition, for t l ≥ 1 week samples, the B peak appears at almost the same E t .This may mean that the chemical bonding state around the carbon atoms is independent of t l although its detailed atomic bonding structure cannot be determined.

C B A
As described above, although TSC provides unique information regarding N t and E t , it can detect traps only up to 1.5 eV.This problem can be overcome using the IA technique.The ∆V mg (T ) plot in Figure 15a shows that ∆V mg (T ) begins to decrease at around 300 K. Since this temperature is close to that at which the lowest TSC energy peak appears, this decrease may originate from the emission of holes from the same trap (A peak in TSC).In addition, ∆V mg (T ) shapes depend on t l .However, it is difficult to interpret this dependence, although it implies additional detrapping and/or the recovery of interface states.The ∆V mg (T ) are settled at almost 0 after 700 K IA.This results from the complete detrapping and/or the recovery of interface states.In other words, all the E a of traps and/or interface states can be detected by the IA technique.The MEM results are shown in Figure 15b.It should be remarked that some peaks were clearly extracted.In comparison with the TSC peaks in Figure 14, 0.8, 1.1, and 1.3 eV peaks can be attributed to the detrapping of holes captured during the avalanche stresses.Therefore, the other peaks at 1.2 and 1.7 eV originate from other traps or the recovery of interface states.For more detailed investigation, ∆D it (T ) was analyzed and the results are shown in Figure 15c,d.Note that although ∆D iti (E a ) is the interface state density and its unit is /cm 2 , we use the unit V to make the comparison with ∆V mgi (E a ) (in unit V ) easier.In the obtained ∆D iti (E a ), some peaks are observed at around 1.2 eV.Therefore, the 1.2 eV peak in Figure 15b corresponds to the recovery of the interface states formed by the avalanche stresses.On the other hand, the fact that no ∆D iti (E a ) peaks appear at 1.7 eV suggests that the 1.7 eV peak can be attributed to a hole trap that could not be observed by TSC.Both the 1.2 and 1.7 eV peaks increase for longer t l , which indicates that the carbon contamination accelerates the formation of interface states and hole traps.
The TDCV experiment results are described next.Figure 16a shows TDCV curves.It should be noted that for longer t l , ∆V mg (t) is larger.This indicates that more carbon contamination yields more fixed charges.The ∆V mg (t) value decreases as the films are thinned; however, it is difficult to extract the distribution of the fixed charges in these films from the curves.
Figure 16b shows the MEM results.As shown in ∆V mg (t), the amount of fixed charges increases considerably with more carbon contamination, i.e., for longer t l .This indicates that carbon produces fixed charges.The distribution ρ(t) decreases toward bulk, which implies that carbon diffused into SiO 2 during oxidation.These four experiments and MEM calculations provide information that can be easily and intuitively understood.Thus, the carbon species adsorbed on the surface were incorporated diffusively into the SiO 2 film during the thermal oxidation process.This diffusion results in the carbon distribution shown in Figure 13.In addition, the incorporated carbon yields new traps and simultaneously passivates the intrinsic traps, E γ -and E δ -center.Moreover, the carbon accelerates the formation of interface states and also forms fixed charges.Note that the distribution of the fixed charges resembles the carbon atom distribution (both are localized near the interface) although their amounts differ by many orders, which means that only a part of the incorporated carbon atoms forms fixed charges.This is the effect of carbon contamination on the structural and electrical properties of MOS Unfortunately, this study yielded no information regarding the precise atomic structure of traps, fixed charges, and interface states.However, it is evident that MEM is a very powerful tool for semiconductor engineering and provides crucial information that cannot be extracted with conventional methods.Furthermore, MEM can be easily applied to other experimental data analysis including the inverse problems described by Equation (1).In this respect, rapid growth of MEM application to engineering fields as well as to physics and chemistry can be expected.

Conclusions
This paper describes a brief theoretical background and application results regarding MEM data processing from the viewpoint of the application to semiconductor engineering.The MEM method was applied to the analysis of ARXPS, TSC, IA, and TDCV data, which provided evidence that it can extract very interesting and crucial information regarding atom distribution, traps, fixed charges, and interface states.These data are of great importance for device fabrication processes and the understanding of their physics.The MEM method is very simple and can be applied easily to other data including inverse problems.It can be expected that MEM's importance in the engineering field will continue to grow, and that the method will become a standard analyzing tool.

Figure 1 .
Figure 1.A schematic picture of a layer structure.The t and e − mean the distance from the surface and photoelectrons emitted by X-ray absorption.

Figure 2 .
Figure 2. A standard annealing temperature and measurement scheme used for IA experiments.The lines and circles mean sample temperature and measurement points.The ∆T and δt correspond to the temperature step and annealing period at each temperature.

Figure 3 .
Figure 3.A schematic picture of a layer structure.The k and t mean the thickness of each layer and the whole film thickness.

Figure 4 .Figure 5
Figure 4.The m A (t) dependence of the MEM calculation results for ARXPS (θ = 0, 10, 20, ..., 70 • and the no noise case).(a): Assumed depth distribution of C, N, Si, and O atoms for ARXPS-MEM calculations.(b): Normalized ARXPS intensity as a function of θ calculated from Equations (11), (12), and the depth distribution shown in (a) (symbols).The lines are the MEM calculation results with m A (t) determined by θ = 0 • values.(c) and (d): Reconstructed depth distribution from ARXPS normalized intensity (symbols in (b)) using MEM with m A (t) determined respectively by θ = 0 • values (c) and m A (t) = 0.25 (d) for all atoms.

Figure
Figure 8. Noise dependence of the MEM calculation results for TSC (constant m(E t ) case).From bottom to top, assumed N t (E t ), MEM results for artificially introduced Gaussian noise = 0%, 5%, and 10% cases.The arrows indicate a ghost shoulder and peak.

Figure 9 .
Figure 9.The dependence of the MEM calculation results on the number of points (i.e., ∆T ) and noise for the IA (constant m(E a ) case).(a): Assumed ∆V mgi (E t ) (lower) and the MEM results with ∆T = 20 K and without noise (upper).(b): Calculated ∆V mg (T ) from Equation (26) and (a) with ∆T = 20 K and without noise (symbol).The line corresponds to the MEM result.(c): From bottom to top, assumed ∆V mgi (E t ), the MEM results without noise and with ∆T = 30, 20, and 10 K, respectively.(d): From bottom to top, assumed ∆V mgi (E t ), the MEM results with ∆T = 20 K and with artificially introduced Gaussian noise = 0%, 5%, 8%, and 10%, respectively.

Figure 10 .
Figure 10.Assumed fixed charge distribution and the number of points dependence of the MEM calculation results for TDCV (constant m(kn) and the no noise case).(a)-(c): The black lines are the assumed rho(t).The colored lines are the MEM calculated results with the number of points = 10, 5, and 3. (d): The circles, triangles, and squares are calculated ∆V mg (t) for the assumed distribution rho(t) in (a)-(c), respectively.The lines are the corresponding MEM calculation results.

Figure 12 .
Figure 12.The IA temperature and measurement scheme for contaminated MOS samples.

Figure 13 .
Figure 13.The t l dependence of carbon distribution in contaminated MOS samples determined from ARXPS and MEM calculations.

Figure 14 .
Figure 14.The t l dependence of I TSC (T ) for contaminated MOS samples (a) and N t (E t ) calculated with MEM (b).

Figure 15 .
Figure 15.The t l dependence of ∆V mg (T ) (a) and ∆D it (T ) (c) in contaminated MOS samples.The ∆V mgi (E a ) (b) and ∆D iti (E a ) (d) are the corresponding MEM calculation results.

Figure 16 .
Figure 16.The t l dependence of ∆V mg (t) for contaminated MOS samples (a) and ρ(t) calculated with MEM (b).
These equations give N t (E t ) shown in Figure7a.They cannot reproduce all the peaks because they directly convert I TSC (T ) to N t (E t ) and the obtained N t (E t ) trails I TSC (T ).The broad peak C is reproduced very well but the A and B peaks are not.From this, one can conclude that although Equations (41) and (42) are easy to use, they reproduce only widely distributed E t .However, this can be used as m(E t ) to reduce the computing time.Figure 7. Assumed N t (E t ), I TSC (T ) and MEM results.(a): From bottom to top, assumed N t (E t ), MEM reconstructed results, and N t (E t ) calculated from Equations (41), (42) and I TSC (T ).(b): I TSC (T ) calculated from assumed N t (E t ) (lower) and MEM result (upper).MEM was performed without noise and with constant m(E t ).