A Level Set Method for Infrared Image Segmentation Using Global and Local Information

: Infrared image segmentation plays a signiﬁcant role in many burgeoning applications of remote sensing, such as environmental monitoring, trafﬁc surveillance, air navigation and so on. However, the precision is limited due to the blurred edge, low contrast and intensity inhomogeneity caused by infrared imaging. To overcome these challenges, a level set method using global and local information is proposed in this paper. In our method, a hybrid signed pressure function is constructed by fusing a global term and a local term adaptively. The global term is represented by the global average intensity, which effectively accelerates the evolution when the evolving curve is far away from the object. The local term is represented by a multi-feature-based signed driving force, which accurately guides the curve to approach the real boundary when it is near the object. Then, the two terms are integrated via an adaptive weight matrix calculated based on the range value of each pixel. Under the framework of geodesic active contour model, a new level set formula is obtained by substituting the proposed signed pressure function for the edge stopping function. In addition, a Gaussian convolution is applied to regularize the level set function for the purpose of avoiding the computationally expensive re-initialization. By iteration, the object of interest can be segmented when the level set function converges. Both qualitative and quantitative experiments verify that our method outperforms other state-of-the-art level set methods in terms of accuracy and robustness with the initial contour being set randomly.


Introduction
Infrared (IR) imaging system that passively receives IR radiation (760 nm-1 mm) and converts the invisible rays into images has been extensively applied in remote sensing [1,2]. Infrared (IR) image segmentation is one of the most important techniques in IR systems and plays a fundamental role in many modern remote sensing applications, e.g., unmanned aerial vehicle navigation, pedestrian surveillance, space warning and oil leakage detection [3]. However, the accuracy and robustness of IR image segmentation is hard to be guaranteed due to the inherent drawbacks of IR imaging itself, including the blurred edge, low target/background contrast, local inhomogeneity Since most of the conventional level set methods only draw upon the intensity feature, it is easy to get incomplete contours and large quantities of false targets when they are directly applied to process IR images with remarkable intensity inhomogeneity and blurred edges. To solve these challenging problems mentioned above, a new level set method combing global and local information is presented in this paper. First of all, we propose to construct a novel SPF by integrating a global term and a local term together. Specifically speaking, the global term is calculated by the global average intensities inside and outside of the evolving curve. In addition, the local term is constructed by the signed driving force that is associated with four local statistical features: gradient, entropy, standard deviation and filtered difference. Please note that both the direction and magnitude of the driving force are determined by the distances between the feature vector and the two mean multi-feature vectors inside and outside the contour. Then, we use an adaptive weight matrix calculated based on the range value of each pixel to fuse the global and local terms as a hybrid SPF. Next, the level set formula is formed under the framework of GAC model by substituting the edge stopping function with our proposed SPF. Since the presented level set formula takes both global intensity and local multi-features into account, the curve can evolve fast when it is far away from the real boundary meanwhile the evolution is strictly controlled when it is near the object of interest. Besides, a Gaussian convolution is introduced to regularize the level set function so that the re-initialization with high-computational amount can be avoided. By iteration, the curve will evolve to the target boundary until the level set function converges. Figure 1 shows the complete flow chart of our work.
In conclusion, the main contributions and advantages of this study can be summarized as the following aspects: i.
a new SPF integrating both global-intensity-based information and local-multi-feature-based information via an adaptive weight matrix is developed; ii.
four statistical features are dynamically weighted by range ratio to form the driving force which is utilized to form the local term of SPF; iii. a level set formula which is able to cope with the weak edge and intensity inhomogeneity is constructed using the SPF proposed; iv.
the segmentation results are not obviously influenced by the initialization of contour. The remainder of this paper is organized as follows: in Section 2, the related work in regard to the GAC framework is introduced; in Section 3, the detailed theories of the proposed level set method are depicted; in Section 4, comparisons with other state-of-the-art methods are conducted to verify the superiority of our method; finally, a conclusion is drawn in Section 5.

Related Work about Geodesic Active Contour (GAC) Model
Suppose that Ω ⊂ 2 is a two-dimensional image domain and I : Ω → + is a given single channel IR image. Let C(p) = (x(p), y(p)), p ∈ [0, 1] be a differentiable parameterized curve in Ω. The GAC model is aimed to minimize the following energy functional: E GAC (C(p)) = 1 0 g(|∇I(C(p))|) C (p) dp (1) where, ∇ is a gradient operator; g(·) is the edge stopping function that can be understood as an edge detection function, and its definition is given as where, C (P) represents the arc length of the curve C; G σ * I means convolving the IR image I using a two-dimensional Gaussian kernel G σ = 1 2πσ 2 exp(− x 2 +y 2 2σ 2 ) whose standard deviation is σ. Obviously, lim x→∞ g(x) = 0 satisfies the requirement that the edge stopping function gets the minimum when the curve stops on the boundary; on the contrary, Equation (2) tends to be 1 when the curve is on non-edge regions.
By virtue of the variational theory [31], the Euler-Langrange equation of Equation (1) can be obtained as Equation (3).
where, C t means the derived function of C; κ and → N represent the curvature of the contour and the inward normal to the curve, respectively. In addition, a constant velocity term α is employed to increase the propagation speed and Equation (3) can be re-written as: Furthermore, by representing the closed curve C with the zero level set of the level set function φ, the corresponding level set formula can be deduced as follows: ∂φ ∂t = g · |∇φ| · div ∇φ |∇φ| + α + ∇g · ∇φ (5) where, α is also called as the balloon force and it controls the contour expanding or shrinking; div ∇φ |∇φ| stands for the curvature term. Please note that the contour evolution only depends on the edge information, indicating that the segmentation result of GAC model is easy to be affected by noise and fall into a local minimum when the initial contour is far from the desired object [10]. Motivated by this key point, we consider substituting its edge stopping function with a specific term that contains the global and local information simultaneously. By this means, the new level set model is able to be robust to the adverse interferences existing in IR images.

Theory
In this section, the theories regarding the proposed level set method are illustrated at length. In Section 3.1, the way of constructing the SPF is introduced; in Section 3.2, we further depict the implement of this method; at last, a summary of the whole procedure is revealed in Section 3.3.

Signed Pressure Fucntion Integrating Global and Local Information
As is mentioned in Section 2, one of the key techniques in this study is to design a new SPF containing a global term and a local term to replace the edge stopping function under the GAC framework. According to the previous works about SPF [7,8,32], there are two basic design principles to be noticed: (i) the defined SPF should be in the range [−1, 1]; (ii) the essence of SPF is to modulate the signs of the pressure forces inside and outside the image region. By the guidance of SPF, the contour expands when it is inside the target of interest and shrinks when it is outside the target to be extracted.

Design of the Global Term
The global term of SPF is supposed to make sense when the evolving curve is far away from the object and it provides a powerful force to drive the curve to approach the real boundary with a fast speed. Based on this consideration, the global term needs to focus on the average intensities inside and outside of the curve [7]. Thus, the following definition is adopted: where, c 1 and c 2 are two parameters related to the level set function φ and they stand for the average intensities inside and outside the contour. Equation (6) modulates the signs of the pressure forces inside and outside the region of interest so that the contour shrinks when outside the object and expands when inside the object. The mathematical expressions of c 1 and c 2 are shown as Equations (7) and (8).
where, H(·) is the Heaviside function whose regularized version is selected as: where, arctan(·) is the inverse tangent function and ε 1 is a control factor. H(x) can be understood as an approximation of the condition:

Design of the Local Term
To provide a more precise evolution direction when the curve is in the regions with intensity inhomogeneity or weak edge, local features need to be taken into consideration in the level set method. In this paper, four statistical features: gradient υ 1 (x, y), local entropy υ 2 (x, y), local standard deviation υ 3 (x, y) and filtered difference υ 4 (x, y), that can reflect the local property of each pixel are selected to compose a feature vector υ(x, y) = (υ 1 (x, y), υ 2 (x, y), υ 3 (x, y), υ 4 (x, y)) T . In addition, each feature is defined as Equations (10)- (13). υ 1 (x, y) = (I(x, y) − I(x + 1, y)) 2 + (I(x, y) − I(x, y − 1)) 2 (10) where, W m (x, y) is an m × m sized local window set around the center pixel (x, y); L is the maximal gray level of the input IR image (L = 256 for an 8-bit image); p(t) = n t n means the probability of gray level t in W m (x, y), where n t is the number of pixels whose gray level is t and n = m × m is the total pixel number in the local window; µ = 1 n ∑ (s,t)∈W m (x,y) I(s, t) represents the average intensity in W m (x, y); ℵ m is an m × m sized mean filter and ℵ m * I(x, y) means a mean filtering convolution. Here, a further explanation of Equations (10)-(13) is given: υ 1 (x, y) is the vector sum of horizontal and vertical gradients, and it reflects the comprehensive gradient of (x, y); υ 2 (x, y) and υ 3 (x, y) stand for the local entropy and local standard deviation of (x, y) respectively, and they are utilized to measure the local texture of each pixel; υ 4 (x, y) is the absolute difference between the original intensity and the intensity processed by mean filtering, and this filtered difference is commonly applied to reflect the local roughness [8].
Next, we argue that the four features in each feature vector should possess different weights according to the degree of inhomogeneity when they are further utilized to calculate the driving force. In light of this consideration, we employ a metric called range ratio ρ i [33] to be the adaptive coefficient, and Equation (14) below shows the definition.
where, υ imax (x, y) and υ imin (x, y) denote the maximal and minimal feature values in the local window W m (x, y); max(υ i ) is the maximal pixel value of the whole feature map; ρ max (x, y) = max(ρ 1 (x, y), ρ 2 (x, y), ρ 3 (x, y), ρ 4 (x, y)) stands for the maximal range ratio among the four features and it is utilized for normalization. Thus, a feature weight vector ρ(x, y) = can be obtained for each pixel. Then, let us focus on studying the construction of driving force which is closely related to the afore-proposed local multi-features. We notice that in LBF model [21], the local intensity information inside and outside the contour is embedded via a Gaussian kernel function used in the energy functional, which is revealed as Equations (15) and (16).
where, f in (x, y) and f out (x, y) are two values that predict the average intensities inside and outside the contour; K σ e is a Gaussian kernel function whose standard deviation is σ e ; M in (φ) = 1 − H(φ) and M out (φ) = H(φ), where H(·) is the Heaviside function introduced in Equation (9). For a further understanding of Equations (15,16), f in (x, y) and f out (x, y) are weighted averages of the intensities in a neighborhood of (x,y), whose size is proportional to the scale parameter σ e [21].
Referring to LBF model, we also embed gradient, local entropy, local standard deviation and filtered difference to our model with a Gaussian kernel function K σ e [34,35]. For further emphasizing the local property, the mean operator ℵ m (·) that computes the mean value in a m × m sized local window W m (x, y) is also used to smooth the embedded local feature for each pixel. By this means, each pixel has two mean multi-feature vectors inside and outside the contour. This procedure is mathematically expressed by Equations (17) and (18) and is intuitively depicted by Figure 2.
where, ψ in (x, y) and ψ out (x, y) are the two mean multi-feature vectors inside and outside the contour, and they are composed of four single-feature values denoted as ψ in i (x, y) and ψ out i (x, y) (i = 1, 2, 3, 4) respectively.  (15,16), in f (x, y) and out f (x, y) are weighted averages of the intensities in a neighborhood of (x,y), whose size is proportional to the scale parameter e σ [21].
Referring to LBF model, we also embed gradient, local entropy, local standard deviation and filtered difference to our model with a Gaussian kernel function e σ K [34,35]. For further emphasizing the local property, the mean operator  m () that computes the mean value in a  mm sized local window m W (x, y) is also used to smooth the embedded local feature for each pixel. By this means, each pixel has two mean multi-feature vectors inside and outside the contour. This procedure is mathematically expressed by Equations (17) and (18) and is intuitively depicted by Figure 2.   K (x, y) M φ(x, y) Furthermore, the distances between υ(x, y), ψ in (x, y) and ψ out (x, y) that measure the similarities are computed using the feature weight vector ρ(x, y) as Equations (19) and (20).
L out (x, y) = ρ 1 (x, y) · υ 1 (x, y) − ψ out 1 (x, y) + ρ 2 (x, y) · υ 2 (x, y) − ψ out 2 (x, y) where, |·| represents calculating the absolute value. In essence, L in (x, y) and L out (x, y) are the weighted sums of the four feature distances, and they indicate the evolving direction of the current pixel (x, y). Accordingly, the magnitudes of the driving forces from the two directions can be determined as follows: where, F in (x, y), F out (x, y) ∈ (0, 1] denote the internal and external driven forces, respectively; η is a constant parameter. By comparing L in (x, y) and L out (x, y), the driving force spf L (x, y), which is also regarded as the local term of SPF, are computed as Equation (23).
Please note that the driving force indicates both the evolution direction and magnitude for each pixel on the curve.

Combination of Global and Local Terms
After computing the global and local terms using the procedures introduced in Sections 3.1.1 and 3.1.2, we need to fuse the two terms adaptively and construct a complete SPF. That is to say, the SPF can be written as: where, ω(x, y) is the adaptive weighted coefficient of each pixel, and we call ω as the adaptive weight matrix in this paper. Equation (24) indicates that whether the SPF in our method is dominated by the global information or the local information is determined by ω. As is emphasized above, the intensities in the regions far away from the object vary slowly. As a result, it is unsuitable and unnecessary to use local features to describe these image blocks, i.e., the weight of spf G (x, y) should increase in these regions. On the contrary, the intensities tend to change drastically, and c 1 as well as c 2 computed by Equations (7) and (8) may deviate from the real average intensities inside and outside the contour when the curve is in the regions near the boundary of object. In this case, the weight of spf L (x, y) should increase accordingly. To sum up, the global term plays a dominant role in the smooth regions while the local term becomes extremely significant in the regions with remarkable intensity inhomogeneity.
Similar to the concept of range ratio used in Equation (14), we propose to calculate the range matrix ζ that measures the local roughness as Equation (25).
where, max(·) and min(·) means computing the maximum and minimum in W m (x, y). To some degree, ζ(x, y) can be understood as the local contrast of (x,y). Next, we need to find an appropriate mapping function Φ to establish a mapping relationship between the range matrix ζ and the adaptive weight matrix ω. Two basic requirements that the mapping function needs to meet are reported below: i.
it should be a non-negative and monotonically increasing function; ii. Inspired by the previous work about IR small target enhancement [36], we fortunately find that the standard sigmoid function whose mathematical form is Sg(x) = 1 1+e −x can meet the above two points (see Figure 3a). However, we notice that there is a transition interval in Sg(x) that occupies a large percentage of the whole interval. We argue that the width of this transition interval should be strictly controlled, i.e., ζ 1 corresponding to the central point A and ζ 2 corresponding to the lower inflection point B should be carefully selected (see Figure 3b), because once the relatively smooth area with a small ζ is dominated by the local term of SPF, the evolution of curve will be extremely slow and is easy to drop into local minima. Thus, some modifications in regard to the scale and phase terms are necessary. Suppose that the final mapping function based on the standard sigmoid function is expressed as follows: where, a and b represent the scale and phase parameters, respectively. As is shown in Figure 3b, these two parameters are determined by the two key points A = (ζ 1 , 1 2 ) and B = (ζ 2 , ε 2 ), where ε 2 is a constant close to the lower bound 0. By substituting A and B into Equation (26), where, ζ 1 = λ · max(ζ) and ζ 2 = 0.7 × ζ 1 in this paper; λ ∈ (0, 1) is a constant which is set by experience. By this stage, the adaptive weight matrix can be worked out as    ω(x, y) Φ ζ(x, y) and the final SPF can be constructed as Equation (24). By this stage, the adaptive weight matrix can be worked out as ω(x, y) = Φ(ζ(x, y)) and the final SPF can be constructed as Equation (24).

Implementation
Although we have addressed the problem of constructing the SPF fusing global and local information, there are still three aspects that should be noticed in the implementation of the whole algorithm: (i) the initialization of level set function; (ii) the construction of level set formula; (iii) the evolution of level set function.

Initialization of Level Set Function
In our method, the level set function is initialized with a random positive constant using the following formulation: where, τ is a positive constant which is set randomly; Ω 0 is a subset of the image domain Ω 0 and ∂Ω 0 denotes the boundary of Ω 0 . With regards to ∂Ω 0 , it can be set randomly when the level set function is partly controlled by a global term [20]. That is to say, the initial location, shape and size of ∂Ω 0 will not influence the curve evolution obviously in our method. For convenience, we set ∂Ω 0 as an s 0 × s 0 sized square which is located in the center of the input image for all the experiments.
In Section 4, we will further demonstrate that the initialization of the level set function does not influence the segmentation results by experiments.

Construction of Level Set Formula
In this paper, the edge stopping function of the GAC model is replaced by the proposed hybrid SPF for the purpose of dealing with intensity inhomogeneity and blurred edge. Substituting the edge stop function g(·) in Equation (5) with the proposed SPF, our level set formula can be expressed using the following formulation: In Equation (29), the curvature-based term |∇φ| · div ∇φ |∇φ| is utilized to regularize the level set function φ. Since φ is a signed distance function that satisfies the condition |∇φ| = 1, the regularized term can be expressed by a Laplacian of φ. As pointed out in the theory of scale-space [37], the evolution of a function with its Laplacian equals to a Gaussian kernel filtering the initial condition of the function [7]. Hence, a Gaussian filter K σ r whose standard deviation is σ r can be used to regularize φ in this study. In this case, the regularize term div ∇φ |∇φ| can be removed from Equation (29). Also, the term ∇spf · ∇φ is unnecessary because the proposed method uses the statistical information of regions that has a large capture range and capacity of anti-edge leakage [7,8]. Lastly, the level set formula of our method can be written as:

Evolution of Level Set Function
Since the level set formula ∂φ ∂t can be regarded as the derivative between φ t and φ t−1 , the level set function φ t in the t-th iteration can be updated as: , this step is alternative, i.e., it is necessary if we want to selectively segment the desired object; otherwise, it is unnecessary.
Moreover, as introduced in Section 3.2.2, a Gaussian kernel filtering is employed to regularize the new level set function φ t as: where, K σ r is the Gaussian kernel utilized whose standard deviation is σ r ; the kernel size is 2 × round( 2 × σ r ) + 1, where round(·) is a rounding operator. By iteration, the evolution of φ t can be regarded as convergent when the current φ t meets the following condition: where, · means calculating the Frobenius norm, δ is a convergence threshold. In fact, Equation (33) indicates that the level set function does not change obviously any further. Finally, the zero-level set φ((x, y), t) = 0 is selected as the resulting contour ∂Ω * .

Summary of the Proposed Method
To highlight the complete implementation of our method, the main procedures are summarized in Algorithm 1 below according to all the afore-introduced theories.

Algorithm 1 Level set method using global and local information
Input: an IR image I 1. Initialization: initialize the level set function φ to be a binary function using Equation (28). 2. While not convergence do 3. for each pixel do 4. Calculate the global term spf G of SPF using Equation (6) 5. Calculate the local term spf L of SPF using Equation (23) 6. Calculate the adaptive weight matrix ζ using Equation (25) 7. Combing spf G and spf L to construct SPF using Equation (24) 8. end 9. Construct the level set formulation ∂φ ∂t according to Equation The resulting contour ∂Ω * : φ((x, y), t) = 0.

Experiment and Discussion
In this section, experiments based on real IR test images and relevant discussions are made. In Section 4.1, the value settings of all the parameters utilized in our method are introduced at first; in Section 4.2, comparisons in terms of segmentation results, segmentation precision, running efficiency

Parameter Setting
Here, all the parameters involved, as well as the corresponding meanings and their default values are listed in Table 1.
The value of the lower inflection point of Φ(x) 0.00001 α The balloon force of level set formula 400 τ The constant utilized for initializing the contour 1 s 0 The side length of initial contour ∂Ω 0 7 σ r The standard deviation of the Gaussian filter used for regularizing level set function 2.5 δ The convergence threshold 0.03

Parameter Description
According to the previous analysis [21], if ε 1 is too small, the energy functional would fall into a local minimum meanwhile the final contour location may also drift from the ground truth if ε 1 is too large. Thus, ε 1 is set as 1.5 which has been proved to be an appropriate choice [7]. The local windows used for computing local entropy, standard deviation, filtered difference, range ratio and the range matrix are uniformly fixed as m × m sized. The window size is usually 5 × 5 to 9 × 9, and we set m = 5 in the simulations. Besides, the standard deviation of the Gaussian filter used for embedding local features is set as σ e = 3.0 which is referred to the related analysis made by Li et al. [21]. Through their analysis based on the results for different values of σ e , this parameter has little influence on the segmentation result. η and λ are two unique parameters used in our study. It can be seen from Equations (21) and (22) that η affects the magnitude of the driving force to some extent, and through our repeated experiments, η = 1.5 can achieve a robust and satisfactory performance. λ decides the central point of Φ(x) and also determines the width of the transition interval of Φ(x) indirectly. Strictly speaking, λ varies for different images. In this paper, we set λ = 0.3 as a default value which is selected by experience. More analyses about η and λ are presented in Section 4.1.2. ε 2 is a constant that represents the value of the lower inflection point of Φ(x), and it will not obviously affect the output of Φ(x) as long as it is close to 0 (usually smaller than 0.0001) [38]. The balloon force α is a crucial parameter that directly determines the precision of segmentation and the convergence rate of the level set function. A larger α leads to a fast convergence, but it may also generate leakage of boundary. Vice versa. To keep a balance, we set α = 400 in this study. The constant τ used for contour initialization only needs to meet the requirement that τ > 0, so we set τ = 1 according to Zhang's choice [7]. s 0 denotes the size of initial contour ∂Ω 0 and it is set as s 0 = 7 in our experiments. As a matter of fact, this parameter has little influence on the segmentation results, which is demonstrated in Section 4.2.4. What is more, the standard deviation σ r of the Gaussian filter used for regularizing level set function is also quite important. On the one hand, if the standard deviation is too small, the presented model will be sensitive to noise; on the other hand, false extraction of object and leakage of boundary are easy to happen if it is too large. The empirical value of σ r is in the range [1.5, 3.0], and it is set as r = 2.5 in our model. Lastly, the convergence threshold δ is used as a stop condition of the curve evolution and Equation (33) indicates that the level set function φ t tends to be constant at t-th iteration. Based on this consideration, δ should be a constant close to 0 and we find that the resulting curve has no remarkable changes any more when the rate of change of φ is smaller than 3%.

Sensitivity Analysis
In this section, we focus on discussing the sensitivities of two significant parameters η and λ so as to obtain their suitable settings. As mentioned in Section 4.1.2, the default values of other parameter are selected according to the previous studies, so we do not discuss them again.
Here, F value [39,40] is employed as a measuring metric for sensitivity analysis. The define of F value is given as follows: where, γ is a harmonic coefficient and γ = 1 in this paper; P and R denote the precision rate and recall rate respectively, which are defined as: where, A R represents the area extracted by the level set method; A S represents the object region provided by the ground truth; A C represents the correctly segmented region, i.e., For an intuitive illustration, Figure 4 provides a schematic diagram to show relationship between A R , A S and A C . Obviously, a larger F value indicates a more accuracy segmentation result, and F is in the In this section, we focus on discussing the sensitivities of two significant parameters η and λ so as to obtain their suitable settings. As mentioned in Section 4.1.2, the default values of other parameter are selected according to the previous studies, so we do not discuss them again.
Here, F value [39,40] is employed as a measuring metric for sensitivity analysis. The define of F value is given as follows: (34) where, γ is a harmonic coefficient and  γ 1 in this paper; P and R denote the precision rate and recall rate respectively, which are defined as: where, R A represents the area extracted by the level set method; S A represents the object region provided by the ground truth; C A represents the correctly segmented region, i.e.,  For an intuitive illustration, Figure 4 provides a schematic diagram to show relationship between R A , S A and C A . Obviously, a larger F value indicates a more accuracy segmentation result, and F is in the range [0,1] . We consider that an appropriate parameter setting of η or λ need to get higher F values in most of the practical IR scenes. Here, 6 typical IR images shown in Figure 5 are utilized to make the sensitivity analysis, in which we continuously change the values of η or λ while other parameters are fixed with the default values listed in Table 1. We consider that an appropriate parameter setting of η or λ need to get higher F values in most of the practical IR scenes. Here, 6 typical IR images shown in Figure 5 are utilized to make the sensitivity analysis, in which we continuously change the values of η or λ while other parameters are fixed with the default values listed in Table 1. A statistic of F values with different η is implemented. η changes from 0.1 to 5 (the step length is chosen as 0.1) and the presented algorithm is conducted to produce a resulting segmentation image for each η , after which the corresponding F value is calculated. For λ , the same procedure is adopted to obtain the statistic of F values. In our simulation, λ varies from 0.01 to 1 and the step length is set as 0.02. By this means, we can get 50 groups of data for each parameter and the following two cartograms drawn in Figure 6 reveal the sensitivities of η and λ respectively.
As for η , it does influence the segmentation accuracy significantly since the η -sensitivity curves shown in Figure 6a undergo striking fluctuations in most scenes. However, we find that the F values can keep in high levels when η is in the range (0.75, 2) and the statistical results are relatively stable. To this end, we adopt  η 1.5 for all the experiments.
It can be seen from Figure 6b that the λ -sensitivity curves increase rapidly before  λ 0.1 and level off after  λ 0.4 . As far as we are concerned, the level set function is dominated by the local term when  λ 0.1 while it is dominated by the global term when  λ 0.4 . To keep a balance, we set  λ 0.3 in this paper.
We would like to state that the specific values of the parameters involved in this study may still need to be adjusted in different cases so as to get more accuracy segmentation results (especially η , λ and r σ ), and the values provided in Section 4.1 are just default values. A statistic of F values with different η is implemented. η changes from 0.1 to 5 (the step length is chosen as 0.1) and the presented algorithm is conducted to produce a resulting segmentation image for each η, after which the corresponding F value is calculated. For λ, the same procedure is adopted to obtain the statistic of F values. In our simulation, λ varies from 0.01 to 1 and the step length is set as 0.02. By this means, we can get 50 groups of data for each parameter and the following two cartograms drawn in Figure 6 reveal the sensitivities of η and λ respectively.
As for η, it does influence the segmentation accuracy significantly since the η-sensitivity curves shown in Figure 6a undergo striking fluctuations in most scenes. However, we find that the F values can keep in high levels when η is in the range (0.75, 2) and the statistical results are relatively stable. To this end, we adopt η = 1.5 for all the experiments.
It can be seen from Figure 6b that the λ-sensitivity curves increase rapidly before λ ≈ 0.1 and level off after λ ≈ 0.4. As far as we are concerned, the level set function is dominated by the local term when λ < 0.1 while it is dominated by the global term when λ > 0.4. To keep a balance, we set λ = 0.3 in this paper.
We would like to state that the specific values of the parameters involved in this study may still need to be adjusted in different cases so as to get more accuracy segmentation results (especially η, λ and σ r ), and the values provided in Section 4.1 are just default values.

Comparative Experiment
In this part, 6 state-of-the-arts level set methods: GAC [9], CV [15], SBGFRLS [20], LBF [21], ILFE [22], Cao's model [8] and an interactive segmentation method: MSRM [28] that all have been discussed in Section 1 are selected to make both qualitative and quantitative comparisons with the method proposed in this paper. In Section 4.2.1, the level-set-based algorithms are tested with a uniform initial contour while MSRM is conducted with the object and background markers being added artificially. Then, two metrics indicating the segmentation precision are employed to evaluate the segmentation results in Section 4.

Comparative Experiment
In this part, 6 state-of-the-arts level set methods: GAC [9], CV [15], SBGFRLS [20], LBF [21], ILFE [22], Cao's model [8] and an interactive segmentation method: MSRM [28] that all have been discussed in Section 1 are selected to make both qualitative and quantitative comparisons with the method proposed in this paper. In Section 4.2.1, the level-set-based algorithms are tested with a uniform initial contour while MSRM is conducted with the object and background markers being added artificially. Then, two metrics indicating the segmentation precision are employed to evaluate the segmentation results in Section 4.2.2. Next, comparisons in terms of running time and iterations are made in Section 4.2.3. Finally, an additional experiment to test the influence of contour initialization is implemented in Section 4.2.4.

Segmentation Result
In this section, apart from 30 IR images (part of them can be downloaded from Ref. [41]), 5 natural images [42] and 5 remote sensing images [43] are also adopted to be the database in order to verify that the proposed algorithm is general. Figures 7-9 report the segmentation results of IR images, natural images and remote sensing images respectively. The IR images utilized can be classified into two categories: single-object-images

Segmentation Result
In this section, apart from 30 IR images (part of them can be downloaded from Ref. [41]), 5 natural images [42] and 5 remote sensing images [43] are also adopted to be the database in order to verify that the proposed algorithm is general. Figures 7-9 report the segmentation results of IR images, natural images and remote sensing images respectively. The IR images utilized can be classified into two categories: single-object-images and multiple-object-images, and they all contain blurred object boundaries and a degree of local intensity inhomogeneity. In particular, test images, like IR.10 and IR.11 in Figure 7a, are contaminated by noises and the disturbances of false alarm (e.g., edges of the cart and laboratory furniture) are quite obvious. The target/background contrasts of the natural images and remote sensing images are higher and the details, especially the edges and textures, are richer.

Comparison of Segmentation Accuracy
To further prove the superiority of our method, we would like to compare the precision of segmentation results through a quantitative way. First of all, the segmentation results shown in Figures 7-9 are transferred to binary maps according to the corresponding final level set functions. Figures 10-12 display the binary maps as well as the ground truth, respectively.
As is introduced in Equation (34), F value is extensively employed to measure the segmentation accuracy in the field of object detection. Here, Table 2 lists the corresponding F values of all the binary segmentation maps in Figures 10-12 and the average F value of each algorithm is given in the last row. The top two F values in each group are marked with bold values.
Apart from F values that indicate the overlapping rate, another metric called boundary precision (BP) that evaluates ability of precise boundary locating [27] is also applied to make quantitative comparisons in this section. Below, the definition of BP is given as:

Comparison of Segmentation Accuracy
To further prove the superiority of our method, we would like to compare the precision of segmentation results through a quantitative way. First of all, the segmentation results shown in Figures 7-9 are transferred to binary maps according to the corresponding final level set functions. Figures 10-12 display the binary maps as well as the ground truth, respectively.
As is introduced in Equation (34), F value is extensively employed to measure the segmentation accuracy in the field of object detection. Here, Table 2 lists the corresponding F values of all the binary segmentation maps in Figures 10-12 and the average F value of each algorithm is given in the last row. The top two F values in each group are marked with bold values.
Apart from F values that indicate the overlapping rate, another metric called boundary precision (BP) that evaluates ability of precise boundary locating [27] is also applied to make quantitative comparisons in this section. Below, the definition of BP is given as: For a fair comparison, we use a uniform initial contour to define the binary level set function for each test image, which can be seen from the blue rectangle in the first columns of Figures 7-9. Also, the initial object and background markers of MSRM, which are marked in green and blue, are shown in the second columns. In addition, the 3-8 columns correspond to the segmentation results of GAC, CV, SBGFRLS, LBF, ILFE, Cao's model, MSRM and our method, respectively. Please note that the final evolving curves are marked in red.
As is introduced above, GAC model is highly dependent on the initialization of contour, and it can only get complete segmentation results when the initial contour contains the whole object of interest. In this comparative experiment, the initial contour is set in the image center and most of the objects are partly covered. As a result, the contours evolved by GAC model all converge to local minima, which leads to the seriously incomplete segmentations. More seriously, the curve totally disappears if the initial contour is far away from the object (see IR. (20,23) in Figure 7c and RS. 3 in Figure 9c).
Chen-Vese (CV) model draws upon the statistical intensity information to evolve the curve and is able to achieve satisfactory performances in binary phase segmentation. Basically speaking, all the objects are extracted by CV model, but boundary leakage occurs in those inhomogeneous areas due to the fact that this model is established with the assumption that image intensities in each region always maintain constant. For example, as can be seen from IR. (1,7,11,14) in Figure 7d, some striking holes on the clothes are mistakenly generated in the final segmentation results. Moreover, CV model is quite sensitive to the disturbance of noise, which can be demonstrated by the fake targets on the cart and ground in IR. (10,11) in Figure 7d. Besides, we find that the convergence of CV model is quite unstable, e.g., IR. (18,20,26,29) in Figure 7d, which we infer may be associated with the contour initialization.
In regard to SBGFRLS model, it also does well with IR images with homogeneous intensities, e.g., IR. (18,20,26,29) in Figure 7e and natural images, but its performance turns to be bad when dealing with inhomogeneous cases. From our experiments, we can clearly find that incompleteness of segmentation always occurs near the borders between the object and the ground, the cloth parts of humans in IR images (see IR. (3,(5)(6)(7)(8)14) in Figure 7e) and the lands in remote sensing images (see RS. (1,2,4) in Figure 9e), because the intensity distributions of these regions are quite uneven, which causes severe interferences for the expansion or shrink of the contour.
Additionally, it is obvious that LBF and ILFE get a large number of false targets when coping with images with rich texture information. On the one hand, many edges belonging to the background are recognized as the real boundaries in IR. (10,11) in Figures 7f and 7g; on the other hand, the real object is divided into too many blocks, which absolutely cannot satisfy the visual requirement of image segmentation. It should be noticed that the initial contour has an outstanding influence on the results of ILFE, since there is a remarkable false target with a relatively large area around the position of the initial contour in IR. (20,21,26,27) in Figure 7g.
Cao's model takes full advantages of global and local intensity information, so it can successfully address the weak edges and local intensity inhomogeneities. As a whole, Cao's method is able to extract the objects in all the test images, although there are still some incompleteness existing in the final contours, especially in the feet and clothes parts, which can be seen from IR. (3, 5-8, 13, 14) in Figure 7h. We notice that its performances in remote sensing images are not satisfactory enough and there are several false segmentations existing in RS. (1,2) in Figure 9h, while the segmentation results are almost perfect in natural images. From our point of view, this is mainly because that the local term of Cao' model that only takes intensity information into account is still not robust to complex texture patterns.
Maximal similarity-based region merging (MSRM) method is an interactive image segmentation that guides the region merging with the aid of two kinds of markers. Although the fore-and-background markers are correctly provided, we notice that the object boundaries obtained still deviate from the ground truth in some cases, especially in IR images, such as IR. (1-3, 10-11, 18, 27) in Figure 7i. Intuitively, those inaccurate boundaries are lack of smoothness and many background image patches are mistakenly classified into the foreground class. In our opinion, there are two main reasons that lead to the segmentation errors: (i) the rough segmentation using mean shift or super-pixel is always inaccurate in gray images; (ii) IR images are lack of color and texture information, so the similarities between each region are not convincing.
Fortunately, our method obtains the most complete and precise segmentation results among the 8 algorithms owing to its strategy of integrating the average intensity-based global information and multi-feature-based local information. As can be seen from IR. (1)(2)(3)28) in Figure 7j, the feet are perfectly picked out and the problem of boundary leakage does not occur in our method. Also, the interferences of noise and edge in IR. (10,11) in Figure 7j do not take any effects on the final results. Thus, we argue that the superiority of the proposed method is proved by this comparative experiment. On the other hand, our method is somewhat weak at dealing with tiny objects, e.g., the ox horn in N. 2 in Figure 8j and the small circle-shape land in RS. 5 in Figure 9j are lost, and the fluorescent lamp in IR. 18 is mistakenly recognized as the object.
Last but not least, all these methods, except GAC model, is capable of partitioning multiple objects in a single image simultaneously, which can be verified by the results shown in IR. (25)(26)(27)(28)(29)(30)

Comparison of Segmentation Accuracy
To further prove the superiority of our method, we would like to compare the precision of segmentation results through a quantitative way. First of all, the segmentation results shown in Figures 7-9 are transferred to binary maps according to the corresponding final level set functions. Figures 10-12 display the binary maps as well as the ground truth, respectively.        As is introduced in Equation (34), F value is extensively employed to measure the segmentation accuracy in the field of object detection. Here, Table 2 lists the corresponding F values of all the binary segmentation maps in Figures 10-12 and the average F value of each algorithm is given in the last row. The top two F values in each group are marked with bold values.
Apart from F values that indicate the overlapping rate, another metric called boundary precision (BP) that evaluates ability of precise boundary locating [27] is also applied to make quantitative comparisons in this section. Below, the definition of BP is given as: where, u and v are the pixels on the result boundary and ground-truth boundary; #u and #v denote the number of pixels on the corresponding boundaries; for each pixel u, min v d(u, v) means the minimal distance between u and the pixels on the ground-truth boundary; similarly, for each pixel v, min u d(v, u) means the minimal distance between v and the pixels on the result boundary. Obviously, a higher BP refers to a more accuracy boundary locating.  [44] and the ground-truth boundaries. Table 3 reports the BP value of each test group clearly.        As is revealed from Table 2, our method achieves the largest F values on average (96.04%) while GAC model always gets the worst F values in all the test images. GAC model is an edge-based level set method, so its segmentation results are poor if the initial contour cannot totally contain the real object. In our case, the initial contour is set as a rectangle located in the image center which partly covers the object. Hence, the segmentation precision is doomed to be very low. LBF model and ILFE model excessively concentrate on the local intensity feature, but they ignore the global intensity instead. Although this strategy can extract the details in a cautious way, it may result in too many segmented blocks in the object region and cause large quantities of false targets in the background. That is the main reason their F values basically maintain around 50-60%. SBGFRLS model is able to partition the objects well, but it cannot do in addressing the details perfectly. When the evolving curve approaches the boundary, the further evolution becomes inaccuracy due to the fact that the regions near the boundary are always inhomogeneous, which does not accord with the assumption of this model. That is to say, the average intensities inside and outside the contour estimated by Equations (7) and (8) are distorted. As a result, it is difficult for SBGFRLS to achieve F values greater than 95%. CV model's robustness to local intensity inhomogeneities is stronger than SBGFRLS model, but the it cannot avoid causing false segmentations inside the object. This drawback is reflected by those minor holes on the human bodies. Also, the disturbances of noise decrease the F values of CV model to some extent. The F values of Cao's model are around 92% which are approximately on the same level of SBGFRLS model, but the average value is decreased by its weak performance in a small part of the test images, such as IR. (18,26) in Figure 10. To further analyze, we consider that it may be the inaccuracy of its weight function used to integrate the global and local terms that results in the striking boundary leakage, indicating that its robustness needs to be improved further. With the help of makers, MSRM achieves relatively high F values, even higher than our method in certain test images. We think it is quite reasonable because the additional makers do provide much useful information for MSRM to distinguish between objects and backgrounds. The average F value of our method is about 4.5% higher than the second place (SBGFRLS), and our method gets the top largest F values in almost all the database. Through observing the binary segmentation results, we find that our method possesses an outstanding ability to overcome the problem of boundary leakage and it is much less sensitive to the noise, which guarantees its high segmentation precision. However, we still need to point out that the segmentation errors of the proposed method are mainly derived from the miss of groove parts (e.g., the inner thigh). The overall distribution of BP value depicted in Table 3 is almost the same as F value, i.e., our proposed level set method generally outperforms other comparing algorithms in terms of boundary precision (0.8575 on average) while the BPs of GAC model are far smaller than others (only 0.0219 on average). This phenomenon matches the statistical result of F values quite well, proving that the conclusions derived from these two metrics are both convincing. It is interesting that MSRM often gets the top two best BPs among the IR image database, but SBGFRLS models takes the second place on average. This phenomenon indicates that SBGFRLS model is more robust than MSRM method and it can keep in a relatively high boundary locating level as well. CV, LBF and ILFE absolutely do not have strong boundary locating abilities according to Table 3. For CV model, the weak and unstable convergence decreases it overall BPs to a great extent while the large quantities of false boundaries caused by LBF and ILFE models weaken their BPs sharply. The performance of Cao's model is moderate among all the 8 algorithms. There are no severe segmentation errors, but the BPs cannot be improved further since the leakage of boundary is still not solved.

Comparison of Running Time
As is commonly known, running speed is also a significant factor in evaluating an algorithm. In this section, we test the number of iterations (except MSRM) and the execution time respectively, which are listed in Tables 4 and 5.
With respect to the number of iterations, it can be clearly observed from Table 4 that our method has the fewest iterations in most of the test images (fewer than 50 times), which demonstrates its best convergence among all the other methods. We consider that its outstanding convergence is mainly owing to the fact that the average intensity-based global term accelerates the evolving speed meanwhile the local multi-feature-based term avoids the energy functional falls into local minima. SBGFRLS, LBF and Cao's models also achieve relatively satisfactory iteration times which maintain around 50 times on the whole. However, we notice that the convergences of SBGFRLS model and Cao's method become poor when dealing with IR. 10 that contains several edge disturbances. This phenomenon indicates that these two models would still drop into local minima when facing the complex scenes. When compared with the afore-discussed methods, CV and ILFE models turn to be relatively weak in iterations. Since CV model does not contain any local intensity information, the evolution of curve may slow down in inhomogeneous regions. For ILFE model, the Laplacian fitting term may have an adverse effect on the convergence also. Lastly, GAC model is the worst one whose iterations are more than 10 times than the proposed method. As has been discussed above, its curve evolution is easy to stagnate around the initial contour if the initialization itself is set inappropriately.
When considering the whole execution time, it can be seen from Table 5 that those level set methods that only involve global intensity information, e.g., GAC, CV and ILFE, get relatively fast running speeds. Among them, although GAC and CV suffer from large numbers of iterations, their whole running time is still shorter than SBGFRLS and Cao's model owing to its high efficiency of the single iteration. The total running time of LBF is also fine, but it is because of its few iterations, rather than the single iteration efficiency. MSRM gets the fastest execution speed among all the 8 algorithms, because this method does not have the problem of convergence and the user's interaction time is not included in our test. Our method achieves the moderate performance in this comparison, meaning that the running efficiency is not the best. As far as we are concerned, it is the procedures of calculating the multi-features and constructing the driving forces inside and outside the contour that increase the running time. Thus, it is of great necessity for us to adopt the parallel processors, e.g., FPGA and GPU, and optimize the codes further. will result in different evolving curves. As is shown in contour 2, 3 of Figure 17, the head part of 'man' image will be segmented into fewer blocks by LBF and a striking false contour around the head part will be generated by ILFE when the initial contour is circle. In addition, as we can see in contour 2, 3 of Figure 16 and contour 2, 4 of Figure17, the curves of CV model stop evolving if it is initialized in a region suffering from intensity inhomogeneity seriously. Accordingly, its Fs and BPs both occupy wide ranges, varying from 0.34-0.96 and 0.06-0.97 respectively.   keep in a low level in terms of the two metrics, we still notice that different shapes of initial curve will result in different evolving curves. As is shown in contour 2, 3 of Figure 17, the head part of 'man' image will be segmented into fewer blocks by LBF and a striking false contour around the head part will be generated by ILFE when the initial contour is circle. In addition, as we can see in contour 2, 3 of Figure 16 and contour 2, 4 of Figure17, the curves of CV model stop evolving if it is initialized in a region suffering from intensity inhomogeneity seriously. Accordingly, its Fs and BPs both occupy wide ranges, varying from 0.34-0.96 and 0.06-0.97 respectively.               Here, three key points: position, size and shape, are considered by us. For both Figures 16 and 17, the initial positions of contours can be classified as inside the object, outside the object, partly inside the object; the initial sizes of contours can be classified as 7 × 7 and 15 × 15; the initial shapes can be classified as square and circle.
Intuitively speaking, the initialization of contour has little effect on SBGFRLS, Cao's model and the proposed method according to our experiment. As is reported in Tables 6 and 7, the Fs and BPs of our method and Cao's model are still superior to other comparing algorithms while those of SBGFRLS are moderate. In general, the utilized metrics of the three methods do not undergo remarkable fluctuations. By contrast, the segmentation results of GAC model are seriously affected by the initial contours. On the one hand, it is highly possible for the contour to converge towards the position where its initial condition locates; besides, it is even possible to be absorbed completely if the initial contour has no overlaps with the object. Although the performances of LBF and ILFE still keep in a low level in terms of the two metrics, we still notice that different shapes of initial curve will result in different evolving curves. As is shown in contour 2, 3 of Figure 17, the head part of 'man' image will be segmented into fewer blocks by LBF and a striking false contour around the head part will be generated by ILFE when the initial contour is circle. In addition, as we can see in contour 2, 3 of Figure 16 and contour 2, 4 of Figure 17, the curves of CV model stop evolving if it is initialized in a region suffering from intensity inhomogeneity seriously. Accordingly, its Fs and BPs both occupy wide ranges, varying from 0.34-0.96 and 0.06-0.97 respectively.

Methdology
In this paper, a new level set method fusing average-intensity-based global information and multi-feature-based local information is proposed to segment IR images. Different from the conventional level set methods that only consider single intensity feature, the presented model constructs a hybrid SPF made up of a global term and a local term. The global term is calculated based on the average intensities inside and outside the contour while the local term is represented by the driving force computed by four weighted local features. To keep a balance, the two terms are combined via an adaptive weight matrix. By substituting the edge stopping function in GAC model with the proposed SPF, the level set formula is constructed and the level set function is re-initialized by a Gaussian filtering. By iteration, the final contour of object can be obtained and the object is thus extracted. Both qualitative and quantitative experiments verify that our method outperforms other state-of-the-art level set methods in terms of segmentation accuracy, convergence and robustness to contour initialization.

Applicability and Limitation to Remote Sensing
IR imaging belongs to one of the significant branches of remote sensing and IR image segmentation is extensively applied in many remote sensing applications, e.g., intelligent urban surveillance, unmanned aerial vehicles, pedestrian detection, etc. Although the proposed level set method is aimed to cope with the blurred boundary, low contrast and intensity inhomogeneity in IR image, it can also be directly applied to process remote sensing images when they are converted to single channel images, which has been demonstrated in Section 4.2.1. However, the color and spectrum information of remote sensing images are not fully used when they are processed using our method, indicating that the inherent advantages are ignored. On the other hand, we find that remote sensing images usually contain rich texture information, but the proposed algorithm is somewhat weak at coping with the tiny objects, especially the small circular holes existing in the land. We infer it is the Gaussian filtering used for regularizing the level set function that removes these details.

Future Work
In the future work, we plan to further optimize the codes of our algorithm and try to transplant the Matlab codes to parallel hardware, e.g., FPGA and GPU, so that the whole computational time can be reduced greatly and the real-time running can be expected. Besides, we notice that the target of interest in IR image can be seen as a salient object, so its visual saliency may be exploited for further investigation.