Combining Multiple Algorithms for Road Network Tracking from Multiple Source Remotely Sensed Imagery: a Practical System and Performance Evaluation

In light of the increasing availability of commercial high-resolution imaging sensors, automatic interpretation tools are needed to extract road features. Currently, many approaches for road extraction are available, but it is acknowledged that there is no single method that would be successful in extracting all types of roads from any remotely sensed imagery. In this paper, a novel classification of roads is proposed, based on both the roads' geometrical, radiometric properties and the characteristics of the sensors. Subsequently, a general road tracking framework is proposed, and one or more suitable road trackers are designed or combined for each type of roads. Extensive experiments are performed to extract roads from aerial/satellite imagery, and the results show that a combination strategy can automatically extract more than 60% of the total roads from very high resolution imagery such as QuickBird and DMC images, with a time-saving of approximately 20%, and acceptable spatial accuracy. It is proven that a combination of multiple algorithms is more reliable, more efficient and more robust for extracting road networks from multiple-source remotely sensed imagery than the individual algorithms.


Introduction
The increasing availability of commercial high-resolution satellite imaging sensors such as SPOT5, IKONOS, QuickBird and TerraSAR, requires the availability of suitable automatic interpretation tools to extract and identify cartographic features, especially in rapidly changing urban areas. Roads are one of the most important linear cartographic features. Particularly, extraction of road networks from remotely sensed imagery is not only meaningful for cartography and topography [1], but also significant for various applications of geodata such as automatically aligning two spatial datasets [2] or automated vehicle navigation [3]. Therefore, research on the automatic extraction of road networks from remotely sensed imagery has been a topical research theme in the various fields of photogrammetry, remote sensing, geographic information systems, pattern recognition, and computer vision. As a result, many strategies, methodologies and algorithms for road network extraction have been presented since the 1970s, which have achieved varying degrees of success [4]. According to the level of automation, the techniques for road extraction with the aid of computer vision can be coarsely classified into automatic and semi-automatic approaches.
The automatic methods attempt to seek an analysis and interpretation of the image similar to that of a human operator. Nevatia and Babu [5] utilized an edge detection method to identify ribbon roads with lateral and parallel anti-edges. Radon transform was employed to locate the roadsides and to measure the width of a road [6]. Due to the variation of the complexity of image contents, the above low level edge detection methods are insufficient to extract the road features with high completeness, correctness, quality and accuracy. Therefore, more high level techniques have been developed. For example, angular texture signatures can make use of the characteristics of road's texture, and this is utilized to find candidate road centerline points [7] or to discriminate road surfaces from the parking lots [8]. Mathematical morphology is not only useful to dilate and enhance the road surfaces and roadsides [9], but also helpful to extract the skeletons of ribbon roads [10]. Segmentation and classification methods are popular due to the fact that they could utilize road's radiometric, geometrical, topological, and elevation characteristics to help finding road networks [11,12]. Heipke et al. [13] utilized a multi-scale strategy to extract global road network structures initially at a low resolution and detailed substructures later at a high resolution. Multi-view approaches not only can reconstruct 3D models, but also can utilize the multi-cues from multiple source images [14,15]. Rulebased approaches use reasoning methods to deal with the problems of segment alignment and fragmentation, as well as enable bottoms-up processing to link the fragmented primitives into a road network [16]. Statistical inference methods were also used to model the road linking process as a geometric-stochastic model [17], an active testing model [18], a MRF-based model [19], or a Gibbs point process [20]. Another category of automatic approaches is the use of existing information or knowledge to guide road extraction [21]. Currently, the tendency is that more and more methodologies are based upon hybrid strategies. For example, profile analysis, rule-based linking and model-based verification are combined together to detect, trace and link the road segments to form a road network [22]; Hu et al. [2] combined a spoke wheel operator, used to detect road surfaces, and a toe-finding algorithm, utilized to determine the road direction, to trace roads; multi-resolution and object-oriented fuzzy analysis is integrated to extract cartographic features [23]; and a novel combination strategy was adopted by Peng et al. [24] who incorporated an outdated GIS digital map, multi-scale analysis, a phase field model and a higher order active contour to extract roads from very high resolution (VHR) images. Despite the fact that much work on automatic approaches for road extraction has taken place, the desired high level of automation could not be achieved yet [25]. The main problem of a fully automatic approach is that it needs some strict hypothesis of road characteristics, but road properties vary considerably with ground sampling distances (GSD), road types, and densities of surrounding objects, light conditions etc. Therefore, the quality of automatic extraction is usually insufficient for practical applications.
On the other hand, semi-automatic methodologies are considered to be a good compromise between the fast computing speed of a computer and the efficient interpretation skills of a human operator [1], and quite a number of promising approaches for semi-automatic road extraction have been proposed so far. Optimal search methods, which are often realized by dynamic programming [26] or snakes [27], are frequently applied to find or determine an optimal trajectory between manually selected seed points. In these models, geometric and radiometric characteristics of roads are integrated by a cost function or an 'energy' function. Then the road extraction is equivalent to seeking the global energy minimum. However, it is hard to define the reasonable 'energy' function for each image. Angular texture signature (ATS) has also been improved to semi-automatically track road axes. Lin et al. [28] and Shen et al. [29] proposed the mean value and the entropy besides existing variance to measure the texture. As a result, ATS is capable of tracing the bright ribbon roads from SPOT imagery or tracing the dark ribbon roads from VHR COSMO SAR imagery. Minimum cost to follow a path was proposed by Shukla et al. [30]. Essentially, the minimum cost is similar to the ATS in the sense that it takes variance as a measure of texture. Another practical methodology is template matching. McKeown and Denlinger [31] introduced a semi-automatic road tracker based on profile correlations. This road tracker starts from some road seed points. The profile matching technique compares a reference profile with the road profile at a pixel predicted to be on the road. The differences between the two profiles are measured by identifying two geometric parameters (shift and width) and two radiometric parameters (brightness and contrast). These parameters are estimated by minimizing the squared sum of the gray value differences between the profiles. Actually, the subsequent approaches essentially follow the same scheme as the above method. Vosselman and Knecht [1] improved the profile matching with a Kalman filter, Baumgartner et al. [25] also developed a human-computer interactive prototype system based on the above method. Zhou et al. [32] used two profiles, one orthogonal to the road direction and the other parallel to the road direction, to enhance the robustness of the tracker and applied extended Kalman filter and particle filter to solve profile matching problems for road tracking, and a prototype system for semi-automatic road extraction is also developed. Kim et al. [33] utilized a rectangular template instead of profile to track ribbon roads by least squares template matching from VHR satellite images. Zhao et al. [34] used rectangular template matching on the classified imagery and proposed another prototype. However, there are no reports on their applicability on VHR images finer than 0.2 m/pixel, and most of the road trackers mentioned above don't operate well when they encounter irregular geometric deformations and radiometric changes due to the appearance of road junctions, material changes, occlusion from cars, shadows, lane markings etc. As a result, consecutive failure of the road tracker not only loses efficiency, but also increases the work required of the human operators.
After reviewing the existing work on road extraction, it was realized that ATS, profile matching and rectangular template matching are practical to extract road networks. However, it is acknowledged that there is no single method which is successful in extracting all types of roads from any remotely sensed imagery. Moreover, it is believed that a number of techniques developed for different classes of roads and for different kinds of imagery will lead to a many-branched solutions for road extraction which will be effective for a wide range of road types. In this paper, roads are classified into the ones with lane markings and the ones without lane markings on VHR remotely sensed imagery whose GSDs range from 2.5 per pixel to 0.2 m per pixel. Furthermore, two road trackers are proposed to extract the roads. Particularly, the profiles and rectangular templates are combined together to form an interlaced template, while the ATS [28,29] is further improved by a parallelepiped classifier, named parallelepiped angular texture signature (PATS). Correspondingly, a general road tracking framework is proposed. Then least squares interlaced template matching is applied to extract ribbon road networks with salient road markings, while the profile matching, rectangular template matching, and PATS are combined to extract ribbon roads without salient lane markings. In general, a practical system, combining multiple road trackers, is designed to extract the roads from VHR images in this paper.
The remaining sections of this paper are organized as follows. In Section 2, a general scheme of road tracking is introduced, and the interlaced template matching and the PATS are proposed. In Section 3, experiments are described and the performances of the road trackers are evaluated and discussed. In Section 4, some conclusions are reached.

The general framework
In the semi-automatic plotting system, a human operator is required in the annotation process where computer algorithms are utilized to assist the plotter performing measurement tasks. From the user's point of view, the procedure is as follows: the operator first identifies a short segment of a road which serves as initialization for an automatic tracking, and then the tracker algorithms are launched and automatically trace the road axis for as long as possible. Whenever the internal evaluation of the algorithms indicates that the tracker might have lost the road centerline, it needs intervention of the user. Then the operator has to confirm the tracker or he/she must edit the extracted road and put the tracker back the road again. Particularly, the semi-automatic tracking is divided into the following steps: 1) Initialization by three seed points. Similar to McKeown's strategy [31], in this paper the initialization is also accomplished by manual selected seed points. However, we make an improvement in that a three consecutive mouse clicks strategy is adopted to obtain the starting location, direction, width of the road, and the step size as well. This three seeds method is feasible for most ribbon roads, and it is accomplished as follows [see Figure 1(a)]: the human operator enters a road segment with two consecutive mouse clicks on A' and B with the axis joining the points defining one road sideline A'B, which indicates road direction arctangent(A'B), then the following third click on C, on the other roadside, defines the width w of the road. w is equal to the distance between the point C and the line A'B. As a result, the above three points can also derive a rectangle A'B'B"A" with width w and length sign L . And the direction of A'B' is equal to arctangent(A'B) while C is located on the side B"A". Then a start point A of the road is derived from the middle point of A' A". In this paper, sign L is equal to a coefficient, sign C , multiplying by the road width w . . This toe-finding approach is divided into two separate steps. First, the dominant peaks are found. Second, valleys between sequential peaks are found. In the algorithm, there are three constant coefficients, 1  , 2  , and 3  , and they are selected based on our experiments. We set 1  = 0.25, 2  = 0.8, and 3  = w /8, i.e., if the distance of two peaks is less than w /8, then merge the two peaks. As long as any lane marking found, generate a reference interlaced template described in Section 2.2, and then interlaced template matching is triggered. If there are no markings, the standard deviation of intensity rectangle A'B'B"A" is calculated. In this circumstance: 2) Acquire the next road axis point. If the interlaced template matching, profile matching or template matching is launched, the system then predicts next most possible position of the road axis according to the following equation: are the coordinates of next template point; When the reference template is convolved within part of the image by Equation (1), a number of target templates are generated. The one which has the minimal squared sum of gray value differences between reference template and target template is taken as the optimal response. With this optimal response, the next road axis is located.
Alternatively, if PATS is triggered, the operation is performed as follows: revolve the current road axis point p and calculate the PATS values; take the direction of the significant maximum which has a minimal inclination with the current road direction current  as the direction of next road axis point; and get the next road axis point by Equation (1) where rotating  =0°, S =0.
3) Validate the above optimal point. Once the above obtained point is added into the road trajectory, check whether any stopping criterion is fulfilled as follows:  the change of the directions of two adjacent road segments is larger than predefined threshold T ;  approaching an extracted road or border of the image;  the minimal squared sum of gray value differences between the reference template and the target template surpass 1 T for the interlaced template matching, profile matching or template matching;  compactness of PATS polygon [8] is larger than 2 T .
If any of these conditions is encountered, exit the tracking procedure and go to Step 4). Otherwise, go to Step 2) again. 4) Stop the trace. If no rule can be made to continue the tracking procedure, the system will stop tracking, report the reason, and offer an appropriate choice of user interaction. The user can then modify the traced path with the aid of common GIS-functionalities, manually digitize complex roads, update the reference template (occurrence of change of the number of lanes, or significant change of spectral characteristics due to different ages, construction materials, illumination angles, etc.), or restart the tracking process from the next specified location.
In this paper, for each test image, the four state-of-the-art road algorithms, such as the profile matching, rectangular template matching, interlaced template matching, PATS, and their combination strategy were employed, respectively, to trace the road network to compare their performances. In the tracking procedure, for each type of road, if a road tracker frequently fails, it suggests the road tracker is not suitable for tracing this type of road, and then another type of road is tried. Moreover, manual digitization was adopted to create reference data to verify the performances of the road trackers. Note that in the process of full manual annotation, the manual clicks should be as accurate as possible, i.e., the selected points should be on the road axes. Furthermore, the digitized roads should be smooth, i.e., abrupt changes in directions should be avoided and no zigzags should occur.

The interlaced template matching
In fact, the salient lane markings are usually less frequently disturbed by the occlusions of cars or the shadows of colonnade than other parts of road surfaces [35] on VHR imagery. This meaningful phenomenon is neglected by most of existing research about semi-automatic road tracking. In this case, if templates of lane markings rather than the template of a segment of whole rectangular road surface are utilized in template matching, it may be more effective and robust. Considering that the above salient road markings are often discontinuous, it is a tradeoff between a rectangular template of road surface and some templates of salient lane markings. Consequently, we can take a novel template, i.e., an interlaced template which is composed of two parts: some cross-section profiles (i.e., each is a typical intensity profile perpendicular and symmetrical to the road axis) and some rectangular templates of road markings (i.e., some intensity rectangles whose width are as wide as lane markings).
Normally, the intensity of pure road surface except the disturbing features tends to be constant, and so are the intensities of road markings. This characteristic can be formulated as follows: Equation (3) guarantees that the number of pixels in the profiles is equal to the one of pixels of lane markings, and the equality has significant effect on determining the threshold 1 T . In this paper, 1 T is equal to variance of the profiles in the reference interlaced template plus variance of the markings in the reference interlaced template. In this sense, 1 T is adaptive in our paper.

PATS
A texture measure is described in [7]. Practically, it is hard to populate ATS if it still takes variance as a measure of texture in complex scene where even intensities of one feature differ greatly. Fortunately, it could still function on a classified thematic image. As mentioned above, the initialization information derives a rectangle A'B'B"A". And then for each band, calculate the mean M and standard deviation  of the gray values of pixels in rectangle A'B'B"A" respectively, then perform parallelepiped classification using ± as limits on the raw image. Subsequently, set the pixels of road subclass as 1, meanwhile set the pixels of any other subclass as 0, and then the mean of the rotating template is feasible too [see Figure 2 Figure 2(a) displays an ATS with  =5°, Figure   1(c) shows the improved PATS values related to Figure 2(a). The direction of the significant maximum which has a minimal inclination with road direction  is taken as the direction of next road axis point.
To find the relationship between the shape of the PATS polygon and corresponding pixel types, we plot the PATS values around the pixel under consideration with corresponding direction and link the last point to the first point [6]. The resulting polygon is called the PATS polygon, and Figure 2(d) shows the calculated PATS for pixel p with the PATS polygons. If the road has a good contrast with its surrounding objects, the polygon usually looks like an ellipse or  -shape, or a circle in other cases. The compactness of PATS is defined as the compactness of the PATS polygon using Equation (4) where A and P are the area and perimeter of the PATS polygon, respectively. It is employed to check whether the shape of the PATS polygon looks like a circle. A circle-like PATS polygon usually indicates that the tracker is no longer fit for tracking the road ahead. Note that our program will calculate the compactness of PATS at regular intervals to verify whether the PATS is still suitable for tracing a road.

Experiments and Performance Evaluation
A prototype system based on the proposed framework was developed to verify the algorithms. The hardware we used is a Dell Precision workstation T5400, with an Intel Xeon 2.50-GHz plus 2.49-GHz CPU and 3-GB RAM. It should be noted that since our system is still in the simulation stage, the program is implemented in a single thread mode. Performance should be improved if it is implemented as a multi-thread application taking full advantage of the features of multi-core processors.  (1), (2) and (3) are fused from their panchromatic band and multispectral bands by smoothing filter-based intensity modulation algorithm [36], and the geographic area for the images in (1), (2) and (3) is located at Tai'an, Shandong Province, P.R. China. Additionally, the SAR image is located at He'fei, Anhui Province, P.R. China, while the coverage area of the DMC image is within Denver, Colorado, USA. Note that the noises include the occlusions of vehicles, shadows of buildings, occlusions of trees, road junctions or crossings, and speckles. Particularly, "v" denotes the occlusions of vehicles, "b" denotes the shadows of buildings, "c" denotes the occlusions of colonnades, "j" denotes the road junctions or crossings, "s" denotes the speckles, and "n" denotes no salient obstacles on the road surfaces.

Data collection
The above five images contain the most frequent road types such as national highways and railroads, intrastate highways, ring roads, streets for local transportation, and rural roads. Furthermore, each image contains different road structures such as curves, ramps, straight roads, junctions and bridges. It also includes various road conditions such as occlusions of vehicles and shadows from trees or high buildings. The detailed characteristics of roads refer to Table 1 and Figure 8. Note that, in Table 1, the contrast means the differences between roads and other features measured by three qualitative levels: low contrast roads refer to these whose digital numbers are similar to the ones of surrounding objects while the boundaries are still discernable, mean contrast roads refer to these whose digital numbers are slightly different from the ones of surrounding objects while the boundaries are discernable and high contrast roads refer to these whose digital numbers are quite different from the ones of surrounding objects while the boundaries are discernable; the length of the roads are also measured by three qualitative levels: short one refers to that whose length is no larger than 1.0 km, mean one refers to that whose length is larger than 1.0 km but no longer than 3.0 km and long one refers to that whose length is larger than 3.0 km; the average curvature indicates the smoothness of the road measured by the levels: short one refers to that whose average curvature is no larger than 0.3, mean one refers to that whose average curvature is larger than 0.3 but no larger than 1.5 and high one refers to that whose average curvature is larger than 1.5; and the above thresholds are set based on our experience.
Based on our extensive experiments, the first threshold 1 I , namely the standard deviation of intensity rectangle A'B'B"A", is set to 10.0; while the second threshold 2 I , namely the standard deviation of intensity rectangle A'B'B''A", is set to 20.0. For profile matching, the coefficient profile C to calculate the width of a profile is set to 2.0, the coefficient steplength C to calculate the length of step size is set to 0.5, and the changed angle T for two adjacent road segments is set to 10°. For template matching, the profile C is set to 1.0, the coefficient sign C to calculate height of the template is set to 2.0, steplength C is set to 0.8, and T is set to 10°. For PATS algorithm, profile C is set to 1.0, sign C is set to 2.0, steplength C is set to 0.8, T is set to 10°, and 2 T is set to 0.8. For interlaced template matching, profile C is set to 1.0, sign C is set to 0.5, steplength C is set to 0.4, T is set to 10°, and the width sign W of a lane marking is set to 3 pixels.

Evaluation criteria
Zhou et al. [32] proposed that the criteria used to evaluate a semi-automatic system include correctness, completeness, efficiency and accuracy. In our system, when the tracking is running after initialization, if the program detects a possible tracking problem or a tracking failure, it returns control back to the human operator. The operator supervises the road changes, diagnoses the failure reasons, deletes the incorrect parts, and restarts the tracker or inputs a new correct segment which refreshes the state model of the tracker. In this way, the correctness of tracking is guaranteed. Therefore, the completeness, efficiency and accuracy are employed to evaluate the performances of our road trackers. Particularly, the completeness is defined as the ratio of length of roads tracked by the computer with the aid of the plotter to the total length of reference data. The efficiency efficiency C is measured by time saving that can be obtained by the following formula: plotting. Geometric accuracy is evaluated as the mean square error between the road tracker's result and the reference data. Note that the length of roads is measured in number of pixels, which is convenient to compare the performances of the same tracker on different resolutions, and the total time cost includes the computation time and interaction time with the human operator. Table 2 shows the statistical results of the performances while Figures 3, 4, 5, 6 and 7 show the raw images and the final road segments extracted from the images (1), (2), (3), (4) and (5) illustrated in Section 3.1 respectively. The profile matching method is capable of extracting the homogenous highways from the above SPOT5, IKONOS and QuickBird images with different efficiency and geometrical accuracy. But, in the SPOT5 image, even the highways are blurry due to their narrow width and low contrasts to other surrounding objects [see Figure 8(a)], which leads to frequent failures, low efficiency and large RMSE. Fortunately, this algorithm can also work on the main streets which have a sparse traffic and shadow on IKONOS image. However, profile matching is not robust compared to the other trackers, and frequent stops lead to low efficiency in spite of its low computing magnitude for per movement.  Note that road types include the ones in Table 1. Particularly, "1" denotes the national highways, "2" denotes the intrastate highways, "3" denotes the railroads, "4" denotes the avenue, "5" denotes the lane.

Experimental results and performance evaluation
The rectangular template approach can extract the railroads and main streets besides the highways with acceptable spatial accuracy. According to the statistical results, the highest efficiency, approximately 15% of time saving, is achieved on IKONOS image. Note that the low efficiency on the SAR image is partially blamed to the noisy speckles, and mostly induced by the straightness of the roads which is in favor of manual drawing. On the other hand, the maximum completeness is achieved on the QuickBird image with 11% of time saving compared to fully manual interpretation. Unfortunately, dense traffics and shadows make this algorithm inapplicable on the DMC image. In general, this method performs better than the profile matching approach on SPOT5 image, IKONOS image and SAR image. The PATS has the ability to extract various types of roads, but, at the same time, its performance varies considerably with images. The highest completeness occurs on the SAR image; the highest efficiency appears on the QuickBird image; and the smallest RMSE arises on the DMC image. On the other side, the lowest efficiency occurs on the SAR image, and the lowest completeness and largest RMSE appear on the SPOT5 image. As a mater of fact, the completeness and the geometrical accuracy of PATS are roughly equivalent to template matching method on the IKONOS and QuickBird images, compared to its relatively lower efficiency than the latter. Additionally, the PATS has higher completeness, lower efficiency and larger RMSE than the interlaced template matching algorithm for the DMC image test. Interlaced template matching is proposed to extract the roads with lane markings or continuous centerlines such as highways. In fact, it not only can trace its preferred roads, but it also can extract the roads with strips of vegetations or bar-shaped features such as rails on the railroads, which is an unexpected bonus. Moreover, it is not sensitive to the dense occlusions of vehicles and tree shadows, which results in a fast tracking speed with high geometrical accuracy when it operates. In fact, its unpleasing performance on the SAR image further suggests that existing step-by-step tracking is more preponderant for smooth and curving long ribbon roads than for straight roads.
The combination strategy can extract most of the roads from the above images except the SAR image with acceptable efficiency and spatial accuracy. Particularly, the time saving reaches approximately 21% for the DMC image and approximately 33% for the QuickBird image, which can be transferred into an enhanced economic productivity in the future. Considering the potential of performance optimization of the algorithms and implementation as multiple-thread program, the result is promising.

Discussion
As can be seen, each road tracker has its preferred types of roads. For example, highways are favored by the interlaced template matching method, even though the surfaces of the highways may be seriously occluded by vehicles; while avenues with a few tree shadows may be traced better by the PATS on the DMC image. In fact, the combination strategy is aimed to extract each road with an optimal road tracker, which indicates its advantages from the above experiment results. However, the road trackers mentioned in this paper is not an exhaustive list. For instance, a new road tracker is needed for the fast extraction of the straight roads, and new approaches are demanded to detect and depict the various types of road junctions or crossings, which will produce a more advanced combination strategy than current one.
On the other hand, the performance of a road tracker is greatly influenced by the GSD of an image. For example, profile matching method has a higher efficiency than the interlaced template matching one in extraction of the highways from the IKONOS image, but it is not a case for the QuickBird image, because the highways on the former image has less vehicles than the latter. Unfortunately, the profile matching approach is not suitable to extract the highway from the DMC image at all, which has a very dense traffic on road surfaces [see Figure 8(e)]. A similar phenomenon occurs for the rectangular template matching algorithm. In this sense, in this paper each road tracker has its preferred GSD. Particularly, the GSD from 1 m pixelto 0.61 m pixelis favored for profile matching, from 1 m pixelto 0.3 m pixelfor rectangular template matching, from 1 m pixelto 0.2 m pixelfor PATS, from 1 m pixelto 0.2 m pixelfor interlaced template matching method.
Additionally, complex road scenes can cause frequent tracking failures, requiring more human interactions, which significantly decreases the tracker's efficiency. As for the IKONOS image, the profile matching can track approximately 51 pixels per second, while the interlaced template matching can follow approximately 65 pixels per second. However, theoretically the speed of former should be higher than the latter considering both of the complexity of the algorithm and computing magnitude. As a mater of fact, the low efficiency is deduced by the irregular appearances of the vehicles, shadow of trees and building, junctions or crossings, etc.
Despite the consideration of decreasing the influences of the above noises for interlace template matching algorithm and PATS, large shadows of high buildings and junctions are still barriers for road tracking, especially in urban areas. In addition, currently there is no method which can more efficiently extract the straight ribbon roads with many disturbances than manual plotting, which is also one of the major reasons that decrease the overall efficiency of the combination strategy.

Conclusions
This paper presents a semi-automatic system for road tracking. The system adopts a new combination strategy to extract the road networks. Particularly, the interlaced template matching, the profile matching, the rectangular template matching, and the PATS methods are adaptively selected based on the initialization information. At the same time, a human operator is retained in the tracking process to supervise the extracted trajectory, to response to the program's prompts. Extensive experiments are performed to extract roads from aerial/satellite imagery including optical imagery and Synthetic Aperture Radar (SAR) data. The results show that the combination of multiple road trackers can more reliably and more efficiently extract most of the main roads than each individual tracker, which has significant practical applications. The current limits of our scheme are that it can't deal well with road junctions or crossings, and shadows of high buildings, and it needs a novel method to extract straight ribbon roads. Future work will also include the optimization of the algorithms to speed up the calculations.