1. Introduction
A positioning system is designed to determine the location of an object within a defined space [
1]. While the global positioning system (GPS) is the most well-known and significant advancement in navigation, its accuracy and reliability are compromised by various factors, such as multi-path propagation and signal attenuation [
2]. Recently, three major methods for positioning objects have been discussed to ensure the safety of advanced and intelligent applications. The first is the sensor-fusion method, which combines data from multiple sensors to estimate the position. The second is the device-assisted method, which relies on ground sensors to determine the position and trajectory. Lastly, the third is the vision-based approach, in which geometric features are analyzed to identify the position [
1,
2]. Notably, the latter two methods are also considered sensor-fusion methods.
To address the limitations of the GPS, we leveraged onboard cameras by employing the vision-based method. In visual-based positioning, convolutional neural networks (CNNs) are used to match geometric features [
1]. However, the impact of seasonal lighting variations is not considered, even though such variations can reduce accuracy.
Hence, we designed a new method to position a drone, accounting for seasonal lighting variations by incorporating critical components into a CNN. We introduced CNN-based network blocks [
3] and used an orthophotomap to train models and assess their performance based on accuracy, computing load, and so forth [
1]. The orthophotomap provides an image by merging orthophotos taken by drones or satellites [
1]. By doing this, the accuracy in drone positioning increases.
The developed method is different from traditional supervised learning, where training and testing data are mutually exclusive. Drones cannot position themselves without pre-existing map data in the vision-based approach. Therefore, we augmented the original training data to generate testing data with seasonal lighting variations which are different from the original training and testing data.
In this article,
Section 2 presents the essential background information.
Section 3 presents the concept of the developed model, detailing the designs and improvements.
Section 4 shows the experiment results and related statistics. Finally, the last section concludes this study with potential directions for future research.
2. Background Knowledge
2.1. Data Augmentation with Lighting Variations
Data augmentation is utilized in machine learning (ML) to increase the diversity of a training dataset without collecting new data. It involves applying various transformations to the existing dataset and generating a new version of the original dataset to improve the ability of the ML model to generalize unseen data and reduce overfitting. In this study, we introduced the concept of the conversion for global solar radiation in Taiwan [
4,
5] to augment the original training data and generate testing data with lighting variations. We created the difference and corresponding formula of light variations for the experimental area [
4,
5]. We adjusted the brightness of the orthophotomap to generate new testing data.
2.2. Datasets
A large orthophotomap was used in the experiment. Firstly, small orthophotomaps were created as the original training data. Since drones cannot position themselves without pre-existing map data in the vision-based approach, the original testing data could not be used. Next, we augmented the original training data to generate more testing data with seasonal lighting variations. The augmented data were used for testing and training.
Figure 1 shows the relationship between the original training data, the testing data, and the new testing data with lighting variations. It illustrates a scenario where a drone can position itself in a specific area.
2.3. CNN
Convolution is a structure of the animal visual cortex, where individual neurons respond to stimuli within their respective receptive fields [
1]. Since receptive fields partially overlap, they collectively cover the entire visual field, effectively performing convolution operations. Convolutional layers are paired with pooling layers to condense the outputs from the previous layer into a single output for the next layer. This significantly reduces the number of weights and computational complexity. Incorporating one or more convolutional and pooling layers in the early stages of a neural network benefits the two methods in traditional architecture [
1,
6,
7].
2.4. Regression and Classification
Regression and classification are conducted with supervised learning algorithms using labeled datasets. The critical difference between them is the problems they address. Regression is used to predict continuous values, while classification is used to categorize data into discrete labels. In this study, positioning problems are treated as a classification problem rather than a regression one. In a positioning problem, the sliding window technique is used [
8] in a small range so that the precision of classification can be close to that of regression, which is an important and interesting aspect of this study [
1,
6,
7].
3. Learning and Testing
In the developed vision-based approach, the pseudocode is used as shown in Algorithm 1. First, a large orthophotomap is inputted as the sole source (line 01). Then, several variables are defined for our algorithm: stride
s is the distance the window moves at each step (line 02),
original-training-data is a set of small orthophotomaps used as the original training data (line 03),
original-testing-data is a set of small orthophotomaps used as the original testing data (line 04),
new-testing-data is a set of small orthophotomaps used as the new testing data (line 05), and
model is our machine learning model (line 06).
Algorithm 1: Vision-based approach developed in this study. |
input: (01) map: a large orthophotomap; variables: (02) s: stride, the distance the window moves at each step; (03) original-training-data: small orthophotomaps as original training data; (04) original-testing-data: small orthophotomaps as original testing data; (05) new-testing-data: small orthophotomaps as new testing data; (06) model: our machine learning model; | initial phase: (07) original-training-data ← use sliding window technique with stride s to divide map; (08) original-testing-data ← original-training-data; (09) new-testing-data ← augment original-testing-data by lighting variations; training phase: (10) train model via original-training-data; evaluating phase: (11) evaluate model via original-testing-data and new-testing-data; |
In the initial phase, the sliding window with stride s was used to obtain the original training data by dividing the large orthophotomap (line 07), original-testing-data was set to original-training-data (line 08), and new-testing-data was obtained by augmenting the original testing data based on the concept of lighting variations (line 09). In training, the model was trained using the original training data (line 10). In evaluation, the model was evaluated using the original and new testing data (line 11). These methods were used in the scenario for a drone to position itself in a specific area even with lighting variations.
4. Results and Discussions
Figure 2 shows the instances of small orthophotomaps (classifications) augmented for four seasons. The original orthophotomaps were captured in the autumn and the others were augmented.
Sixteen areas were considered in the experiment, and their features were different (
Figure 3).
We divided each experimental area into 400 sub-areas (classifications). The resolution of the small orthophotomaps in the dataset was 300 × 300 pixels. The distance of the window at each step was 0.5 m. Through data augmentation, a drone positioned itself in a specific area with lighting variations. We used four critical blocks in the developed CNN model, i.e., an inception block, dense block, MCDN block, and residual block [
3]. The dropout was 0.6, cross-entropy was used as a loss function, the learning rate was 0.0001, and the Adam optimizer was used with a weight decay of 10
−3.
An experimental computer with a CPU of Intel Core i7-11370H, RAM of 16 GB, DDR4 3200MHz, and Nvidia GeForce RTX 3070 8GB was used. An integrated development environment (IDE), PyCharm [
9], an application programming interface (API), and PyTorch [
10] were also used.
In the experiment, we used four critical blocks to test the CNN model. The operation with the residual concept showed the best results. For simplicity, the above method was used (
Figure 4).
Table 1 shows the positioning accuracy for 16 sub-areas. For image classification, we conducted feature extraction for high-dimensional image data. The CNN model with residual-like connections and addition blocks preserved information across layers.
Table 1 also presents results for the different sub-areas in four seasons. Most sub-areas maintained high performance across all seasons, while the model showed a decrease in performance in winter, with sub-area 15 showing the most reduction. Stability was also observed in all areas, except for seasonal fluctuations. The developed method showed a 1% worse performance in winter. Therefore, the model showed difficulty in positioning drones in environments with lower brightness.
5. Conclusions and Future Works
We addressed the challenges of the visual positioning of a drone under seasonal lighting variations using a CNN model. The model performed consistently, although its accuracy decreased in the lower brightness of winter. This highlights the difficulty of positioning under low light intensities. It is necessary to improve the accuracy and applicability of the visual drone positioning system and incorporate additional environmental factors to improve its performance even in low light intensities and in large areas.
Author Contributions
Author Contributions: Methodology, C.-C.C.; software, B.-Y.L., B.-R.C. and P.-T.W.; validation, C.-C.C., B.-Y.L., B.-R.C. and P.-T.W.; formal analysis, C.-C.C., B.-Y.L., B.-R.C. and P.-T.W.; writing—original draft preparation, C.-C.C.; writing—review and editing, C.-C.C.; funding acquisition, C.-C.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by the National Science and Technology Council, Taiwan, under grant 112-2221-E-035-062-MY3.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data will be made available on reasonable request.
Acknowledgments
We extend our gratitude to SINJIE INSTRUMENT CO., LTD. for providing us with free access to the orthophotomap, which was instrumental in conducting our experiments.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Chang, C.-C.; Lin, D.-T.; Ikema, Y.; Ooi, Y.-M. Drone visual positioning algorithm with convolutional neural network. In Proceedings of the 2023 IEEE 5th Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan, 27–29 October 2023. [Google Scholar]
- Chang, C.-C.; Tsai, J.; Lu, P.-C.; Lai, C.-A. Accuracy improvement of autonomous straight take-off, flying forward, and landing of a drone with deep reinforcement learning. Int. J. Comput. Intell. Syst. 2020, 13, 914–919. [Google Scholar] [CrossRef]
- Lee, Y.; Jun, D.; Kim, B.-G.; Lee, H. Enhanced Single Image Super Resolution Method Using Lightweight Multi-Scale Channel Dense Network. Sensors 2021, 21, 3351. [Google Scholar] [CrossRef] [PubMed]
- Michael, P.R.; Johnston, D.E.; Moreno, W. A conversion guide: Solar irradiance and lux illuminance. J. Meas. Eng. 2020, 8, 153–166. [Google Scholar] [CrossRef]
- Hsieh, T.-E.; Fraincas, B.; Chang, K.-C. Generation of a Typical Meteorological Year for Global Solar Radiation in Taiwan. Energies 2023, 16, 2986. [Google Scholar] [CrossRef]
- Hwang, K. Cloud Computing for Machine Learning and Cognitive Applications; The MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
- Hwang, K.; Chen, M. Big-Data Analytics for Cloud, IoT and Cognitive Computing; John Wiley & Sons: Hoboken, NJ, USA, 2017. [Google Scholar]
- Window Sliding Technique—GeeksforGeeks [Online]. Available online: https://www.geeksforgeeks.org/window-sliding-technique/ (accessed on 1 January 2025).
- PyCharm: The Python IDE for Professional Developers by JetBrains [Online]. Available online: https://www.jetbrains.com/pycharm/ (accessed on 1 January 2025).
- PyTorch [Online]. Available online: https://pytorch.org/ (accessed on 1 January 2025).
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).