# Using Bidirectional Long-Term Memory Neural Network for Trajectory Prediction of Large Inner Wheel Routes

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

#### 2.1. Object Detection

#### 2.1.1. MobileNet

_{k}input dimensions to k × k convolutional core; the final output of the width, height and dimensions W

_{out}, H

_{out}and N

_{k}, and the total calculation amount is shown in Equation (1).

_{k}1 × 1 × N convolution, the calculated amount is shown in Equation (3); the combined result will be the same as the general convolution calculation output.

^{2}times, which means that the larger the convolution core size with the more numbers can save more calculations, such as 512 × 512 × 50 input data, and there are 10. With a convolutional core size of 5 × 5, the Depthwise separable convolution has only a 1/10 + 1/25 = 0.14 computation of general convolution.

#### 2.1.2. You Only Look Once (YOLO)

#### 2.1.3. Faster Region-Based CNN, Faster R-CNN

#### 2.1.4. Single Shot Multi-Box Detector

#### 2.2. Basic LSTM Encoder–Decoder Trajectory Deposit Method

_{t}and past actions u. Z(θ) is a normalized function.

#### 2.3. RNN

#### 2.4. LSTM

_{t}), output gate (i

_{o}) and forgotten gate (f

_{t}), and a memory cell to gate, to control the message added to the memory unit and removed from the memory unit, so that these three switches are used to protect and control the memory state; they have their respective weights and will be weighted according to the input data. After calculation, each switch decides whether it is on or off.

_{t}

_{−1}is the output at time t−1, x

_{t}input for time t and b

_{f}represents the bias value. Converted through the sigmoid layer, these will result in values between 0 and 1. The value 1 is fully reserved, 0 is completely discarded and the final forgotten gate the output is represented by f

_{t}.

_{t}determines which values need to be updated through the sigmoid layer, and ${\tilde{C}}_{t}$ creates a vector through a tanh layer with new candidate values. These two results are updated in the Equation (10), where W

_{i}, W

_{c}represents the weight matrix and multiplies the old state C

_{t}

_{−1}by f

_{t}to determine the need to forget the message, plus the new candidate value i

_{t}·${\tilde{C}}_{t}$, will be in the new state C

_{t}, b

_{i}and b

_{c}, which are the bias value.

_{t}from the memory unit, and the state of the memory unit is passed After the tanh layer, one obtains a value between −1 and 1, multiplying o

_{t}with $tanh\left({C}_{t}\right)$ and finally the h

_{t}to determine the output. Where b

_{o}is the bias value.

#### 2.5. Bidirectional RNN (BRNN)

## 3. System Architecture

#### 3.1. Data Collection

- Object detection

- Object tracking

_{1}, y

_{1}), (x

_{2}, y

_{2}), …, (x

_{t}, y

_{t}) to coordinate the difference, this paper uses Euclid distance (also known as European distance) to calculate the straight distance of the shortest line between two points. Where d is the Eujid distance of the current frame (x

_{t}, y

_{t}) from the previous frame (${x}_{t-1}$, ${y}_{t-1}$) of the coordinate point, find out the center of each bounding box appearing in the image, which is the point with the shortest distance from the center of the previous bounding box, in order to set a threshold in several calculations. If the calculated distance is greater than 30 pixels, it means that this result is a newly detected object: A new list is created for this object to store the next frame. For recording data and comparisons, if the pixel quality is less than 30, the test result is the same object, and the results are added to the established list. If the target disappears and the list is not updated, the data of the objects recorded in the list is exported and the list is deleted.

#### 3.2. Object Detection Model

#### 3.3. Trajectory Prediction Models

_{t}, y

_{t}) of the front wheel and the rear wheel pixel coordinates W(x

_{t}, y

_{t}) in the original data, one divides and calculates the vectors between the coordinates separately to find significant outliers, and finally removes the outliers and makes up the average of the two points as the value of that coordinate.

_{1}, X

_{2}, …, X

_{t}represents the input data for each time sequence, h

_{1}, h

_{2}, …, h

_{t}for each time sequence the output of a layer of Bi-LSTM and t represents the input time.

## 4. Results

#### Training Results of Cyclic Neural Network Prediction Model

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- World Health Organization. Global Status Report on Road Safety 2018: Summary; World Health Organization: Geneva, Switzerland, 2018.
- Caliendo, C.; De Guglielmo, M.L.; Guida, M. Comparison and analysis of road tunnel traffic accident frequencies and rates using random-parameter models. J. Transp. Saf. Secur.
**2016**, 8, 177–195. [Google Scholar] [CrossRef] - Wu, Y.; Abdel-Aty, M.; Lee, J. Crash risk analysis during fog conditions using real-time traffic data. Accid. Anal. Prev.
**2018**, 114, 4–11. [Google Scholar] [CrossRef] [PubMed] - Kopelias, P.; Papadimitriou, F.; Papandreou, K.; Prevedouros, P. Urban freeway crash analysis: Geometric, operational, and weather effects on crash number and severity. Transp. Res. Rec.
**2015**, 9, 123–131. [Google Scholar] [CrossRef] - Sharma, K.P.; Poonia, R.C.; Sunda, S. Accurate real-time location map matching algorithm for large scale trajectory data. In Proceedings of the 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 29–31 August 2018; pp. 646–651. [Google Scholar] [CrossRef]
- Chen, Q.H.; Huang, H.; Li, Y.; Lee, J.; Long, K.J.; Gu, R.; Zhai, X. Modeling accident risks in different lane-changing behavioral patterns. Anal. Methods Accid. Res.
**2021**, 30, 100159. [Google Scholar] [CrossRef] - Road Traffic Safety Committee. Road Traffic Accident (within 30 Days)—According to the Type of Driving by the First Party. Available online: https://stat.motc.gov.tw/mocdb/stmain.jsp?sys=220&ym=10501&ymt=10912&kind=21&type=1&funid=b330704&cycle=41&outmode=0&compmode=0&outkind=1&fldlst=111&codspc0=0,2,4,1,&rdm=pqbiczld (accessed on 10 August 2021).
- Park, S.H.; Kim, B.; Kang, C.M.; Chung, C.C.; Choi, J.W. Sequence-to-sequence prediction of vehicle trajectory via lstm encoder-decoder architecture. In 2018 IEEE Intelligent Vehicles Symposium (IV); IEEE: Changshu, China, 2018; pp. 1672–1678. [Google Scholar]
- Altché, F.; de La Fortelle, A. An lstm network for highway trajectory prediction. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 353–359. [Google Scholar]
- Xue, H.; Huynh, D.Q.; Reynolds, M. Bi-Prediction: Pedestrian trajectory prediction based on bidirectional LSTM classification. In Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Sydney, NSW, Australia, 29 November–1 December 2017; pp. 1–8. [Google Scholar] [CrossRef]
- Xia, C.; Weng, C.Y.; Zhang, Y.; Chen, I.M. Vision-based measurement and prediction of object trajectory for robotic manipulation in dynamic and uncertain scenarios. IEEE Trans. Instrum. Meas.
**2020**, 69, 8939–8952. [Google Scholar] [CrossRef] - Ip, A.; Irio, L.; Oliveira, R. Vehicle trajectory prediction based on LSTM recurrent neural networks. In Proceedings of the 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), Helsinki, Finland, 25–28 April 2021; pp. 1–5. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv
**2017**, arXiv:1704.04861. [Google Scholar] - Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–21 June 2018; pp. 4510–4520. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv
**2018**, arXiv:180402767. [Google Scholar] - Lin, T.-Y.; Dollr, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H. YOLOv4: Optimal speed and accuracy of object detection. In Proceedings of the Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst.
**2015**, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. arXiv
**2016**, arXiv:1512023259905, 21–37. [Google Scholar] - Kitani, K.M.; Ziebart, B.D.; Bagnell, J.A.; Hebert, M. Activity forecasting. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, 7–13 October 2012; pp. 201–214. [Google Scholar]
- Finn, C.; Goodfellow, I.; Levine, S. Unsupervised Learning for Physical Interaction through Video Prediction. Adv. Neural Inf. Processing Syst.
**2016**, 29, pp.64–72. [Google Scholar] - Lotter, W.; Kreiman, G.; Cox, D. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning. arXiv
**2016**, arXiv:1605.08104. [Google Scholar] - Luc, P.; Neverova, N.; Couprie, C.; Verbeek, J.; LeCun, Y. Predicting Deeper into the Future of Semantic Segmentation. arXiv
**2017**, arXiv:170307684. [Google Scholar] - Huang, S.; Li, X.; Zhang, Z.; He, Z.; Wu, F.; Liu, W.; Tang, J. Deep Learning Driven Visual Path Prediction from a Single Image. IEEE Trans. Image Process.
**2016**, 25, 5892–5904. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Kim, B.; Kang, C.M.; Kim, J.; Lee, S.H.; Chung, C.C.; Choi, J.W. Probabilistic Vehicle Trajectory Prediction over Occupancy Grid Map via Recurrent Neural Network. arXiv
**2017**, arXiv:170407049. [Google Scholar] - Xing, Y.; Lv, C.; Cao, D. Personalized Vehicle Trajectory Prediction Based on Joint Time-Series Modeling for Connected Vehicles. IEEE Trans. Veh. Technol.
**2020**, 69, 1341–1352. [Google Scholar] [CrossRef] - Bandara, K.; Bergmeir, C.; Hewamalage, H. LSTM-MSNet: Leveraging Forecasts on Sets of Related Time Series With Multiple Seasonal Patterns. IEEE Trans. Neural Netw. Learn. Syst.
**2021**, 32, 1586–1599. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Schuster, M.; Paliwal, K. Bidirectional recurrent neural networks. Signal Process. IEEE Trans.
**1997**, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version] - Tzutalin. Labelimg. GitHub. Available online: https://github.com/tzutalin/labelImg (accessed on 10 August 2021).

**Figure 1.**Statistics on the number of large car traffic accidents [7].

**Figure 2.**Ref. [8] Faster R-CNN model architecture diagram.

Directions for Improvement | Method | |||
---|---|---|---|---|

More accurate for detection | Batch normalization | High resolution classifier | Convolutional with anchor boxes | Dimension clusters |

Direct location prediction | Fine-grained features | Multi-scale training | ||

Faster for detection | Darknet-19 | Training for classifier | Training for detection |

Architecture and Methodical | YOLOv3 | YOLOv4 | Improvement Advantages |
---|---|---|---|

Prediction (Head) | YOLO | YOLO | |

Data augmentation | Pixel-wise adjustments | Mosaic | Stitching 4 images into one can reduce mini-batches by 4 times and improve detection of small objects |

Backbone | Darknet53 | CSPDarknet53 | Reduces the number of parameters, reduces the amount of operation and improves accuracy |

Normalization | Batch normalization (BN) | Cross mini-Batch Normalization (CmBN) | Can be better suited for smaller batch size settings |

Neck | FPN | FPN + PAN + SPP | Combining local features and global feature messages to obtain rich semantic feature plots |

Regularization | Dropout | Dropblock | Avoiding disturbance between neighboring neurons |

Activations | Leaky-ReLU | Leaky-ReLU andMish | Mish function makes gradient smoother and stable |

Names | Parameter Value |
---|---|

Training image size | 512 × 512 |

Number of training images | 3389 |

Number of classes | 3 |

Learning rate | 0.003 |

Maximum number of training steps | 15,000 |

Number of training | A total of 3389 |

Number of tests | A total of 847 |

Model Name | Image Size | Truck AP | Front AP | Wheel AP | Average Accuracy (mAP) | (GTX 1080 ti 8 G) FPS |
---|---|---|---|---|---|---|

ResNet152 Faster RCNN | 512 × 512 | 72% | 83% | 61% | 72% | 6 |

MobileNet ver2 SSD | 512 × 512 | 85% | 90% | 77% | 84% | 39 |

YOLO v3 | 512 × 512 | 90% | 89% | 85% | 88% | 49 |

YOLO v4 | 512 × 512 | 98% | 97% | 96% | 97% | 55 |

Name | Parameter Value |
---|---|

Input dimension | Two-dimensional or four-dimensional |

Predicted length | 10 trajectories after 10 trajectory prediction 10 trajectories after 20 trajectory prediction |

Maximum number of training steps | 12,000 |

Number of trainings | A total of 312 |

Number of tests | A total of 31 |

Items | Training Model | Input Data Dimension | Number of Predictions (N-M) (N Data Predictions M Data) | Number of Generations | Accuracy |
---|---|---|---|---|---|

1 | Stacked LSTM | 4-Dimension (fx, fy, wx, wy) | 10-10 | 12,000 | 52.30% |

2 | Stacked LSTM | 4-Dimension (fx, fy, wx, wy) | 10-20 | 12,000 | 49.30% |

3 | Stacked LSTM | 2-dimensional (x, y) | 10-10 | 12,000 | 86.43% |

4 | Stacked LSTM | 2-dimensional (x, y) | 10-20 | 12,000 | 71.75% |

5 | Bi-LSTM | 4-Dimension (fx, fy, wx, wy) | 10-10 | 12,000 | 62.17% |

6 | Bi-LSTM | 4-Dimension (fx, fy, wx, wy) | 10-20 | 12,000 | 59.81% |

7 | Bi-LSTM | 2-dimensional (x, y) | 10-10 | 12,000 | 87.57% |

8 | Bi-LSTM | 2-dimensional (x, y) | 10-20 | 12,000 | 68.85% |

9 | Stacked Bi-LSTM | 4-Dimension (fx, fy, wx, wy) | 10-10 | 12,000 | 76.03% |

10 | Stacked Bi-LSTM | 4-Dimension (fx, fy, wx, wy) | 10-20 | 12,000 | 61.12% |

11 | Stacked Bi-LSTM | 2-dimensional (x, y) | 10-10 | 12,000 | 87.77% |

12 | Stacked Bi-LSTM | 2-dimensional (x, y) | 10-20 | 12,000 | 75.29% |

Items | Training Model | Number of Predictions (N-M) (N Data Predictions M Data) | MAD | MSE | MAPE |
---|---|---|---|---|---|

1 | Stacked LSTM | 10-10 | 27.5799 | 756.8014 | 5.5205 |

2 | Stacked LSTM | 10-20 | 32.6421 | 1103.1558 | 6.0711 |

3 | Bi-LSTM | 10-10 | 23.3785 | 493.8061 | 4.7476 |

4 | Bi-LSTM | 10-20 | 40.0393 | 1593.683 | 7.6167 |

5 | Stacked Bi-LSTM | 10-10 | 14.4397 | 242 | 3.0556 |

6 | Stacked Bi-LSTM | 10-20 | 22.3255 | 643 | 4.3622 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Horng, G.-J.; Huang, Y.-C.; Yin, Z.-X.
Using Bidirectional Long-Term Memory Neural Network for Trajectory Prediction of Large Inner Wheel Routes. *Sustainability* **2022**, *14*, 5935.
https://doi.org/10.3390/su14105935

**AMA Style**

Horng G-J, Huang Y-C, Yin Z-X.
Using Bidirectional Long-Term Memory Neural Network for Trajectory Prediction of Large Inner Wheel Routes. *Sustainability*. 2022; 14(10):5935.
https://doi.org/10.3390/su14105935

**Chicago/Turabian Style**

Horng, Gwo-Jiun, Yu-Chin Huang, and Zong-Xian Yin.
2022. "Using Bidirectional Long-Term Memory Neural Network for Trajectory Prediction of Large Inner Wheel Routes" *Sustainability* 14, no. 10: 5935.
https://doi.org/10.3390/su14105935