You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

17 October 2021

Improving Deep Object Detection Algorithms for Game Scenes

,
and
1
Department of Human Centric Intelligent Information, Sangmyung University, Seoul 03016, Korea
2
Division of SW Convergence, Sangmyung University, Seoul 03016, Korea
3
Department of Computer Science, Sangmyung University, Seoul 03016, Korea
*
Authors to whom correspondence should be addressed.
This article belongs to the Special Issue Applications of Computer Vision in Interactive Environments

Abstract

The advancement and popularity of computer games make game scene analysis one of the most interesting research topics in the computer vision society. Among the various computer vision techniques, we employ object detection algorithms for the analysis, since they can both recognize and localize objects in a scene. However, applying the existing object detection algorithms for analyzing game scenes does not guarantee a desired performance, since the algorithms are trained using datasets collected from the real world. In order to achieve a desired performance for analyzing game scenes, we built a dataset by collecting game scenes and retrained the object detection algorithms pre-trained with the datasets from the real world. We selected five object detection algorithms, namely YOLOv3, Faster R-CNN, SSD, FPN and EfficientDet, and eight games from various game genres including first-person shooting, role-playing, sports, and driving. PascalVOC and MS COCO were employed for the pre-training of the object detection algorithms. We proved the improvement in the performance that comes from our strategy in two aspects: recognition and localization. The improvement in recognition performance was measured using mean average precision (mAP) and the improvement in localization using intersection over union (IoU).

1. Introduction

Computer games have been one of the most popular applications for all generations since the dawn of the computing age. Recent progress in computer hardware and software has presented computer games of high quality. Nowadays, e-sports, playing or watching computer games, have become some of the most popular sports. E-sports are newly emerging sports where professional players compete in highly popular games, such as Starcraft and League of Legends (LoL), while millions of people watch them. Consequently, e-sports have become one of the most popular types of content on various media channels, including YouTube and Tiktok. From these trends, analyzing game scenes by recognizing and localizing objects in the scenes has become an interesting research topic.
Among the many computer vision algorithms including object recognition and object detection, localization and segmentation are candidates for analyzing game scenes. In analyzing game scenes, both recognizing and localizing objects in the scene are required. Therefore, we select object detection algorithms for analyzing game scenes. Object detection algorithms can identify thousands of objects and draw bounding boxes for objects in real-time. At this point, we have a question in relation to applying object detection algorithms to game scenes: “Can the object detection algorithms trained by real scenes be applied to game scenes?”
Detecting objects in game scenes is not a straightforward problem that can be resolved by applying existing object detection algorithms. The recent progress in computing hardware and software techniques presents diverse visually pleasing rendering styles to computer games. Some games are rendered in a photorealistic style, while some are in a cartoon style. Furthermore, various depictions of a game scene with various colors and tones present a distinctive game scene style. Some cartoon-based games present their deformed characters and objects according to their original cartoons. Therefore, detecting various objects in diverse games can be challenging.
Existing deep-learning-based object detection algorithms show satisfactory detection performance for images captured from the real world. We selected five of the most widely-used deep object detection algorithms: YOLOv3 [], Faster R-CNN [], SSD [], FPN [] and EfficientDet []. We also prepared two frequently used datasets, PascalVOC [,] and MS COCO [], for training the object detection algorithms. We examined these algorithms in recognizing objects in game scenes.
We aimed to improve the performance of object recognition of these algorithms by retraining them using game scenes. We prepared eight games including various genres, such as first-person shooting, racing, sports, and role-playing. Two of the selected games presented cartoon-styled scenes. We excluded games with non-real objects. In many fantasy games, for example, dragons, orcs, and non-existent characters appear. We excluded these games since existing object detection algorithms are not trained to detect dragons or orcs.
We also tested a data augmentation scheme that produces cartoon-styled images for the images in frequently used datasets. Several widely used image abstraction and cartoon-styled rendering algorithms were employed for the augmentation process. We retrained the algorithms using the augmented images and measured their performances.
To prove that the performance of the object detection algorithms was improved using game scene datasets, we compared the comparison for two cases. One case was to compare PascalVOC and PascalVOC with game scenes, and the other case was to compare MS COCO and MS COCO with game scenes. For each case of comparisons, the five object detection algorithms were pre-trained with a frequently used dataset. After measuring the performance, we retrained the algorithms with game scenes and measured the performance. These performances were compared to prove our hypothesis that the object detection algorithms trained with the public dataset and game scenes showed better performance than the algorithms trained only with the public dataset.
We compared the pre-trained and retrained algorithms in terms of two metrics: mean average precision (mAP) and intersection over union (IoU). We examined the accuracy of recognizing objects with mAP and the accuracy of localizing objects with IoU. From this comparison, we could determine whether the existing object detection algorithms could be used for game scenes. Furthermore, we could also determine whether the object detection algorithms retrained with game scenes showed a significant difference from the pre-trained object detection algorithms.
The contributions of this study are summarized as follows:
  • We built a dataset of game scenes collected from eight games.
  • We presented a framework for improving the performance of object detection algorithms on game scenes by retraining them using game scene datasets.
  • We tested whether the augmented images using image abstraction and stylization schemes can improve the performance of the object detection algorithms on game scenes.
This study is organized as follows. Section 2 briefly explains deep-learning-based object detection algorithms and presents several works on object detection techniques in computer games. We elaborate on how we selected object detection algorithms and games in Section 3. In Section 4, we explain how we trained the algorithms and present the resulting figures. In Section 5, we analyze the results and answer our RQ. Finally, we conclude and suggest future directions in Section 6.

3. Collecting Materials

3.1. Selected Deep Object Detection Algorithms

We found many excellent deep object detection algorithms in the recent literature. Among these algorithms, we selected the most highly-cited algorithms: YOLO [,,], R-CNN [,,], and SSD []. Among various versions of YOLO algorithms, we selected YOLOv3 [], which detects 9000 objects very effectively. For R-CNN algorithms, we selected Faster R-CNN [], the cutting-edge version of the R-CNNs. Although these algorithms are highly cited, we needed to select a recent algorithm. Therefore, we selected EfficientDet []. Therefore, we compared four deep object detection algorithms in our study: YOLOv3, Faster R-CNN, SSD and EfficientDet. The architectures of these algorithms are compared in Figure 1.
Figure 1. The architectures of the deep object detection algorithms used in this study.

3.2. Selected Games

We had three strategies for selecting games in our study. The first strategy was to select games over various game genres. Therefore, we referred to Wikipedia [] and sampled game genres including action, adventure, role-playing, simulation, strategy, and sports. The second strategy was to exclude games with objects that existing object detection algorithms cannot recognize. Many role-playing games include fantasy items such as dragons, wyverns, titans, or orcs, which are not recognized by existing algorithms. We also excluded strategy games since they include weapons such as tanks, machine guns, and jet fighters that are not recognized. Our third strategy was to sample both photo-realistically rendered games and cartoon-rendered games. Although most games are rendered photo-realistically, some games employ cartoon-styled rendering because of their uniqueness. Games whose original story is based on cartoons tend to preserve cartoon-styled rendering. Therefore, we sampled cartoon-rendered games to test how the selected algorithms can detect cartoon-styled objects.
We selected games for our study from these genres as evenly as possible. For action and adventure games, we selected 7 Days to Die [], Left 4 Dead 2 [] and Gangstar New Orleans []. For simulation, we selected Sims4 [], Animal Crossing [], and Doraemon []. For sports, we selected Asphalt 8 [] and FIFA 20 []. Among these games, Animal Crossing and Doraemon are rendered in a cartoon style. Figure 2 shows illustrations of the selected games.
Figure 2. Eight games we selected for our study.

4. Training and Results

4.1. Training

We retrained the existing object detection algorithms using two datasets: PascalVOC and game scenes. We sampled 800 game scenes: 100 scenes from 8 games we selected. We augmented the sampled game scenes in various schemes: flipping, rotation, controlling hues and controlling tone. By alternating these augmentation schemes, we could build more than 10,000 game scenes for retraining the selected algorithms.
We trained and tested the algorithms on a personal computer with an Intel Pentium i7 CPU and nVidia RTX 2080 GPU. The time required for re-training the algorithms is presented in Table 1.
Table 1. Time required for retraining the algorithms (hrs).

4.2. Results

The result images on sampled eight samples comparing pre-trained algorithms and re-trained algorithms are presented in Appendix A. We have presented our results according to the following strategies: recognition performance measured by mAP, localization performance measured by IoU and various statistics. We measured mAP, IoU and various statistic values including average IoU, precision, recall, F1 score and accuracy for the five object recognition algorithms with two datasets.

4.2.1. Measuring and Comparing Recognition Performance Using mAP

In Table 2, we compare mAP values for the five algorithms between the Pascal VOC dataset and the Pascal VOC dataset with game scenes. We show the same comparison on the MS COCO dataset in Table 3. In Figure 3, we illustrate the comparisons presented in Table 2 and Table 3.
Table 2. The comparison of mAPs for each game. We compared five object detection algorithms pre-trained by PascalVOC and retrained by PascalVOC with game scenes. Note that PascalVOC is abbreviated as Pascal in the table.
Table 3. The comparison of mAPs for each game. We compared five object detection algorithms pre-trained by MS COCO and retrained by MS COCO with game scenes. Note that MS COCO is abbreviated as MS in the table.
Figure 3. mAPs from five object detection algorithms trained by different datasets are compared. In the left column, blue bars denote mAPs from those models trained using PascalVOC only and red bars are for PascalVOC + game scenes. In the right column, blue bars denote mAPs from those models trained using MS COCO only and red bars are for MS COCO + game scenes.

4.2.2. Measuring and Comparing Localization Performance Using IoU

In Table 4, we compare IoU values of the five algorithms between the Pascal VOC dataset and the Pascal VOC dataset with game scenes. We show the same comparison on the MS COCO dataset in Table 5. In Figure 4, we illustrate the comparisons presented in Table 4 and Table 5.
Table 4. The comparison of IoUs for each game. We compared five object detection algorithms pre-trained by PascalVOC and retrained by PascalVOC with game scenes. Note that PascalVOC is abbreviated as Pascal in the table.
Table 5. The comparison of IoUs for each game. We compared five object detection algorithms pre-trained by MS COCO and retrained by MS COCO with game scenes. Note that MS COCO is abbreviated as MS in the table.
Figure 4. IoUs from five object detection algorithms trained by different datasets are compared. In the left column, blue bars denote IoUs from those models trained using PascalVOC only and red bars are for PascalVOC + game scenes. In the right column, blue bars denote IoUs from those models trained using MS COCO only and red bars are for MS COCO + game scenes.

4.2.3. Measuring and Comparing Various Statistics

In Table 6 and Table 7, we estimate the average IoU, precision, recall, F1 score and accuracy of the five algorithms for the Pascal VOC dataset and the MS COCO dataset. In Figure 5, we illustrate the comparisons presented in Table 6 and Table 7.
Table 6. The statistics. We compared the algorithms trained by PascalVOC and retrained by PascalVOC with game scenes for five object detection algorithms. Note that PascalVOC is abbreviated as Pascal in the table.
Table 7. The statistics. We compared the algorithms trained by MS COCO and retrained by MS COCO with game scenes for five object detection algorithms. Note that MS COCO is abbreviated as MS in the table.
Figure 5. Mean IoU, precision, recall, F1 score and accuracy are compared between two different datasets. In the left column, blue bars denote the values from those models trained using PascalVOC only and red bars are for PascalVOC + game scenes. In the right column, blue bars denote the values from those models trained using MS COCO only and red bars are for MS COCO + game scenes.

5. Analysis

To prove our claim that the object detection algorithms retrained with game scenes show better performance than the object detection algorithms trained only with existing datasets such as Pascal VOC and MS COCO, we asked the following research questions ( R Q ).
RQ
Does our strategy to retrain existing object detection algorithms with game scenes improve mAP?
RQ
Does our strategy to retrain existing object detection algorithms with game scenes improve IoU?

5.1. Analysis of mAP Improvement

To answer R Q 1 , we compared and analyzed mAP values suggested in Table 2 and Table 3, which compare the mAP values of the object detection algorithms trained only with the existing datasets and retrained with game scenes. An overall observation reveals that the retrained object detection algorithms show better mAP than the pre-trained algorithms for 61 of all 80 cases. For further analysis, we performed a t-test and measured the effect size using Cohen’s d value.

5.1.1. t-Test

Table 8 compares the p values for the five algorithms trained by PascalVOC and retrained by PascalVOC + game scenes. From the p values, we found that the results from three of the five algorithms are distinguished for p < 0.05 . The results from EffficientDet are distinguished even for p < 0.01 .
Table 8. p values for the mAPs from five object detection algorithms. We compared the algorithms trained by PascalVOC and retrained by PascalVOC with game scenes. Note that PascalVOC is abbreviated as Pascal in the table.
Table 9 compares the p values for the five algorithms trained by MS COCO and retrained by MS COCO + game scenes. From the p values, we found that the results from four of the five algorithms are distinguished for p < 0.05 .
Table 9. p values for the mAPs from five object detection algorithms. We compared the algorithms trained by MS COCO and retrained by MS COCO with game scenes. Note that MS COCO is abbreviated as MS in the table.
From these results, we show that seven cases from all ten cases exhibit significantly distinguishable results for p < 0.05 .

5.1.2. Cohen’s d

We also measured the effect size using Cohen’s d value for the mAP values and present the results in Table 10 and Table 11.
Table 10. Cohen’s d values for mAPs from five object detection algorithms. We compared the algorithms trained by PascalVOC and retrained by PascalVOC with game scenes. Note that PascalVOC is abbreviated as Pascal in the table.
Table 11. Cohen’s d values for mAPs from five object detection algorithms. We compared the algorithms trained by MS COCO and retrained by MS COCO with game scenes. Note that MS COCO is abbreviated as MS in the table.
Since four Cohen’s d values in Table 10 are greater than 0.8, we can conclude that the effect size of retraining the algorithms using game scenes is great for four algorithms.
We also suggest the Cohen’s d values measured from the MS COCO dataset in Table 11, where four Cohen’s d values are greater than 0.8. We can also conclude that the effect size of retraining the algorithms using game scenes is great for four algorithms.

5.2. Analysis on the Improvement of IoU

To answer R Q 2 , we compared and analyzed IoU values suggested in Table 4 and Table 5 that compare the IoU values of the object detection algorithms trained only with existing datasets and retrained with game scenes. From these values, we found that the retrained object detection algorithms show better IoU for 68 of all 80 cases. For further analysis, we performed a t-test and measured the effect size using Cohen’s d value.

5.2.1. t-Test

Table 12 compares the p values for the five algorithms trained by PascalVOC and PascalVOC + game scenes. From the p-values, we found that the results from all the five algorithms are distinguished for p < 0.05 . Therefore, our strategy to retrain the algorithms with game scenes shows a significant improvement for localization.
Table 12. p values for the IoUs from four object detection algorithms. We compared the algorithms trained by PascalVOC and retrained by PascalVOC with game scenes. Note that PascalVOC is abbreviated as Pascal in the table.
Table 13 compares the p values for the five algorithms trained by MS COCO and MS COCO + game scenes. From the p-values, we found that the results from three algorithms are distinguished for p < 0.05 .
Table 13. p values for the IoUs from four object detection algorithms. We compared the algorithms trained by MS COCO and retrained by MS COCO with game scenes. Note that MS COCO is abbreviated as MS in the table.
From these results, we have demonstrated that eight cases from all ten cases show a significant distinguishable results for p < 0.05 .

5.2.2. Cohen’s d

We also measured the effect size using Cohen’s d value for the IoU values and present the results in Table 14 and Table 15.
Table 14. Cohen’s d values for IoUs from four object detection algorithms. We compared the algorithms trained by PascalVOC and retrained by PascalVOC with game scenes. Note that PascalVOC is abbreviated as Pascal in the table.
Table 15. Cohen’s d values for IoUs from four object detection algorithms. We compared the algorithms trained by MS COCO and retrained by MS COCO with game scenes. Note that MS COCO is abbreviated as MS in the table.
Since four Cohen’s d values in Table 10 are greater than 0.8, we can conclude that the effect size of retraining the algorithms using game scenes is great for four algorithms.
We also suggest the Cohen’s d values measured from the MS COCO dataset in Table 11, where three Cohen’s d values are greater than 0.8. We can also conclude that the effect size of retraining the algorithms using game scenes is great for three algorithms.
In summary, mAP is improved for 61 of 80 cases and IoU for 68 of 80 cases. When we performed a t-test on p < 0.05 , 7 of 10 cases showed a significantly unique improvement for mAP and 8 of 10 cases for IoU. When we measured the effect size, 8 of 10 cases showed a large effect size for mAP and 7 of 10 for IoU. Therefore, we can answer the research questions as the object detection algorithms retrained with game scenes show an improved mAP and IoU compared with the algorithms trained only with public datasets including PascalVOC and MS COCO.

5.3. Training with Augmented Dataset

An interesting approach for improving the performance of object detection algorithms on game scenes is to employ augmented images from datasets such as Pascal VOC or MS COCO. In several studies, intentionally transformed images are generated and employed to train pedestrian detection [,]. In our approach, stylization schemes are employed to render images in some game scene style. The stylization schemes we employ include flow-based image abstraction with coherent lines [], color abstraction using bilateral filters [] and deep cartoon-styled rendering [].
In our approach, we augmented 3000 images by applying three stylization schemes [,,] and retrained object detection algorithms. Some of the augmented images are suggested in Figure 6. In Table 16, we present a comparison between mAP values from pre-trained algorithms and mAP values from retraining with the augmented images. We tested this approach for the Pascal VOC dataset.
Figure 6. The augmented images from Pascal VOC: (a) is the sampled images from Pascal VOC dastaset, (b) is produced by flow-based image abstraction with coherent lines [], (c) is produced by color abstraction using a bilateral filter [] and (d) is produced by deep cartoon-styled rendering [].
Table 16. The comparison of mAPs of the object detection algorithms between the VOC dataset trained with Pascal and retrained using augmented images.
Among the eight games we used for the experiment, the scenes from D o r a e m o n show similar styles to the augmented images. It is interesting to note that this approach shows somewhat improved results on the scenes from D o r a e m o n . For other game scenes, we cannot recognize the improvement in the results. Figure 7 illustrates the comparison of three approaches: (i) trained with Pascal VOC, (ii) retrained with augmentation and (iii) retrained with game scenes.
Figure 7. Comparison of three approaches: training only with Pascal VOC, retrained with augmented images and retrained with game scenes. The red rectangle shows comparison of mAPs on scenes from D o r a e m o n , which shows greatest improvement. The blue rectangle shows comparison of average mAPs.

6. Conclusions and Future Work

This study proved that the object detection algorithms retrained using game scenes show an improved performance compared with the algorithms trained only with the public datasets. Pascal VOC and MS COCO, two of the most frequently used datasets, were employed for our study. We tested our approach for five widely used object detection algorithms, YOLOv3, SSD, Faster R-CNN, FPN and EfficientDet, and for eight games from various genres. We estimated mAP between the pre-trained and retrained algorithms to show that object recognition accuracy is improved. We also estimated IoU to show that the accuracy of localizing objects is improved. We also tested data augmentation schemes that can be applied for our purpose, which shows very limited results according to the style of game scenes.
We have two further research directions. One direction is to establish a dataset about game scenes to improve the performance of existing object detection algorithms on game scenes. We aim to include various non-existent characters such as dragons, elves or orcs. Another direction is to modify the structure of the object detection algorithms to optimize them on game scenes.

Author Contributions

Conceptualization, M.J. and K.M.; methodology, H.Y.; software, M.J.; validation, H.Y. and K.M.; formal analysis, K.M.; investigation, M.J.; resources, M.J.; data curation, M.J.; writing—original draft preparation, H.Y.; writing—review and editing, K.M.; visualization, K.M.; supervision, K.M.; project administration, K.M.; funding acquisition, K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by Sangmyung Univ. Research Fund 2019.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In Appendix A, we present eight figures (Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7 and Figure A8) that sample the results for eight games by five important object detection algorithms: YOLOv3, SSD, Faster R-CNN, FPN and EfficientDet.
Figure A1. Comparison of the bounding box detection on game 7 Days to Die. The left column is the result from the models trained by PASCAL VOC and the right column is the result from the models trained by PASCAL VOC and game scenes.
Figure A2. Comparison of the bounding box detection on game Sims. The left column is the result from the models trained by PASCAL VOC and the right column is the result from the models trained by PASCAL VOC and game scenes.
Figure A3. Comparison of the bounding box detection on game Animal Crossing. The left column is the result from the models trained by PASCAL VOC and the right column is the result from the models trained by PASCAL VOC and game scenes.
Figure A4. Comparison of the bounding box detection on game Asphalt 8. The left column is the result from the models trained by PASCAL VOC and the right column is the result from the models trained by PASCAL VOC and game scenes.
Figure A5. Comparison of the bounding box detection on game FIFA20. The left column is the result from the models trained by PASCAL VOC and the right column is the result from the models trained by PASCAL VOC and game scenes.
Figure A6. Comparison of the bounding box detection on game Doraemon. The left column is the result from the models trained by PASCAL VOC and the right column is the result from the models trained by PASCAL VOC and game scenes.
Figure A7. Comparison of the bounding box detection on game Left4Dead2. The left column is the result from the models trained by PASCAL VOC and the right column is the result from the models trained by PASCAL VOC and game scenes.
Figure A8. Comparison of the bounding box detection on game Gangstar, New Orleans. The left column is the result from the models trained by PASCAL VOC and the right column is the result from the models trained by PASCAL VOC and game scenes.

References

  1. Redmon, J.; Farhardi, A. YOLOv3: An Incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  2. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A. SSD: Single shot multiBox detector. In Proceedings of the ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
  4. Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
  5. Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
  6. Everingham, M.; van Gool, L.; Williams, C.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
  7. Everingham, M.; Eslami, S.A.; van Gool, L.; Williams, C.; Winn, J.; Zisserman, A. The Pascal Visual Object Chasses Challenges: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
  8. Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C. Microsoft COCO: Common Objects in Context. In Proceedings of the ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
  9. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
  10. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  11. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
  12. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  13. Girschick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  14. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Utsumi, O.; Miura, K.; Ide, I.; Sakai, S.; Tanaka, H. An object detection method for describing soccer games from video. In Proceedings of the IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, 26–29 August 2002; pp. 45–48. [Google Scholar]
  16. Chen, Z.; Yi, D. The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI. arXiv 2017, arXiv:1702.05663. [Google Scholar]
  17. Sundareson, P. Parallel image pre-processing for in-game object classification. In Proceedings of the IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Bengaluru, India, 5–7 October 2017; pp. 115–116. [Google Scholar]
  18. Venkatesh, A. Object Tracking in Games Using Convolutional Neutral Networks. Master’s Thesis, California Polytechnic State University, San Luis Obispo, CA, USA, 2018. [Google Scholar]
  19. Liu, S.; Zheng, B.; Zhao, Y.; Guo, B. Game robot’s vision based on faster R-CNN. In Proceedings of the Chinese Automation Congress (CAC) 2018, Xi’an, China, 30 November–2 December 2018; pp. 2472–2476. [Google Scholar]
  20. Chen, Y.; Huang, W.; He, S.; Sun, Y. A Long-time multi-object tracking method for football game analysis. In Proceedings of the Photonics & Electromagnetics Research Symposium-Fall 2019, Xiamen, China, 17–20 December 2019; pp. 440–442. [Google Scholar]
  21. Tolmacheva, A.; Ogurcov, D.; Dorrer, M. Puck tracking system for aerohockey game with YOLO2. J. Phys. Conf. Ser. 2019, 1399. [Google Scholar] [CrossRef] [Green Version]
  22. Yao, W.; Sun, Z.; Chen, X. Understanding video content: Efficient hero detection and recognition for the game Honor of Kings. arXiv 2019, arXiv:1907.07854. [Google Scholar]
  23. Spijkerman, R.; van der Haar, D. Video footage highlight detection in Formula 1 through vehicle recognition with faster R-CNN trained on game footage. In Proceedings of the International Conference on Computer Vision and Graphics 2020, Warsaw, Poland, 14–16 September 2020; pp. 176–187. [Google Scholar]
  24. Kim, K.; Kim, S.; Shchur, D. A UAS-based work zone safety monitoring system by integrating internal traffic control plan (ITCP) and automated object detection in game engine environment. Autom. Constr. 2021, 128. [Google Scholar] [CrossRef]
  25. YOLO in Game Object Detection. 2019. Available online: https://forum.unity.com/threads/yolo-in-game-object-detection-deep-learning.643240/ (accessed on 10 March 2019).
  26. List of Video Game Genres. Available online: https://en.wikipedia.org/wiki/List_of_video_game_genres (accessed on 17 September 2021).
  27. 7 Days to Die. Available online: https://7daystodie.com/ (accessed on 5 August 2013).
  28. Left 4 Dead 2. Available online: https://www.l4d.com/blog/ (accessed on 17 November 2009).
  29. Gangstar New Orleans. Available online: https://www.gameloft.com/en/game/gangstar-new-orleans/ (accessed on 7 February 2017).
  30. Sims4. Available online: https://www.ea.com/games/the-sims/the-sims-4 (accessed on 2 September 2014).
  31. Animal Crossing. Available online: https://animal-crossing.com/ (accessed on 14 April 2001).
  32. Doraemon. Available online: https://store.steampowered.com/app/965230/DORAEMON_STORY_OF_SEASONS/ (accessed on 13 June 2019).
  33. Asphalt 8. Available online: https://www.gameloft.com/asphalt8/ (accessed on 22 August 2013).
  34. FIFA 20. Available online: https://www.ea.com/games/fifa/fifa-20 (accessed on 24 September 2019).
  35. Huang, S.; Ramanan, D. Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 2243–2252. [Google Scholar]
  36. Chan, Z.; Ouyang, W.; Liu, T.; Tao, D. A Shape Transformation-based Dataset Augmentation Framework for Pedestrian Detection. Int. J. Comput. Vis. 2021, 129, 1121–1138. [Google Scholar] [CrossRef]
  37. Kang, H.; Lee, S.; Chui, C. Flow-based image abstraction. IEEE Trans. Vis. Comp. Graph. 2009, 15, 62–76. [Google Scholar] [CrossRef] [PubMed]
  38. Winnemoller, H.; Olsen, S.; Gooch, B. Real-time video abstraction. ACM Trans. Graph. 2006, 25, 1221–1226. [Google Scholar] [CrossRef]
  39. Kim, J.; Kim, M.; Kang, H.; Lee, K. U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.