Performance Evaluation of Ground AR Anchor with WebXR Device API

: Recently, the development of 3D graphics technology has led to various technologies being combined with reality, where a new reality is deﬁned or studied; they are typically named by combining the name of the technology with “reality”. Representative “reality” includes Augmented Reality, Virtual Reality, Mixed Reality, and eXtended Reality (XR). In particular, research on XR in the web environment is actively being conducted. The Web eXtended Reality Device Application Programming Interface (WebXR Device API), released in 2018, allows instant deployment of XR services to any XR platform requiring only an active web browser. However, the currently released tentative version has poor stability. Therefore, in this study, the performance evaluation of WebXR Device API is performed using three experiments. A camera trajectory experiment is analyzed using ground truth, we checked the standard deviation between the ground truth and WebXR for the X, Y, and Z axes. The difference image experiment is conducted for the front, left, and right directions, which resulted in a visible difference image for each image of ground truth and WebXR, small mean absolute error, and high match rate. In the experiment for measuring the 3D rendering speed, a frame rate similar to that of real-time is obtained.


Introduction
Web eXtended Reality (WebXR) is a standard used to support the rendering of a 3D scene with appropriate hardware to present a virtual environment in a web environment or to add graphic images to the real environment. In other words, it is used to provide extended reality in a web environment. In the past, extensive studies for supporting virtual environments in a web environment have been performed, such as WebVR application programming interface (API) and WebAR API [1,2]. However, because these APIs were developed to service a particular virtual environment, supporting other virtual environments became difficult and modifications were time-consuming. According to Mozilla, some browsers may still support WebVR API, but they have already been removed from the relevant web standards, are in the process of being removed, or are only being maintained for compatibility purposes [3]. Therefore, this paper used a new WebXR Device API including these APIs, and evaluated how well it performs through experiments.
The World Wide Web Consortium (W3C) group released the Web eXtended Reality Device Application Programming Interface (WebXR Device API) in 2018. This interface allows instant deployment of extended reality services to any extended reality platform requiring only an active web browser. In addition, both augmented reality and virtual reality can be supported with minimal modifications. Moreover, large downloads are not required and users can immediately experience virtual environments on the website. For this reason, research on this interface is actively underway. However, the current version is unstable, which can cause a deviation in the anchor position when the measuring device is moved, which can reduce the stability of the anchor. When this issue occurs, the 3D model appears to be in a location other than the location specified by the user. As a result, users can be less immersed in the AR environment. Therefore, the need for research on interface performance evaluation has emerged [4].
In this study, we aimed to evaluate the performance of the WebXR Device API and conducted a performance evaluation study. It is important to determine the user could sufficiently use the API or additional development of the API is necessary and the developer is worth investing time in the development of the API.
The experiment was conducted in two different ground environments, and experiments were conducted after loading a 3D model through the WebXR Device API in a mobile web environment. The first experiment measures the camera trajectory in a web environment when the device moves after loading a 3D model. In the second experiment, after the 3D model is loaded, a real model with the same size and shape as the 3D model is created and placed in the same position for accuracy measurement through the difference image, mean absolute error, and match rate. The third experiment is a 3D model rendering speed measurement experiment.
The structure of this paper is as follows. In Section 2, the background theory is explained. In Section 3, the experimental environment and experimental methods for performance evaluation are described. In Section 4, the experimental results and discussions of a performance evaluation using the proposed method are presented. Finally, Section 5 presents conclusions, limitations, and future research plans.

Background Theory
Section 2 describes the definitions of the various virtual environments, the types of each environment, and the differences. In addition, the contents of the virtual environment used in this paper are explained. Section 2.1 describes eXtended reality. At this time, the difference between Virtual Reality, Mixed Reality, and Augmented Reality will be explained. Section 2.2 describes Augmented Reality. Section 2.3 describes the anchor generation method for ground detection in this paper. Section 2.4 describes the WebXR Device API.

eXtended Reality
eXtended Reality (XR) refers to human-machine interaction using all real and virtual environments created by computer technology and wearable devices. XR is a concept that encompasses the entire spectrum of the reality-virtual continuum introduced by Paul Milgram from "complete reality" to "complete virtual". Figure 1 shows the reality-virtual continuum. Virtual Reality (VR) is a completely digital environment and Augmented Reality (AR) is close to a completely real environment, with Mixed Mixed Reality (MR) in between. In contrast, XR is defined as a concept that includes all virtual environments [5,6].

Augmented Reality
AR is one of the types of virtual environments. It is a computer graphic technique that combines virtual objects or information with the real environment such that the object appears to exist in the real environment. AR can additionally provide information that is difficult to obtain or is intended to be transmitted with a special purpose by synthesizing virtual objects in a real environment. Unlike VR, which presupposes a completely virtual environment, AR provides augmentation of information based on reality and aims to further increase the effect of reality by combining virtual objects with the real environment. In addition, AR serves to provide additional information necessary for the actual environment. By projecting virtual information on an image that a user is viewing, the distinction between the real environment and the virtual environment is blurred. According to Ronald Azuma, AR is defined as a combination of real and virtual images, enabling real-time interaction in a three-dimensional space [7][8][9][10].

Using Anchor for Ground Detection
To provide AR to the desired location, holding an accurate anchor on space is crucial. An anchor is a reference point appearing at a position to float a virtual object in a plane that can be easily indicated in a spatial coordinate system. When an object needs to be directional in AR implementation, it is a concept that includes x, y, and z coordinates in space and the rotation value of each axis. The methods for holding an anchor are largely separated into four types. The first is Image Anchor, which recognizes the image as a marker. The next is Ground Anchor, which recognizes the floor surface and implements it. The next one is point Cloud Anchor, which is implemented based on the characteristics of the point cloud. The last one is Object Anchor, which is generated by object recognition. Figure 2 shows a flow chart of how to recognize ground in a virtual environment. When the camera is looking at the real environment, if the camera is moved, point cloud information about the real environment is extracted using a Simultaneous Localization And Mapping (SLAM) algorithm. After that, the point cloud extracted by the RANdom SAmple Consensus (RANSAC) algorithm is sampled. A mesh is generated for the sampled point cloud. Thereafter, a normal vector orthogonal to the mesh is calculated, and a normal vector is a dot product for an adjacent mesh to check whether it has a constant normal vector. Grounding is checked by specifying a threshold for the dot product normal vectors. Finally, the values identified as the ground are projected onto the screen using 2D to 3D projection [11][12][13][14][15][16][17].

WebXR Device API
WebXR Device API was developed to serve XR in a web environment. It was released by the W3C group in 2018 and is still under development; the latest versions of the WebXR Anchor Module and WebXR Augmented Reality Module used in this study were released in June 2020. Previously, separate APIs were developed for WebVR [18,19] and WebAR [20]; moreover, WebVR was designed to support only VR. However, the WebXR Device API supports both VR and AR on the web. Support for the AR function was added by the WebXR Augmented Reality module; later, the development of the WebVR API was stopped.
WebXR has been extensively compared with Khronos Group's Open eXtended Reality (OpenXR) [21] Software Development Kit (SDK). OpenXR is an open, royalty-free standard for accessing VR and AR platforms and devices. It was developed by a working group managed by the Khronos Group consortium. The interim version of this standard, OpenXR 1.0, was released in July 2018, allowing developers to provide feedback. OpenXR handles the same basic functions as the WebXR Device API for native applications. Therefore, WebXR and OpenXR seem to have a similar relationship with the Web Graphics Library (WebGL) [22,23] and Open Graphics Library (OpenGL) [24]. Both WebGL and OpenGL are open source libraries published and managed by the Khronos group. In addition, WebGL was developed in a form that translated the API of OpenGL 3.0 into Javascript. In contrast, OpenXR is published and managed by the Khronos group, while WebXR is published and managed by the W3C Group. These two APIs are other APIs being developed by different standards organizations, and many of the same concepts are expressed in different ways on shared subjects. However, when building an XR service, implementing the functions of WebXR using OpenXR becomes possible.

Materials and Methods
Section 3 describes the materials and methods used in this paper. Section 3.1 describes the hit-test method to create the ground anchor. Section 3.2 describes the setup for the experiment. Section 3.3 describes the experimental environment. This includes the device used in the experiment, web browser, 3D model, etc., and Section 3.4 describes the experimental methods.

Hit-Test
Hit-test (hit detection, picking, or picking correlation) is the process of determining whether a cursor controlled by a user intersects a specific graphic object (point, line, or surface) drawn on the screen in computer graphics programming. There are various types of cursors such as a mouse cursor, an interface of a touch screen, or a touchpoint. Hit-testing can be performed on the movement or activation of a specific pointing device. The order of the hit-test used in this paper is as shown in Figure 3a. After the user accesses the web page, it checks whether it supports WebXR. If it is not supported, the page is terminated, and in the opposite case, the web page is activated. At this time, the AR button to access the immersive web environment is activated. Thereafter, when the AR button event occurs, the immersive web starts. After accessing the immersive web, it recognizes the ground and creates a ground anchor. At this time, as shown in Figure 3b, find the ground using the normal vector. This process is repeated until an anchor is created. When the Hit event occurs on the generated ground anchor, it checks whether the 3D model is loaded in the current immersive web page. If a 3D model exists, the 3D model is made invisible through the Hit event. In the opposite case, the 3D model is visualized. This process is performed every time a Hit event occurs. When a user requests termination from an immersive web page, it closes the web page.

Experimental Setup
Section 3.2 describes how to set up for the experiment. Section 3.2.1 describes how to setup the software. This chapter includes differences between OpenXR and WebXR, and dif-ferences between WebGL and OpenGL. Section 3.2.2 describes how to setup the hardware. This chapter includes how to create a 3D model and how to create a ground environment.

Software Setup
In this paper, the experiment was conducted by modifying the hit-test supported by the WebXR Device API, and the location of the ground anchor initially designated was saved to synchronize the location with the actual model, and to load the 3D model of the hit-test. How to designate a ground anchor is explained in Figure 3c. The ground anchor designation method is also called ray casting, and the "rayOrigin" is designated using a virtual ray in the state of initial access to the immersive web. After that, when the device is moved, the "rayDirection" is designated using a new virtual ray, and the intersection point where the two lines meet is designated as the ground anchor. WebXR has been extensively compared with Khronos Group's Open eXtended Reality (OpenXR) Software Development Kit (SDK). OpenXR is an open, royalty-free standard for accessing VR and AR platforms and devices. It was developed by a working group managed by the Khronos Group consortium. The interim version of this standard, OpenXR 1.0, was released in July 2018, allowing developers to provide feedback. OpenXR handles the same basic functions as the WebXR Device API for native applications. Therefore, WebXR and OpenXR seem to have a similar relationship with the Web Graphics Library (WebGL) and Open Graphics Library (OpenGL). Both WebGL and OpenGL are open source libraries published and managed by the Khronos group. In addition, WebGL was developed in a form that translated the API of OpenGL 3.0 into Javascript. In contrast, OpenXR is published and managed by the Khronos group, while WebXR is published and managed by the W3C Group. These two APIs are other APIs being developed by different standards organizations, and many of the same concepts are expressed in different ways on shared subjects. However, when building an XR service, implementing the functions of WebXR using OpenXR becomes possible. Figure 4a,b shows the ground environment created for the difference image extraction experiment. It consists of a square with a side of 2 m. An arbitrary location a is selected where the object is placed; it is 1.5 m away from location 1 . For real users to use the WebXR environment, a certain range of space is required; therefore, a square environment was constructed. In addition, the performance of the ground anchor may differ depending on the texture of the floor surface, reflection of light, and shadows during the experiment. So we used ground with a large amount of light reflection as in (a), and a drawing paper as in (b) the texture was unified, and the floor environment with weak reflection was constructed. In this paper, performance evaluation was conducted by conducting the same experiment in different (a) and (b) environments.

Hardware Setup
In order to measure the accuracy of the 3D model loaded at the position of the ground anchor in the WebXR environment, a 3D model with the same shape as the real model was created. Figure 4c shows the 3D model created for the experiment; Figure 4d shows the actual model. Each model is a cube with a side of 15 cm. The checkered pattern on each side was composed of square tiles with a side of 3.75 cm. The errors of the developed 3D model and real model were compared in the WebXR environment. In the case of a flat-patterned model, a difference on the surface of an object does not appear when the difference image is extracted; only differences on the outline are detected. In contrast, for a checkered pattern, the difference on the surface, as well as the outline, can be checked; that is, more detailed difference values can be checked. Therefore, in this study, performance evaluation was conducted using a checkered pattern model to extract the difference image of the object more accurately. Figure 4e shows the environment of the device used in the experiment. The device used in the experiment is placed at positions 1 , 2 , and 3 , as shown in Figure 4a,b. The experiment was conducted and measurements were obtained from a height of 62 cm from the ground. Figure 4f shows the tilt of the device arranged as shown in Figure 4e. The inclination was measured with a level meter using a gyroscope sensor provided as standard with the device used in the experiment. The X-axis was parallel to the floor and the Y-axis was inclined 70°from the floor. The experiment was carried out by loading the object at the position a . The hit test module of the WebXR API is a system that augments the 3D model after specifying the ground anchor; therefore, it was carried out with an inclination toward a as shown in Figure 4a Table 1 presents a description of the experimental environment. When using the Google Chrome browser, measurement and debugging were performed using the Google Development Tool "inspect" that supports a debugging environment. To test the accuracy of the ground anchor and the loaded model, the hit module of the WebXR Device API was modified and used.

Experimental Method
Section 3.4 describes the experimental method. The experiment consists of three parts and is explained through each sub-chapter. These describes how to proceed with the experimental methods used in this paper: WebXR camera trajectory, checked pattern for difference image, and frames per second.

WebXR Camera Trajectory
The camera trajectory refers to the change in X, Y, and Z-axis movement during an arbitrary measurement time using the sensor of the device. In this paper, a head mounted display (HMD) that recognizes a head tracker is regarded as ground truth, and after loading a 3D model in WebXR, the device and ground truth were fixed and moved at the same time. The amount of change according to move was measured. The difference in the camera trajectory was extracted as the standard deviation based on the result value based on the ground truth. Equation (1) is explained for obtaining the standard deviation. X i is the i-th measured value,X is the average value, n is the number of measured values, v is the variance, and σ is the standard deviation.

WebXR with Checked Pattern
The method of obtaining the difference image from the captured image is explained in Equation (2). The image in which the actual model is used to obtain the difference image is denoted as R, the image captured by the 3D model is denoted as V, and the image of only the background is denoted as B.
Equation (3) is used for obtaining the MAE of the difference image. For each image, to obtain an average of the error between pixels, the absolute error between the measured value and the absolute value is obtained; all the obtained absolute errors are added. The average is calculated by dividing this by the number of absolute errors. x is the X-axis pixel coordinate of the image, y is the Y-axis pixel coordinate, W is the width, and H is the height.
In addition, the pixel values of the x and y coordinates of the image where the real object is captured are R(x, y) and pixel values of the x and y coordinates of the image where the 3D model is captured are V(x, y); only the background is captured. The pixel value of the image's x and y coordinates is B(x, y). At this time, R(x, y) is called "real value", V(x, y) is called "measured value".

Frames Per Second (FPS)
Equation (5) was used to find the number of FPS. The duration of start time to end time was measured and the difference was calculated. The absolute value was the result value; it was divided by the number of frames during measurement. T S is the start time and T E is the end time (unit of T is ms). F is the number of frames.

WebXR Camera Trajectory
In order to experiment with the camera trajectory along each axis, a comparative experiment between ground truth and WebXR was performed for each of the X, Y, and Z axes. An HMD that included a head tracker was selected for the ground truth experiment, which showed high accuracy in terms of camera trajectory because the head tracker performed the function of tracking the display device. It has been used in several studies due to its high accuracy within the sensor range [25][26][27].
After loading the 3D model in the same position as the head tracker that the ground truth could recognize, the device was moved 50 cm in each direction of the X-axis direction (left, right), 50 cm in each direction of the Y-axis direction (up, down), and 50 cm in each direction of the Z-axis direction (front, back). When moving about each axis, the difference between the ground truth and the camera trajectory of WebXR was analyzed. Figure 5a is the result of an experiment to evaluate the accuracy of the camera trajectory about the X-axis. The experiment was conducted for about 520 frames. The highest point of ground truth was 0.3189, and at this time, the same frame value for the ground with strong reflection was 0.4624, and the same frame value for the ground with weak reflection was 0.3819. The lowest point of ground truth is −0.2251, and at this time, the same frame value of the ground with strong reflection was −0.4284, and the same frame value of the ground with weak reflection was −0.2336. Figure 5b is the result of an experiment to evaluate the accuracy of the camera trajectory about the Y-axis. The experiment was conducted for about 1220 frames. The highest point of ground truth is 2.2366, and at this time, the same frame value of the ground with strong reflection was 2.1113, and the same frame value of the ground with weak reflection was 2.1609. The lowest point of ground truth was 1.8774, and at this time, the same frame value of the ground with strong reflection was 1.8199 and the same frame value of the ground with weak reflection was 1.8485. Figure 5c is the result of an experiment to evaluate the accuracy of the camera trajectory about the Z-axis. The experiment was conducted for about 1230 frames. The high point of the ground truth was 1.4055, and at this time, the same frame value of the ground with strong reflection was 1.5649, and the same frame value of the ground with weak reflection was 1.5525. The lowest point of ground truth was 1.1606, and at this time, the same frame value of the ground with strong reflection was 1.1614, and the same frame value of the ground with weak reflection was 1.1551. As a result of the experiment, when comparing the ground truth and WebXR for each axis, it was confirmed that there was a difference in the camera trajectory based on an anchor that was arbitrarily designated.  Table 2 presents the standard deviation according to the difference between the ground truth (GT: ground truth) and the camera trajectory of WebXR. The results of the experiment showed the standard deviation of the ground with strong reflection (SR: strong reflection) on the X-axis is 0.73 cm, and the standard deviation of the ground with weak reflection (WR: weak reflection) is 0.21 cm. On the Y-axis, the standard deviation of the ground with strong reflection is 0.32 m, and the standard deviation of the ground with weak reflection is 0.38 cm. Additionally, in the Z-axis, the standard deviation of the ground with strong reflection was 0.55 cm, and the standard deviation of the ground with weak reflection was 0.51 cm.  Figure 6 shows an image captured through the device after loading the actual model and the 3D model in the WebXR environment. The image was captured when moving to the left ( 2 ) and right ( 3 ) from the image at ( 1 ). When an object was loaded based on the anchor ( a ) shown in Figure 4a,b, the difference between the position of the real model and the 3D model was checked. In order to calculate the difference between the anchor and model loading accuracy of the WebXR Device API, the image captured only in the real model was considered as ground truth.
The captured image had a resolution of 1440 × 3088. To calculate the difference image, an evaluation was performed using MATLAB software (Ver.R2020a). Because most calculated images had the same pixel value when the difference image was extracted, the background was identified as black (RGB #000000); the parts where the difference between the ground truth and the WebXR image was a non-zero value in the image. The result derived using Equation (1) is expressed through the difference image in Figure 7; it shows the difference image measured at positions 1 , 2 , and 3 . The results of the experiment show the difference between the ground truth and WebXR images, which shows the accuracy of the WebXR anchor. Figure 7 shows the image with the background removed to obtain the mean absolute error (MAE) of the image captured at positions 1 , 2 , and 3 . MAE is the amount of the error in the measured value; it is the difference between the "measured value" and the "real value". In this study, the actual value was designated as the ground truth and the measured value was designated as WebXR. In addition, MAE represents a qualitative measure of the accuracy of the statistical estimation; the lower the number, the higher will be the accuracy. In this study, the MAE is a value between 0 and 255. After obtaining images of the real model and 3D model, the background is removed to calculate the mean absolute error of the model. Figure 8 shows images where the background is removed in each direction to obtain the MAE. A binary image is created from the image in which the background is removed. Then, the average absolute error is calculated.  Table 3 shows the mean of the difference image. Experimental results are rounded to the fifth decimal place. In weak reflection ground, through experiments conducted for positions 1 , 2 , and 3 , mean differences of 0.1926, 0.7154, and 0.9611, respectively, were obtained. In addition, when normalized as a percentage, results of 0.0756%, 0.2805%, and 0.3770% were confirmed. In strong reflection ground, through experiments conducted for positions 1 , 2 , and 3 , mean differences of 0.2833, 1.3664, and 1.1958, respectively, were obtained. In addition, when normalized as a percentage, results of 0.1110%, 0.5358%, and 0.4689% were confirmed.  Figure 9 shows the intersection image of the real model images and the mask image of the union image in each direction to obtain the match rate between regions. In this paper, only a part of the mask image obtained through the experiment of the match rate between regions is expressed as shown in Figure 9. The number of pixels in the non-black (RGB #000000) part of each image is extracted. The match rate between regions is calculated using the extracted number of pixels. The match rate uses the ratio of the number of pixels in the intersection image and the number of pixels in the union image.  Table 4 shows the match rates between the intersection image and the union image. The total number of pixels in each image was 4,446,720 from the experiment. In weak reflection ground, the number of pixels in the intersection image are 57,513, 69,283, and 78,231 for 1 , 2 , and 3 , respectively. The number of pixels in the union image are 62,113, 77,936, and 88,145, and match rates of 92.6%, 88.9%, and 88.8% were obtained. In strong reflection ground, the number of pixels in the intersection image are 56,826, 62,567, and 73,255 for 1 , 2 , and 3 , respectively. The number of pixels in the combined image are 61,346, 90,720, and 90,963, and match rates of 92.6%, 69.0%, and 80.5% were obtained.

3D Model Rendering FPS Evaluation in WebXR
FPS represents the rate at which frames change in seconds. In general, 15 frames are supported for animation, 24 frames for movies, 30 frames for television, and 60 frames for broadcasting such as sports. The human eye can process a variable frame rate and recognizes more than 8 continuous FPS as natural motion; it can be said that real-time performance is guaranteed when the number of FPS is more than 30. If real-time performance is guaranteed, users can use it naturally while providing services. Therefore, FPS measurement was performed to check whether XR guarantees real-time performance in a web environment. The FPS experiment was conducted after loading a 3D model in a web environment. Figure 10 shows the result of measuring FPS after the 3D model loading in the WebXR environment while moving in each direction from 1 to 2 and 3 as shown in Figure 4a,b. The experiment was conducted for 174 s. The X-axis is the time (s), and the Y-axis is the frame count rendered per second. The results confirmed that an average of 29.7 FPS were processed.

Discussion
The experimental results conducted in this paper were conducted with the ground environment, the 3D model to be loaded, and the device environment for measurement fixed. In order to compare the performance according to the difference in the ground environment, the experiment was divided into the ground with strong reflection and the ground with weak reflection. In addition, performance differences according to environments were analyzed by conducting the same experiment in different ground environments.
In the case of the camera trajectory experiment, a 3D model was loaded in the WebXR environment while the HMD was used as ground truth and the mobile device was fixed, and the fixed device was simultaneously moved by the X, Y, and Z axes. Experiments were conducted for each axis to improve the accuracy of the amount of change when moving in 3D space based on the anchor. In the case of an HMD used as ground truth, the head tracking function can be used to more accurately check the result value of the movement amount when the device is moved. Through this, an experiment was conducted by designating an anchor and loading a 3D model from a mobile device at the same location as the head tracker. Even if the experiment is conducted as in this paper, if the accuracy of the anchor is inferior, when the device is moved after loading the 3D model, a difference in the amount of movement may occur in the mobile device compared to the ground truth. In order to clearly confirm the result of this difference, the standard deviation of the moving amount was calculated in this paper.
In the case of the difference image experiment, the device was moved to the left and right while the horizontal and viewpoint were fixed. The movement to each side was designated as an angular position based on a straight line connecting the position of the front and the anchor vertically. Ground images from each direction, images with only 3D models loaded, images with only real models fixed, and images loaded with both real and 3D models were captured. If the anchor's accuracy is poor, we can check if the result value is not black (#000000) when calculating the difference image of the captured image. In addition, in this paper, the reliability evaluation method used in the image processing field was used to improve the reliability of the accuracy of the performance evaluation. The average absolute error and IoU were used to express the result mathematically and to obtain the error by pixel and by region, respectively.
Among the methods used, the mean absolute error was used to check the error mean for the pixel values of the difference image. The normalization value of the mean absolute error can be confirmed as the same result as using the mean absolute percentage error (MAPE) method. Additionally, IoU was used to check the match rate for the area of the difference image. In the image processing field, there are various methods for evaluating the reliability of the error rate and the match rate, such as MAE, mean squared error (MSE), MAPE, and IoU. Among them, the mean absolute error can be checked most intuitively because there is no change in the result value of the difference image because the inter-pixel calculation is performed in the calculation process. In addition, IoU is used as a measure of the accuracy between the corresponding regions by dividing the area of the intersection region of the two regions to be compared by the value of the sum region. Therefore, it was used to confirm the agreement rate between regions.
In the case of the FPS experiment, after loading the 3D model in the WebXR environment, the experiment was conducted to measure the rendering speed. In addition, to improve the consistency of the experiment, the speed at which the 3D model is rendered was measured during the difference image experiment.

Conclusions
The purpose of this paper was to evaluate the performance of the WebXR Device API, which was released to provide XR services in a web environment. The interface was recently released and the currently released tentative version has poor stability; therefore, a performance evaluation study was conducted for smooth service. For the performance evaluation, the experiment conducted in this paper produced a 3D model and a floor environment. Additionally, we approached extended reality in a web environment through a web server built for the experiment. After that, the 3D model was loaded and proceeded. The first experiment is an experiment on the camera trajectory when the camera moves in the X, Y, and Z axes. The second is the difference image, mean absolute error, and IoU by placing a real model having the same size and shape as the loaded 3D model in the same position. The third experiment consisted of three experiments measuring 3D model rendering speed. In the first and second experiments, we measured and analyzed the comparison with WebXR by specifying the ground truth.
Since the proposed performance evaluation method focuses on expressing a virtual environment in a web environment, we performed an experiment by fixing the measurement equipment and ground environment and limiting the 3D model. However, in the case of real users, various devices, ground environments, and 3D models can be used. In future studies, the performance evaluation methods in various types of experimental environments will be studied. Finally, the experimental results of this study are expected to be the basis for the technical performance evaluation method of the WebXR interface in the future.