Mathematical Modeling Techniques in Virtual Reality Technologies: An Integrated Review of Physical Simulation, Spatial Analysis, and Interface Implementation

Lee, Junhyeok; Kim, Yong-Hyuk; Lee, Kang Hoon

doi:10.3390/sym18020255

Open AccessReview

Mathematical Modeling Techniques in Virtual Reality Technologies: An Integrated Review of Physical Simulation, Spatial Analysis, and Interface Implementation

by

Junhyeok Lee

¹

,

Yong-Hyuk Kim

²

and

Kang Hoon Lee

^2,*

¹

Department of Computer Science, Kwangwoon University, 20, Gwangun-ro, Nowon-gu, Seoul 01897, Republic of Korea

²

School of Software, Kwangwoon University, 20, Gwangun-ro, Nowon-gu, Seoul 01897, Republic of Korea

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(2), 255; https://doi.org/10.3390/sym18020255

Submission received: 18 December 2025 / Revised: 24 January 2026 / Accepted: 27 January 2026 / Published: 30 January 2026

(This article belongs to the Special Issue Mathematics: Feature Papers 2025)

Download

Browse Figures

Versions Notes

Abstract

Virtual reality (VR) has emerged as a complex technological domain that demands high levels of realism and interactivity. At the core of this immersive experience lies a broad spectrum of mathematical modeling techniques. This survey explores how mathematical foundations support and enhance key VR components, including physical simulations, 3D spatial analysis, rendering pipelines, and user interactions. We review differential equations and numerical integration methods (e.g., Euler, Verlet, Runge–Kutta (RK4)) used to simulate dynamic environments, as well as geometric transformations and coordinate systems that enable seamless motion and viewpoint control. The paper also examines the mathematical underpinnings of real-time rendering processes and interaction models involving collision detection and feedback prediction. In addition, recent developments such as physics-informed neural networks, differentiable rendering, and neural scene representations are presented as emerging trends bridging classical mathematics and data-driven approaches. By organizing these elements into a coherent mathematical framework, this work aims to provide researchers and developers with a comprehensive reference for applying mathematical techniques in VR systems. The paper concludes by outlining the open challenges in balancing accuracy and performance and proposes future directions for integrating advanced mathematics into next-generation VR experiences.

Keywords:

virtual reality (VR); mathematical modeling; physical simulation; numerical integration; spatial transformation; human–computer interaction

1. Introduction

Virtual reality (VR) imposes unusually tight mathematical and systems constraints. Dynamics must be physically plausible. Spatial estimation must be accurate. Rendering must be photometrically consistent. Interaction must be responsive. All of these must run within millisecond budgets at 90–120 Hz to sustain presence and avoid cybersickness. Even small latency spikes or implausible physics can break immersion. VR is therefore a stringent testbed for methods spanning differential equations, optimization on manifolds, real-time rendering, and human–computer interaction.

Classical foundations remain indispensable. Rigid-body dynamics and constraint methods [1,2] govern motion and contact; illumination and projection models [3,4,5] determine image formation; quaternions and inverse kinematics (IK) on Special Euclidean Group (3D) (

SE (3)

) [6,7,8,9] enable natural manipulation and embodiment. Recent advances augment rather than replace these tools: NeRFs [10] and real-time variants [11,12] learn geometry and appearance for high-quality view synthesis; differentiable rendering [13] exposes image-level gradients for joint calibration; Physics-Informed Neural Networks (PINNs) [14] embed physical laws in learning.

1.1. Scope, Perspective, and Positioning

We focus on mathematical models that enable real-time VR on consumer hardware (90–120 Hz). The survey is system-oriented. It is organized around four pillars:

(i): Physics-based simulation: integration, constraints, collision, learned surrogates.
(ii): Spatial representation and geometric transforms: quaternions, Lie groups, differentiable transforms, manifold optimization.
(iii): Real-time rendering: Phong/Bidirectional Reflectance Distribution Function (BRDF), GPU pipelines, neural/differentiable rendering, perceptual strategies.
(iv): Interaction modeling: sensing/pose, IK, hand/body tracking, constraint-based manipulation, haptics, comfort-aware control.

Authoritative surveys exist for 3D user interfaces [15], classical rendering/geometry [4,5], inverse kinematics [9], and rigid-body physics [2]. Recent surveys address neural, differentiable, and foveated rendering [10,11,12,13,16,17], but typically treat these topics in isolation or outside a VR systems perspective. Our positioning reads physics, tracking/Simultaneous Localization and Mapping (SLAM), rendering, and interaction together, framed by VR-specific constraints (quality–latency–memory trade-offs, motion-to-photon, reprojection error, SSQ) and by cross-module interfaces (e.g.,

SE (3)

tracking feeding differentiable rendering and avatar controllers).

1.2. Contributions and Methodology

This article is a survey; contributions are curatorial and integrative:

(i): We organize stability-oriented practices for VR physics. We clarify when semi-implicit/symplectic integrators are preferable. We explain how constraint projection maintains plausibility under contact. We identify where perceptually gated adaptivity can safely reduce cost. We also situate differentiable simulators within this toolbox.
(ii): We connect literatures on manifold optimization, $SE (3)$ SLAM, differentiable image formation, and neural scene fields. We present an end-to-end view for tracking, calibration, and reconstruction. We include a taxonomy and typical failure modes for avatars and hand–object interaction.
(iii): We review hybrid and neural rendering for dual-eye 90/120 Hz VR. We distill rules of thumb for the quality–latency–memory trade-off (e.g., radiance caching, network factorization, perceptual scheduling).
(iv): We align methods in physics, rendering, and interaction (IK, haptics, personalized avatars) with user-facing and system metrics. We link algorithm families to time-to-grasp, slip rate, embodiment, Absolute Trajectory Error (ATE)/Relative Pose Error (RPE), reprojection error, and motion-to-photon latency.
(v): We compile checklists and a consolidated benchmark table (Appendix A). We outline a near-term (1–2 year) agenda with concrete goals. Targets include numerical energy drift $\leq 1 %$ on standard scenes. We seek average reprojection error $< 1$ px. We target end-to-end motion-to-photon $< 20$ ms. We aim for 20– $30 %$ lower SSQ under standardized protocols. To ground the discussion in practical device constraints, Appendix A.1 reports a headset specification summary (Table A1).

To provide a high-level roadmap, Figure 1 summarizes the simplified VR system pipeline considered in this survey (hardware → tracking → physics/IK → rendering → display) and maps each block to the corresponding sections and representative equations. In particular, we later illustrate the numerical stability of the physics update in Figure 2 and detail the rendering pipeline in Figure 3.

Selection criteria. We surveyed 2020–2025 work in SIGGRAPH, TOG, Eurographics, IEEE VR, TVCG, Virtual Reality, CHI, CVPR, ICCV, ECCV, NeurIPS, and ICRA, prioritizing: (i) real-time constraints (≥90 Hz or ≤20 ms), (ii) explicit mathematical formulations, (iii) quantitative validation, and (iv) integration/deployment considerations. Foundational references are included where they clarify principles or baselines.

1.3. Reader’s Guide and Conventions

Reader’s guide. Section 2 covers numerical integration, constraint satisfaction, collision detection/response, and stability–performance trade-offs. Section 3 treats coordinate frames, rotation representations, differentiable transforms, manifold optimization, and pose-graph estimation. Section 4 spans classical to neural/differentiable rendering and perceptual strategies. Section 5 reviews sensing, IK, haptics, and comfort-aware control. Section 6 sets near-term targets; Section 7 synthesizes implications.

Notation and conventions. Bold letters denote vectors/matrices;

SO (3)

/

SE (3)

denote rotations/rigid motions; unit quaternions represent orientation. We use standard reprojection/trajectory metrics (ATE/RPE), SSQ for comfort, and per-eye throughput (resolution × refresh) for system reporting.

2. Physics-Based Simulation in VR

VR immersion depends not only on visual fidelity but also on physically consistent responses under tight frame budgets. This section formalizes equations of motion and constraints, compares time integration schemes under accuracy-stability-structure trade-offs, summarizes contact/friction as non-smooth dynamics, and details perceptual/system-aware adaptivity and learning-augmented simulators relevant to real-time deployment.

Table 1 provides an at-a-glance summary of modeling priorities across representative VR application domains. The ratings indicate relative emphasis (1–5) in terms of typical system requirements rather than a universal ranking. These priorities guide the focus of the subsequent sections on tracking, physics/IK, and rendering, where we detail the corresponding mathematical formulations.

2.1. VR Strengths and Limitations (at a Glance)

Strengths
–
Perceptual realism: physically consistent responses increase presence and make user action outcomes predictable.
–
Interaction stability: constraint-aware solvers reduce jitter/failure during grasp/manipulation [2].
–
Composability: shared rigid/soft formulations unify haptics, avatars, and environment logic.
Limitations
–
Strict frame budgets: solver work must fit $\approx 11.1$ ms @ 90 Hz (8.3 ms @ 120 Hz); physics LOD/scheduling are mandatory [18,19].
–
Stability–accuracy trade-off: Semi-implicit/projection schemes preserve stability but bias trajectories. Higher-order non-symplectic methods are costlier [2,20].
–
Contact/constraint overhead: non-smooth friction/contact induces LCP/SOCP solves; under-resolved contacts cause visible interpenetration/haptic artifacts [21,22].
–
Latency coupling: physics–tracking–render clocks interact; motion-to-photon and prediction errors degrade comfort [5].

2.2. Equations of Motion and Constraints

This subsection states the equations of motion for a rigid multibody system and how equality/inequality constraints enter the model. We fix notation for the state

(q, v)

, separate model terms

M, C, f_{ext}

, and introduce Lagrange multipliers for constraints; these definitions will be used throughout the section.

Classical physically based modeling formulations established the rigid-body dynamics and constraint foundations that still underpin real-time physics engines used in modern VR [23].

Let generalized coordinates

q \in R^{n}

and velocities

v = \dot{q}

. In the absence of constraints, the equations of motion read

M (q) \dot{v} + C (q, v) = f_{ext} (q, v, t),

(1)

with mass matrix M, Coriolis/gyroscopic term C, and external forces

f_{ext}

.

Intuitively, Equation (1) is the n-DOF generalization of Newton’s law:

M (q)

maps accelerations to forces, while

C (q, v)

gathers velocity-dependent inertial effects that become prominent in coupled or fast motions.

Holonomic constraints

g (q) = 0

(e.g., joints) and unilateral nonpenetration

g_{n} (q) \geq 0

(contacts) impose algebraic conditions. Introducing Lagrange multipliers

λ

for the constraints yields the constrained system

\begin{matrix} M (q) \dot{v} + C (q, v) & = f_{ext} (q, v, t) + J {(q)}^{⊤} λ, \\ g (q) & = 0, J (q) = \partial g / \partial q, \end{matrix}

(2)

where J is the constraint Jacobian. Constraint drift can be reduced by Baumgarte stabilization (enforcing

\dot{g} + α g = 0

in the integrator) or by post-stabilization, i.e., projecting

(q, v)

back to the constraint manifold after each step [1,2].

A common post-stabilization step applies a minimal correction that projects the tentative configuration

\tilde{q}

back onto the constraint manifold:

q \leftarrow \tilde{q} + Δ q, Δ q = - M {(\tilde{q})}^{- 1} J {(\tilde{q})}^{⊤} {(J (\tilde{q}) M {(\tilde{q})}^{- 1} J {(\tilde{q})}^{⊤})}^{- 1} g (\tilde{q}),

(3)

where

J (q) = \partial g / \partial q

.

2.3. Time Integration: Accuracy, Stability, and Structure

This subsection contrasts commonly used time-stepping schemes for interactive VR physics and graphics. We focus on how order of accuracy, stability, and structure preservation trade off under real-time budgets. Let

Δ t

be the fixed timestep and let

a (q, v)

denote the instantaneous acceleration implied by forces and constraints.

Explicit Euler. The explicit Euler integration is formulated as follows:

v_{k + 1} = v_{k} + Δ t a (q_{k}, v_{k}), q_{k + 1} = q_{k} + Δ t v_{k} .

(4)

It is first-order accurate and cheap, but it tends to drift in energy over long horizons.

Symplectic (semi-implicit) Euler. The symplectic Euler integration is formulated as:

v_{k + 1} = v_{k} + Δ t a (q_{k}, v_{k}), q_{k + 1} = q_{k} + Δ t v_{k + 1} .

(5)

This scheme preserves a discrete symplectic structure and often exhibits bounded long-horizon energy error for Hamiltonian systems. To illustrate the long-horizon stability of Equation (4), Figure 2 compares the relative energy error of a simple pendulum using explicit Euler, symplectic (semi-implicit) Euler (Equation (4)), and RK4 under the same step size

Δ t

.

Position Verlet. This simple equation gives the position Verlet method:

q_{k + 1} = 2 q_{k} - q_{k - 1} + Δ t^{2} a (q_{k}) .

(6)

It is second-order accurate and symplectic for conservative forces and is widely used for cloth and soft-constraint models.

High-order non-symplectic (RK4). Classical RK4 is fourth-order accurate per step but non-symplectic and costlier; it suits small, nonstiff subsystems (e.g., camera oscillators) when local accuracy outweighs structure preservation [20].

In practice, semi-implicit updates are often coupled with constraint projection methods such as position-based dynamics (PBD) and extended position-based dynamics (XPBD) to maintain stability under contact bursts [2,24,25].

2.4. Contact, Friction, and Non-Smooth Dynamics

This subsection summarizes the discrete-time treatment of rigid contact and Coulomb friction used in real-time VR simulators. We adopt an impulse and velocity level view, which handles intermittent contact, sticking and sliding, and naturally leads to complementarity or cone projection problems.

Normal contact. Let

g_{n}

denote the end-of-step normal gap (non-penetration requires

g_{n} \geq 0

) and let

λ_{n} \geq 0

be the normal impulse delivered over the step. Normal contact is enforced by the complementarity condition

0 \leq λ_{n} ⊥ g_{n} \geq 0,

(7)

meaning either a positive gap with zero impulse, or zero gap with a compressive impulse [1,21].

Tangential friction. With coefficient

μ

, the Coulomb cone is

∥ λ_{t} ∥ \leq μ λ_{n},

(8)

with sticking if the post-impact tangential velocity

v_{t} = 0

lies strictly inside the cone, and sliding on the boundary with

v_{t}

opposite the friction direction (maximum dissipation).

Discrete formulations and solvers. Approximating the circular cone by a friction pyramid yields a linear complementarity problem (LCP); keeping the cone gives a second-order cone program (SOCP). Common real-time solvers include projected Gauss–Seidel and cone projection (PGS/CP), proximal or alternating direction method of multipliers (ADMM) variants, and interior-point methods

Proximity queries. Accurate contact generation requires robust distance queries and feature pairs. For rigid models, Gilbert–Johnson–Keerthi (GJK) algorithm with Expanding Polytope Algorithm (EPA) refinement and bounding volume hierarchies (BVHs) are standard; for deformables, signed distance fields or volumetric proxies are widely used [26,27,28].

2.5. Perceptual- and System-Aware Adaptivity

This subsection formalizes adaptivity for real-time VR as four ingredients: error-controlled timestepping, contact-aware substepping, gaze-driven physics LOD, and frame-budget scheduling. The goal is to meet a per-frame budget of B (e.g.,

B \approx 11.1

ms at 90 Hz) without perceptible artifacts [2,18,19,22].

(i): Error-controlled timestepping.

Let the integrator have local order p, and let

{\hat{e}}_{k}

be an estimate of the local truncation error at step k (from an embedded pair or a residual). For a user or device tolerance

τ_{phys}

,

{\hat{e}}_{k} \leq τ_{phys} \Rightarrow Δ t_{k + 1} = Δ t_{k} safety {(\frac{τ_{phys}}{{\hat{e}}_{k}})}^{\frac{1}{p + 1}},

(9)

with

safety \in (0, 1)

to prevent step rejection cascades. This keeps integration error below a perceptual threshold while exploiting slack when motion is slow.

(ii): Contact-aware substepping.

For a candidate contact with normal gap

g_{n} > 0

and closing speed

v_{n}^{-} \geq 0

, choose the number of substeps m inside the frame step

Δ t

so that predicted closure per substep is a fraction

η \in (0, 1)

of the gap:

m \geq ⌈\frac{v_{n}^{-} Δ t}{η g_{n} + ε}⌉, Δ t_{sub} = Δ t / m,

(10)

with small

ε

to avoid division by zero. This reduces penetrations without globally shrinking

Δ t

during brief contact bursts.

(iii): Gaze-driven physics LOD.

Let

π : R^{3} \to R^{2}

be the rendering projection and

J_{i} = \partial π / \partial x

evaluated at object i. To bound screen-space motion by an eccentricity-dependent tolerance

τ_{pix} (θ_{i})

(from eye-tracking),

∥ J_{i} δ x_{i} ∥ \leq τ_{pix} (θ_{i}), δ x_{i} \approx v_{i} Δ t_{i},

(11)

which yields a per-object update interval

Δ t_{i} \leq \frac{τ_{pix} (θ_{i})}{∥ J_{i} v_{i} ∥} .

(12)

Equivalently, define a perceptual weight

w_{i} = w (θ_{i})

(e.g.,

w (θ) = exp (- β θ)

) and set per-object iteration counts

I_{i} \propto w_{i}

or update rates

r_{i} = 1 / Δ t_{i}

that decrease with eccentricity [19].

(iv): Frame-budget scheduling.

Let

T

be tasks in the current frame (constraint batches, cloth solver iterations, background rigid updates, etc.). Each task

t \in T

has cost

c_{t}

and perceptual utility

u_{t}

(higher near fovea, for large screen-space motion, or for audible haptic coupling).

We allocate work by

max_{{x_{t}}} \sum_{t \in T} u_{t} x_{t} subject to \sum_{t \in T} c_{t} x_{t} \leq B, x_{t} \in N,

(13)

a knapsack-style problem. A greedy policy using ratio

u_{t} / c_{t}

is effective in real time; engine backends (e.g., DOTS, Chaos, Flex) realize this by prioritizing foreground constraints and deferring low-

u_{t}

tasks to worker queues [2].

Clock Alignment

Let

T_{sim} + T_{ren} + T_{queue} \leq B

. We reduce perceived latency by predicting the presented state to the next vsync time

t_{v}

:

\hat{x} (t_{v}) = x_{k} + v_{k} (t_{v} - t_{k}),

(14)

and by choosing

Δ t

via (9) so that jitter in

T_{sim}

does not spill over B.

2.6. Learning- and Gradient-Based Simulation

Differentiable physics layers embed simulation inside gradient-based pipelines for system identification, control, and inverse problems; Recent differentiable solvers that explicitly handle contact (and, when available, frictional effects) improve gradient quality for learning-based control and inverse problems [29]. Such advances make physically grounded optimization more practical within real-time VR pipelines.

Moreover, contact is often regularized (compliance/soft constraints) to ensure stable gradients [30,31]. Physics-Informed Neural Networks incorporate PDE residuals into the loss,

L_{PINN} = L_{data} + λ {∥ N (q, v, \dot{v}) - f ∥}^{2},

(15)

serving as surrogates for expensive substeps (e.g., local solves in reduced FEM) or for environment response fields in digital twins [14,32,33]. These hybrids can reduce substep counts while meeting comfort constraints when validated against baselines.

Deployment in real-time VR must additionally account for tail latency and out-of-distribution failures; practical guardrails and fallback strategies are discussed in Section 6.8.

2.7. Practical Guidance and Trade-Offs

Based on Table 2 and the preceding discussion, several practical guidelines emerge for real-time VR: structure-preserving (symplectic) schemes are generally preferable for long-horizon stability; adaptive substepping near impacts mitigates penetrations without shrinking the global step; per-frame constraint projection helps maintain feasibility under contacts; and perceptual gating reduces workload by lowering update rates for off-gaze or low-saliency bodies [18,19].

To ground the above dual-rate scheduling principles in a concrete VR medical scenario, we summarize a representative deployment in Box 1 (see Appendix A.2 for full details).

Box 1. Illustrative Deployment: Medical VR Surgical Simulator

Context: A neurosurgery training system requires haptic feedback at 1 kHz while rendering at 90 Hz on Quest 3 (Table A1).

Mathematical solution:

Physics (Section 2): Reduced-order FEM with 50 modal basis functions (Equation (1), $M \ddot{q} + C = f_{ext}$ ). Modal reduction: 10K tetrahedral elements → 50 modes, achieving 0.8 ms update at 90 Hz.
Haptics (Section 5.4): Passivity controller (Equations (66) and (67)) with adaptive damping maintains $E [k] \geq 0$ over 5-min procedures, preventing instability under variable rendering latency.
Rendering (Section 4): Per-vertex shader skinning from modal weights; budget allocation: 3 ms physics + 6 ms rendering + 2 ms margin $= 11$ ms at 90 Hz.

Measured outcomes (

n = 12

surgeons, 20 procedures):

Metric	Target	Achieved	Ref.
Haptic rate	1 kHz	950–1000 Hz	Section 5.4
Visual rate (P95)	90 Hz	88–90 Hz	Section 6
Force fidelity	<10% error	8.1% RMSE	Equation (68)

Key lesson: Modal reduction (Section 2.6) is essential—full FEM cannot meet dual refresh rates. Decoupled visual/haptic threads (Section 5.4) prevent mutual interference. See Appendix A.2 for full deployment details including design rationale and measured performance breakdown.

2.8. Summary of Section 2

Table 3 summarizes the core formulations, practical implications, and canonical references introduced in Section 2 under real-time VR constraints.

3. Spatial Representation and Geometric Transformations in VR

Classical spatial modeling—coordinate frame definitions, transform chains, and rotation operators—remains the backbone of VR systems. What has changed since 2020 is how these primitives are organized, differentiated, and learned in modern pipelines. This section reviews the minimal classical tools needed for VR, then focuses on recent trends that make these tools robust and learnable at scale.

3.1. Coordinate Frames and Transform Chains

This subsection recalls the minimal coordinate-frame conventions and transform chains used throughout VR pipelines. We focus on composing and inverting rigid transforms for head, hand, and world frames, and on bookkeeping that avoids drift in long chains.

Scene graphs in VR arrange objects by hierarchical frames; correctness depends on composing translation, rotation, and scale with numerically stable conventions. Practical recipes for transform order, handedness, and matrix conditioning are well-documented in graphics texts and VR geometry primers [5,34]. In practice, systems favor right-handed world frames with explicit documentation of pre-/post-multiplication and column-/row-major storage to avoid ambiguity during engine integration [34].

3.2. Rotations: From Euler Angles to Quaternions and Lie Groups

We review rotation representations with an eye to interactive stability: Euler angles are compact but suffer from singularities, while quaternions and Lie-group updates provide smooth composition and interpolation for head, hand, and camera control. We outline when each is preferable in real-time systems and prepare the notation used later.

Euler angles are compact but suffer from gimbal singularities; quaternions support smooth interpolation (SLERP) and numerically robust composition for head–pose, hand–pose, and camera control [6,7]. Surveys on 3D orientation (attitude) representations compare rotation matrices, quaternion/Euler-parameter forms, and exponential maps and frame rotations as elements of the Lie group

SO (3)

with minimal updates in its tangent space [35].

This Lie-group view is advantageous for VR tracking and control because it enforces orthogonality and unit norm by construction, avoids parameter singularities and redundancy, yields consistent Jacobians for EKF/SLAM/ICP, and reduces long–chain drift via manifold retraction [5].

Consequently, modern systems favor quaternion updates or Lie–algebra increments (

R \leftarrow R exp ({[ω]}_{\times} Δ t)

) within

SO (3)

and

SE (3)

to maintain stability at interactive rates [5,6].

3.3. Differentiable Transforms and Inverse Graphics

Modern VR stacks expose camera and object transforms to gradient-based optimization. This subsection formalizes differentiable parameterizations for extrinsics/intrinsics and shows how they plug into photometric objectives without breaking geometric constraints.

A major shift is differentiable transforms that permit gradients to flow through rendering and geometry. Early differentiable rendering frameworks established approximate yet useful Jacobians for camera and lighting parameters [13]. Neural implicit scene models (e.g., NeRF and its fast variants) embed pose and calibration variables into optimization, enabling view synthesis and on-the-fly relighting for VR telepresence and scene capture [10,12].

These pipelines rely on stable pose/warp parameterizations (quaternions, axis–angle) so that gradient-based solvers remain well-conditioned during joint optimization of geometry and appearance [10,13].

3.4. Learning on Manifolds and Non-Euclidean Domains

Many geometry spaces in VR are non-Euclidean (poses on SO(3)/SE(3), shapes on low-dimensional manifolds). We summarize practical manifold-aware tools used for tracking, retargeting, and avatar personalization, emphasizing what matters at interactive rates.

Geometric deep learning extends classical signal processing to meshes, graphs, and point clouds—data domains common in VR avatars, environments, and props. Convolutional operators on manifolds support curvature-aware feature extraction for registration and morphable modeling [36]; recent surveys generalize these ideas to groups, graphs, and gauges with principled handling of symmetry and coordinate choices [37]. In human-centric VR, kinematic consistency is enforced by putting learning inside a model-based loop (e.g., SMPL/SMPL-X fitting), improving 3D pose/shape reconstruction from monocular or sparse sensors [38,39].

3.5. Pose Graphs, SLAM, and Online Estimation for VR

We introduce pose-graph optimization for keeping content world-locked: nodes are 6-DoF poses and edges are relative measurements with covariances. We then give the standard SE(3) objective and explain how these estimators stabilize mixed-reality overlays in real time.

Interactive VR scenes benefit from real-time pose-graph optimization and differentiable tracking to keep virtual content world-locked (anchored in a fixed world frame so objects do not move with the user’s head), rather than head-locked (attached to the display frame). A pose graph is a sparse graph whose nodes are 6-DoF poses

T \in SE (3)

and whose edges encode relative-pose measurements with covariances.

A standard objective uses Lie-group residuals via the logarithm map:

min_{{T_{i} \in SE (3)}} \sum_{(i, j) \in E} ρ ({∥Σ_{i j}^{- 1 / 2} log (Z_{i j}^{- 1} T_{i}^{- 1} T_{j})∥}_{2}^{2}),

(16)

Equation (15) minimizes weighted relative-pose residuals over edges

(i, j) \in E

by mapping the group discrepancy

Z_{i j}^{- 1} T_{i}^{- 1} T_{j}

to a 6D tangent-space error via

log (\cdot)

, and scaling it by the uncertainty

Σ_{i j}^{- 1 / 2}

while robustifying with

ρ (\cdot)

. In VR, this is used to suppress drift and keep anchors world-locked by enforcing global consistency across many local motion constraints (e.g., loop closures and multi-sensor factors) in Section 3.5.

Recent deep SLAM systems couple learned features with geometric bundle adjustment and robust

SE (3)

updates, yielding accurate trajectories under fast motion and low texture [40]. These estimators integrate with differentiable rendering and neural scene fields (Section 3 and Section 4), enabling joint refinement of poses, intrinsics/extrinsics, scene fields, and rendering parameters at runtime.

3.6. Summary of Section 3

Table 4 summarizes the main spatial representations and differentiable image-formation tools from Section 3, emphasizing their practical impact on VR tracking, calibration, and avatar control.

4. Real-Time Rendering and Mathematical Models in VR

Rendering is the terminal stage of the VR pipeline that translates a mathematically defined scene into the visual stimuli perceived by the user. Unlike offline graphics, VR imposes strict constraints on latency, frame rate, and stability, since delays or visual artifacts can degrade presence or induce discomfort. This section organizes the mathematics behind real-time rendering and highlights how recent developments extend classical models [4,5].

4.1. Classical Illumination Models and Shading

This subsection summarizes core shading models and shows how the rendering integral is evaluated efficiently on the GPU.

4.1.1. Phong Shading

The Phong model decomposes shading into ambient, diffuse, and specular terms [3]:

I = k_{a} I_{a} + k_{d} (L \cdot N) I_{ℓ} + k_{s} {(R \cdot V)}^{n} I_{ℓ},

(17)

where

k_{a}, k_{d}, k_{s}

are material coefficients, L is the light direction, N the surface normal, R the reflection of L about N, and V the view direction.

4.1.2. Rendering Equation (Physically Based Rendering)

Physically based rendering (PBR) adopts energy-conserving microfacet BRDFs [41]:

L_{o} (p, ω_{o}) = \int_{Ω (n)} f_{r} (p, ω_{i}, ω_{o}) L_{i} (p, ω_{i}) max (0, n \cdot ω_{i}) d ω_{i} .

(18)

Here

L_{o}

is the outgoing radiance toward

ω_{o}

at point p with normal n,

L_{i}

is incident radiance from

ω_{i}

over the hemisphere

Ω (n)

,

f_{r}

is the BRDF, and

max (0, n \cdot ω_{i})

is the cosine term.

4.1.3. Monte Carlo Evaluation of the Hemisphere Integral

On the GPU, the integral is approximated by Monte Carlo with importance sampling [42]:

{\hat{L}}_{o} = \frac{1}{N} \sum_{k = 1}^{N} \frac{f_{r} (p, ω_{k}, ω_{o}) L_{i} (p, ω_{k}) max (0, n \cdot ω_{k})}{p (ω_{k})},

(19)

where samples

ω_{k} \sim p (ω)

are drawn from a proposal p matched to the integrand. Practical choices: cosine-weighted sampling for diffuse and sampling the microfacet normal distribution (e.g., GGX (Ground Glass × unknown)/Trowbridge–Reitz) for glossy lobes [43]. For scenes with both emitters and glossy BRDFs, multiple importance sampling blends light- and BRDF-based proposals to reduce variance [42].

4.1.4. Real-Time Approximations in Engines

Typical pipelines combine (i) next-event estimation for direct lighting, (ii) a few BRDF-importance samples for specular response, and (iii) denoising or temporal accumulation for residual noise. Preintegrated approximations (e.g., split-sum environment BRDF via small LUTs for geometry and Fresnel terms) amortize cost to near constant time in PBR engines [4,5].

4.2. Camera Models and Projection Matrices

This subsection formalizes the pinhole and off-axis (asymmetric) stereo cameras used in VR, derives per-eye projection from headset geometry, and gives the math for lens predistortion and timewarp.

4.2.1. Symmetric Perspective

With intrinsics

f_{x}, f_{y}

, and principal point

(c_{x}, c_{y})

,

K = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}], P_{sym} = [\begin{matrix} \frac{2 n}{w} & 0 & 0 & 0 \\ 0 & \frac{2 n}{h} & 0 & 0 \\ 0 & 0 & \frac{f + n}{n - f} & \frac{2 f n}{n - f} \\ 0 & 0 & - 1 & 0 \end{matrix}] .

(20)

where

n, f

are near/far distances and

w, h

are the near-plane width/height.

4.2.2. Off-Axis (Asymmetric) per-Eye Frustum

With near-plane bounds

l, r, b, t

in camera space,

P_{off} = [\begin{matrix} \frac{2 n}{r - l} & 0 & \frac{r + l}{r - l} & 0 \\ 0 & \frac{2 n}{t - b} & \frac{t + b}{t - b} & 0 \\ 0 & 0 & \frac{f + n}{n - f} & \frac{2 f n}{n - f} \\ 0 & 0 & - 1 & 0 \end{matrix}] .

(21)

Place the screen/lens image plane at distance d with physical rectangle

[L, R] \times [B, T]

(meters). Then

l = n \frac{L}{d}, r = n \frac{R}{d}, b = n \frac{B}{d}, t = n \frac{T}{d} .

(22)

If the per-eye horizontal optical center shift is

s_{e}

(IPD and lens center), with panel half-width

W / 2

, then

L = - \frac{W}{2} - s_{e}, R = \frac{W}{2} - s_{e}, s_{e} = \pm \frac{IPD}{2} - c_{x}^{(lens)},

(23)

where the sign + is for the right eye and − for the left eye.

4.2.3. Lens Distortion (Predistortion for VR)

Let

(x, y)

be normalized image-plane coordinates and

r^{2} = x^{2} + y^{2}

. The Brown–Conrady model writes [44,45]

\begin{matrix} x_{d} & = x (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + 2 p_{1} x y + p_{2} (r^{2} + 2 x^{2}), \\ y_{d} & = y (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + p_{1} (r^{2} + 2 y^{2}) + 2 p_{2} x y . \end{matrix}

(24)

VR runtimes render to a predistorted image using

(x, y) \mapsto (x_{d}, y_{d})

(or its inverse via a LUT) so that post-lens appearance is undistorted [5].

4.2.4. Timewarp (Orientation-Only and Depth-Aware)

Let K be intrinsics,

R_{pred}

the head rotation used for rendering, and

R_{now}

the most recent rotation. Orientation-only timewarp:

\tilde{u} \propto K (R_{now} R_{pred}^{- 1}) K^{- 1} u,

(25)

for homogeneous pixel

u = {(u, v, 1)}^{⊤}

;

\tilde{u}

samples the rendered color buffer. With per-pixel depth

D (u)

, reconstruct

X = D K^{- 1} u

, apply

Δ T = T_{now} T_{pred}^{- 1}

, and reproject:

u^{'} \propto K (Δ T X) .

(26)

Both warps reduce apparent motion-to-photon latency; use the same per-eye

P_{off}

in the original render and the warp.

4.3. GPU Pipelines and Shader Mathematics

This subsection isolates shader-side math that most directly affects VR image quality: perspective-correct interpolation, depth mapping/precision, normal-space transforms, and texture filtering with MIP selection.

4.3.1. Perspective-Correct Interpolation

Let a triangle have screen-space barycentrics

α, β, γ

with

α + β + γ = 1

and clip-space

w_{i}

at vertices

i \in {1, 2, 3}

. For an attribute a,

a_{persp} = \frac{α \frac{a_{1}}{w_{1}} + β \frac{a_{2}}{w_{2}} + γ \frac{a_{3}}{w_{3}}}{α \frac{1}{w_{1}} + β \frac{1}{w_{2}} + γ \frac{1}{w_{3}}},

(27)

which avoids the distortion of affine (screen-linear) interpolation [4,46].

4.3.2. Depth Mapping and Precision

With eye-space

z_{e} > 0

and near/far

n, f

, the projection in Section 4.2 gives

z_{ndc} = \frac{f + n}{n - f} + \frac{2 f n}{(n - f) z_{e}}, d = \frac{1}{2} z_{ndc} + \frac{1}{2} .

(28)

Since

\partial d / \partial z_{e} \propto 1 / z_{e}^{2}

, precision concentrates near n; pushing n outward reduces z-fighting in VR [4].

4.3.3. Normals: Inverse-Transpose and TBN

For linear object transform

M \in R^{3 \times 3}

,

N^{'} \propto {(M^{- 1})}^{⊤} N, N^{'} \leftarrow \frac{N^{'}}{∥ N^{'} ∥} .

(29)

With normal maps, decode tangent-space normal

n_{t}

and map via orthonormal basis

T, B, N

:

n_{world} = [T B N] n_{t} .

(30)

This eliminates shading skew under non-uniform scales and deformations [4].

4.3.4. Texture Filtering and MIP Selection

Given screen-space differentials

\partial u / \partial x

,

\partial u / \partial y

,

\partial v / \partial x

,

\partial v / \partial y

, GPUs choose

λ = {log}_{2} (max \{\sqrt{{(\frac{\partial u}{\partial x})}^{2} + {(\frac{\partial v}{\partial x})}^{2}}, \sqrt{{(\frac{\partial u}{\partial y})}^{2} + {(\frac{\partial v}{\partial y})}^{2}}\}),

(31)

then apply bilinear/trilinear sampling;anisotropic/Elliptical Weighted Average (EWA) filters approximate the projected footprints ellipse for grazing angles [46,47].

4.3.5. Linear Color and Fast Fresnel

Lighting is computed in linear space; encode to display with the standard RGB (sRGB) transfer function only at output. A common GPU Fresnel approximation is Schlick’s

F (cos θ) = F_{0} + (1 - F_{0}) {(1 - cos θ)}^{5},

(32)

which closely tracks conductor/dielectric behavior at a fraction of the cost [48,49].

4.4. Neural and Differentiable Rendering: Models and Math

Neural/differentiable rendering provides learnable scene representations and differentiable image formation suitable for VR.

4.4.1. Neural Radiance and Density Fields

A scene is represented by a network

F_{θ} (x, d) = (σ (x), c (x, d)),

(33)

where

x \in R^{3}

(meters),

d \in S^{2}

(unit),

σ \geq 0

has units

1 / m

(attenuation per unit length), and

c \in {[0, 1]}^{3}

is RGB;

θ

are MLP weights [10]. Range conventions:

σ

typically uses softplus to enforce nonnegativity; c uses sigmoid to stay in gamut.

4.4.2. Frequency Encodings (Positional/SIREN)

To mitigate spectral bias and resolve detail, inputs are lifted by positional encodings

γ (x) = [sin (2^{0} π x), cos (2^{0} π x), \dots, sin (2^{L - 1} π x), cos (2^{L - 1} π x)],

(34)

applied per-coordinate;

γ (d)

uses fewer bands due to smoother view dependence [10,50]. An alternative uses periodic activations (SIREN),

h_{ℓ + 1} = sin (ω_{0} W_{ℓ} h_{ℓ} + b_{ℓ}),

(35)

which supplies a Fourier-like basis intrinsically [51]. Bandwidth note:larger L or

ω_{0}

increases representable frequencies but raises sampling demands.

4.4.3. Differentiable Volume Rendering

Along a camera ray

r (t) = o + t d

,

C (r) = \int_{t_{n}}^{t_{f}} T (t) σ (r (t)) c (r (t), d) d t, T (t) = exp (- \int_{t_{n}}^{t} σ (r (s)) d s) .

(36)

T (t)

is transmittance (survival probability to depth t). With stratified samples

{t_{i}}

,

δ_{i} = t_{i + 1} - t_{i}

,

\hat{C} (r) = \sum_{i = 1}^{N} \underset{T_{i}}{\underset{︸}{exp (- \sum_{j = 1}^{i - 1} σ_{j} δ_{j})}} \underset{α_{i}}{\underset{︸}{(1 - exp (- σ_{i} δ_{i}))}} c_{i},

(37)

In Equation (36),

T_{i} = exp (- \sum_{j < i} σ_{j} δ_{j})

is transmittance and

α_{i} = 1 - exp (- σ_{i} δ_{i})

is opacity, so

w_{i} = T_{i} α_{i}

yields front-to-back compositing along the ray [10]. Intuitively, higher density

σ_{i}

increases opacity and attenuates contributions from later samples, which is the key mechanism enabling differentiable view synthesis in real-time VR pipelines.

Gradients:

\partial \hat{C} / \partial c_{i} = w_{i}

;

\partial α_{i} / \partial σ_{i} = δ_{i} e^{- σ_{i} δ_{i}}

; autodiff handles the dependence of later

T_{k}

(

k > i

) on earlier

σ_{i}

.

4.4.4. Differentiable Rasterization

(i) Forward model.

Vertices

X_{w} \in R^{3}

map to camera space

X_{c} = R X_{w} + t

and to pixels via the pinhole projection

u = f_{x} \frac{X_{c}}{Z_{c}} + c_{x}, v = f_{y} \frac{Y_{c}}{Z_{c}} + c_{y} .

(38)

Shading produces color

S (u, v; θ_{sh})

(BRDF/texture/lighting). The per-image loss is

L = \sum_{p} ρ (C (p) - C^{*} (p)),

(39)

with robust penalty

ρ (\cdot)

[13]. Purpose: Equations (38) and (39) define the forward pass and objective.

(ii) Projection Jacobians (pose). With

X_{c} = {(X_{c}, Y_{c}, Z_{c})}^{⊤}

, the image Jacobian is

J_{π} (X_{c}) = [\begin{matrix} \frac{f_{x}}{Z_{c}} & 0 & - f_{x} \frac{X_{c}}{Z_{c}^{2}} \\ 0 & \frac{f_{y}}{Z_{c}} & - f_{y} \frac{Y_{c}}{Z_{c}^{2}} \end{matrix}] .

(40)

For an SE(3) twist

δ ξ = (δ t, δ ϕ)

,

\frac{\partial X_{c}}{\partial ξ} = [I - {[X_{c}]}_{\times}] \in R^{3 \times 6}, \frac{\partial (u, v)}{\partial ξ} = J_{π} (X_{c}) \frac{\partial X_{c}}{\partial ξ} .

(41)

Manifold-consistent updates keep

R \in SO (3)

:

R \leftarrow R exp ({[δ ϕ]}_{\times}), {[δ ϕ]}_{\times} = [\begin{matrix} 0 & - δ ϕ_{z} & δ ϕ_{y} \\ δ ϕ_{z} & 0 & - δ ϕ_{x} \\ - δ ϕ_{y} & δ ϕ_{x} & 0 \end{matrix}] .

(42)

Purpose: Equations (40) and (42) give pose sensitivities and a stable pose update [13].

(iii) Shape and intrinsics derivatives. Linear shape bases

B_{k}

give

X_{w} (α) = X_{w}^{(0)} + \sum_{k} α_{k} B_{k}, \frac{\partial (u, v)}{\partial α_{k}} = J_{π} (R X_{w} + t) R B_{k} .

(43)

Intrinsics gradients are

\frac{\partial u}{\partial f_{x}} = \frac{X_{c}}{Z_{c}}, \frac{\partial v}{\partial f_{y}} = \frac{Y_{c}}{Z_{c}}, \frac{\partial u}{\partial c_{x}} = 1, \frac{\partial v}{\partial c_{y}} = 1 .

(44)

Purpose: Equations (43) and (44) provide shape/camera sensitivities.

(iv) Smooth visibility/occlusion. Soft coverage (edge SDF

s_{i} (p)

) and soft z-buffer give nonzero gradients near silhouettes:

w_{i} (p) = σ (- κ s_{i} (p)), σ (z) = \frac{1}{1 + e^{- z}}, κ > 0,

(45)

ω_{i} (p) = \frac{exp (- τ z_{i})}{\sum_{j} exp (- τ z_{j})}, τ > 0 .

(46)

Blended color:

C (p) = \sum_{i} (w_{i} (p) ω_{i} (p)) S_{i} (p; θ_{sh}) .

(47)

Purpose: Equations (45)–(47) relax discrete visibility for differentiability (cf. OpenDR [13]).

(v) Shading derivatives (chain rule). For texture sampling with UVs

(u_{t}, v_{t})

,

\frac{\partial S}{\partial X_{c}} = \frac{\partial S}{\partial (u_{t}, v_{t})} \frac{\partial (u_{t}, v_{t})}{\partial (u, v)} \frac{\partial (u, v)}{\partial X_{c}},

(48)

and analogously for

ξ, α_{k}, f_{x}, f_{y}, c_{x}, c_{y}

. Note: Include TBN for normal maps and any material channels in

\partial S / \partial (\cdot)

.

(vii) Loss-gradient assembly. With residual

r (p) = C (p) - C^{*} (p)

and

ρ^{'} (r)

,

\frac{\partial L}{\partial θ} = \sum_{p} ρ^{'} (r (p)) (\sum_{i} \frac{\partial C}{\partial S_{i}} \frac{\partial S_{i}}{\partial θ} + \sum_{i} \frac{\partial C}{\partial u_{i}} \frac{\partial u_{i}}{\partial θ}),

(49)

for

θ \in {ξ, α_{k}, f_{x}, f_{y}, c_{x}, c_{y}, θ_{sh}}

. Note:

\frac{\partial C}{\partial u_{i}}

accumulates coverage and depth terms from Equations (45) and (46).

(viii) Stability practices (concise checklist).

Coarse $\to$ fine schedules: solve pose/intrinsics first, then shape/appearance.
Clamp softness: choose $κ, τ$ to avoid vanishing/exploding gradients at edges.
Use robust penalties $ρ$ and multi-view priors for regularization.
For avatars, pair with learned face/body models for stronger priors [38,39,52].

4.4.5. Training Objective and Regularization

Given calibrated multi-view targets

C^{*} (r)

,

min_{θ} \sum_{r \in R} ∥ {\hat{C}}_{θ} (r) - C^{*} (r) ∥_{2}^{2} + λ L_{reg} (θ),

(50)

where common regularizers include sparsity on

σ

and distortion/smoothness on accumulated density to reduce floaters [10]. Differentiability: the loss backpropagates through sampling, transmittance, and network layers by the chain rule.

4.4.6. Acceleration for Interactive VR

To approach dual-eye 90/120 Hz, systems (i) factorize fields into many tiny MLPs for parallel evaluation [11], (ii) amortize multi-bounce lighting via neural radiance caching inside real-time path tracing [12], and (iii) reduce memory pressure in large scenes through streamable, memory-efficient radiance field representations that support partitioning and on-demand streaming [53]. Budgeting tip: couple per-eye foveation with coarser sampling/offline caches in the periphery to respect VR frame budgets. In practice, learning-based accelerations should be paired with runtime monitoring and classical fallbacks to remain robust under latency spikes and out-of-distribution content (Section 6.8).

4.5. Hybrid and Perceptual Rendering for VR

VR must sustain high frame rates with limited budgets. Perceptual strategies modulate computation where the human visual system is most sensitive. Foveated rendering concentrates resolution and shading near the gaze point, guided by eye tracking; recent surveys organize algorithms and hardware support [16,17,19]. Temporal reprojection/upsampling and dynamic level-of-detail further trade accuracy for stability.

Hybrid renderers mix rasterization (for primary visibility and nearby geometry) with selective ray/path tracing (for glossy reflections, soft shadows, GI), allocating work to regions that most impact perceived quality. Perceptual error metrics and attention-aware policies help decide when to refine shading or geometry [18,22]. Together, these trends show a shift from purely deterministic shading to hybrid, learned, and perceptually guided rendering, tailored to the stringent constraints of immersive VR.

4.6. Summary of Section 4

Section 4 reviewed how classical shading, camera and projection models, GPU math, and newer neural or differentiable methods are combined into practical VR rendering pipelines. Table 5 summarizes the core ideas and their impact on VR systems.

Practical Takeaways

Do all shading computations in linear space, and apply the sRGB transfer function only once at the final color write.
Use the same per-eye off-axis projection in both the main render and the timewarp/spacewarp paths to avoid extra distortion.
Push the near plane outward as far as content allows, and transform normals with the inverse-transpose of the model matrix to reduce depth and shading artifacts.
In NeRF-style or neural field training, make physical units explicit (for example, treat density as “per meter”) and monitor the accumulated opacity weights that actually scale image residuals and gradients.
For 90/120 Hz stereo, allocate resolution and shading budget to the fovea, and rely on cheaper sampling, models, or caches in the periphery.

5. Interaction Modeling and User Interface Dynamics

User interaction in VR is a closed-loop process spanning sensing, state estimation, kinematic/physics modeling, rendering, and human motor response. This section consolidates foundational models for pose and manipulation, then organizes advance during the recent five years (2021–2025) that fuse learning-based perception, differentiable optimization, and perceptual constraints into the interaction loop.

5.1. Foundations: Sensing, Pose, and Kinematic Mapping

This subsection formalizes the sensing→pose pipeline and the kinematic mapping used to drive hands, tools, and avatars in VR under real-time constraints. Contemporary VR interaction follows the classic 3D UI taxonomy (selection, manipulation, travel, system control) [15]. Rigid-body poses evolve on

SE (3)

; orientations are commonly represented by unit quaternions for numerically stable interpolation and composition [7,34,35]. End-effector goals (e.g., a hand or tool tip) are mapped to joint angles via FK/IK with joint limits and comfort costs [8].

Foundational work on modeling and controlling virtual humans provides enduring principles for avatar kinematics, posture control, and animation pipelines, complementing the FK/IK formulations used in VR embodiment [54].

Social VR further couples embodiment with multisensory feedback. For example, even simple collision haptics during human–virtual human interaction can measurably influence presence and user experience, motivating careful feedback design alongside stability constraints [55]. In practice, low-latency controllers rely on Jacobian-based updates.

(i): Problem setup and notation. Given a task-space target $x^{⋆} \in R^{k}$ and forward kinematics $f (θ) \in R^{k}$ with joint angles $θ \in R^{m}$ , define the task error

$e = x^{⋆} - f (θ) .$

(51)

Here $k \in {3, 6}$ denotes position/pose, and m is the number of degrees of freedom.
(ii): Geometric Jacobian. With the geometric Jacobian $J = \partial f / \partial θ \in R^{k \times m}$ , small variations satisfy the linear approximation

$δ x \approx J δ θ .$

(52)

In real time, J is updated numerically or from analytic expressions.
(iii): DLS (damped least-squares) update. To remain stable near singularities, use the damped update

$Δ θ = η J^{⊤} {(J J^{⊤} + λ^{2} I)}^{- 1} e,$

(53)

Here the damping term $λ^{2} I$ regularizes the inverse near singular configurations by keeping $(J J^{⊤} + λ^{2} I)$ well-conditioned, so $Δ θ$ remains bounded even when J is ill-conditioned. In VR hand/avatar control, this trades accuracy for stability under tight frame budgets, and $λ$ is typically tuned to avoid jitter while maintaining responsive convergence.
(iv): Task weighting and conditioning. If task components have different priorities, introduce $W ≻ 0$ :

$Δ θ = η J^{⊤} {(J J^{⊤} + λ^{2} W)}^{- 1} e,$

(54)

which improves conditioning and allows per-axis tolerances.
(v): Secondary objectives via null space. To address posture comfort or joint-limit avoidance without disturbing the primary task,

$Δ θ \leftarrow Δ θ + (I - J^{+} J) (- \nabla_{θ} C (θ)),$

(55)

where $C (θ)$ is a secondary cost and $J^{+}$ the pseudoinverse.
(vi): Lightweight alternatives. When matrix solves are too expensive, use the Jacobian-transpose heuristic

$Δ θ = η J^{⊤} e$

(56)

or cyclic coordinate descent (CCD). These are approximations but very fast.
(vii): Closing the loop (tracking and full-body drive). Visual–inertial (inside-out) tracking estimates head/hand 6DoF poses; SLAM back-ends stabilize world anchors over long interactions [40]. For detailed hands/faces, learned regressors provide strong priors for retargeting and animation [39]. Empirically, controller input can outperform vision-only hand tracking on some tasks, motivating context-aware switching policies [56].

5.2. Hand, Body, and Object Interaction

Direct manipulation (grasp, pinch, poke) hinges on accurate articulated tracking with temporal smoothing and constraint projection to suppress jitter near contact [57]. Whole-body interaction blends controller priors, optical cues, and IK to maintain reachability and avoid self-collision in tight spaces [58]. Robust contact and proximity tests rely on efficient collision detection, broad-phase culling, and convex distance queries (e.g., GJK), which also inform haptic proxies and constraint solvers [26,28].

A growing trend is controller-less tracking from headset-mounted egocentric cameras, where self-occlusion and motion blur become dominant failure modes. Large-scale egocentric datasets enable such body/hand tracking under realistic occlusions, strengthening avatar embodiment pipelines [59].

5.3. Constraint-Based Manipulation and Differentiable Calibration

This subsection formalizes contact handling for VR manipulation at real-time rates using three complementary views–impulse/velocity, convex QP, and position-level projection–and then shows how differentiable sensing/graphics enable online calibration.

5.3.1. Impulse-/Velocity-Level Contact Resolution

Let

q, v

be configuration and velocity,

M ≻ 0

the generalized mass, J the contact Jacobian, and

f_{ext}

external forces. Over a step

Δ t

,

M (v_{t + 1} - v_{t}) = Δ t f_{ext} + J^{⊤} λ,

(57)

where

λ

are contact impulses. With normal gap

g_{n} (q) \geq 0

(signed distance), nonpenetration is enforced by complementarity

g_{n} (q_{t + 1}) \geq 0, λ_{n} \geq 0, λ_{n}^{⊤} g_{n} (q_{t + 1}) = 0,

(58)

and tangential (Coulomb) friction by a cone

∥ λ_{t} ∥ \leq μ λ_{n}

or a friction pyramid. A velocity-level variant uses

J_{n} v_{t + 1} + b_{n} \geq 0, λ_{n} \geq 0, λ_{n}^{⊤} (J_{n} v_{t + 1} + b_{n}) = 0,

(59)

with

b_{n}

collecting Baumgarte/restition terms. These yield a mixed complementarity problem (MCP) in

λ

(or in

(v_{t + 1}, λ)

) [2,21].

5.3.2. Convex QP Time Stepping

A common real-time scheme solves a convex projection in velocity space:

min_{Δ v, λ} \frac{1}{2} {∥ Δ v - Δ v_{free} ∥}_{M}^{2} subject to J_{n} Δ v + b_{n} \geq 0, friction constraints,

(60)

where

Δ v_{free} = M^{- 1} Δ t f_{ext}

. Stationarity gives the KKT condition

M Δ v - J^{⊤} λ = 0

; together with the inequalities and the complementarity

λ_{n}^{⊤} (J_{n} Δ v + b_{n}) = 0,

(61)

(60) is equivalent to an LCP when a friction pyramid is used, or to a SOCP with a circular cone

∥ λ_{t} ∥ \leq μ λ_{n}

. This framing unifies projected Gauss–Seidel/cone-projection and interior-point solvers under one objective [2].

5.3.3. Geometric (Position-Level) Projection

Position-level methods correct constraint violations directly:

min_{Δ q} \frac{1}{2} {∥ Δ q ∥}^{2} subject to C (q + Δ q) = 0,

(62)

where

C (q) = 0

encodes joints/contacts. Linearizing,

Δ q \approx - J_{C}^{⊤} {(J_{C} J_{C}^{⊤})}^{- 1} C (q),

(63)

with

J_{C} = \partial C / \partial q

. In practice, constraints are processed iteratively (Gauss–Seidel) within the frame budget; small compliance or damping in C regularizes contact jitter [2].

5.3.4. Differentiable Calibration and Identification

Online calibration estimates unknown frames/offsets (e.g., controller/tool pose offsets or contact frames) by minimizing visual reprojection or photometric residuals through differentiable graphics/kinematics. With calibrated intrinsics K, 3D features

X_{i}

, image points

u_{i}

, and pose

T \in SE (3)

,

min_{T, ϕ} \sum_{i} ∥ π (K T (ϕ) X_{i}) - u_{i} ∥_{2}^{2} + λ L_{photo},

(64)

where

ϕ

parameterizes the pose increment and

π

is projection. Updates are applied on the manifold,

T \leftarrow T Exp (\hat{ϕ}),

(65)

which preserves group structure and improves stability [13]. For contact-rich skills, differentiable physics adds residuals on contact consistency (e.g., nonpenetration, frictional stick/slide), enabling recovery of latent states/forces from video or tactile surrogates and improving data efficiency in policy learning [30].

5.4. Haptics and Force-Feedback for Action–Perception Coupling

Haptic channels tighten the loop by reflecting contact and stiffness cues. Lightweight devices and exoskeletons typically use PID/impedance targets with passivity-minded gains to remain stable under network and rendering latency; classic PHANToM-style designs and control envelopes are well documented [60,61]. A canonical proxy-based approach is the constraint-based god-object formulation, which enforces non-penetration while maintaining stable force feedback [62]. In soft-tissue or tool-mediated tasks, reduced-order FEM or proxy models support high-rate (kHz) force updates while the visual path runs at display refresh, sustaining stability and realism [63].

Discrete-Time

Passivity (PO/PC) Let

f_{h} [k], v_{h} [k]

be device-port force/velocity at sample k and

T_{s}

the servo period. A passivity observer tracks energy

E [k + 1] = E [k] + T_{s} f_{h} {[k]}^{⊤} v_{h} [k] - T_{s} f_{v} {[k]}^{⊤} v_{h} [k], E [0] \geq 0,

(66)

where

f_{v}

is the virtual environment force. If

E [k + 1] < 0

(incipient non-passivity), a passivity controller injects adaptive damping

f_{v} [k] \leftarrow f_{v} [k] + α [k] v_{h} [k], α [k] = \frac{- E [k + 1]}{T_{s} {∥ v_{h} [k] ∥}^{2} + ε},

(67)

Equation (65) monitors the net energy exchanged at the device port; if

E [k + 1] < 0

, the coupled loop is injecting energy (non-passive), which can trigger unstable oscillations under delay or stiff virtual contacts. Equation (66) restores passivity by adding the minimum adaptive damping

α [k] v_{h} [k]

needed to dissipate the excess energy, making kHz haptic updates stable while visuals run at lower refresh rates.

Which guarantees

E [k + 2] \geq 0

and restores passivity while minimally perturbing the nominal interaction. Energy-tank variants enforce

E [k] \in [0, E_{\max}]

to budget aggressive transients (tool impacts) before throttling feedback. These controllers are compatible with the proxy/FEM pipelines above and preserve stability across rendering-latency fluctuations [60,61,63]. Comparative studies further show that visuo-haptic versus visual-only feedback can systematically change presence, underscoring that modeling priorities depend on the dominant feedback channel [64].

5.5. Perception-Aware Interaction: Foveation, Workload, and Cybersickness

Perceptual limits bound what must be simulated for believable action. Eye-tracked foveation concentrates shading and sampling near the gaze point, freeing budget for physics/IK during action onsets; algorithmic trade-offs and system design are summarized in recent surveys [16,17,19]. Perceptual LOD throttles distant/occluded dynamics without degrading agency. For comfort, ML-based cybersickness predictors leverage demographics and behavioral/physiological signals; Personalized cybersickness prediction models can improve forecasting accuracy at the individual level, enabling comfort-aware adaptation policies in real time [65]. These models can drive online mitigation (e.g., locomotion gains, vignette strength) during interaction [66,67,68].

5.6. Learning-Augmented Interfaces

Interfaces increasingly fuse geometric cores with learned scene/function representations. NeRF-style encodings assist occlusion-aware queries and grasp target prediction; factorized fields accelerate spatial lookups for interaction [10,11]. Graph/mesh operators enable spatial reasoning over layouts (affordances, collision margins) in shared spaces [36,37]. End-to-end policies co-train perception and control through differentiable kinematics/rendering losses for low-drift hand–object tasks [39]. Personalized, animatable avatars reduce retargeting error and improve ownership in multi-user scenes [31,52].

Emerging LLM-driven agents in social VR suggest a path toward more natural dialogue and interactive behavior; however, they introduce new constraints on latency, safety, and evaluation [69]. These trends motivate future benchmarking that jointly measures interaction quality and real-time system performance.

Despite their promise, learning-augmented interfaces should be paired with deployment guardrails (latency margins, validation, fallbacks) to remain robust in diverse real-time VR settings (Section 6.8).

5.7. Robustness, Latency, and System Practices

Interaction quality depends on stable 6DoF state, contact resolution, and motion-to-photon latency. Practical engines prioritize foreground constraints, defer background dynamics, and align sensing/physics/render clocks to minimize jitter [15]. Physics choices (semi-implicit/symplectic steps for energy behavior, adaptive

Δ t

for contact bursts) directly influence grasp stability and selection accuracy [2].

5.8. Summary of Section 5

Section 5 collected the main ingredients for physically grounded, yet comfortable VR interaction: manifold-aware pose updates, IK for arm and hand control, contact and constraint handling, online calibration, and perceptual allocation of computation and haptics. The main design rules are:

(i): Maintain poses on $SE (3)$ with quaternion/Lie updates; fuse visual–inertial sensing to stabilize world-locked anchors.
(ii): Map end-effector goals with DLS/transpose/CCD under joint limits; use null-space terms to encourage comfort and good posture.
(iii): Resolve contacts via impulse/QP or position-level projection, iterating within the frame budget and regularizing to suppress jitter.
(iv): Calibrate online through differentiable kinematics and rendering with manifold-consistent pose updates.
(v): Allocate work perceptually (eye-tracked foveation, perceptual LOD); keep haptics passive under latency using passivity observer/controller (PO/PC) style damping.

As a compact reference for reporting and reproducible evaluation, Table 6 summarizes the interaction metrics used throughout Section 5.

Reporting Metrics

Notation

{\hat{x}}_{i}

(prediction),

x_{i}^{*}

(reference);

∠ (R) = arccos ((tr R - 1) / 2)

;

g_{+} (q) = max (g (q), 0)

; ⊙ denotes Hadamard product;

π (\cdot)

is the projection with intrinsics K;

∥ Δ T ∥ = {∥ log (Δ T) ∥}_{2}

is the geodesic norm on

SE (3)

.

EE position RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} ∥ {\hat{x}}_{i} - x_{i}^{*} ∥_{2}^{2}}

(68)

EE orientation error = ∠ ({\hat{R}}_{i} R_{i}^{* ⊤})

(69)

Contact infeasibility = ∥ g_{+} (q) ∥_{\infty}

(70)

Complementarity residual = ∥ λ_{n} ⊙ g_{n} (q) ∥_{\infty}

(71)

Reprojection RMSE = \sqrt{\frac{1}{| P |} \sum_{p \in P} ∥ u_{p} - π (K {\hat{T}}_{p} X_{p}) ∥_{2}^{2}}

(72)

Calibration drift = ∥ Δ T ∥ = ∥ log ({\hat{T}}_{t}^{- 1} T_{t}) ∥_{2}

(73)

Control latency = t_{pose out} - t_{sensor in}

(74)

Slip rate = \frac{# {unintended releases}}{# {grasps}}

(75)

\begin{matrix} E [k + 1] & = E [k] + T_{s} f_{h} {[k]}^{⊤} v_{h} [k] - T_{s} f_{v} {[k]}^{⊤} v_{h} [k], \\ Passivity margin & = min_{0 \leq j \leq k} E [j] (\geq 0 desired) . \end{matrix}

(76)

Enforce passivity by : f_{v} [k] \leftarrow f_{v} [k] + α [k] v_{h} [k], α [k] = \frac{- min (0, E [k + 1])}{T_{s} {∥ v_{h} [k] ∥}^{2} + ε} .

(77)

6. Future Direction of Mathematical Modeling in VR

This section charts near-term (1–2 year) opportunities for VR that are (i) geometry-aware–numerics and estimation on appropriate manifolds with constraint/contact handling [2,5,21], (ii) perception-aware–budgets and error bounded by human sensitivity via foveation and comfort models [16,17,19,22], and (iii) learning-augmented–neural modules used where they offer the largest cost–benefit while preserving structure [10,11,12,13,14,40]. Subsections detail: physics and time stepping (Section 6.1), spatial estimation/SLAM (Section 6.2), neural/differentiable rendering (Section 6.3), avatars and IK (Section 6.4), haptics and collision (Section 6.5), and cybersickness-aware scheduling (Section 6.6). Common metrics/targets and a concise reporting template appear in Section 6.7.

6.1. Learning-Augmented Physics and Stable Time Stepping

6.1.1. Structure-Preserving Time Stepping

Let

(q, p)

be generalized coordinates and momenta with Hamiltonian

H (q, p) = T (p) + V (q)

and holonomic constraints

c (q) = 0

with Jacobian

G = \partial c / \partial q

. A symplectic Euler step with multipliers

λ

is

\begin{matrix} p_{k + 1} & = p_{k} - Δ t \nabla V (q_{k}) - G {(q_{k})}^{⊤} λ_{k}, \\ q_{k + 1} & = q_{k} + Δ t M^{- 1} p_{k + 1}, c (q_{k + 1}) = 0, \end{matrix}

(78)

which preserves a discrete symplectic form and yields bounded long-horizon energy drift when constraint residuals are controlled [2,20]. Here

M ≻ 0

is the mass matrix and

λ_{k}

enforces

c (q_{k + 1}) = 0

.

6.1.2. Learning Insertion Without Breaking Structure

A learned module predicts parameters

ϕ_{θ}

and a conservative correction

U_{θ} (q)

so that

F_{θ} (q) = - \nabla U_{θ} (q)

, replacing

V \leftarrow V + U_{θ}

while keeping the step symplectic. If only a black-box force

{\tilde{F}}_{θ}

is available, project it to a conservative field via a Poisson solve

ψ_{θ} = arg min_{ψ} \int_{Ω} {∥ \nabla ψ (q) - {\tilde{F}}_{θ} (q) ∥}^{2} d q, F_{θ} \leftarrow \nabla ψ_{θ},

(79)

so the update still derives from a potential [14,32,33].

6.1.3. Nonsmooth Contact and Friction

Normal complementarity and frictional cones (or pyramids) act on impulses

λ

:

g_{n} (q_{k + 1}) \geq 0, λ_{n} \geq 0, λ_{n}^{⊤} g_{n} (q_{k + 1}) = 0, ∥ λ_{t} ∥ \leq μ λ_{n},

(80)

yielding an MCP/LCP/SOCP depending on the friction model [2,21]. Here

g_{n}

is signed gap,

μ

the friction coefficient, and

(λ_{n}, λ_{t})

the normal/tangential impulses.

6.1.4. Stability and What to Report

If

\nabla^{2} (V + U_{θ}) ⪯ L I

and constraints are projected each step, discrete Grönwall bounds give

| H (q_{k}, p_{k}) - H (q_{0}, p_{0}) | \leq C Δ t \sum_{j < k} (∥ c (q_{j}) ∥ + friction viol .),

(81)

so energy drift is controlled by residuals [2,20]. Report energy drift (%), max nonpenetration/equality violation, and fallback/substep rate as in Section 6.7 [18,22].

6.2. Spatial Estimation on Manifolds and Differentiable SLAM

6.2.1. Factor-Graph Objective on $SE (3)$

Given relative pose measurements

Z_{i j}

and per-pixel matches

(u_{i p}, X_{p})

, estimate poses

T_{i} \in SE (3)

, depths/structure D by

min_{{T_{i}}, D} \sum_{(i, j)} ∥ Σ_{i j}^{- \frac{1}{2}} log (Z_{i j}^{- 1} T_{i}^{- 1} T_{j}) ∥_{2}^{2} + \sum_{(i, p)} ρ (∥ u_{i p} - π (K T_{i} X_{p} (D)) ∥_{2}^{2}),

(82)

where

log (\cdot)

is the Lie log on

SE (3)

and

ρ

a robustifier. This unifies geometric (between-frames) and photometric (reprojection) terms [40].

6.2.2. On-Manifold Updates and Gauge

Optimize with

T_{i} \leftarrow T_{i} exp ({\hat{ξ}}_{i})

to remain on

SE (3)

. Fix one pose (and global scale for monocular) to remove gauge freedoms; otherwise the normal equations are rank-deficient [5,40].

6.2.3. Metrics and Alignment Choices

Use ATE/RPE and per-frame reprojection RMSE with alignment stated explicitly (Sim(3) for monocular; SE(3) for stereo/RGB-D), following Section 6.7 [5,40].

6.3. Neural and Differentiable Rendering for Real-Time XR

6.3.1. Cached Estimators and Perceptual Budgeting

For per-pixel radiance, combine a fresh unbiased estimate

{\tilde{L}}^{(t)}

and a cached running estimate

{\hat{L}}^{(t - 1)}

by

{\hat{L}}^{(t)} = α {\tilde{L}}^{(t)} + (1 - α) {\hat{L}}^{(t - 1)},

(83)

whose MSE decomposes into bias

\propto {(1 - α)}^{2}

and variance

\propto α^{2} / N

. Choose

(α, N)

by minimizing samples under a CSF-derived weight

w (θ, f)

such that

MSE \cdot w \leq ε_{JND}

[16,19]. Neural radiance caching and tiny-MLP factorization supply

{\tilde{L}}^{(t)}

at VR rates [10,11,12].

6.3.2. Foveated Sampling as a Constrained Program

Let

Λ (x)

be spp and

w (x)

the gaze-weight. Solve

min_{Λ \in N} \sum_{x} Λ (x) subject to MSE (Λ (x)) w (x) \leq ε,

(84)

to allocate samples where the HVS is most sensitive [16,19].

6.4. Avatars, IK, and Differentiable Bodies

6.4.1. Shape–Pose Estimation with Physics Priors

Estimate shape

β

, pose

θ

, and global T by

min_{β, θ, T} \sum_{i} ∥ u_{i} - π (K T X_{i} (β, θ)) ∥_{2}^{2} + λ_{contact} {∥ g {(q (θ))}_{+} ∥}_{2}^{2} + λ_{prior} R (β, θ),

(85)

where

g (\cdot)

penalizes interpenetration and R encodes learned priors. This couples differentiable IK/rendering for stable personalization [38,39,52,70]. Here

X_{i}

is the skinned vertex,

q (θ)

joint angles, and

u_{i}

image points.

6.4.2. Identifiability and Sensing

Local identifiability requires the stacked Jacobian w.r.t.

(β, θ, T)

to have full column rank after gauge fixing; with HMD+controllers, regularization via R and contact terms reduces ambiguity [39,52].

6.5. Haptics, Collision, and Contact Modeling

6.5.1. Discrete Passivity with PO/PC

With sample period

T_{s}

and port variables

(f_{h} [k], v_{h} [k])

, passivity requires

\sum_{k = 0}^{N - 1} T_{s} f_{h} {[k]}^{⊤} v_{h} [k] \geq - E_{0} .

(86)

A passivity controller injects adaptive damping

f_{v} [k] \leftarrow f_{v} [k] + α [k] v_{h} [k], α [k] = \frac{max (0, - E [k + 1])}{T_{s} {∥ v_{h} [k] ∥}^{2} + ε},

(87)

to restore

E [k] \geq 0

while minimally altering the virtual interaction [60,61,63]. Here

f_{v}

is the virtual environment force and

E [k]

the observed stored energy.

6.5.2. Deterministic Collision Bounds

For convex

A, B

, GJK/EPA yields

d (A, B) = {min}_{a \in A, b \in B} {∥ a - b ∥}_{2}

with BVH LOD. Guarantee

| d - \tilde{d} | \leq δ

by bounding node radii, giving predictable haptic-step timing and stable force updates [26,28].

6.6. Cybersickness-Aware Scheduling and Perceptual Control

Predict-and-Allocate (MPC View)

Let a learned proxy

{\hat{s}}_{t} = h_{φ} (x_{t})

predict instantaneous sickness from signals

x_{t}

(gaze, motion, frame-time, head kinematics). Allocate budgets

b_{t} = (b_{t}^{render}, b_{t}^{phys}, b_{t}^{anim})

by

min_{b_{t} \geq 0} J_{t} = α {\hat{s}}_{t} + β {MTP}_{t}^{2} + γ {miss}_{t}, subject to b_{t}^{⊤} 1 \leq B .

(88)

with horizon-H MPC and device budget B. Train

h_{φ}

on SSQ proxies and behavioral/physiological features [66,67,68] and gate foveation/LOD as in [16,17]. Here

{MTP}_{t}

is motion-to-photon, and

{miss}_{t}

the missed-frame ratio.

6.7. Summary of Section 6

6.7.1. Guiding Principles

(i): Preserve geometric structure in integration and estimation (symplectic/semi-implicit updates; on-manifold pose updates).
(ii): Use learned priors/modules to cut cost, while keeping constraints/guarantees in the outer loop.
(iii): Balance quality and performance with perceptual limits (eye-tracked/foveated budgets; just-noticeable thresholds).

To facilitate reproducible evaluation and cross-paper comparison, we summarize a concise set of common VR metrics in Table 7.

6.7.2. Definitions and Formulas

Notation

∠ (R) = arccos ((tr R - 1) / 2)

;

g_{+} (q) = max (g (q), 0)

;

π (\cdot)

is the projection with intrinsics K;

RMSE (\cdot)

denotes root-mean-square error;

S \in SE (3)

or

Sim (3)

aligns estimates to ground truth.

Physics

Drift (%) = 100 \frac{| E (T) - E (0) |}{max (E (0), ε)}, E (t) = T (t) + V (t) .

(89)

∥ g_{+} (q) ∥_{\infty} {(non - penetration), ∥ g (q) ∥}_{\infty} (equality), {∥ J (q) v ∥}_{\infty} (velocity - level) .

(90)

Spatial Estimation

ATE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} ∥ x_{i} - (S {\hat{x}}_{i}) ∥_{2}^{2}} .

(91)

R_{i} = {(T_{i}^{- 1} T_{i + Δ})}^{- 1} ({\hat{T}}_{i}^{- 1} {\hat{T}}_{i + Δ}), {RPE}_{trans} = RMSE (trans (R_{i})), {RPE}_{rot} = RMSE (∠ (R_{i})) .

(92)

{RMSE}_{repr} = \sqrt{\frac{1}{| P |} \sum_{p \in P} ∥ u_{p} - π (K {\hat{T}}_{p} X_{p}) ∥_{2}^{2}} .

(93)

Rendering and Systems

Missed - frame ratio = \frac{# {t_{frame} > budget}}{# {frames}} .

(94)

Report MTP as median and 95th percentile (ms) from sensor timestamp to photons- on-panel.

Comfort

Time - to - discomfort = min {t ∣ {SSQ}_{Total} (t) \geq τ} .

(95)

Dropout rate = \frac{# {participants who stop early}}{# {participants}} .

(96)

6.7.3. Reporting Template (Concise)

(i): Task/dataset and total budget (per-eye resolution/refresh, frame budget).
(ii): Metrics from Table 7, with window sizes and alignment choices.
(iii): Foveation policy and salient system settings (e.g., reprojection, timewarp).
(iv): Statistical summaries (median/p90/p99 with n and confidence intervals).
(v): Ablations isolating learned vs. structure-preserving components.

6.8. Learning-Based Components: Practical Limitations Under Real-Time VR Constraints

6.8.1. Computational Overhead and Latency Variability

Learning-based components can reduce modeling burden or enable new capabilities, but they also introduce inference overhead and latency variability that are difficult to reconcile with strict VR frame budgets. Neural rendering and learned surrogates typically add extra stages (network evaluation, feature fetching, cache management) whose costs are not always amortized across frames, especially when scene content, view-dependent effects, or contact states change rapidly. Even when the median cost appears acceptable, occasional spikes caused by content-dependent workloads, thermal throttling, driver/runtime effects, or concurrent processes can trigger missed deadlines and visible judder. This is particularly problematic in stereo rendering, where small scheduling slips can accumulate into motion-to-photon latency increases and discomfort. Consequently, evaluation should emphasize tail behavior (e.g., percentile frame times) and include explicit fallback policies that preserve interaction continuity when learned components exceed budget.

6.8.2. Generalization and Out-of-Distribution Failures

A second limitation is robustness under distribution shift. Models trained on curated datasets can fail under novel lighting, motion patterns, occlusions, sensor noise, or uncommon user morphologies. In tracking and world-locking pipelines, such failures manifest as drift, sudden pose jumps, or loss of tracking, which directly undermines spatial stability and can produce strong discomfort. Similarly, learned avatar or body/hand models may degrade for users whose proportions or motion styles are underrepresented in training data, increasing retargeting error and reducing embodiment. Learned physics surrogates face comparable risks: contact-rich edge cases (multi-contact, high-velocity impacts, or frictional stick–slip transitions) can violate learned assumptions and yield non-physical behavior. For real-time VR deployment, these failure modes argue for hybrid designs in which learned outputs are continuously validated by geometric/physical consistency checks and are automatically reverted to classical estimators/solvers when confidence drops.

6.8.3. Deployment Guardrails and Recommended Practice

From a deployment perspective, the most reliable pattern is to treat learning as an augmentation layer rather than a single point of failure. Classical pipelines remain attractive because they provide predictable runtime, clearer debugging signals, and graceful degradation modes (reduced quality rather than catastrophic loss). Learning is most appropriate when its scope is well-bounded (e.g., accelerating a specific query, providing priors for optimization, or improving personalization) and when runtime monitors can detect anomalies early. Practically, systems should (i) profile performance under worst-case content and device states, (ii) reserve explicit margins for variability, (iii) implement fallback paths that maintain tracking stability and interaction safety, and (iv) report these guardrails as part of reproducibility. These considerations complement the broader limitations discussed later (Section 7.4) and motivate hybrid architectures that combine learned priors with geometric and physical constraints.

7. Conclusions

7.1. Synthesis: Toward Geometry-Aware, Perception-Aware, and Learning-Augmented VR

This survey has argued that the clearest path for real-time VR is a synthesis of three complementary lenses: geometry-aware models that preserve structure in dynamics and pose; perception-aware scheduling that allocates computation where it matters for users; and learning–augmented components that accelerate or calibrate core loops without breaking invariants. In practice, constraint-consistent updates and semi-implicit/symplectic stepping support physically plausible motion [2], perceptual tolerances gate adaptive effort across physics and rendering [16,17,22], and differentiable layers couple estimation and inverse problems to image formation and scene priors [10,13].

7.2. Domain-Specific Advances and Challenges

Physics. Progress stems from stable time stepping. Data-driven parameter identification and projection-based constraints further improve robustness. Nonsmooth frictional contact remains the main obstacle for differentiable pipelines and learned controllers [21]. Estimation. On-manifold optimization fused with differentiable rendering unifies tracking and calibration. This approach reduces drift and improves relocalization under fast motion [13,40]. Rendering. Hybrid rasterization–neural methods are pushing dual-eye 90/120 Hz. Representative techniques include radiance caching and factorized tiny networks. These maintain photometric fidelity [10,11,12]. Embodiment and Haptics. Implicit neural avatars with differentiable IK reduce retargeting error from sparse sensors [39,52,70]. Passivity-aware impedance control and reduced-order models sustain kHz haptics alongside visual refresh. Device/scene co-design and stability margins under latency remain open challenges [60,61,63]. Comfort-aware controllers that close the loop with cybersickness predictors are promising, but they need broader validation [66,67,68,71].

7.3. Methodological Recommendations

To make advances comparable and cumulative, we recommend a common reporting battery. For physics, include long-horizon energy drift, momentum/constraint violations, and contact stability traces [2]. For estimation, report ATE/RPE and per-frame reprojection RMSE with defined train/test splits and motion profiles [13,40]. For rendering/systems, provide frame-time percentiles (P50/P90/P99), motion-to-photon latency, and binocular synchronization, alongside image metrics; for interaction/comfort, include standardized cybersickness outcomes with pre-registered thresholds and time-to-discomfort analyses, plus ablations of perception-informed policies [16,17,66,67]. Where learning is used, ablate the surrogate’s role, characterize failure modes, and disclose memory/bandwidth profiles relevant to headsets.

7.4. Limitations and Future Outlook

Three cross-cutting limitations persist. First, memory/bandwidth ceilings constrain neural scene/shape representations in large or dynamic environments. Second, visibility- and shading-dependent gradients bias differentiable pipelines, complicating robust calibration and inverse problems [13]. Third, comfort models trained on narrow demographics or content may not generalize across tasks and devices [67,68,71,72,73,74,75,76].

Near-term opportunities include:

(i): Structure-preserving, differentiable solvers for contact and friction that remain stable with learned surrogates [21].
(ii): Tightly coupled on-manifold estimation with hybrid neural rendering for joint pose–layout inference at VR frame rates [11,12,40].
(iii): Personalized embodiment and comfort-aware control that adapt in situ while honoring safety margins [52,66,70].

Carefully combining structure, perception, and learning offers a scalable route to stable and deeply immersive VR.

Author Contributions

Conceptualization, J.L. and K.H.L.; methodology, J.L.; formal analysis, J.L.; investigation, J.L.; resources, J.L. and Y.-H.K.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, Y.-H.K. and K.H.L.; visualization, J.L.; supervision, K.H.L.; project administration, K.H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. This work is a comprehensive survey of existing literature. All cited references are publicly available through their respective publishers or repositories.

Acknowledgments

The present research has been conducted by the Research Grant of Kwangwoon University in 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADMM	Alternating Direction Method of Multipliers
AR	Augmented Reality
ATE	Absolute Trajectory Error
BRDF	Bidirectional Reflectance Distribution Function
BVH	Bounding Volume Hierarchy
CCD	Cyclic Coordinate Descent
CSF	Contrast Sensitivity Function
DLS	Damped Least Squares
DOF	Degrees of Freedom
EPA	Expanding Polytope Algorithm
EWA	Elliptical Weighted Average
FEM	Finite Element Method
FK	Forward Kinematics
GJK	Gilbert-Johnson-Keerthi (algorithm)
GPU	Graphics Processing Unit
IK	Inverse Kinematics
IPD	Interpupillary Distance
JND	Just-Noticeable Difference
KKT	Karush-Kuhn-Tucker
LCP	Linear Complementarity Problem
LOD	Level of Detail
LUT	Look-Up Table
MCP	Mixed Complementarity Problem
MLP	Multi-Layer Perceptron
MPC	Model Predictive Control
NeRF	Neural Radiance Field
PBD	Position-Based Dynamics
PBR	Physically Based Rendering
PGS	Projected Gauss-Seidel
PID	Proportional-Integral-Derivative
PINN	Physics-Informed Neural Network
QP	Quadratic Programming
RK4	Fourth-Order Runge-Kutta
RPE	Relative Pose Error
SE(3)	Special Euclidean Group (3D)
SLAM	Simultaneous Localization and Mapping
SLERP	Spherical Linear Interpolation
SO(3)	Special Orthogonal Group (3D)
SOCP	Second-Order Cone Program
SSQ	Simulator Sickness Questionnaire
VR	Virtual Reality
XPBD	Extended Position-Based Dynamics
XR	Extended Reality

Appendix A

Appendix A.1. VR Headset Specifications

Table A1 summarizes key specifications for representative consumer VR headsets (2024–2025). These constraints inform the frame budgets, resolution targets, and memory limitations discussed throughout Section 2, Section 3, Section 4, Section 5 and Section 6. For instance, Quest 3’s 8 GB shared memory and 11.1 ms frame budget at 90 Hz directly shape adaptive LOD strategies (Section 2.5) and neural rendering trade-offs (Section 4.4).

Table A1. VR headset comparison (manufacturer specifications). Device weight, compute, and display specifications used as practical constraints for VR system design. Data compiled from manufacturer documentation and independent benchmarks (2024–2025).

Device	Weight (g)	Chipset	RAM	Display	Resolution (per Eye)	Refresh (Hz)
Meta Quest 3	515	Snapdragon XR2 Gen2	8 GB	Fast-switch LCD	$2064 \times 2208$	72–120
Apple Vision Pro	750–800	Apple M5/R1	16 GB	micro-OLED	$3660 \times 3200$	90–120
PlayStation VR2	560	Host-dependent	–	OLED	$2000 \times 2040$	120

Appendix A.2. Representative Deployment: Medical VR Surgical Simulator

To ground the mathematical formulations in Section 2, Section 3, Section 4 and Section 5 within a realistic deployment context, we outline a representative case study that integrates physics-based simulation, real-time rendering, and haptic interaction under the device constraints of Table A1.

Appendix A.2.1. System Requirements

A neurosurgery training simulator for resident education demands:

Haptic fidelity: 1 kHz force feedback for realistic tool–tissue interaction during cutting and suturing
Visual rendering: Stereo 90 Hz on Quest 3 (Table A1: 8 GB RAM, $2064 \times 2208$ per eye)
Physical realism: Soft-tissue deformation with sub-mm accuracy at contact points
Clinical validation: Surgeon acceptance rating $> 4.0 / 5$ for training efficacy

Appendix A.2.2. Mathematical Solution Stack

1. Physics simulation (Section 2)

Soft-tissue dynamics follow the constrained equations of motion (Equation (1)):

M (q) \ddot{q} + C (q, \dot{q}) = f_{ext} + J^{⊤} λ,

where

q

are modal coordinates,

M

is the modal mass matrix, and

λ

are contact constraint forces. The key enabler is modal reduction:

Full FEM discretization: 10,000 tetrahedral elements with 30,000 DOF
Modal basis: 50 dominant eigenmodes (99.2% energy capture)
Computational cost: 0.8 ms per update at 90 Hz (visual thread), 0.1 ms at 1 kHz (haptic thread)

Time integration uses symplectic Euler (Equation (5)) with per-frame constraint projection (Section 2.2) to maintain stability under rapid tool motions.

2. Haptic rendering (Section 5.4)

High-rate force feedback employs passivity observer/controller (Equations (66) and (67)):

E [k + 1] = E [k] + T_{s} f_{h} {[k]}^{⊤} v_{h} [k] - T_{s} f_{v} {[k]}^{⊤} v_{h} [k],

with adaptive damping

α [k]

injected when

E [k + 1] < 0

to prevent energy generation under latency. Virtual tool–tissue contact uses:

Stiffness: 400 N/m (calibrated to ex-vivo tissue measurements)
Damping: Adaptive via Equation (67), typically 5–15 N·s/m
Energy margin: $E [k] \geq 0$ maintained over 5-min procedures (maximum observed: $E_{max} = 0.08$ J)

3. Visual rendering (Section 4)

Deformable mesh rendering allocates the 11.1 ms frame budget (90 Hz) as follows:

Geometry update (0.5 ms): GPU vertex skinning from modal weights via compute shader
Physics simulation (2.5 ms): Modal integration + contact constraint projection
Shading (5.5 ms): PBR with preintegrated environment BRDF (Section 4.1)
Blood flow effects (1.5 ms): GPU particle system + screen-space fluid rendering
Margin (1.1 ms): Reserved for OS/runtime jitter

Perceptual LOD (Section 2.5) reduces mesh resolution for peripheral tissues (gaze eccentricity

> 20^{\circ}

) from 5000 to 1500 triangles, saving ∼1.2 ms without perceptible quality loss.

Appendix A.2.3. Measured Performance

Validation with 12 neurosurgery residents over 20 procedures each:

Table A2. Performance summary for a medical VR surgical simulator. Metric definitions follow Table 7 and Equations (89)–(96).

Metric	Target	Achieved	Reference
Haptic update rate	1000 Hz	950–1000 Hz	Section 5.4, Equation (66)
Visual frame rate (P95)	90 Hz	88–90 Hz	Table 7, Equation (94)
Force fidelity (RMSE)	<10%	8.1%	Table 7, Equation (68)
Contact infeasibility	<0.5 mm	0.3 mm	Equation (90)
Surgeon realism rating	>4.0/5	$4.2 \pm 0.6$	User study (n = 12)
Energy drift (5 min)	<5%	2.3%	Equation (89)

Appendix A.2.4. Design Lessons and Trade-Offs

1. Modal reduction is essential. Full FEM at 90 Hz would require 6.5–8.0 ms per update (exceeding budget by

5 - 7 \times

). The 50-mode approximation introduces

< 2 %

RMS error in contact forces while meeting real-time constraints.

2. Decoupled visual/haptic threads. Running haptics at 1 kHz on a separate CPU core (Section 5.4) prevents visual render spikes from degrading force feedback stability. Shared state (modal coordinates

q

) uses lock-free double buffering.

3. Perceptual adaptation under load. When frame time approaches budget (e.g., during complex tool maneuvers), the system dynamically reduces peripheral mesh LOD (Section 2.5, Equations (11) and (12)), prioritizing haptic continuity over visual fidelity where users are less sensitive.

4. Constraint projection over penalty forces. Position-level constraint projection (Equation (3)) proved more stable than penalty-based contact under rapid tool insertion/withdrawal, reducing interpenetration artifacts from 1.2 mm (penalty,

k = 10^{5}

N/m) to 0.3 mm (projection, 3 iterations).

Appendix A.2.5. Connections to Survey Content

This deployment directly instantiates principles from:

Section 2.3: Symplectic Euler (Equation (5)) maintains bounded energy error despite 5-min horizons.
Section 2.5: Gaze-driven LOD (Equations (11) and (12)) allocates computation perceptually.
Section 5.4: Passivity theory (Equations (66) and (67)) guarantees stability under variable rendering latency.
Table A1: Quest 3’s 8 GB RAM and thermal limits necessitate aggressive LOD and modal reduction.

The system demonstrates that careful integration of structure-preserving numerics, perceptual scheduling, and constraint-based manipulation (Section 2, Section 3, Section 4 and Section 5) can achieve clinical-grade training fidelity within consumer VR constraints.

References

Baraff, D. An Introduction to Physically Based Modeling: Rigid Body Dynamics. SIGGRAPH ’97 Course Notes, 1997. Available online: https://www.cs.cmu.edu/~baraff/sigcourse/notesd1.pdf (accessed on 31 December 2025).
Erleben, K. Stable, Robust, and Versatile Physics for Computer Animation. Ph.D. Thesis, University of Copenhagen, Copenhagen, Denmark, 2005. [Google Scholar]
Phong, B.T. Illumination for Computer Generated Pictures. Commun. ACM 1975, 18, 311–317. [Google Scholar] [CrossRef]
Möller, T.; Haines, E. Real-Time Rendering; A K Peters: Natick, MA, USA, 1999. [Google Scholar]
LaValle, S. Virtual Reality, Chapter 3: Geometric Foundations; UIUC, 2017. Available online: https://lavalle.pl/vr/vrch3.pdf (accessed on 31 December 2025).
Hanson, A.J. Visualizing Quaternions; Morgan Kaufmann/Elsevier: Amsterdam, The Netherlands, 2006. [Google Scholar]
Shoemake, K. Animating rotation with quaternion curves. ACM SIGGRAPH Comput. Graph. 1985, 19, 245–254. [Google Scholar]
Craig, J. Introduction to Robotics: Mechanics and Control, 4th ed.; Pearson: Upper Saddle River, NJ, USA, 2018. [Google Scholar]
Aristidou, A.; Lasenby, J.; Chrysanthou, Y.; Shamir, A. Inverse Kinematics Techniques in Computer Graphics: A Survey. Comput. Graph. Forum 2018, 37, 35–58. [Google Scholar] [CrossRef]
Mildenhall, B.; Srinivasan, P.; Tancik, M.; Barron, J.; Ramamoorthi, R.; Ng, R. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Commun. ACM. 2020, 65, 99–106. [Google Scholar] [CrossRef]
Reiser, C.; Peng, S.; Liao, Y.; Geiger, A. KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs. In Proceedings of the ICCV, Montreal, QC, Canada, 11–17 October 2021; pp. 15617–15626. [Google Scholar]
Müller, T.; Rousselle, F.; Novák, J.; Keller, A. Real-Time Neural Radiance Caching for Path Tracing. ACM Trans. Graph. 2021, 40, 36. [Google Scholar] [CrossRef]
Loper, M.; Black, M.J. OpenDR: An Approximate Differentiable Renderer. In Computer Vision—ECCV 2014; Springer: Cham, The Netherlands, 2014; pp. 106–121. [Google Scholar]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
LaViola, J.; Kruijff, E.; McMahan, R.; Bowman, D.; Poupyrev, I. 3D User Interfaces: Theory and Practice, 2nd ed.; Addison-Wesley: Boston, MA, USA, 2017. [Google Scholar]
Wang, L.; Shi, X.; Liu, Y. Foveated Rendering: A State-of-the-art Survey. Comput. Vis. Media 2023, 9, 195–228. [Google Scholar] [CrossRef]
Mohanto, B.; Subramanyam, A.V.; Hassan, E.A. An Integrative View of Foveated Rendering. Comput. Graph. 2022, 102, 64–88. [Google Scholar] [CrossRef]
O’Sullivan, C.; Howlett, S.; McDonnell, R.; Morvan, Y.; O’Conor, K. Perceptually Adaptive Graphics. In Eurographics 2004—State of the Art Reports (STARs); Schlick, C., Purgathofer, W., Eds.; Eurographics Association: Aire-la-Ville, Switzerland, 2004; pp. 141–164. [Google Scholar] [CrossRef]
Weier, M.; Zender, H.; Wimmer, M. A Survey of Foveated Rendering. Comput. Graph. 2015, 53, 137–147. [Google Scholar] [CrossRef]
Eberly, D.H. Game Physics, 2nd ed.; Morgan Kaufmann: Burlington, MA, USA, 2010. [Google Scholar]
Glocker, C. Set-Valued Force Laws: Dynamics of Non-Smooth Systems; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Yeh, T.Y.; Reinman, G.; Patel, S.J.; Colicchio, B.; Faloutsos, P. Fool Me Twice? Exploring and Exploiting Error Tolerance in Physics-Based Animation. ACM Trans. Graph. 2008, 27, 82. [Google Scholar] [CrossRef]
Witkin, A.; Baraff, D. Physically Based Modeling: Principles and Practice; SIGGRAPH ’97 Course Notes, 1997. Available online: https://www.cs.cmu.edu/~baraff/sigcourse/ (accessed on 31 December 2025).
Witkin, A.; Kass, M. An Introduction to Physically Based Modeling; SIGGRAPH Course Notes; ACM SIGGRAPH: New York, NY, USA, 1988. [Google Scholar]
Jakobsen, T. Advanced Character Physics. In Proceedings of the Game Developers Conference (GDC 2001); GDC Press: San Jose, CA, USA, 2001; pp. 383–401. [Google Scholar]
Gilbert, E.G.; Johnson, D.W.; Keerthi, S.S. A Fast Procedure for Computing the Distance Between Complex Objects in Three-Dimensional Space. IEEE J. Robot. Autom. 1988, 4, 193–203. [Google Scholar] [CrossRef]
Teschner, M.; Kimmerle, S.; Heidelberger, B.; Zachmann, G.; Raghupathi, L.; Fuhrmann, A.; Cani, M.-P.; Faure, F.; Magnenat-Thalmann, N.; Strasser, W. Collision Detection for Deformable Objects. Comput. Graph. Forum 2005, 24, 61–81. [Google Scholar] [CrossRef]
Jiménez, P.; Thomas, F.; Torras, C. 3D Collision Detection: A Survey. Comput. Graph. 2001, 25, 269–285. [Google Scholar] [CrossRef]
Huang, Z.; Colli Tozoni, D.; Gjoka, A.; Ferguson, Z.; Schneider, T.; Panozzo, D.; Zorin, D. Differentiable solver for time-dependent deformation problems with contact. ACM Trans. Graph. 2024, 43, 31:1–31:30. [Google Scholar] [CrossRef]
Zhu, X.; Ke, J.; Xu, Z.; Sun, Z.; Bai, B.; Lv, J.; Liu, Q.; Zeng, Y.; Ye, Q.; Lu, C.; et al. Diff-LfD: Contact-aware Model-based Learning from Visual Demonstration for Robotic Manipulation via Differentiable Physics-based Simulation and Rendering. In Proceedings of the 7th Annual Conference on Robot Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 41183–41203. [Google Scholar]
Si, Z.; Zhang, G.; Ben, Q.; Romero, B.; Xian, Z.; Liu, C.; Gan, C. DIFFTACTILE: Physics-based Differentiable Tactile Simulation. arXiv 2024, arXiv:2403.11111. [Google Scholar]
Go, M.-S.; Lim, J.H.; Lee, S. Physics-informed Neural Network-based Surrogate Model for a Virtual Thermal Sensor with Real-time Simulation. Int. J. Heat Mass Transf. 2023, 214, 124392. [Google Scholar] [CrossRef]
Yang, S.; Kim, H.; Hong, Y.; Yee, K.; Maulik, R.; Kang, N. Data-Driven Physics-Informed Neural Networks: A Digital Twin Perspective. arXiv 2024, arXiv:2401.08667. [Google Scholar] [CrossRef]
Vince, J. Rotation Transforms for Computer Graphics; Springer: London, UK, 2011. [Google Scholar]
Shuster, M.D. A Survey of Attitude Representations. J. Astronaut. Sci. 1993, 41, 439–517. [Google Scholar]
Masci, J.; Boscaini, D.; Bronstein, M.; Vandergheynst, P. Geodesic Convolutional Neural Networks on Riemannian Manifolds. In Proceedings of the ICCV, Santiago, Chile, 11–18 December 2015; pp. 832–840. [Google Scholar] [CrossRef]
Bronstein, M.M.; Bruna, J.; Cohen, T.; Velickovic, P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv 2021, arXiv:2104.13478. [Google Scholar] [CrossRef]
Kolotouros, V.; Pavlakos, G.; Black, M.J.; Tulsiani, S. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5092–5102. [Google Scholar]
Feng, Y.; Feng, H.; Black, M.J.; Hilliges, O. Learning Detailed 3D Face Shape and Expression from a Single Image. ACM Trans. Graph. 2021, 40, 88. [Google Scholar] [CrossRef]
Teed, P.; Deng, J. DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. In Proceedings of NeurIPS (Virtual Event); Curran Associates, Inc.: Red Hook, NY, USA, 2021; pp. 29775–29789. [Google Scholar]
Kajiya, J.T. The Rendering Equation. In SIGGRAPH ’86: Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques; ACM: New York, NY, USA, 1986; pp. 143–150. [Google Scholar] [CrossRef]
Veach, E. Robust Monte Carlo Methods for Light Transport Simulation. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 1997. [Google Scholar]
Walter, B.; Marschner, S.R.; Li, H.; Torrance, K.E. Microfacet Models for Refraction through Rough Surfaces. In Rendering Techniques 2007: Proceedings of the 18th Eurographics Symposium on Rendering; Eurographics Association: Aire-la-Ville, Switzerland, 2007; pp. 195–206. [Google Scholar] [CrossRef]
Brown, D.C. Decentering Distortion of Lenses. Photogramm. Eng. 1966, 32, 444–462. [Google Scholar]
Zhang, Z. A Flexible New Technique for Camera Calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
Heckbert, P.S. Fundamentals of Texture Mapping and Image Warping. Master’s Thesis, University of California, Berkeley, CA, USA, 1989. [Google Scholar]
Williams, L. Pyramidal Parametrics. In SIGGRAPH ’83: Proceedings of the 10th Annual Conference on Computer Graphics and Interactive Techniques; ACM: New York, NY, USA, 1983; pp. 1–11. [Google Scholar] [CrossRef]
Schlick, C. An Inexpensive BRDF Model for Physically-based Rendering. Comput. Graph. Forum 1994, 13, 233–246. [Google Scholar] [CrossRef]
Poynton, C. Digital Video and HD: Algorithms and Interfaces, 2nd ed.; Morgan Kaufmann: Burlington, MA, USA, 2012. [Google Scholar]
Tancik, M.; Srinivasan, P.P.; Mildenhall, B.; Fridovich-Keil, S.; Raghavan, N.; Singhal, U.; Ramamoorthi, R.; Barron, J.T.; Ng, R. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020); Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; pp. 7537–7547. [Google Scholar]
Sitzmann, V.; Martel, J.N.P.; Bergman, A.W.; Lindell, D.B.; Wetzstein, G. Implicit Neural Representations with Periodic Activation Functions. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020); Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; pp. 7462–7473. [Google Scholar]
Peng, S.; Xu, Z.; Dong, J.; Wang, Q.; Shuai, Q.; Bao, H.; Zhou, X. Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos. IEEE TPAMI 2024, 46, 4147–4159. [Google Scholar] [CrossRef]
Duckworth, D.; Hedman, P.; Reiser, C.; Zhizhin, P.; Thibert, J.-F.; Lučić, M.; Szeliski, R.; Barron, J.T. SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration. ACM Trans. Graph. 2024, 43, 63:1–63:13. [Google Scholar] [CrossRef]
Badler, N.I.; Phillips, C.B.; Webber, B.L. Simulating Humans: Computer Graphics, Animation, and Control; Oxford University Press: New York, NY, USA, 1993. [Google Scholar]
Krogmeier, C.; Mousas, C.; Whittinghill, D. Human, Virtual Human, Bump! A Preliminary Study on Haptic Feedback. In Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces (VR), Osaka, Japan, 23–27 March 2019; pp. 1032–1033. [Google Scholar] [CrossRef]
Steed, A.; Lai, J. Comparison of Hand Tracking-based and Controller-based Interaction in a Consumer Virtual Reality Game. Virtual Real. 2025, 29, 120. [Google Scholar] [CrossRef]
Huang, L.; Zhang, B.; Guo, Z.; Xiao, Y.; Cao, Z.; Yuan, J. Survey on Depth and RGB Image-based 3D Hand Shape and Pose Estimation. Virtual Real. Intell. Hardw. 2021, 3, 207–234. [Google Scholar] [CrossRef]
Tevatia, G.; Schaal, S. Inverse Kinematics for Humanoid Robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), San Francisco, CA, USA, 24–28 April 2000; pp. 294–299. [Google Scholar] [CrossRef]
Zhao, A.; Tang, C.; Wang, L.; Li, Y.; Dave, M.; Tao, L.; Twigg, C.D.; Wang, R.Y. EgoBody3M: Egocentric Body Tracking on a VR Headset using a Diverse Dataset. In Computer Vision–ECCV 2024; Springer: Cham, The Netherlands, 2024; pp. 375–392. [Google Scholar] [CrossRef]
Massie, T.H.; Salisbury, J.K. The PHANToM Haptic Interface: A Novel Design for Interactive Mechanical Simulation. In Proceedings of the ASME Winter Annual Meeting, Haptic Interfaces for Virtual Environment and Teleoperator Systems, Chicago, IL, USA, 6–11 November 1994; DSC-Vol. 55-1, pp. 295–299. [Google Scholar]
Salisbury, J.K.; Conti, F.; Barbagli, F. The PHANToM Haptic Interface: A Device for Probing Virtual Objects. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Minneapolis, MN, USA, 22–28 April 1996; pp. 3185–3190. Available online: https://api.semanticscholar.org/CorpusID:14915458 (accessed on 17 December 2025).
Zilles, C.B.; Salisbury, J.K. A Constraint-Based God-Object Method for Haptic Display. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Pittsburgh, PA, USA, 5–9 August 1995; pp. 146–151. [Google Scholar] [CrossRef]
Berkley, J.; Turkiyyah, G.; Berg, D.; Ganter, M.; Weghorst, S. Real-time Finite Element Modeling for Surgery Simulation: An Application to Virtual Suturing. IEEE Trans. Vis. Comput. Graph. 2004, 10, 314–325. [Google Scholar] [CrossRef] [PubMed]
Gibbs, J.K.; Gillies, M.; Pan, X. A comparison of the effects of haptic and visual feedback on presence in virtual reality. Int. J. Hum.-Comput. Stud. 2022, 157, 102717. [Google Scholar] [CrossRef]
Tasnim, U.; Islam, R.; Desai, K.; Quarles, J. Investigating Personalization Techniques for Improved Cybersickness Prediction in Virtual Reality Environments. IEEE Trans. Vis. Comput. Graph. 2024, 30, 2368–2378. [Google Scholar] [CrossRef]
Hadadi, A.; Guillet, C.; Chardonnet, J.-R.; Langovoy, M.; Wang, Y.; Ovtcharova, J. Prediction of Cybersickness in Virtual Environments Using Topological Data Analysis and Machine Learning. Front. Virtual Real. 2022, 3, 973236. [Google Scholar] [CrossRef]
Ramaseri-Chandra, A.N.; Reza, H. Predicting Cybersickness Using Machine Learning and Demographic Data in Virtual Reality. Electronics 2024, 13, 1313. [Google Scholar] [CrossRef]
Qi, C.; Ding, D.; Chen, H.; Cao, Z.; Zhang, W. CPNet: Real-Time Cybersickness Prediction without Physiological Sensors for Cybersickness Mitigation. ACM Trans. Sens. Netw. 2025. [Google Scholar] [CrossRef]
Wan, H.; Zhang, J.; Suria, A.A.; Yao, B.; Wang, D.; Coady, Y.; Prpa, M. Building LLM-based AI Agents in Social Virtual Reality. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA ’24); Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–7. [Google Scholar] [CrossRef]
Dong, Z.; Guo, C.; Song, J.; Chen, X.; Geiger, A.; Hilliges, O. PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), New Orleans, LA, USA, 19–24 June 2022; pp. 11099–11108. [Google Scholar]
Long, Y.; Wang, T.; Liu, X.; Li, Y.; Tao, D. Toward Accurate Cybersickness Prediction in Virtual Reality: A Physiological Modeling Approach. Sensors 2025, 25, 5828. [Google Scholar] [CrossRef] [PubMed]
Hecker, C. Physics, The Next Frontier. Game Developer Magazine, October/November 1996. pp. 12–20. Available online: https://www.chrishecker.com/images/d/df/Gdmphys1.pdf (accessed on 17 December 2025).
Sung, N.-J.; Ma, J.; Hor, K.; Kim, T.; Va, H.; Choi, Y.-J.; Hong, M. Real-Time Physics Simulation Method for XR Application. Computers 2025, 14, 17. [Google Scholar] [CrossRef]
Patry, M.; Ricard, J.; Bellavance, F. Cybersickness, Arousal, and Affect in Virtual Reality: Effects of Field of View, Gender, and Age. Virtual Real. 2023, 27, 2631–2646. [Google Scholar] [CrossRef]
Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 14–18 May 2012; pp. 573–580. [Google Scholar] [CrossRef]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]

Figure 1. Simplified VR system architecture used as the organizing roadmap of this survey. The pipeline blocks (hardware, tracking, physics/IK, rendering, and display) are mapped to the corresponding sections, with one representative equation highlighted per block (e.g., tracking: Equation (15), physics: Equation (4), rendering: Equation (17)).

Figure 2. Integration stability for a simple pendulum (1000 steps). (a) Explicit Euler updates

v_{k + 1} = v_{k} + Δ t a (q_{k}, v_{k})

and

q_{k + 1} = q_{k} + Δ t v_{k}

, yielding monotonic energy drift. (b) Symplectic (semi-implicit) Euler follows Equation (4):

v_{k + 1} = v_{k} + Δ t a (q_{k}, v_{k})

and

q_{k + 1} = q_{k} + Δ t v_{k + 1}

, keeping the energy error bounded. (c) RK4 reduces drift but is computationally more expensive.

Figure 2. Integration stability for a simple pendulum (1000 steps). (a) Explicit Euler updates

v_{k + 1} = v_{k} + Δ t a (q_{k}, v_{k})

and

q_{k + 1} = q_{k} + Δ t v_{k}

, yielding monotonic energy drift. (b) Symplectic (semi-implicit) Euler follows Equation (4):

v_{k + 1} = v_{k} + Δ t a (q_{k}, v_{k})

and

q_{k + 1} = q_{k} + Δ t v_{k + 1}

, keeping the energy error bounded. (c) RK4 reduces drift but is computationally more expensive.

Figure 3. Overview of a modern VR rendering pipeline. Geometry processing transforms high-level primitives into screen-space meshes (Geometry). Shading applies rasterization, lighting, and texturing to generate shaded images (Shading). Foveated rendering allocates higher resolution near the gaze center while reducing detail in the periphery via variable-rate shading (Foveation). Finally, reprojection performs time warp and lens distortion to produce stereo images for the left and right eyes (Reprojection).

Table 1. Domain-specific modeling priorities in VR applications. Ratings (1–5) indicate relative emphasis under typical requirements; key constraints drive system-level trade-offs discussed in later sections.

Domain	Physics	Tracking	Rendering	Key Constraint
Medical	5	4	3	Haptic fidelity (kHz)
Gaming	3	4	5	90 Hz minimum
Industrial	3	5	4	Sub-cm accuracy
Social VR	2	5	3	Network latency

Table 2. Integration schemes under real-time VR constraints. The typical use cases reflect common VR workloads, ranging from low-priority background props to articulated characters and constraint-rich soft objects. The local error and energy behavior indicate long-horizon stability (e.g., drift versus bounded oscillations), while per-step cost determines achievable update rates under stereo frame budgets. See Section 2.7 for stability-oriented discussion and practical selection guidelines.

Method	Local Error	Energy Behavior	Per-Step Cost	Typical Use in VR
Explicit Euler	$O (Δ t^{2})$ (1st order global)	Drift (non-symplectic)	Very low	Background props, cheap particles; avoid long horizons
Symplectic Euler	$O (Δ t)$	Bounded drift (symplectic)	Very low	Rigid stacks, ragdolls with constraint projection [2]
Verlet (position)	$O (Δ t^{3})$ (2nd global)	Good for conservative forces (symplectic)	Low	Cloth/strings, soft constraints with XPBD-style projection [24,25]
RK4	$O (Δ t^{5})$ (4th global)	Non-symplectic	Medium–high	Small ODEs requiring accuracy (camera oscillators, effects) [20]

Table 3. Section 2 at-a-glance: core physics formulations, their practical implications for real-time VR, and representative references. Each row summarizes the mathematical form, typical stability/performance trade-offs, and the system-level consequences that matter at VR update rates (e.g., feasibility under constraints and long-horizon energy behavior). Cross-references point to detailed derivations and implementation notes.

Topic	Core Formulation	Practical Implications	Canonical Refs.
Dynamics & constraints	$M \dot{v} + C = f_{ext} + J^{⊤} λ$ ; algebraic constraints $g (q) = 0$ , $g_{n} (q) \geq 0$ ; stabilization via Baumgarte or post-step projection.	Fixes state/force notation used throughout. Projection keeps feasibility and reduces drift under real-time steps.	[1,2]
Time integration	Semi-implicit/symplectic, Verlet, RK4 (accuracy vs. structure).	Use symplectic + projection for long-horizon stability; reserve RK4 for small, non-stiff subsystems when accuracy dominates.	[20,24,25]
Contact & friction	Complementarity $0 \leq λ_{n} ⊥ g_{n} \geq 0$ ; Coulomb cone $∥ λ_{t} ∥ \leq μ λ_{n}$ ; LCP/SOCP formulations; collision via GJK/EPA, BVHs; deformables via SDF/volumetric proxies.	Choose PGS/CP for speed or ADMM/interior-point for tighter solves on small sets. Robust proximity queries are critical for stable contacts.	[21,26,27,28]
Perceptual/system adaptivity	Saliency/occlusion/gaze-gated LOD and update-rate scheduling to meet frame budget B (e.g., $B \approx 11.1$ ms @ 90 Hz).	Prioritize foreground constraints; throttle off-gaze/low-utility tasks to cut misses/jitter without visible artifacts.	[2,18,19,22]
Integrator trade-offs	See Table 2.	Compare accuracy, stability, and cost of representative schemes under VR budgets.	—

Table 4. Section 3 at-a-glance: spatial representations, differentiable image formation, and manifold-aware estimation relevant to VR tracking and calibration. The summary emphasizes structure-preserving pose updates and differentiable objectives that enable online refinement under noisy sensing. These elements support world-locked content, stable anchors, and reproducible evaluation in Section 6.

Topic	Core Idea/Formulation	Practical Implication for VR	Canonical Refs
Transforms & frames	On-manifold pose updates ( $R \leftarrow R exp ({[δ ϕ]}_{\times})$ ), unit-quaternion orientation with normalization; stable $SE (3)$ chaining.	Reduces drift and gimbal issues at 90/120 Hz; improves numerical stability in long transform chains and resampling.	[6,34,35]
Differentiable image formation & neural fields	Photometric losses with Jacobians w.r.t. $SE (3)$ pose/intrinsics; NeRF-style radiance fields for view synthesis and calibration.	Enables gradient-based tracking/relocalization and online calibration; ties rendering directly to estimation quality.	[10,12,13]
Manifold-aware learning (meshes, avatars)	Geometric DL on surfaces/graphs (e.g., Laplace–Beltrami, spectral/mesh operators); differentiable body/face models.	Respects non-Euclidean structure; better generalization and lower retargeting error for hand-object and avatar tasks.	[36,37,38,39]
SLAM & pose-graph optimization	On-manifold bundle adjustment and dense–sparse fusion for robust pose graphs under motion.	Tighter online estimates for world-locked content; reduced drift for stable anchoring in interactive scenes.	[40]

Table 5. Section 4 at-a-glance: rendering foundations and modern extensions for VR. The summary spans physically based shading, camera and projection models for stereo, GPU-precision considerations, and neural/differentiable methods for view synthesis. Perceptual strategies such as foveation allocate computation where users are most sensitive, supporting real-time performance without sacrificing perceived quality.

Topic (Section)	Core Math/Formulation	VR Impact
Classical shading (Section 4.1)	Physically based microfacet BRDFs with energy-conserving parameters; Schlick-style Fresnel; compute lighting in linear space and apply the sRGB transfer function only at the final output.	Gives physically plausible highlights and stable tone reproduction at real-time rates.
Cameras & projection (Section 4.2)	Per-eye off-axis frusta from IPD and lens geometry using $(l, r, b, t)$ ; lens pre-distortion; orientation- and depth-aware timewarp/spacewarp that reuse the same per-eye projection as the main render.	Produces correct stereo geometry and reduces apparent latency without resubmitting full frames.
GPU math essentials (Section 4.3)	Perspective-correct interpolation realised by dividing attributes by clip-space w, interpolating, then renormalizing; depth precision concentrated near the near plane; normal transform via the inverse-transpose of the model matrix; MIP level estimated from screen-space derivatives, with anisotropic or EWA filtering when needed.	Avoids affine warping, mitigates z-fighting by pushing the near plane, fixes shading skew, and preserves texture detail at grazing angles.
Neural & differentiable rendering (Section 4.4)	Neural radiance fields for view synthesis with volumetric integration along rays; differentiable rasterization that exposes derivatives with respect to pose and intrinsics, updating rotations on the $SO (3)$ manifold.	Enables gradient-based tracking, relocalization, and online calibration; real-time use becomes feasible via network factorization and radiance caching.
Hybrid & perceptual (Section 4.5)	Eye-tracked foveated rendering; temporal reprojection and upsampling; hybrid pipelines where rasterization handles primary visibility while selected reflections, soft shadows, and global illumination use ray or path tracing.	Concentrates computation where perception is most sensitive and improves quality-per-watt under 90/120 Hz stereo constraints.

Table 6. Interaction metrics glossary for Section 5. The listed metrics quantify manipulation accuracy, contact feasibility, calibration quality, and responsiveness in ways that are directly measurable in VR experiments. Units and measurement contexts are included to support reproducible reporting, and the associated formulas provide precise definitions used later in benchmarks and case studies (Section 6).

Metric	Symbol	What It Measures	Notes/Units
EE position RMSE	Equation (68)	Average Euclidean error of the end-effector position relative to a reference trajectory.	Meters, in the world frame.
EE orientation error	Equation (69)	Angular misalignment between predicted and reference orientations (geodesic angle on $SO (3)$ ).	Degrees or radians.
Contact infeasibility	Equation (70)	Largest remaining penetration depth over all active contacts.	Meters; should be close to zero.
Complementarity residual	Equation (71)	Violation of the normal-contact complementarity $0 \leq λ_{n} ⊥ g_{n} \geq 0$ .	Impulse·meter; zero means perfectly feasible contacts.
Reprojection RMSE	Equation (72)	Pixel-wise error when reprojecting tracked visual features into the image.	Pixels in the image plane.
Calibration drift	Equation (73)	Accumulated change of the estimated pose on $SE (3)$ over time.	Geodesic distance, e.g., $∥ log (Δ T) ∥$ .
Control latency	Equation (74)	Delay from sensing to the corresponding actuation reaching the user.	Milliseconds.
Slip rate	Equation (75)	How often a grasp unintentionally fails during interaction.	Fraction in $[0, 1]$ (events per grasp).
Passivity margin	Equations (76) and (77)	Energy balance in the haptic loop (how far the system is from violating passivity).	Nonnegative is desired; larger margins are safer.

Table 7. Common metrics across physics, estimation, rendering/systems, and comfort for VR evaluation. Physics-oriented measures track stability and constraint satisfaction; estimation measures capture tracking drift and consistency; system measures capture missed-frame behavior and latency; and comfort measures summarize user impact. Together, they provide a compact, reproducible checklist for reporting and cross-paper comparison.

Domain	Metric (Symbol)	What It Measures	Units/Notes
Physics	Energy drift (Equation (89))	Change of total energy over a window	% (lower is better)
Physics	Constraint infeasibility (Equation (90))	Residuals for non-penetration/equality/velocity-level constraints	m or unitless
Estimation	ATE (Equation (91))	Absolute trajectory error after alignment	m (RMSE)
	RPE (Equation (92))	Relative pose error over gap $Δ$	m/deg (RMSE)
	Reprojection RMSE (Equation (93))	Pixel-space residual per frame	pixels
Rendering/Systems	Missed-frame ratio (Equation (94))	Budget violations per frame	fraction (0–1)
	Visual rate (P95)	95th-percentile achieved display rate (or equivalently, 95th-percentile frame time)	Hz (higher is better); report alongside missed-frame ratio
	Motion-to-photon (MTP)	Sensor-to-photon latency (median/p95)	ms
	Per-eye throughput	Sustained res. and refresh	e.g., $2 \times (2160 \times 2160)$ @ 90/120 Hz
Comfort	SSQ (Total/subscales)	Simulator Sickness Questionnaire	scalar scores
	Time-to-discomfort (Equation (95))	Exposure time until threshold	s or min
	Dropout rate (Equation (96))	Participants unable to complete exposure	fraction (0–1)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, J.; Kim, Y.-H.; Lee, K.H. Mathematical Modeling Techniques in Virtual Reality Technologies: An Integrated Review of Physical Simulation, Spatial Analysis, and Interface Implementation. Symmetry 2026, 18, 255. https://doi.org/10.3390/sym18020255

AMA Style

Lee J, Kim Y-H, Lee KH. Mathematical Modeling Techniques in Virtual Reality Technologies: An Integrated Review of Physical Simulation, Spatial Analysis, and Interface Implementation. Symmetry. 2026; 18(2):255. https://doi.org/10.3390/sym18020255

Chicago/Turabian Style

Lee, Junhyeok, Yong-Hyuk Kim, and Kang Hoon Lee. 2026. "Mathematical Modeling Techniques in Virtual Reality Technologies: An Integrated Review of Physical Simulation, Spatial Analysis, and Interface Implementation" Symmetry 18, no. 2: 255. https://doi.org/10.3390/sym18020255

APA Style

Lee, J., Kim, Y.-H., & Lee, K. H. (2026). Mathematical Modeling Techniques in Virtual Reality Technologies: An Integrated Review of Physical Simulation, Spatial Analysis, and Interface Implementation. Symmetry, 18(2), 255. https://doi.org/10.3390/sym18020255

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Mathematical Modeling Techniques in Virtual Reality Technologies: An Integrated Review of Physical Simulation, Spatial Analysis, and Interface Implementation

Abstract

1. Introduction

1.1. Scope, Perspective, and Positioning

1.2. Contributions and Methodology

1.3. Reader’s Guide and Conventions

2. Physics-Based Simulation in VR

2.1. VR Strengths and Limitations (at a Glance)

2.2. Equations of Motion and Constraints

2.3. Time Integration: Accuracy, Stability, and Structure

2.4. Contact, Friction, and Non-Smooth Dynamics

2.5. Perceptual- and System-Aware Adaptivity

Clock Alignment

2.6. Learning- and Gradient-Based Simulation

2.7. Practical Guidance and Trade-Offs

2.8. Summary of Section 2

3. Spatial Representation and Geometric Transformations in VR

3.1. Coordinate Frames and Transform Chains

3.2. Rotations: From Euler Angles to Quaternions and Lie Groups

3.3. Differentiable Transforms and Inverse Graphics

3.4. Learning on Manifolds and Non-Euclidean Domains

3.5. Pose Graphs, SLAM, and Online Estimation for VR

3.6. Summary of Section 3

4. Real-Time Rendering and Mathematical Models in VR

4.1. Classical Illumination Models and Shading

4.1.1. Phong Shading

4.1.2. Rendering Equation (Physically Based Rendering)

4.1.3. Monte Carlo Evaluation of the Hemisphere Integral

4.1.4. Real-Time Approximations in Engines

4.2. Camera Models and Projection Matrices

4.2.1. Symmetric Perspective

4.2.2. Off-Axis (Asymmetric) per-Eye Frustum

4.2.3. Lens Distortion (Predistortion for VR)

4.2.4. Timewarp (Orientation-Only and Depth-Aware)

4.3. GPU Pipelines and Shader Mathematics

4.3.1. Perspective-Correct Interpolation

4.3.2. Depth Mapping and Precision

4.3.3. Normals: Inverse-Transpose and TBN

4.3.4. Texture Filtering and MIP Selection

4.3.5. Linear Color and Fast Fresnel

4.4. Neural and Differentiable Rendering: Models and Math

4.4.1. Neural Radiance and Density Fields

4.4.2. Frequency Encodings (Positional/SIREN)

4.4.3. Differentiable Volume Rendering

4.4.4. Differentiable Rasterization

4.4.5. Training Objective and Regularization

4.4.6. Acceleration for Interactive VR

4.5. Hybrid and Perceptual Rendering for VR

4.6. Summary of Section 4

Practical Takeaways

5. Interaction Modeling and User Interface Dynamics

5.1. Foundations: Sensing, Pose, and Kinematic Mapping

5.2. Hand, Body, and Object Interaction

5.3. Constraint-Based Manipulation and Differentiable Calibration

5.3.1. Impulse-/Velocity-Level Contact Resolution

5.3.2. Convex QP Time Stepping

5.3.3. Geometric (Position-Level) Projection

5.3.4. Differentiable Calibration and Identification

5.4. Haptics and Force-Feedback for Action–Perception Coupling

Discrete-Time

5.5. Perception-Aware Interaction: Foveation, Workload, and Cybersickness

5.6. Learning-Augmented Interfaces

5.7. Robustness, Latency, and System Practices

5.8. Summary of Section 5

Reporting Metrics

Notation

6. Future Direction of Mathematical Modeling in VR

6.1. Learning-Augmented Physics and Stable Time Stepping

6.1.1. Structure-Preserving Time Stepping

6.1.2. Learning Insertion Without Breaking Structure

6.1.3. Nonsmooth Contact and Friction

6.1.4. Stability and What to Report

6.2. Spatial Estimation on Manifolds and Differentiable SLAM

6.2.1. Factor-Graph Objective on SE ( 3 )

6.2.2. On-Manifold Updates and Gauge

6.2.3. Metrics and Alignment Choices

6.3. Neural and Differentiable Rendering for Real-Time XR

6.3.1. Cached Estimators and Perceptual Budgeting

6.3.2. Foveated Sampling as a Constrained Program

6.2.1. Factor-Graph Objective on $SE (3)$