Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (50)

Search Parameters:
Keywords = instruction-level parallelism

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
9 pages, 836 KB  
Communication
Test–Retest Reliability of Single-Arm Closed Kinetic Chain Upper Extremity Stability Test
by Andy Waldhelm, Mareli Klopper, Matthew Paul Gonzalez, Stephanie Flynn, Edward Austin and Ron Masri
J. Funct. Morphol. Kinesiol. 2026, 11(1), 46; https://doi.org/10.3390/jfmk11010046 - 21 Jan 2026
Viewed by 291
Abstract
Background: The original Closed Kinetic Chain Upper Extremity Stability Test (CKCUEST) is a simple assessment tool but does not account for individual differences in hand starting position and fails to provide information on limb asymmetries. The purpose of the study is to evaluate [...] Read more.
Background: The original Closed Kinetic Chain Upper Extremity Stability Test (CKCUEST) is a simple assessment tool but does not account for individual differences in hand starting position and fails to provide information on limb asymmetries. The purpose of the study is to evaluate the test–retest reliability of a new single-arm CKCUEST as well as the reliability of the limb symmetry index (LSI). This version normalizes the test based on the participant’s arm length and allows for the assessment of limb symmetry since it is performed one arm at a time. Methods: Twelve healthy young adults provided both verbal and written consent to participate. Participants were excluded if they had sustained an injury in the past three months requiring medical attention and/or resulting in decreased activity for more than three days. Testing was conducted in the push-up position with participants’ thumbs placed parallel and at a distance equal to the length of their dominant arm (measured from the acromion to the tip of the middle finger), and feet positioned shoulder-width apart. Participants were instructed to keep the testing hand stable on the floor while the opposite hand reached across the body to touch the stationary hand and then return to the starting position marked with athletic tape. The goal was to complete as many touches as possible in 15 s, with each touch counted only if the participant touched the stationary hand, returned to the starting position, and maintained the shoulder-width stance. The average number of touches from the three trials was used for analysis. Intraclass Correlation Coefficients (ICC(3,1)) were computed to determine test–retest reliability. Results: Test–retest reliability of the single-arm CKCUEST individual tests was good to excellent. The ICC(3,1) was 0.88 (95% CI: 0.74–0.95) for all tests, 0.89 (95% CI: 0.66–0.96) for the dominant arm, and 0.93 (95% CI: 0.78–0.98) for the non-dominant arm. In contrast, the reliability of the Limb Symmetry Index (LSI) was questionable, showing substantial variability with an ICC(3,1) of 0.53 (95% CI: −0.03–0.83) between Day 1 and Day 2, despite similar mean values (Day 1: 93.6 ± 8.46; Day 2: 94.8 ± 5.77). The Kappa coefficient suggested a substantial level of agreement for the direction of the asymmetry (preferred limb) (Kappa coefficient = 0.62). Conclusions: The new single-arm CKCUEST, which personalizes the hand starting position and measures limb symmetry, demonstrates high reliability among healthy young adults. Full article
(This article belongs to the Section Kinesiology and Biomechanics)
Show Figures

Figure 1

17 pages, 2889 KB  
Technical Note
Increasing Computational Efficiency of a River Ice Model to Help Investigate the Impact of Ice Booms on Ice Covers Formed in a Regulated River
by Karl-Erich Lindenschmidt, Mojtaba Jandaghian, Saber Ansari, Denise Sudom, Sergio Gomez, Stephany Valarezo Plaza, Amir Ali Khan, Thomas Puestow and Seok-Bum Ko
Water 2026, 18(2), 218; https://doi.org/10.3390/w18020218 - 14 Jan 2026
Viewed by 339
Abstract
The formation and stability of river ice covers in regulated waterways are critical for uninterrupted hydro-electric operations. This study investigates the modelling of ice cover development in the Beauharnois Canal along the St. Lawrence River with the presence and absence of ice booms. [...] Read more.
The formation and stability of river ice covers in regulated waterways are critical for uninterrupted hydro-electric operations. This study investigates the modelling of ice cover development in the Beauharnois Canal along the St. Lawrence River with the presence and absence of ice booms. Ice booms are deployed in this canal to promote the rapid formation of a stable ice cover during freezing events, minimizing disruptions to dam operations. Remote sensing data were used to assess the spatial extent and temporal evolution of an ice cover and to calibrate the river ice model RIVICE. The model was applied to simulate ice formation for the 2019–2020 ice season, first for the canal with a series of three ice booms and then rerun under a scenario without booms. Comparative analysis reveals that the presence of ice booms facilitates the development of a relatively thinner and more uniform ice cover. In contrast, the absence of booms leads to thicker ice accumulations and increased risk of ice jamming, which could impact water management and hydroelectric generation operations. Computational efficiencies of the RIVICE model were also sought. RIVICE was originally compiled with a Fortran 77 compiler, which restricted modern optimization techniques. Recompiling with NVFortran significantly improved performance through advanced instruction scheduling, cache management, and automatic loop analysis, even without explicit optimization flags. Enabling optimization further accelerated execution, albeit marginally, reducing redundant operations and memory traffic while preserving numerical integrity. Tests across varying ice cross-sectional spacings confirmed that NVFortran reduced runtimes by roughly an order of magnitude compared to the original model. A test GPU (Graphics Processing Unit) version was able to run the data interpolation routines on the GPU, but frequent data transfers between the CPU (Central Processing Unit) and GPU caused by shared memory blocks and fixed-size arrays made it slower than the original CPU version. Achieving efficient GPU execution would require substantial code restructuring to eliminate global states, adopt persistent data regions, and parallelize at higher level loops, or alternatively, rewriting in a GPU-friendly language to fully exploit modern architectures. Full article
Show Figures

Figure 1

19 pages, 16366 KB  
Article
A Supplementary Damping Control of D-STATCOM for Alleviating SSO in Photovoltaic Generation Integrated into Weak AC Grid
by Qichao Chen, Nan Wei, Zhidong Wang, Zhi An, Peng Tao and Yiqi Liu
Energies 2026, 19(1), 234; https://doi.org/10.3390/en19010234 - 31 Dec 2025
Viewed by 316
Abstract
The interaction between the Photovoltaic station and the weak grid can easily trigger sub- or super-synchronous oscillation (SSO). In this article, the equivalent impedance model of the photovoltaic grid-connected system is built, and the mechanism of SSO is analyzed based on the global [...] Read more.
The interaction between the Photovoltaic station and the weak grid can easily trigger sub- or super-synchronous oscillation (SSO). In this article, the equivalent impedance model of the photovoltaic grid-connected system is built, and the mechanism of SSO is analyzed based on the global admittance criterion (GA). To mitigate the SSO, a Distribution Static Synchronous Compensator (D-STATCOM) supplementary damping control (SDC) strategy is proposed, which uses a three-parameter notch filter to extract the sub- or super-synchronous harmonic component without a phase shift. The component is superimposed on the modulated wave of the D-STATCOM through the gain link to obtain the modulation instruction. At the sub- or super-synchronous frequency, the D-STATCOM can be equivalent to the parallel impedance in the system and play a role in suppressing the sub- or super-synchronous oscillation. Compared to the complex combination filters in the traditional SDC, which require phase compensation and have poor adaptability, the three-parameter notch filter used in this SDC does not need a phase compensation stage and can effectively cope with the presence of oscillation frequencies on both sides of the fundamental frequency with a simpler design. Simulation results prove that the proposed scheme effectively improves the stability of photovoltaic generation under different short-circuit ratios, irradiance levels, and fault conditions. The proposed solution can be applied to photovoltaic generation equipped with D-STATCOM. Full article
Show Figures

Figure 1

22 pages, 3408 KB  
Article
A High-Performance Branch Control Mechanism for GPGPU Based on RISC-V Architecture
by Yao Cheng, Yi Man and Xinbing Zhou
Electronics 2026, 15(1), 125; https://doi.org/10.3390/electronics15010125 - 26 Dec 2025
Viewed by 499
Abstract
General-Purpose Graphics Processing Units (GPGPUs) rely on warp scheduling and control flow management to organize parallel thread execution, making efficient control flow mechanisms essential for modern GPGPU design. Currently, the mainstream RISC-V GPGPU Vortex adopts the Single Instruction Multiple Threads (SIMT) stack control [...] Read more.
General-Purpose Graphics Processing Units (GPGPUs) rely on warp scheduling and control flow management to organize parallel thread execution, making efficient control flow mechanisms essential for modern GPGPU design. Currently, the mainstream RISC-V GPGPU Vortex adopts the Single Instruction Multiple Threads (SIMT) stack control mechanism. This approach introduces high complexity and performance overhead, becoming a major limitation for further improving control efficiency. To address this issue, this paper proposes a thread-mask-based branch control mechanism for the RISC-V architecture. The mechanism introduces explicit mask primitives at the Instruction Set Architecture (ISA) level and directly manages the active status of threads within a warp through logical operations, enabling branch execution without jumps and thus reducing the overhead of the original control flow mechanism. Unlike traditional thread mask mechanisms in GPUs, our design centers on RISC-V and realizes co-optimization at both the ISA and microarchitecture levels. The mechanism was modeled and validated on Vortex SimX. Experimental results show that, compared with the Vortex SIMT stack mechanism, the proposed approach maintains correct control semantics while reducing branch execution cycles by an average of 31% and up to 40%, providing a new approach for RISC-V GPGPU control flow optimization. Full article
Show Figures

Figure 1

24 pages, 2445 KB  
Systematic Review
From Practice to Reflection: A Systematic Review of Mechanisms Driving Metacognition and SRL in Music
by Yinghui Wang, Mengqi Zhang, Huasen Zhang, Xin Shan and Xiaofei Du
J. Intell. 2025, 13(12), 162; https://doi.org/10.3390/jintelligence13120162 - 9 Dec 2025
Viewed by 1516
Abstract
Metacognition and self-regulated learning (SRL) are widely recognized as key mechanisms for academic achievement and skill development, yet in music education they have rarely been examined through explicit instructional interventions to enable causal testing and effect evaluation. To address this gap, this study [...] Read more.
Metacognition and self-regulated learning (SRL) are widely recognized as key mechanisms for academic achievement and skill development, yet in music education they have rarely been examined through explicit instructional interventions to enable causal testing and effect evaluation. To address this gap, this study followed PRISMA guidelines and conducted a systematic review of 31 studies (including seven for meta-analysis) to identify intervention types and mechanisms, and to quantify their overall effects and moderating factors. Results indicate the following: (1) the intervention ecology is grounded in structured learning support (SLS), frequently combined with strategy teaching (ST) or technology-enhanced interventions (TEI), with full integration concentrated at the university level. (2) The mechanisms operate primarily along four pathways: structure facilitates a “plan–practice–reflection” loop, strategy instruction makes tacit experience explicit, technological feedback provides a third-person perspective, and teacher support stabilizes motivation. (3) The meta-analysis revealed a significant positive medium effect overall. (4) Intervention structure moderated outcomes, though not as a single or stable determinant. (5) Effects followed a U-shaped pattern across educational stages, strongest in secondary school, followed by university, and weaker in preschool and primary. Future research should employ proximal, task-aligned measures, conduct parallel multi-indicator assessments within the same stage, and expand evidence for multi-mechanism integration in primary and secondary school contexts. Experimental designs manipulating levels of SLS are needed to test whether ST + TEI remain effective under low-structure conditions, thereby identifying the minimum structural threshold. Extending samples to informal and professional music learners would further enhance robustness and generalizability. Full article
Show Figures

Figure 1

14 pages, 450 KB  
Article
Reverse Engineering the Branch Target Buffer Organizations on Apple M2
by Taehee Kim and Hyunwoo Choi
Electronics 2025, 14(23), 4686; https://doi.org/10.3390/electronics14234686 - 27 Nov 2025
Viewed by 924
Abstract
Modern high-performance processors employ sophisticated branch prediction mechanisms to minimize control hazards and maximize instruction-level parallelism. A core component of this mechanism is the Branch Target Buffer (BTB), a critical hardware structure responsible for storing branch target addresses and enabling rapid fetch redirection. [...] Read more.
Modern high-performance processors employ sophisticated branch prediction mechanisms to minimize control hazards and maximize instruction-level parallelism. A core component of this mechanism is the Branch Target Buffer (BTB), a critical hardware structure responsible for storing branch target addresses and enabling rapid fetch redirection. While the BTB has been extensively studied in ×86 architectures, its internal behavior and organization on ARM-based Apple Silicon remain largely unexplored. In this work, we present an empirical reverse engineering study of the BTB implementation on Apple Silicon, with a focus on the M2 processor. By leveraging targeted microbenchmarks, we characterize key parameters such as BTB size, set indexing bit, and associativity. Based on our empirical analysis, we estimate that the M2 BTB comprises approximately 2 K entries, employs nine set index bits, and features four-way associativity. This work provides the first systematic public dissection of the BTB on Apple Silicon and lays the groundwork for further architectural exploration and tooling development within this closed ecosystem. Full article
Show Figures

Figure 1

36 pages, 2028 KB  
Article
Perspectives of Women and Men Students and Faculty on Conceptual and Quantitative Problem-Solving in Physics from Introductory to Graduate Levels
by Apekshya Ghimire and Chandralekha Singh
Educ. Sci. 2025, 15(12), 1602; https://doi.org/10.3390/educsci15121602 - 27 Nov 2025
Viewed by 712
Abstract
Developing expertise in physics requires appropriate integration and assimilation of physics and mathematics. Instructors and students often describe physics courses in terms of their emphasis on conceptual and quantitative problem-solving. For example, they may argue that a course emphasizes primarily conceptual over quantitative [...] Read more.
Developing expertise in physics requires appropriate integration and assimilation of physics and mathematics. Instructors and students often describe physics courses in terms of their emphasis on conceptual and quantitative problem-solving. For example, they may argue that a course emphasizes primarily conceptual over quantitative problem-solving or may emphasize equally on both depending on instructional context and assessment design. In this study, we investigated how students and instructors across different levels of physics instruction perceive the roles and development of conceptual and quantitative problem-solving in student learning and expertise development. Using departmental surveys administered at the beginning and end of each semester, we collected both Likert-scale and open-ended responses from students enrolled in introductory, upper-level undergraduate and graduate physics courses. These surveys assessed students’ self-perceived skills, preferences, and perceptions of instructors and course emphasis. To complement student perspectives, we conducted interviews with instructors, using parallel questions adapted to reflect instructional goals and expectations. Our findings highlight patterns in how students and instructors prioritize conceptual and quantitative problem-solving across course levels, as well as alignment and misalignment between student and instructor perspectives. Also, although the questions were framed around conceptual versus quantitative problem-solving, we do not view them as mutually exclusive; rather we seek to understand perceived course emphasis and student expertise development from student and instructor points of view in a language commonly used in physics. These results can help shape teaching, course design, and assessment practices to better support the development of expert-like problem-solving skills in students in physics and related disciplines. Full article
Show Figures

Figure 1

17 pages, 1783 KB  
Article
MOOC Dropout Prediction via a Dilated Convolutional Attention Network with Lie Group Features
by Yinxu Liu, Chengjun Xu, Desheng Yang and Yuncheng Shen
Informatics 2025, 12(4), 127; https://doi.org/10.3390/informatics12040127 - 21 Nov 2025
Cited by 1 | Viewed by 1161
Abstract
Massive open online courses (MOOCs) represent an innovative online learning paradigm that has garnered considerable popularity in recent years, attracting a multitude of learners to MOOC platforms due to their accessible and adaptable instructional structure. However, the elevated dropout rate in current MOOCs [...] Read more.
Massive open online courses (MOOCs) represent an innovative online learning paradigm that has garnered considerable popularity in recent years, attracting a multitude of learners to MOOC platforms due to their accessible and adaptable instructional structure. However, the elevated dropout rate in current MOOCs limits their advancement. Current dropout prediction models predominantly employ fixed-size convolutional kernels for feature extraction, which insufficiently address temporal dependencies and consequently demonstrate specific limitations. We propose a Lie Group-based feature context-local fusion attention model for predicting dropout in MOOCs. This model initially extracts shallow features using Lie Group machine learning techniques and subsequently integrates multiple parallel dilated convolutional modules to acquire high-level semantic representations. We design an attention mechanism that integrates contextual and local features, effectively capturing the temporal dependencies in the study behaviors of learners. We performed multiple experiments on the XuetangX dataset to evaluate the model’s efficacy. The results show that our method attains a precision score of 0.910, exceeding the previous state-of-the-art approach by 3.3%. Full article
Show Figures

Figure 1

24 pages, 8957 KB  
Article
Utilizing VR Technology in Foundational Welding Skill Development
by Nuri Furkan Koçak, Ali Saygın and Fuat Türk
Appl. Sci. 2025, 15(22), 12331; https://doi.org/10.3390/app152212331 - 20 Nov 2025
Cited by 1 | Viewed by 1281
Abstract
Traditional approaches to welder training demand substantial investments in equipment, consumable materials, and workshop facilities, while also exposing novice learners to considerable safety risks. This study investigates the effectiveness of a virtual reality (VR)-based welding training system developed with Unity for the Meta [...] Read more.
Traditional approaches to welder training demand substantial investments in equipment, consumable materials, and workshop facilities, while also exposing novice learners to considerable safety risks. This study investigates the effectiveness of a virtual reality (VR)-based welding training system developed with Unity for the Meta Quest 2 platform, designed to deliver safe and immersive instruction in fundamental welding techniques. A total of twenty participants with no prior welding experience completed structured VR training sessions over two weeks. The program focused on developing competencies in welding machine operation (including start-up procedures and parameter adjustments), controlling shielding gas flow, and accurately regulating torch-to-workpiece distance, torch angle, and travel speed. Real-time feedback was integrated into the system to support accurate control and positioning of the welding torch. Quantitative assessments demonstrated significant improvements in both technical proficiency and trainee confidence and anxiety levels. Knowledge test scores increased from 45.3 to 85.1, while machine adjustment accuracy rose from 28.7 to 92.3. In parallel, participant confidence levels increased substantially, and anxiety scores decreased from 4.0–4.5 to 1.1–1.5 on standardized scales. These findings provide experimental evidence that VR-based training can enhance fundamental welding education by offering a safe, repeatable, and effective practice environment that simultaneously improves technical performance, strengthens learner confidence, and reduces training-related anxiety. Full article
(This article belongs to the Special Issue Recent Advances and Application of Virtual Reality)
Show Figures

Figure 1

30 pages, 15481 KB  
Article
Effects of 12 Weeks of Chromium, Phyllanthus emblica Fruit Extract, and Shilajit Supplementation on Markers of Cardiometabolic Health, Fitness, and Weight Loss in Men and Women with Risk Factors to Metabolic Syndrome Initiating an Exercise and Diet Intervention: A Randomized Double-Blind, Placebo-Controlled Trial
by Victoria Martinez, Kay McAngus, Broderick L. Dickerson, Megan Leonard, Elena Chavez, Jisun Chun, Megan Lewis, Dante Xing, Drew E. Gonzalez, Choongsung Yoo, Joungbo Ko, Heather Rhodes, Hudson Lee, Ryan J. Sowinski, Christopher J. Rasmussen and Richard B. Kreider
Nutrients 2025, 17(12), 2042; https://doi.org/10.3390/nu17122042 - 19 Jun 2025
Viewed by 11851
Abstract
Background: Exercise and nutritional interventions are often recommended to help manage risk related to metabolic syndrome (MetSyn). The co-ingestion of Phyllanthus emblica (PE) with trivalent chromium (Cr) has been purported to improve the bioavailability of chromium and enhance endothelial function, reduce platelet aggregation, [...] Read more.
Background: Exercise and nutritional interventions are often recommended to help manage risk related to metabolic syndrome (MetSyn). The co-ingestion of Phyllanthus emblica (PE) with trivalent chromium (Cr) has been purported to improve the bioavailability of chromium and enhance endothelial function, reduce platelet aggregation, and help manage blood glucose as well as lipid levels. Shilajit (SJ) has been reported to have anti-inflammatory, adaptogenic, immunomodulatory, and lipid-lowering properties. This study evaluated whether dietary supplementation with Cr, PE, and SJ, or PE alone, during an exercise and diet intervention may help individuals with risk factors to MetSyn experience greater benefits. Methods: In total, 166 sedentary men and women with at least two markers of metabolic syndrome participated in a randomized, placebo-controlled, parallel-arm, and repeated-measure intervention study, of which 109 completed the study (48.6 ± 10 yrs., 34.2 ± 6 kg/m2, 41.3 ± 7% fat). All volunteers participated in a 12-week exercise program (supervised resistance and endurance exercise 3 days/week with walking 10,000 steps/day on non-training days) and were instructed to reduce energy intake by −5 kcals/kg/d. Participants were matched by age, sex, BMI, and body mass for the double-blind and randomized supplementation of a placebo (PLA), 500 mg of PE (PE-500), 1000 mg/d of PE (PE-1000), 400 µg of trivalent chromium (Cr) with 6 mg of PE and 6 mg of SJ (Cr-400), or 800 µg of trivalent chromium with 12 mg of PE and 12 mg of SJ (Cr-800) once a day for 12 weeks. Data were obtained at 0, 6, and 12 weeks of supplementation, and analyzed using general linear model multivariate and univariate analyses with repeated measures, pairwise comparisons, and mean changes from the baseline with 95% confidence intervals (CIs). Results: Compared to PLA responses, there was some evidence (p < 0.05 or approaching significance, p > 0.05 to p < 0.10) that PE and/or Cr with PE and SJ supplementation improved pulse wave velocity, flow-mediated dilation, platelet aggregation, insulin sensitivity, and blood lipid profiles while promoting more optimal changes in body composition, strength, and aerobic capacity. Differences among groups were more consistently seen at 6 weeks rather than 12 weeks. While some benefits were seen at both dosages, greater benefits were more consistently observed with PE-1000 and Cr-800 ingestion. Conclusions: The results suggest that PE and Cr with PE and SJ supplementation may enhance some exercise- and diet-induced changes in markers of health in overweight individuals with at least two risk factors to MetSyn. Registered clinical trial #NCT06641596. Full article
(This article belongs to the Section Phytochemicals and Human Health)
Show Figures

Figure 1

31 pages, 1396 KB  
Article
Can Correct and Incorrect Worked Examples Supersede Worked Examples and Problem-Solving on Learning Linear Equations? An Examination from Cognitive Load and Motivation Perspectives
by Bing Hiong Ngu, Ouhao Chen, Huy P. Phan, Hasbee Usop and Philip Nuli Anding
Educ. Sci. 2025, 15(4), 504; https://doi.org/10.3390/educsci15040504 - 17 Apr 2025
Cited by 2 | Viewed by 4684
Abstract
Research has advocated for the use of incorrect worked examples targeting specific conceptual barriers to enhance learning. From the perspective of cognitive load theory, we examined the relationship between instructional efficiency (correct and incorrect worked examples [CICWEs] vs. worked examples [WEs] vs. problem-solving [...] Read more.
Research has advocated for the use of incorrect worked examples targeting specific conceptual barriers to enhance learning. From the perspective of cognitive load theory, we examined the relationship between instructional efficiency (correct and incorrect worked examples [CICWEs] vs. worked examples [WEs] vs. problem-solving [PS]), levels of expertise (low vs. high), and belief in achievement best (realistic vs. optimal) in learning linear equations across two experiments (N = 43 vs. N = 68). In the CICWE group, students compared an incorrect step in the incorrect worked example with the parallel correct step in the correct worked example and justified why the step was wrong. The WE group completed multiple worked example–equation pairs, while the PS group solved equivalent linear equations independently. As hypothesized, the WE group outperformed the PS group for low prior knowledge students, while the reverse occurred for high prior knowledge students, demonstrating the expertise reversal effect. In contrast, the CICWE group did not outperform either the PS or WE group. A student’s indication of optimal best, reflecting what is known as the ‘realistic–optimal achievement bests dichotomy’, aligns with his or her belief in their ability to perform tasks of varying complexity (simple task vs. complex task). Regarding the belief in achieving optimal best as an outcome of instructional manipulation, for low prior knowledge students, there were no significant differences across groups on either the realistic best or optimal best subscales. However, for high prior knowledge students, the groups differed significantly on the optimal best subscale, but not on the realistic best subscale. Importantly, the mental effort invested during learning was unrelated to students’ belief in achieving their optimal best. Full article
Show Figures

Figure 1

11 pages, 1974 KB  
Proceeding Paper
Chip Design of Multithreaded and Pipelined RISC-V Microcontroller Unit
by Mao-Hsu Yen, Yih-Hsia Lin, Tzu-Feng Lin, Yu-Hui Chen, Yuan-Fu Ku and Chien-Ting Kao
Eng. Proc. 2025, 89(1), 31; https://doi.org/10.3390/engproc2025089031 - 28 Feb 2025
Viewed by 2192
Abstract
Multithreading is widely used in microcontroller unit (MCU) chips. Multithreaded hardware is composed of multiple identical single threads and provides instructions to different threads. Using the concept of thread-level parallelism (TLP), pauses are compensated for during single-thread operation to increase the throughput at [...] Read more.
Multithreading is widely used in microcontroller unit (MCU) chips. Multithreaded hardware is composed of multiple identical single threads and provides instructions to different threads. Using the concept of thread-level parallelism (TLP), pauses are compensated for during single-thread operation to increase the throughput at the same unit. The principle of pipelined management is to use instruction-level parallelism (ILP) to split the MCU into multiple stages. When an instruction is given in a certain stage, other instructions are provided to operate in other idle stages and improve their execution efficiency. Based on the four-thread and pipelined RISC-V MCU architecture, we analyzed the instruction types of three benchmarks, i.e., Coremark, SHA, and Dijkstra. A total of 94% of the instructions use the arithmetic logic unit (ALU). Based on the executable four-thread architecture, we developed two to four RISC-V architectures with different numbers of ALUs and a dispatch algorithm. This architecture allows for the simultaneous delivery of multiple instructions, enabling parallel processing of instructions and increasing efficiency. Compared to the traditional RISC-V architecture with only one ALU, the test results showed that the instructions per clock (IPCs) of RISC-V architectures with two, three, and four ALUs increased efficiency by 76, 128.9, and 154.3%, while the area increased by 12, 22.3, and 32.6% and the static power consumption increased by 5.1, 9.2, and 13.3%. The results showed a significant improvement in performance with only a slight increase in the area. Due to the limited area of chips, a two-thread microcontroller architecture was used for the IC design and tape-out. TSMC’s 180nm process with a chip area of 1190 × 1190 μm at 133 MHz was used in this study. Full article
Show Figures

Figure 1

16 pages, 696 KB  
Article
Optimizing Lattice Basis Reduction Algorithm on ARM V8 Processors
by Ronghui Cao, Julong Wang, Liming Zheng, Jincheng Zhou, Haodong Wang, Tiaojie Xiao and Chunye Gong
Appl. Sci. 2025, 15(4), 2021; https://doi.org/10.3390/app15042021 - 14 Feb 2025
Viewed by 1459
Abstract
The LLL (Lenstra–Lenstra–Lovász) algorithm is an important method for lattice basis reduction and has broad applications in computer algebra, cryptography, number theory, and combinatorial optimization. However, current LLL algorithms face challenges such as inadequate adaptation to domestic supercomputers and low efficiency. To enhance [...] Read more.
The LLL (Lenstra–Lenstra–Lovász) algorithm is an important method for lattice basis reduction and has broad applications in computer algebra, cryptography, number theory, and combinatorial optimization. However, current LLL algorithms face challenges such as inadequate adaptation to domestic supercomputers and low efficiency. To enhance the efficiency of the LLL algorithm in practical applications, this research focuses on parallel optimization of the LLL_FP (LLL double-precision floating-point type) algorithm from the NTL library on the domestic Tianhe supercomputer using the Phytium ARM V8 processor. The optimization begins with the vectorization of the Gram–Schmidt coefficient calculation and row transformation using the SIMD instruction set of the Phytium chip, which significantly improve computational efficiency. Further assembly-level optimization fully utilizes the low-level instructions of the Phytium processor, and this increases execution speed. In terms of memory access, data prefetch techniques were then employed to load necessary data in advance before computation. This will reduce cache misses and accelerate data processing. To further enhance performance, loop unrolling was applied to the core loop, which allows more operations per loop iteration. Experimental results show that the optimized LLL_FP algorithm achieves up to a 42% performance improvement, with a minimum improvement of 34% and an average improvement of 38% in single-core efficiency compared to the serial LLL_FP algorithm. This study provides a more efficient solution for large-scale lattice basis reduction and demonstrates the potential of the LLL algorithm in ARM V8 high-performance computing environments. Full article
(This article belongs to the Special Issue Parallel Computing and Grid Computing: Technologies and Applications)
Show Figures

Figure 1

21 pages, 4242 KB  
Article
A Learning Emotion Recognition Model Based on Feature Fusion of Photoplethysmography and Video Signal
by Xiaoliang Zhu, Zili He, Chuanyong Wang, Zhicheng Dai and Liang Zhao
Appl. Sci. 2024, 14(24), 11594; https://doi.org/10.3390/app142411594 - 12 Dec 2024
Viewed by 3544
Abstract
The ability to recognize learning emotions facilitates the timely detection of students’ difficulties during the learning process, supports teachers in modifying instructional strategies, and allows for personalized student assistance. The detection of learning emotions through the capture of convenient, non-intrusive signals such as [...] Read more.
The ability to recognize learning emotions facilitates the timely detection of students’ difficulties during the learning process, supports teachers in modifying instructional strategies, and allows for personalized student assistance. The detection of learning emotions through the capture of convenient, non-intrusive signals such as photoplethysmography (PPG) and video offers good practicality; however, it presents new challenges. Firstly, PPG-based emotion recognition is susceptible to external factors like movement and lighting conditions, leading to signal quality degradation and recognition accuracy issues. Secondly, video-based emotion recognition algorithms may witness a reduction in accuracy within spontaneous scenes due to variations, occlusions, and uneven lighting conditions, etc. Therefore, on the one hand, it is necessary to improve the performance of the two recognition methods mentioned above; on the other hand, using the complementary advantages of the two methods through multimodal fusion needs to be considered. To address these concerns, our work mainly includes the following: (i) the development of a temporal convolutional network model incorporating channel attention to overcome PPG-based emotion recognition challenges; (ii) the introduction of a network model that integrates multi-scale spatiotemporal features to address the challenges of emotion recognition in spontaneous environmental videos; (iii) an exploration of a dual-mode fusion approach, along with an improvement of the model-level fusion scheme within a parallel connection attention aggregation network. Experimental comparisons demonstrate the efficacy of the proposed methods, particularly the bimodal fusion, which substantially enhances the accuracy of learning emotion recognition, reaching 95.75%. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 5420 KB  
Article
Realizing the Calculation of a Fully Normalized Associated Legendre Function Based on an FPGA
by Yuxiang Fang, Qingbin Wang and Yichao Yang
Sensors 2024, 24(22), 7262; https://doi.org/10.3390/s24227262 - 13 Nov 2024
Viewed by 1740
Abstract
A large number of fully normalized associated Legendre function (fnALF) calculations are required to compute Earth’s gravity field elements using ultra high-order gravity field coefficient models. In the surveying and mapping industry, researchers typically rely on CPU-based systems for these calculations, which leads [...] Read more.
A large number of fully normalized associated Legendre function (fnALF) calculations are required to compute Earth’s gravity field elements using ultra high-order gravity field coefficient models. In the surveying and mapping industry, researchers typically rely on CPU-based systems for these calculations, which leads to limitations in execution speed and power efficiency. Although modern CPUs improve instruction execution efficiency through instruction-level parallelism, the constraints of a shared memory architecture impose further limitations on the execution speed and power efficiency. This results in exponential increases in computation time as demand rises alongside high power consumption. In this article, we present a new computational implementation of an fnALF based on the ZYNQ platform. We design a task-parallel “pipeline” architecture which converts the original serial logic into a more efficient hardware implementation, and we utilize a redundant calculation layer to handle repetitive coefficient computations separately. The experimental results demonstrate that our system achieved accurate and rapid calculations. Under the only one geocentric residual latitude condition, we measured the computation times for spherical harmonic coefficient degrees of 360, 720, and 1080 to be 0.155922 s, 0.520950 s, and 1.401609 s, respectively. In the case of the multiple geocentric residual latitudes condition, our design generally yielded efficiency gains of over three times those of MATLAB R2020b implementation. Additionally, our calculated results were used to determine the geoid height in the field with an error of less than ±0.1m, confirming the reliability of our computations. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

Back to TopTop