This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Customer Baseline Credibility in Constrained Reinforcement Learning for Incentive-Based Demand Response
by
Jiyong Li
Jiyong Li * and
Kaiyue Wang
Kaiyue Wang
Department of Electrical Engineering, Guangxi University, Nanning 530004, China
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(13), 3986; https://doi.org/10.3390/s26133986 (registering DOI)
Submission received: 19 May 2026
/
Revised: 14 June 2026
/
Accepted: 15 June 2026
/
Published: 23 June 2026
Abstract
Incentive-based demand response is an important flexibility resource for power systems with high-renewable energy penetration. However, practical incentive allocation depends not only on flexible capacity and user response uncertainty, but also on the credibility of customer baseline load (CBL), which directly affects response measurement, verification, and incentive settlement. To address this issue, this paper proposes a constrained reinforcement learning method with customer baseline credibility for dynamic resource allocation in incentive-based demand response. Based on user-side load measurements and demand response event records, the proposed framework evaluates user resources using flexible capacity, response reliability, response cost, and CBL credibility. The CBL credibility score reflects the measurement quality of the delivered response and is used as a pre-event allocation factor. Users are then grouped into different resource levels, and a group-level reinforcement learning agent dynamically determines incentive multipliers and response task allocation ratios. To improve feasibility, an action correction module revises raw policy outputs under budget, price, response capacity, and CBL risk constraints before implementation. Case studies are conducted using public industrial demand response measurements and open electricity-system time-series data. The results show that the proposed CBL-CRL method reduces the normalized total operating cost to 0.897, reduces the response tracking error to 0.108, and lowers CBL risk exposure to 0.087 under the normal scenario. Relative to the No-DR reference, CBL-CRL reduces the normalized total operating cost by 10.3 percent. Compared with MAPPO, the strongest learning-based baseline, CBL-CRL reduces the response tracking error by 10.7 percent and the CBL risk exposure by 40.8 percent, while maintaining the same renewable accommodation rate of 0.970. Compared with rule-based and learning-based baselines, CBL-CRL achieves a better balance between operational performance, incentive efficiency, action feasibility, and baseline-related settlement reliability. The results demonstrate that CBL credibility should not only be used for post-event settlement, but can also serve as an effective pre-event resource allocation factor for measurement-driven demand response programs.
Share and Cite
MDPI and ACS Style
Li, J.; Wang, K.
Customer Baseline Credibility in Constrained Reinforcement Learning for Incentive-Based Demand Response. Sensors 2026, 26, 3986.
https://doi.org/10.3390/s26133986
AMA Style
Li J, Wang K.
Customer Baseline Credibility in Constrained Reinforcement Learning for Incentive-Based Demand Response. Sensors. 2026; 26(13):3986.
https://doi.org/10.3390/s26133986
Chicago/Turabian Style
Li, Jiyong, and Kaiyue Wang.
2026. "Customer Baseline Credibility in Constrained Reinforcement Learning for Incentive-Based Demand Response" Sensors 26, no. 13: 3986.
https://doi.org/10.3390/s26133986
APA Style
Li, J., & Wang, K.
(2026). Customer Baseline Credibility in Constrained Reinforcement Learning for Incentive-Based Demand Response. Sensors, 26(13), 3986.
https://doi.org/10.3390/s26133986
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.