Tariff-Sensitive Global Supply Chains: Semi-Markov Decision Approach with Reinforcement Learning
Abstract
1. Introduction
2. Literature Review
2.1. Reinforcement Learning for Logistics Optimization
2.2. Reinforcement Learning for Inventory Management and Supply Chain Security
2.3. Deep RL and Multi-Agent/Process Control Applications
2.4. Broader Cyber-Physical and Data-Centric Applications of Deep Reinforcement Learning
2.5. Research Gap
3. Methodology
3.1. Semi-Markov Decision Processes (SMDP)
3.2. Reinforcement Learning (RL)
3.3. Smart Algorithm (Average Reward Q-Learning)
- is the learning rate,
- r is the immediate reward received after taking action a in state s,
- is the current estimate of the average reward,
- t is the elapsed time for the transition,
- is the next state, and
- b iterates over all possible actions from .
3.4. Deep Q-Networks (DQNs)
- Neural Network Function Approximation: A deep neural network , parameterized by weights , approximates the action-value function . This approach avoids the exponential growth of a tabular Q-learning when the state space becomes large or continuous.
- Experience Replay: Transition tuples collected from the environment are stored in a replay buffer. During training, mini-batches are sampled randomly from this buffer, breaking the strong correlations often present in sequential data and improving the stability of training.
- Target Networks: DQN maintains a separate set of network parameters for the target Q function, which is updated more slowly (e.g., every fixed number of steps) to reduce oscillations and divergence in learning.
3.5. Implementation Outline
- Initialize: Set to some initial value for all states s and actions a. Initialize , the average reward estimate, to 0 or a small number.
- Observe State: Determine the current system state .
- Action Selection: Choose an action a (e.g., shipping mode) using an -greedy or softmax policy based on the current Q-values.
- Simulate: Generate a random production time and shipping time, apply the chosen action, and calculate the realized reward r.
- Update: Compute the transition time t. Update using (2). Periodically adjust via , where is a small step size.
- Transition: Observe the next state . Repeat until convergence or until a predefined number of iterations.
3.6. Modeling Assumptions
- Exchange rate processes. Currency conversion factors for the three countries are independently and stationarily drawn from the scenario-specific uniform intervals. Cross-country correlations are neglected to isolate tariff effects.
- Demand arrivals. Customer orders follow a stationary Erlang (Gamma) arrival process with parameters ; demand is therefore independent of exchange rate fluctuations and inventory levels.
- Lead times. Production and transportation times are mutually independent and identically distributed (i.i.d.) across episodes, each following the uniform distributions.
- Inventory costs. Holding costs are linear in time and quantity, with no economies of scale; stockouts are not back-ordered but incur lost sales penalties implicit in the reward function. A late arrival penalty of L = 3 monetary units per unit shipped late.
- Capacity. Warehouse capacities are finite and fixed (five equivalent stock-keeping units in all runs); production capacity is assumed to be sufficient to meet demand on average, consistent with [3].
- Tariffs and prices. Tariff rates and selling prices are deterministic within each scenario.
4. Experimental Results
4.1. Network Structure
- A production plant (PPi),
- An export warehouse (EWi),
- An import warehouse (IWi),
- A final market ().
- Domestic route—ship directly from the same country’s export warehouse EWi; this option avoids tariffs and involves only short, low-cost transport, but is limited by the available stock in EWi.
- Cross-border route—import the item from another country’s export warehouse EWj (). This choice triggers additional transportation time, tariff charges at the destination country, and extra pipeline inventory costs.
4.2. Cost Formulation and Parameter Settings
- Exchange Rates (). Each country’s currency converts into the reference monetary unit (MU) via
- Sale Price (P). The list price is fixed at MU (monetary unit); selling in country j therefore yields .
- Production Cost (M). A unit produced in country i costs in MU with LU (local currency units).
- Inventory Holding Cost (). Baseline carrying charge MU per unit-time, rescaled by the country-specific storage factors and the sensitivity multiplier introduced in Section 6.
- Transportation Cost and Time. Let be the shipping charge for an order routed from origin i to destination j and be its transit time:
- Domestic (): , .
- Cross-border, slow (): , .
- Cross-border, fast (): , .
- Pipeline Cost Rate (C). A running charge of MU per time unit is applied while the shipment is in transit.
- Tariff Rate (). A uniform ad-valorem duty applies to every cross-border movement (). Baseline for Scenarios 1 and 2; Scenario 3 adopts country-specific rates .
- Late-Delivery Penalty (L). If total lead time tu, a penalty of with MU is incurred.
- Production Lead Time (). Drawn i.i.d. from time units.
Illustrative Calculation
4.3. SMDP States and Actions
- Destination Country Logic:
- Action 0: Local Shipping.
- If (Country 1):
- Action 1 or 2 → shipping from Country 2 (), where Action 1 indicates slow shipping and Action 2 indicates fast shipping.
- Action 3 or 4 → shipping from Country 3 (), where Action 3 is slow and Action 4 is fast.
- If (Country 2):
- Action 1 or 2 → shipping from Country 1 (), with Action 1 for slow and Action 2 for fast.
- Action 3 or 4 → shipping from Country 3 (), with Action 3 for slow and Action 4 for fast.
- If (Country 3)::
- Action 1 or 2 → shipping from Country 1 (), with Action 1 for slow and Action 2 for fast.
- Action 3 or 4 → shipping from Country 2 (), with Action 3 for slow and Action 4 for fast.
4.4. Scenario Setup
- Replication of Pontrandolfo et al. [3] —adopting their original exchange rate distributions and a uniform tariff of 0.15.
- Narrowed Currency Gap—modifying the exchange rate ranges to be more closely aligned among the three countries.
- Differentiated Tariffs—imposing distinct duty rates per country to assess how tariff disparities shape sourcing decisions.
4.5. Setup and Implementation
5. Comparisons and Overall Discussion
5.1. Validation
- Quantitative comparison (Table 2).
Overlap (%) | MSE | Conv. Ep. (T/D) | |||
---|---|---|---|---|---|
5 | 20.82 | 16.10 | 13.3 | 21.6 | 1/1 |
10 | 25.25 | 21.05 | 40.0 | 5.1 | 1/1 |
20 | 20.63 | 13.93 | 11.7 | 2.2 | 1/1 |
30 | 21.09 | 15.60 | 57.8 | 1.2 | 1/1 |
- Dynamic behavior (Figure 2).
- Scalability snapshot (Figure 3).
5.2. Scenario 1: Erlang Demand with Uniform Tariffs
- Country 1 and Country 2 frequently selected actions associated with cross-border shipping from Country 3 (e.g., bestAction = 3 or 4) to exploit Country 3’s high exchange rate, offsetting the uniform tariff cost.
- Country 3 predominantly used action 0, indicating that local fulfillment was most profitable when exchange rate advantages were already in its favor.
- The agent achieved an average reward consistent with the previous literature’s wide-gap scenarios, validating the RL approach in this baseline setting.
5.3. Scenario 2: Closer Exchange Rates
5.4. Scenario 3: Distinct Tariffs with Closer Exchange Rates
- Country 1 showed significant variation between local (Action 0) and cross-border (particularly Action 2 or 4) shipping. The interplay of moderate exchange gain versus partial tariff led to nuanced decisions.
- Country 2 mostly remained local as the distinct tariff structure seldom warranted cross-border shipping unless a modest inventory shortage arose.
- Country 3, facing the highest tariff outflow (0.30), rarely found cross-border shipping profitable; nearly all states yielded bestAct = 0, preserving local supply.
6. Discussion and Conclusions
6.1. Baseline Design: Three Currency–Tariff Scenarios
- Wide exchange gaps + uniform tariffs (Scenario 1): Encourage cross-border shipping from the highest-exchange-rate country (Country 3), provided inventory or lead-time constraints do not override the advantage.
- Closer exchange rates (Scenario 2): Reduce the incentive for cross-border shipments, promoting more local or balanced sourcing while showing how a smaller currency gap can shift strategies away from a single dominant supplier.
- Distinct tariffs (Scenario 3): Further complicate the cost landscape by discouraging exports from countries with steep tariffs (e.g., 0.30 for Country 3), although local vs. global choices may still hinge on inventory levels and lead-time risks.
6.2. Scenario Exploration: Baseline vs. Uncertainty-Extended Variants
- Demand . We modeled inter-arrival times with a Gamma distribution whose shape k directly controlled variance while leaving the mean unchanged.
- (baseline): moderate variability.
- (“light”): thinner tail → smoother workload.
- (“heavy”: thicker tail → bursty arrivals, mimicking demand shocks.
- Supply disruption (10 tu delay, p = 0.10). With a probability of , every shipment received an additional 10 time-unit transit penalty, representing port congestion, customs holds, or transport breakdowns. The penalty was large enough to trigger the model’s late delivery cost in many episodes, thereby stressing the learning agent.
Scenario Label | Exchange Rate Range | Tariff Rule | Demand | Supply Disruption |
---|---|---|---|---|
Scenario 1 | uniform 0.15 | (baseline) | – | |
Scenario 2 | uniform 0.15 | – | ||
Scenario 3 | same as Sc. 2 | [0.10, 0.15, 0.30] | – | |
Scenario 1 (+Ext) | same as Sc. 1 | uniform 0.15 | (heavy) | 10 tu delay, |
Scenario 2 (+Ext) | same as Sc. 2 | uniform 0.15 | (light) | 10 tu delay, |
Scenario 3 (+Ext) | same as Sc. 2 | [0.10, 0.15, 0.30] | 10 tu delay, |
- What the trajectories reveal.
- (i)
- Currency advantage dominated. After injecting heavy demand variance and 10 tu supply delay shocks, Scenario 1 (+Ext) still produced the highest long-run reward. Country 3 retained the strongest currency (exchange rate interval –), so shipping from Country 3 remained profitable enough to outweigh both the uniform tariff and the extra disruptions. The extended demand process () simply lengthened the horizontal axis—more demands were realized—thereby stretching the trajectory far to the right and letting climb above 9.7.
- (ii)
- Narrower gaps compressed profit. Both baseline and extended versions of Scenario 2 plateaued at around . Here, the three exchange rate intervals overlapped (–), so cross-border arbitrage was weak; the lighter arrival variance case () only added marginal noise without changing the level.
- (iii)
- Steep tariffs remained binding. Scenario 3 trajectories settled roughly one unit below Scenario 2. The country-specific tariffs [0.10, 0.15, 0.30] penalized exports from the high-currency regions, and the added disruption risk in the extended variant reduced a little further.
6.3. Sensitivity Analysis
6.3.1. Sensitivity to Quantity Discounts
6.3.2. Sensitivity to Tariff
- Scenario S1—wide: large exchange rate gaps, uniform baseline tariff .
- Scenario S2—close: compressed exchange rates, uniform tariff.
- Scenario S3—tariff–vec: same exchange rates as S2 but country-specific baseline tariffs; when swept, the same was imposed on every lane for comparison.
6.3.3. Joint Impact of Inventory Holding Cost and Tariff Level
- Inventory multiplier rescaled both holding terms in Equation (3), i.e., the pre-production component and the post-shipment component .
- Tariff rate linearly scaled the customs-duty term .
- Data pipeline. Most ERP suites already expose inventory levels and open purchase orders and inbound transit times. These feeds can populate the learner’s state vector in real time or on an hourly batch schedule.
- Training and inference layer. The SMART algorithm is lightweight enough to train overnight on a single CPU node. We therefore recommend a two-tier micro-service: (i) a nightly “trainer” container that ingests the previous day’s transactions and updates the Q-table; (ii) a stateless “inference” endpoint that is called by the ERP’s MRP run or by a /suggestRoute button in the buyer’s user interface.
- Power BI dashboard. Writing the live Q-table and KPIs (average reward, cross-border share, lead-time breach rate) to a cloud data-lake allows instant visualization in Power BI, Tableau, or Looker. Planners can “what-if” tariffs or exchange rates via slicers, triggering on-the-fly policy simulations without touching the production ERP.
- Governance and override. A simple traffic-light override is advisable: if the recommended action deviates from the incumbent rule or if the implied delivery date violates a service level agreement, the suggestion is pushed to an approval queue rather than executed automatically.
6.4. Limitations and Future Work
- (i)
- Modeling Scope. The prototype optimizes a single-product, single-echelon network without explicit capacity limits. Extending SMART to multiple SKUs (Stock-Keeping Units), regional DC (Distribution Center) layers, and finite production or transport capacities would require a constrained MDP formulation or a hierarchical multi-agent RL design. Future extensions could also account for sustainability metrics (e.g., CO2 cost per lane) and dynamic forecasting signals that update the MDP in real time.
- (ii)
- Stochastic Environment. Customer demand arrivals follow a stationary Erlang process, while all unit–cost coefficients (production, holding, pipeline, tariff, late penalty rates) are deterministic; uncertainty in the reward therefore arises only from demand timing, the stochastic production and transit times, and exchange rate fluctuations. Future research could relax these assumptions by introducing non-stationary demand patterns, volatile freight-rate schedules, and time-dependent quantity discount schemes.
- (iii)
- Computational Scalability. Tabular Q-learning suffices for the current 450 state–action pairs but does not scale to large state spaces. Preliminary Deep-Q results match the tabular policy; moving to will necessitate deeper networks, experience replay prioritization, and distributed training to maintain convergence speed.
Funding
Data Availability Statement
Conflicts of Interest
References
- Howard, R. Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision Processes; Jonh Wiley& Sons, Inc.: New York, NY, USA, 1971. [Google Scholar]
- Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; John Wiley & Sons: New York, NY, USA, 2014. [Google Scholar]
- Pontrandolfo, P.; Gosavi, A.; Okogbaa, O.G.; Das, T.K. Global supply chain management: A reinforcement learning approach. Int. J. Prod. Res. 2002, 40, 1299–1317. [Google Scholar] [CrossRef]
- Günay, E.E.; Park, K.; Okudan Kremer, G.E. Integration of product architecture and supply chain under currency exchange rate fluctuation. Res. Eng. Des. 2021, 32, 331–348. [Google Scholar] [CrossRef]
- van Tongeren, T.; Kaymak, U.; Naso, D.; van Asperen, E. Q-learning in a competitive supply chain. In Proceedings of the 2007 IEEE International Conference on Systems, Man and Cybernetics, Montréal, QC, Canada, 7–10 October 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1211–1216. [Google Scholar]
- Chaharsooghi, S.K.; Heydari, J.; Zegordi, S.H. A reinforcement learning model for supply chain ordering management: An application to the beer game. Decis. Support Syst. 2008, 45, 949–959. [Google Scholar] [CrossRef]
- Zhao, X.; Sun, X. A multi-agent reinforcement learning approach for supply chain coordination. In Proceedings of the IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI), Qingdao, China, 15–17 July 2010; pp. 341–346. [Google Scholar]
- Puskás, E.; Budai, Á.; Bohács, G. Optimization of a physical internet based supply chain using reinforcement learning. Eur. Transp. Res. Rev. 2020, 12, 47. [Google Scholar] [CrossRef]
- Rolf, B.; Jackson, I.; Müller, M.; Lang, S.; Reggelin, T.; Ivanov, D. A review on reinforcement learning algorithms and applications in supply chain management. Int. J. Prod. Res. 2023, 61, 7151–7179. [Google Scholar] [CrossRef]
- Zou, Y.; Gao, Q.; Wu, H.; Liu, N. Carbon-Efficient Scheduling in Fresh Food Supply Chains with a Time-Window-Constrained Deep Reinforcement Learning Model. Sensors 2024, 24, 7461. [Google Scholar] [CrossRef]
- Liu, J. Simulation of Cross-border E-commerce Supply Chain Coordination Decision Model Based on Reinforcement Learning Algorithm. In Proceedings of the 2023 International Conference on Networking, Informatics and Computing (ICNETIC), Palermo, Italy, 29–31 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 429–433. [Google Scholar]
- Li, X.; Zheng, Z. Dynamic pricing with external information and inventory constraint. Manag. Sci. 2024, 70, 5985–6001. [Google Scholar]
- Gurkan, M.E.; Tunc, H.; Tarim, S.A. The joint stochastic lot sizing and pricing problem. Omega 2022, 108, 102577. [Google Scholar] [CrossRef]
- Li, X.; Li, Y. NEV’s supply chain coordination with financial constraint and demand uncertainty. Sustainability 2022, 14, 1114. [Google Scholar] [CrossRef]
- Bergemann, D.; Brooks, B.; Morris, S. The limits of price discrimination. Am. Econ. Rev. 2015, 105, 921–957. [Google Scholar] [CrossRef]
- Gomes, U.T.; Pinheiro, P.R.; Saraiva, R.D. Dye schedule optimization: A case study in a textile industry. Appl. Sci. 2021, 11, 6467. [Google Scholar] [CrossRef]
- Lu, C.; Wu, Y. Optimization of Logistics Information System based on Multi-Agent Reinforcement Learning. In Proceedings of the 2024 5th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), Lalitpur, Nepal, 18–19 January 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 421–426. [Google Scholar]
- Kavididevi, V.; Monikapreethi, S.; Rajapriya, M.; Juliet, P.S.; Yuvaraj, S.; Muthulekshmi, M. IoT-Enabled Reinforcement Learning for Enhanced Cold Chain Logistics Performance in Refrigerated Transport. In Proceedings of the 2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS), Coimbatore, India, 10–12 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 379–384. [Google Scholar]
- Zhou, T.; Xie, L.; Zou, C.; Tian, Y. Research on supply chain efficiency optimization algorithm based on reinforcement learning. Adv. Contin. Discret. Model. 2024, 2024, 51. [Google Scholar] [CrossRef]
- Liu, X.; Hu, M.; Peng, Y.; Yang, Y. Multi-agent deep reinforcement learning for multi-echelon inventory management. Prod. Oper. Manag. 2022, 10591478241305863. [Google Scholar] [CrossRef]
- Aboutorab, H.; Hussain, O.K.; Saberi, M.; Hussain, F.K.; Prior, D. Adaptive identification of supply chain disruptions through reinforcement learning. Expert Syst. Appl. 2024, 248, 123477. [Google Scholar] [CrossRef]
- Ma, N.; Wang, Z.; Ba, Z.; Li, X.; Yang, N.; Yang, X.; Zhang, H. Hierarchical Reinforcement Learning for Crude Oil Supply Chain Scheduling. Algorithms 2023, 16, 354. [Google Scholar] [CrossRef]
- Piao, M.; Zhang, D.; Lu, H.; Li, R. A supply chain inventory management method for civil aircraft manufacturing based on multi-agent reinforcement learning. Appl. Sci. 2023, 13, 7510. [Google Scholar] [CrossRef]
- Ma, Z.; Chen, X.; Sun, T.; Wang, X.; Wu, Y.C.; Zhou, M. Blockchain-based zero-trust supply chain security integrated with deep reinforcement learning for inventory optimization. Future Internet 2024, 16, 163. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Wang, J. Research on international logistics supply chain management strategy based on deep reinforcement learning. Appl. Math. Nonlinear Sci. 2024, 9, 13. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Wang, Z.; Schaul, T.; Hessel, M.; van Hasselt, H.; Lanctot, M.; de Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; pp. 1995–2003. [Google Scholar]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized Experience Replay. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
- Zhu, Z.; Lin, K.; Jain, A.K.; Zhou, J. Transfer learning in deep reinforcement learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13344–13362. [Google Scholar] [CrossRef]
- Padhye, V.; Lakshmanan, K. A deep actor critic reinforcement learning framework for learning to rank. Neurocomputing 2023, 547, 126314. [Google Scholar] [CrossRef]
- Norouzi, A.; Shahpouri, S.; Gordon, D.; Shahbakhti, M.; Koch, C.R. Safe deep reinforcement learning in diesel engine emission control. Proc. Inst. Mech. Eng. Part J. Syst. Control Eng. 2023, 237, 1440–1453. [Google Scholar] [CrossRef]
- Prasuna, R.G.; Potturu, S.R. Deep reinforcement learning in mobile robotics—A concise review. Multimed. Tools Appl. 2024, 83, 70815–70836. [Google Scholar] [CrossRef]
- Rahimian, P.; Mihalyi, B.M.; Toka, L. In-game soccer outcome prediction with offline reinforcement learning. Mach. Learn. 2024, 113, 7393–7419. [Google Scholar] [CrossRef]
- Scarponi, V.; Duprez, M.; Nageotte, F.; Cotin, S. A zero-shot reinforcement learning strategy for autonomous guidewire navigation. Int. J. Comput. Assist. Radiol. Surg. 2024, 19, 1185–1192. [Google Scholar] [CrossRef]
- Liu, X.Y.; Xia, Z.; Yang, H.; Gao, J.; Zha, D.; Zhu, M.; Wang, C.D.; Wang, Z.; Guo, J. Dynamic datasets and market environments for financial reinforcement learning. Mach. Learn. 2024, 113, 2795–2839. [Google Scholar] [CrossRef]
- Yu, L.; Guo, Q.; Wang, R.; Shi, M.; Yan, F.; Wang, R. Dynamic offloading loading optimization in distributed fault diagnosis system with deep reinforcement learning approach. Appl. Sci. 2023, 13, 4096. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 1. [Google Scholar]
- Yilmaz Eroglu, D. RL for Multi-Country Supply-Chains (Q-Learning & Deep Q-Network); Zenodo: Geneva, Switzerland, 2025. [Google Scholar] [CrossRef]
- The Economist. The Big Mac Index—January 2024. Available online: https://github.com/TheEconomist/big-mac-data/releases/tag/2024-01 (accessed on 15 July 2025).
- Jeanne, O.; Son, J. To what extent are tariffs offset by exchange rates? J. Int. Money Financ. 2024, 142, 103015. [Google Scholar] [CrossRef]
Symbol | Meaning |
---|---|
Origin (i) and destination (j) exchange rates, quoted as local-currency (LU) units per monetary unit (MU). | |
P | List selling price (MU per unit). |
M | Unit production cost in origin currency (LU). |
Baseline inventory holding charge (MU · tu−1). | |
Multiplier applied to for sensitivity runs (). | |
Country-specific storage factors . | |
C | In-transit cost rate (MU · tu−1). |
Realized production and shipping lead times (tu). | |
Ad valorem customs duty rate applied when . | |
L | Late delivery penalty (MU) if tu. |
Indicator function (equals 1 when the stated condition holds; otherwise, 0). |
State | Country | Stock | Action 0 | Action 1 | Action 2 | Action 3 | Action 4 | BestAct |
---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 55,797.67 | 55,735.26 | 56,134.91 | 56,585.01 | 56,802.81 | 4 |
1 | 1 | 1 | 56,053.55 | 56,018.52 | 56,154.06 | 56,641.59 | 56,521.91 | 3 |
2 | 1 | 2 | 55,626.69 | 55,563.01 | 56,178.62 | 56,126.73 | 56,432.10 | 4 |
3 | 1 | 3 | 55,458.11 | 55,748.21 | 55,629.78 | 56,249.75 | 56,312.47 | 4 |
4 | 1 | 4 | 55,678.57 | 55,817.28 | 56,178.32 | 56,066.94 | 56,461.56 | 4 |
5 | 2 | 0 | 37,992.47 | 37,960.22 | 37,831.13 | 38,307.13 | 38,576.20 | 4 |
6 | 2 | 1 | 38,293.11 | 37,701.19 | 37,679.09 | 38,309.32 | 38,805.14 | 4 |
7 | 2 | 2 | 38,255.27 | 37,643.85 | 38,124.26 | 38,588.06 | 38,751.09 | 4 |
8 | 2 | 3 | 38,375.37 | 37,801.73 | 38,149.97 | 38,532.25 | 38,754.07 | 4 |
9 | 2 | 4 | 38,426.73 | 37,919.21 | 38,195.35 | 38,182.92 | 38,945.80 | 4 |
10 | 3 | 0 | 27,146.37 | 26,108.66 | 26,296.62 | 26,358.95 | 26,486.41 | 0 |
11 | 3 | 1 | 27,184.81 | 26,079.84 | 26,235.35 | 26,250.30 | 26,524.80 | 0 |
12 | 3 | 2 | 27,225.82 | 26,082.50 | 26,412.65 | 26,339.11 | 26,418.35 | 0 |
13 | 3 | 3 | 27,220.88 | 25,991.03 | 26,483.73 | 26,419.68 | 26,695.00 | 0 |
14 | 3 | 4 | 27,238.16 | 26,195.35 | 26,409.91 | 26,465.04 | 26,579.72 | 0 |
State | Country | Stock | Action 0 | Action 1 | Action 2 | Action 3 | Action 4 | BestAction |
---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 11,669.72 | 11,427.81 | 11,593.90 | 11,417.57 | 11,548.73 | 0 |
1 | 1 | 1 | 11,712.63 | 11,481.70 | 11,607.85 | 11,407.87 | 11,452.70 | 0 |
2 | 1 | 2 | 11,648.60 | 11,452.73 | 11,683.33 | 11,392.84 | 11,502.95 | 2 |
3 | 1 | 3 | 11,685.69 | 11,497.28 | 11,599.06 | 11,442.73 | 11,493.52 | 0 |
4 | 1 | 4 | 11,675.88 | 11,521.64 | 11,718.05 | 11,409.78 | 11,584.26 | 2 |
5 | 2 | 0 | 12,138.90 | 11,938.90 | 12,008.87 | 11,825.58 | 11,948.29 | 0 |
6 | 2 | 1 | 12,200.02 | 11,853.55 | 11,954.85 | 11,817.45 | 12,040.80 | 0 |
7 | 2 | 2 | 12,239.07 | 11,878.84 | 12,105.46 | 11,924.77 | 12,014.54 | 0 |
8 | 2 | 3 | 12,311.24 | 11,967.88 | 12,151.85 | 11,952.27 | 12,022.63 | 0 |
9 | 2 | 4 | 12,387.64 | 12,088.60 | 12,240.48 | 11,918.42 | 12,221.33 | 0 |
10 | 3 | 0 | 10,296.33 | 10,125.91 | 10,257.62 | 10,175.23 | 10,325.42 | 4 |
11 | 3 | 1 | 10,299.94 | 10,109.84 | 10,235.71 | 10,142.84 | 10,347.86 | 4 |
12 | 3 | 2 | 10,359.88 | 10,121.03 | 10,316.26 | 10,186.36 | 10,279.27 | 0 |
13 | 3 | 3 | 10,332.04 | 10,081.90 | 10,337.05 | 10,204.45 | 10,390.45 | 4 |
14 | 3 | 4 | 10,330.06 | 10,164.82 | 10,313.79 | 10,212.58 | 10,368.00 | 4 |
State | Country | Stock | Action 0 | Action 1 | Action 2 | Action 3 | Action 4 | BestAction |
---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 12,635.99 | 12,435.77 | 12,588.43 | 12,422.53 | 12,538.60 | 0 |
1 | 1 | 1 | 12,680.25 | 12,491.59 | 12,602.34 | 12,413.40 | 12,439.54 | 0 |
2 | 1 | 2 | 12,615.54 | 12,461.90 | 12,680.22 | 12,398.07 | 12,491.98 | 2 |
3 | 1 | 3 | 12,673.93 | 12,526.96 | 12,608.40 | 12,472.45 | 12,500.39 | 0 |
4 | 1 | 4 | 12,650.28 | 12,543.50 | 12,730.19 | 12,420.97 | 12,584.04 | 2 |
5 | 2 | 0 | 12,559.73 | 12,380.48 | 12,431.44 | 12,264.04 | 12,370.90 | 0 |
6 | 2 | 1 | 12,622.40 | 12,292.89 | 12,376.09 | 12,255.44 | 12,467.23 | 0 |
7 | 2 | 2 | 12,663.00 | 12,317.49 | 12,531.50 | 12,365.93 | 12,439.72 | 0 |
8 | 2 | 3 | 12,737.03 | 12,409.21 | 12,579.32 | 12,394.63 | 12,447.82 | 0 |
9 | 2 | 4 | 12,815.02 | 12,533.38 | 12,671.67 | 12,359.32 | 12,652.52 | 0 |
10 | 3 | 0 | 9464.18 | 9217.06 | 9341.78 | 9262.61 | 9371.38 | 0 |
11 | 3 | 1 | 9479.41 | 9208.13 | 9323.17 | 9230.65 | 9386.80 | 0 |
12 | 3 | 2 | 9482.04 | 9199.65 | 9376.28 | 9251.11 | 9338.48 | 0 |
13 | 3 | 3 | 9477.38 | 9176.32 | 9406.39 | 9287.30 | 9448.74 | 0 |
14 | 3 | 4 | 9493.40 | 9243.05 | 9375.91 | 9285.56 | 9398.39 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yilmaz Eroglu, D. Tariff-Sensitive Global Supply Chains: Semi-Markov Decision Approach with Reinforcement Learning. Systems 2025, 13, 645. https://doi.org/10.3390/systems13080645
Yilmaz Eroglu D. Tariff-Sensitive Global Supply Chains: Semi-Markov Decision Approach with Reinforcement Learning. Systems. 2025; 13(8):645. https://doi.org/10.3390/systems13080645
Chicago/Turabian StyleYilmaz Eroglu, Duygu. 2025. "Tariff-Sensitive Global Supply Chains: Semi-Markov Decision Approach with Reinforcement Learning" Systems 13, no. 8: 645. https://doi.org/10.3390/systems13080645
APA StyleYilmaz Eroglu, D. (2025). Tariff-Sensitive Global Supply Chains: Semi-Markov Decision Approach with Reinforcement Learning. Systems, 13(8), 645. https://doi.org/10.3390/systems13080645