期刊文献+
共找到955篇文章
< 1 2 48 >
每页显示 20 50 100
Variance minimization for continuous-time Markov decision processes: two approaches 被引量:1
1
作者 ZHU Quan-xin 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2010年第4期400-410,共11页
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance mi... This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions. 展开更多
关键词 continuous-time markov decision process Polish space variance minimization optimality equation optimality inequality.
在线阅读 下载PDF
Variance Optimization for Continuous-Time Markov Decision Processes
2
作者 Yaqing Fu 《Open Journal of Statistics》 2019年第2期181-195,共15页
This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space... This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper. 展开更多
关键词 continuous-time markov decision process Variance OPTIMALITY of Average REWARD Optimal POLICY of Variance POLICY ITERATION
在线阅读 下载PDF
STRONG N-DISCOUNT AND FINITE-HORIZON OPTIMALITY FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES 被引量:1
3
作者 ZHU Quanxin GUO Xianping 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2014年第5期1045-1063,共19页
This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the rewar... This paper studies the strong n(n =—1,0)-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n(n =—1,0)-discount optimal stationary policies by developing two equivalence relations:One is between the standard expected average reward and strong—1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet. 展开更多
关键词 continuous-time markov decision process expected average reward criterion finite-horizon optimality Polish space strong n-discount optimality
原文传递
Average Sample-path Optimality for Continuous-time Markov Decision Processes in Polish Spaces
4
作者 Quan-xin ZHU 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 2011年第4期613-624,共12页
In this paper we study the average sample-path cost (ASPC) problem for continuous-time Markov decision processes in Polish spaces. To the best of our knowledge, this paper is a first attempt to study the ASPC criter... In this paper we study the average sample-path cost (ASPC) problem for continuous-time Markov decision processes in Polish spaces. To the best of our knowledge, this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces. The corresponding transition rates are allowed to be unbounded, and the cost rates may have neither upper nor lower bounds. Under some mild hypotheses, we prove the existence of (ε〉 0)-ASPC optimal stationary policies based on two different approaches: one is the "optimality equation" approach and the other is the "two optimality inequalities" approach. 展开更多
关键词 continuous-time markov decision process average sample-path optimality Polish space optimality equation optimality inequality
原文传递
TOTAL REWARD CRITERIA FOR UNCONSTRAINED/CONSTRAINED CONTINUOUS-TIME MARKOV DECISION PROCESSES
5
作者 Xianping GUO Lanlan ZHANG 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2011年第3期491-505,共15页
This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable... This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems. 展开更多
关键词 Constrained-optimal policy continuous-time markov decision process optimal policy total reward criterion unbounded reward/cost and transition rates.
原文传递
CONVERGENCE OF CONTROLLED MODELS FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES WITH CONSTRAINED AVERAGE CRITERIA
6
作者 Wenzhao Zhang Xianzhu Xiong 《Annals of Applied Mathematics》 2019年第4期449-464,共16页
This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞o... This paper attempts to study the convergence of optimal values and optimal policies of continuous-time Markov decision processes(CTMDP for short)under the constrained average criteria. For a given original model M_∞of CTMDP with denumerable states and a sequence {M_n} of CTMDP with finite states, we give a new convergence condition to ensure that the optimal values and optimal policies of {M_n} converge to the optimal value and optimal policy of M_∞as the state space Snof Mnconverges to the state space S_∞of M_∞, respectively. The transition rates and cost/reward functions of M_∞are allowed to be unbounded. Our approach can be viewed as a combination method of linear program and Lagrange multipliers. 展开更多
关键词 continuous-time markov decision processes optimal value optimal policies constrained average criteria occupation measures
原文传递
Modeling and Design of Real-Time Pricing Systems Based on Markov Decision Processes 被引量:4
7
作者 Koichi Kobayashi Ichiro Maruta +1 位作者 Kazunori Sakurama Shun-ichi Azuma 《Applied Mathematics》 2014年第10期1485-1495,共11页
A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load cur... A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load curve. In this paper, using a Markov decision process (MDP), we propose a modeling method and an optimal control method for real-time pricing systems. First, the outline of real-time pricing systems is explained. Next, a model of a set of customers is derived as a multi-agent MDP. Furthermore, the optimal control problem is formulated, and is reduced to a quadratic programming problem. Finally, a numerical simulation is presented. 展开更多
关键词 markov decision process OPTIMAL Control REAL-TIME PRICING System
在线阅读 下载PDF
Robust analysis of discounted Markov decision processes with uncertain transition probabilities 被引量:3
8
作者 LOU Zhen-kai HOU Fu-jun LOU Xu-ming 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2020年第4期417-436,共20页
Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the rob... Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods. 展开更多
关键词 markov decision processes uncertain transition probabilities robustness and sensitivity robust optimal policy value interval
在线阅读 下载PDF
Development of Optimal Maintenance Policies for Offshore Wind Turbine Gearboxes Based on the Non-homogeneous Continuous-Time Markov Process 被引量:1
9
作者 Mingxin Li Jichuan Kang +1 位作者 Liping Sun Mian Wang 《Journal of Marine Science and Application》 CSCD 2019年第1期93-98,共6页
Gearbox in offshore wind turbines is a component with the highest failure rates during operation. Analysis of gearbox repair policy that includes economic considerations is important for the effective operation of off... Gearbox in offshore wind turbines is a component with the highest failure rates during operation. Analysis of gearbox repair policy that includes economic considerations is important for the effective operation of offshore wind farms. From their initial perfect working states, gearboxes degrade with time, which leads to decreased working efficiency. Thus, offshore wind turbine gearboxes can be considered to be multi-state systems with the various levels of productivity for different working states. To efficiently compute the time-dependent distribution of this multi-state system and analyze its reliability, application of the nonhomogeneous continuous-time Markov process(NHCTMP) is appropriate for this type of object. To determine the relationship between operation time and maintenance cost, many factors must be taken into account, including maintenance processes and vessel requirements. Finally, an optimal repair policy can be formulated based on this relationship. 展开更多
关键词 Maintenance policy NON-HOMOGENEOUS continuous-time markov process OFFSHORE wind TURBINE gearboxes Reliability analysis Failure rates System engineering
在线阅读 下载PDF
Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter
10
作者 Edilson F. Arruda Fabrício Ourique 《American Journal of Operations Research》 2013年第6期514-520,共7页
This paper proposes a technique to accelerate the convergence of the value iteration algorithm applied to discrete average cost Markov decision processes. An adaptive partial information value iteration algorithm is p... This paper proposes a technique to accelerate the convergence of the value iteration algorithm applied to discrete average cost Markov decision processes. An adaptive partial information value iteration algorithm is proposed that updates an increasingly accurate approximate version of the original problem with a view to saving computations at the early iterations, when one is typically far from the optimal solution. The proposed algorithm is compared to classical value iteration for a broad set of adaptive parameters and the results suggest that significant computational savings can be obtained, while also ensuring a robust performance with respect to the parameters. 展开更多
关键词 AVERAGE Cost markov decision processes Value ITERATION Computational EFFORT GRADIENT
在线阅读 下载PDF
Conditional Value-at-Risk for Random Immediate Reward Variables in Markov Decision Processes
11
作者 Masayuki Kageyama Takayuki Fujii +1 位作者 Koji Kanefuji Hiroe Tsubaki 《American Journal of Computational Mathematics》 2011年第3期183-188,共6页
We consider risk minimization problems for Markov decision processes. From a standpoint of making the risk of random reward variable at each time as small as possible, a risk measure is introduced using conditional va... We consider risk minimization problems for Markov decision processes. From a standpoint of making the risk of random reward variable at each time as small as possible, a risk measure is introduced using conditional value-at-risk for random immediate reward variables in Markov decision processes, under whose risk measure criteria the risk-optimal policies are characterized by the optimality equations for the discounted or average case. As an application, the inventory models are considered. 展开更多
关键词 markov decision processes CONDITIONAL VALUE-AT-RISK Risk Optimal Policy INVENTORY Model
在线阅读 下载PDF
Seeking for Passenger under Dynamic Prices: A Markov Decision Process Approach
12
作者 Qianrong Shen 《Journal of Computer and Communications》 2021年第12期80-97,共18页
In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply ... In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%. 展开更多
关键词 Ride-on-Demand Service markov decision process Dynamic Pricing Taxi Services Route Recommendation
在线阅读 下载PDF
Heterogeneous Network Selection Optimization Algorithm Based on a Markov Decision Model 被引量:9
13
作者 Jianli Xie Wenjuan Gao Cuiran Li 《China Communications》 SCIE CSCD 2020年第2期40-53,共14页
A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Consideri... A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs. 展开更多
关键词 heterogeneous wireless networks markov decision process reward function genetic algorithm simulated annealing
在线阅读 下载PDF
An Optimized Vertical Handoff Algorithm Based on Markov Process in Vehicle Heterogeneous Network 被引量:4
14
作者 MA Bin DENG Hong +1 位作者 XIE Xianzhong LIAO Xiaofeng 《China Communications》 SCIE CSCD 2015年第4期106-116,共11页
In order to solve the problem the existing vertical handoff algorithms of vehicle heterogeneous wireless network do not consider the diversification of network's status, an optimized vertical handoff algorithm bas... In order to solve the problem the existing vertical handoff algorithms of vehicle heterogeneous wireless network do not consider the diversification of network's status, an optimized vertical handoff algorithm based on markov process is proposed and discussed in this paper. This algorithm takes into account that the status transformation of available network will affect the quality of service(Qo S) of vehicle terminal's communication service. Firstly, Markov process is used to predict the transformation of wireless network's status after the decision via transition probability. Then the weights of evaluating parameters will be determined by fuzzy logic method. Finally, by comparing the total incomes of each wireless network, including handoff decision incomes, handoff execution incomes and communication service incomes after handoff, the optimal network to handoff will be selected. Simulation results show that: the algorithm proposed, compared to the existing algorithm, is able to receive a higher level of load balancing and effectively improves the average blocking rate, packet loss rate and ping-pang effect. 展开更多
关键词 vehicle heterogeneous network vertical handoff markov process fuzzy logic multi-attribute decision
在线阅读 下载PDF
Probabilistic Analysis and Multicriteria Decision for Machine Assignment Problem with General Service Times
15
作者 Wang, Jing 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 1994年第1期53-61,共9页
In this paper we carried out a probabilistic analysis for a machine repair system with a general service-time distribution by means of generalized Markov renewal processes. Some formulas for the steady-state performan... In this paper we carried out a probabilistic analysis for a machine repair system with a general service-time distribution by means of generalized Markov renewal processes. Some formulas for the steady-state performance measures. such as the distribution of queue sizes, average queue length, degree of repairman utilization and so on. are then derived. Finally, the machine repair model and a multiple critcria decision-making method are applied to study machine assignment problem with a general service-time distribution to determine the optimum number of machines being serviced by one repairman. 展开更多
关键词 Machine assignment problem Queueing model Multicriteria decision markov processes
在线阅读 下载PDF
A dynamical neural network approach for distributionally robust chance-constrained Markov decision process 被引量:1
16
作者 Tian Xia Jia Liu Zhiping Chen 《Science China Mathematics》 SCIE CSCD 2024年第6期1395-1418,共24页
In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms und... In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms under the moment-based uncertainty set.To cope with the non-convexity and improve the robustness of the solution,we propose a dynamical neural network approach to solve the reformulated optimization problem.Numerical results on a machine replacement problem demonstrate the efficiency of the proposed dynamical neural network approach when compared with the sequential convex approximation approach. 展开更多
关键词 markov decision process chance constraints distributionally robust optimization moment-based ambiguity set dynamical neural network
原文传递
A Novel Dynamic Decision Model in 2-player Symmetric Repeated Games
17
作者 Liu Weibing Wang Xianjia Wang Guangmin 《Engineering Sciences》 EI 2008年第1期43-46,共4页
Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decisi... Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decision process with payoffs, and the Boltzmann distribution was intousluced. Our dynamic model is different from others' , we used this dynamic model to study the iterated prisoner' s dilemma, and the results show that this decision model can successfully be used in symmetric repeated games and has an ability of adaptive learning. 展开更多
关键词 game theory evolutionary game repeated game markov process decision model
在线阅读 下载PDF
考虑峰值功率受限约束的柔性作业车间调度研究
18
作者 李益兵 曹岩 +3 位作者 郭钧 王磊 李西兴 孙利波 《中国机械工程》 北大核心 2025年第2期280-293,共14页
针对车间峰值功率受限约束下的柔性作业车间调度面临的作业周期增加、机器负荷增大的问题,建立以最小化最大完工时间和最小化机器最大负载为优化目标、考虑车间峰值功率约束的柔性作业车间调度问题(PPCFJSP)模型。为更好地调度决策,首... 针对车间峰值功率受限约束下的柔性作业车间调度面临的作业周期增加、机器负荷增大的问题,建立以最小化最大完工时间和最小化机器最大负载为优化目标、考虑车间峰值功率约束的柔性作业车间调度问题(PPCFJSP)模型。为更好地调度决策,首先将该问题转化为马尔可夫决策过程,基于此设计了一个结合离线训练与在线调度的用于求解PPCFJSP的调度框架。然后设计了一种基于优先级经验重放的双重决斗深度Q网络(D3QNPER)算法,并设计了一种引入噪声的ε-贪婪递减策略,提高了算法收敛速度,进一步提高了求解能力和求解结果的稳定性。最后开展实验与算法对比研究,验证了模型和算法的有效性。 展开更多
关键词 柔性作业车间调度 马尔可夫决策过程 深度强化学习 峰值功率受限
在线阅读 下载PDF
动态电磁环境下多功能雷达一体化发射资源管理方案
19
作者 张鹏 严俊坤 +2 位作者 高畅 李康 刘宏伟 《雷达学报(中英文)》 北大核心 2025年第2期456-469,共14页
传统多功能雷达仅面向目标特性优化发射资源,在动态电磁环境下面临干扰智能时变、优化模型失配的问题。因此,该文提出一种基于数据驱动的一体化发射资源管理方案,旨在通过对动态干扰信息在线感知与利用提升多功能雷达在动态电磁环境下... 传统多功能雷达仅面向目标特性优化发射资源,在动态电磁环境下面临干扰智能时变、优化模型失配的问题。因此,该文提出一种基于数据驱动的一体化发射资源管理方案,旨在通过对动态干扰信息在线感知与利用提升多功能雷达在动态电磁环境下的多目标跟踪(MTT)性能。该方案首先建立马尔可夫决策过程,数学化描述雷达被敌方截获和干扰的风险。而后将该马尔可夫决策过程感知的干扰信息耦合进MTT精度计算,一体化发射资源管理方法被设计为具有约束动作空间的优化问题。最后提出一种贪婪排序回溯算法对其进行求解。仿真结果表明,所提方法在面向动态干扰环境时不仅可以降低敌方截获概率,还能在被干扰时降低干扰对雷达的影响,改善MTT性能。 展开更多
关键词 一体化发射资源管理 多目标跟踪 动态电磁环境 马尔可夫决策过程 优化问题
在线阅读 下载PDF
基于马尔可夫判定过程的光纤网络入侵检测方法
20
作者 郭海智 贾志诚 李金库 《激光杂志》 北大核心 2025年第3期193-198,共6页
为了可以精准实现光纤网络入侵检测,提出基于马尔可夫判定过程的光纤网络入侵检测方法。通过频域分块技术对光纤网络信号展开信号提纯,利用经验模态分解方法对入侵信号进行初始检测,采用模糊层次分析法确定网络接入行为信用度,对于信用... 为了可以精准实现光纤网络入侵检测,提出基于马尔可夫判定过程的光纤网络入侵检测方法。通过频域分块技术对光纤网络信号展开信号提纯,利用经验模态分解方法对入侵信号进行初始检测,采用模糊层次分析法确定网络接入行为信用度,对于信用度较高的接入行为直接通过,剩余接入行为则利用马尔可夫判定过程展开判定,由此实现入侵检测。实验结果表明,该方法能够快速、准确检测入侵信号,特别是针对Pording数据集所遭受侵入式窃听行为,检出率高达0.985。在整个实验中,该方法检出率的最小值也可以达到0.920,平均检测误判率、平均检测漏判率的最大值分别为0.01、0.02。这说明该方法显著提升光纤网络的安全性和稳定性,为保障网络安全提供有力的支持。 展开更多
关键词 马尔可夫判定过程 光纤网络 经验模态分解 模糊层次分析法 入侵检测
在线阅读 下载PDF
上一页 1 2 48 下一页 到第
使用帮助 返回顶部