Please use this identifier to cite or link to this item:
標題: 部份可觀的馬可夫鏈之風險敏感控制
Risk-Sensitive Control of Partially Observable Markov Chains
作者: 許舜斌
關鍵字: 馬可夫決策過程
Markov decision process
partial observation
stochasticdynamic programming
optimal control
摘要: In this project we study the risk-sensitive control of discrete-time, finite-state Markovchains in the infinite time horizon. Other settings include ‘compact' action space andimperfect observation. Our goal is to find the optimal policy that minimizes the exponentialfunction of long-run average cost. We first define the information state to generalize thesystem state not directly observable to the controller. Following the standard methodology,we use the variational formula to construct the beta-discounted two-player stochastic gameand show the existence of its equilibrium point by Banach's fixed point theorem. We thenuse Tauberian, Arzela-Ascoli, and Bolzano-Weierstrass theorem to argue the limiting caseof discount number approaching 1. To ensure the convergence of the limiting case, asufficient condition for the uniformly boundedness of differential value function isnecessary. A survey on various assumptions currently available shows that many practicalproblems in the industry can not be handled. In our work we would like to integrate andgeneralize the previously proposed conditions so that more industrial cases fit in thesolvable spectrum of the formulation. The second major goal of our research is to explorethe structure of the optimal policy for simple cases. This is particularly important for thepolicy to be implemented with precision and convenience, and will attract strong interestfrom researchers in application-orientated areas. We hope to contribute to this area throughthe project and make the model reach far applicability beyond its rigorous theoretic results.
本計畫研究「離散時態」 (discrete-time),「有限系統狀態」 (finite-state) 與「精簡選擇空間」 (compact action space) 之「馬可夫決策過程」模型(MDP)。我們允許系統狀態僅可部分觀察 (partially observable),所考慮的目標函數則是以「單位時間價格函數之總和」為指數的指數函數,研究的目標是找出讓長期平均 (long-run average) 目標函數之期望值最小的「最佳策略」。該模型因考慮到累積成本隨機變數的高次方期望值,達到對風險敏感 (risk-sensitive) 的效果,讓所得到的最佳化策略得以相對降低成本隨機變數之變異,而引起財務金融與管理科學等應用領域的高度興趣。我們計畫的第一部分,將透過所謂「貝爾曼最佳化方程式」探討最佳策略存在的充分條件。就我們的理解,目前文獻上已有兩類代表性的充分條件被提出,第一類承襲自處理「可數的系統狀態」 (countable system states) 所用的概念,假定所有策略會造成馬可夫鏈的歸零 (reset),這種假設等同於假定系統的某一狀態是完全可觀察的。另一類則假定所有策略所對應的馬可夫鏈轉移矩陣皆為primitive 矩陣 (即,矩陣元素為0 或正數,且矩陣自乘有限次之後,所得新矩陣之元素皆為正數),利用價格函數是凹函數(concave function) 的特性證明「貝爾曼最佳化方程式」解的存在。然而,以上兩個假設都有相當的侷限性,且令人意外的是兩類假設竟彼此互斥。在本研究中,我們希望找到能整合這兩類條件的一般化條件,以解決目前解法上的瑕疵。我們計畫的第二部分,則是研究最佳策略的結構,例如對於運用性極廣的二元控制模型 (兩個系統狀態,兩種控制選擇),其最佳控制策略是否是門檻型 (threshold type) 等問題,使本模型除了理論上的研究價值外能被真正運用到一些實務問題上。
其他識別: NSC99-2221-E005-068
Appears in Collections:電機工程學系所



Show full item record
TAIR Related Article

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.