Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/19736
標題: 在無線感測器網路上設計與實做一個適用於不具備記憶體管理機制的硬體錯誤偵測方法
A Hardware Fault Detection Scheme for MMU-less Embedded Processors in Wireless Sensor Networks
作者: 葉宗祐
Yeh, Tsung-Yu
關鍵字: fault tolerance
容錯
sensor network
fault detection
感測器網路
錯誤偵測
出版社: 資訊網路多媒體研究所
引用: [1]. 李育翰, “設計與實作一個適用於無線感測網路下雙節點系統之檢查點與重啟點機制”,中興大學資訊科學與工程學系碩士論文, 2009. [2]. David Bernick, Bill Bruckert, et al., “NonStop Advanced Architecture”, Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN’05). [3]. Shekhar Borkar, “Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation”, IEEE Micro, 25(6), 2005. [4]. Thomas C. Bressoud, “Hypervisor-based Fault Tolerance”, ACM Transactions on Computer Systems, Vol. 14, No. 1, pp. 80-107, February 1996. [5]. Shian-Tai Chiou, Hsung-Pin Chang, “An Error Detection and Recovery Scheme for Dynamically Downloadable Modules in Wireless Sensor Networks”, International Computer Symposium, 2008. [6]. Marcello Cinque1, Domenico Cotroneo, et al., “Modeling and Assessing the Dependability of Wireless Sensor Networks”, 26th IEEE International Symposium on Reliable Distributed Systems, 2007. [7]. Brendan Cully, Geoffrey Lefebvre , et al., “Remus: High Availability via Asynchronous Virtual Machine Replication”, NSDI ’08: 5th USENIX Symposium on Networked Systems Design and Implementation, 2008. [8]. Joao Carlos Cunha, Antonio Correia, et al., “Reset-Driven Fault Tolerance”, EDCC 2002, pp. 102–120, 2002. [9]. Weining Gu, Zbigniew Kalbarczyk, et al., “Error Sensitivity of the Linux Kernel Executing on PowerPC G4 and Pentium 4 Processors”, International Conference on Dependable Systems and Networks, 2004. [10]. Chih-Chieh Han, et al., “A Dynamic Operating System for Sensor Nodes”, Proceedings of the 3rd ACM MobiSys, 2005. [11]. Douglas Herbert, Vinaitheerthan Sundarram, et al., “Adaptive Correctness Monitoring for Wireless Sensor Networks Using Hierarchical Distributed Run-time Invariant Checking”, ACM Transactions on Autonomous and Adaptive Systems,Vol.2, No.3, 2007. [12]. Ram Kumar, Eddie Kohler, Mani Srivastava, "Harbor: Software-based Memory Protection for Sensor Nodes", Proceedings of the 6th international conference on Information processing in sensor networks (IPSN), Cambridge, Massachusetts, 2007. [13]. Man-Lap Li, Pradeep Ramachandran, et al., “Understanding the Propagation of Hard Errors to Software and Implications for Resilient System Design” ASPLOS, 2008. [14]. Nithin Nakka, Giacinto Paolo Saggese, et al., “An Architectural Framework for Detecting Process Hangs & Crashes”, EDCC, 2005, pp. 103-121. [15]. Kevin Ni, Nithyar Ramanathan, ”Sensor Network Data Fault Types” ACM Journal Name, Vol. 5, No. 3, August 2009. [16]. Feng Qin, Joseph Tucek and Yuanyuan Zhou, “Rx:Treating Bugs as Allergies—A Safe Method to Survive Software Failures”, ACM Transactions on Computer Systems, Vol. 25, No. 3, Article 7, August 2007. [17]. Frank Stajano, Ross Anderson, et al., “The Grenade Timer : Fortifying the Watchdog Timer Against Malicious Mobile Code”, Proceedings of 7th International Workshopon Mobile Multimedia Communications, MoMuC 2000. [18]. Ben L. Titzer, Daniel K. Lee, Jens Palsberg, “Avrora: Scalable Sensor Network Simulation with Precise Timing”, IPSN’05, 2005. [19]. Rajesh Venkatasubramanian, John P. Hayes and Brian T. Murray, “Low-cost On-line Fault Detection Using Control Flow Assertions”, Proceedings of the 9th IEEE International On-Line Testing Symposium (IOLTS’03), 2003. [20]. Nicholas J. Wang, Sanjay J. Patel, “ReStore ~ Symptom-Based Soft Error Detection in Microprocessors”, IEEE Transactions on Dependable and Secure Computing, Vol. 3, No. 3, July-September 2006.
摘要: 基於成本、低耗電與體積小的設計考量,感測節點的硬體架構通常較為簡化,例如並不具備記憶體管理系統及hardware trap來輔助系統偵測錯誤。 有鑑於此,我們提出雙節點架構,備用的節點可以在主用節點失效時取而代之。首先,在處理器與記憶體的錯誤偵測上,我們提出軟體方式來實做並模擬fatal hardware trap,彌補主用節點先天硬體上的不足,此外,因為錯誤可能是暫時性或永久性,當主用節點捕捉到硬體錯誤時,可請求備用節點一同重新執行(parallel re-execution)以診斷硬體錯誤是暫時性或永久性。其次,對於不影響系統可用性的錯誤,如感測裝置的損壞,主用節點利用錯誤資料模型來偵測感測資料是否發生錯誤,如果錯誤,則請求備用節點啟動感測並回傳資料,取代主用節點上的錯誤資料。 實驗結果顯示,我們提出的雙節點架構有很好的處理器錯誤偵測率及感測資料錯誤更正率,另外我們也測量了處理器與記憶體的硬體錯誤偵測延遲(fault detection latency),並且討論錯誤偵測延遲的數據如何影響我們雙節點架構上的檢查點策略(checkpointing)。
The architecture of embedded sensor is always designed for the purpose of low-energy consumption, low cost, and small size. As a result, this resource-constraint platform does not contain memory protection unit(MMU) and a variety of fatal hardware traps. To overcome this difficulty, we propose a dual-node architecture in which the backup node can replace the primary node who failed. First of all, we simulate fatal hardware trap to detect faults of processor and memory; Later, primary node requests the backup node to re-execute simultaneously. Based on the outcome of the parallel re-execution, faults can be categorized into transient faults or permanent faults. Secondly, for those faults which do not affect the system's availability, like faults of sensing I/O device, primary node can detect it with sensing data fault model and request backup node to re-sense in order for correction. Finally, the evaluation shows that our dual-node architecture has good fault coverage of processor and I/O sensing data. We also measure the fault detection latency and discuss about how the latency affects the checkpoint policy of the proposed dual-node architecture.
URI: http://hdl.handle.net/11455/19736
其他識別: U0005-0812200900305200
文章連結: http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID=U0005-0812200900305200
Appears in Collections:資訊網路與多媒體研究所

文件中的檔案:

取得全文請前往華藝線上圖書館



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.