Please use this identifier to cite or link to this item: http://hdl.handle.net/11455/7487
標題: 通訊誘導檢查點協定之自我調整索引技術
Self-Adjusting Indexing Techniques of Communication-Induced Checkpointing Protocols
作者: 陳家揚
Chen, Chia-Yang
關鍵字: Parallel and Distributed Computations;平行分散式計算系統;Rollback-Recovery;Communication-Induced Checkpointing(CIC);回溯復原;通訊誘導檢查點
出版社: 電機工程學系所
引用: [1]叢震,”分散式系統中基於免骨牌效應檢查點通訊協定模擬比較分析”,國立中興大學電機工程研究所碩士論文,民國92年7月。 [2]林炳源,”模擬比較具回溯相依可追蹤性之檢查點通訊協定”,國立中興大學電機工程研究所碩士論文,民國92年7月。 [3]張炳煌,”回溯相依可追蹤性及免骨牌效應檢查點通訊協定之模擬研究”,國立中興大學電機工程研究所碩士論文,民國93年7月。 [4]莊益端、梁仁楷,”C++程式設計實務最新版”,碁峰資訊,民國94年。 [5] Jichiang Tsai, Sy-Yen Kuo, and Yi-Min Wang, “Theoretical Analysis for Communicati- on-Induced Checkpointing Protocols with Rollback-Dependency Trackability, ” IEEE Transactions on Parallel and Distributed Systems, 9(10):963-971, October 1998. [6] Jichiang Tsai, “Simulation Comparisons of Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability,” submitted to Journal of Information Science and Engineering, Dec , 2003. [7] Jichiang Tsai: On Properties of RDT Communication-Induced Checkpointing Protocols. IEEE Trans. Parallel Distrib. Syst. 14(8): 755-764 (2003). [8] Chi-Yi Lin, Jichiang Tsai, Sy-Yen Kuo, Yennun Huang: Communication-Induced Checkpointing Protocols with K-Bounded Domino-Effect Freedom. ICDCS Workshop on Distributed Real-Time Systems 2000: B7-B13. [9] Mostefaoui, J. M. Helary, R. H. B. Netzer, and M. Raynal, “Communication based prevention of useless checkpoints in distributed computations,” Distributed Computing, pp. 29-43, Vol. 13, 2000. [10] R.Bagrodia et al., “Parsec: A Parallel Simulation Environment for Complex Systems,” Computer, pp77-85 ,October 1998. [11] J. Tsai, Y. M. Wang, and S. Y. Kuo, “Evaluations of domino free communication-induced checkpointing protocols , ” Information Processing Letters, 69:31–37, Jan. 1999. [12] R.Bagrodia, “Parallel Language for Discrete-Event Simulation Models,” IEEE Computational Science and Engineering , pp.27-38 Apr.-June 1998. [13] Richard A. Meyer and Rajive Bagrodia, “PARSEC User Manual For PARSEC Release 1.1,” Revised in September 1999, http://pcl.cs.ucla.edu/projects/parsec . [14] Y. M. Wang, A. Lowry, and W. K. Fuchs, “Consistent global checkpoints based on direct dependency tracking,” Information Processing Letters, Vol. 50, 1994, pp. 223-230. [15] K. M. Chandy and L. Lamport, “Distributed snapshots: Determining global states of distributed systems,” ACM Transactions on Computing Systems, Vol. 3, 1985, pp. 63-75. [16] B. Randell, “System structure for software fault-tolerant,” IEEE Transactions on Software Engineering, Vol.1, 1975, pp. 220-232. [17] E. N. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson, “A survey of rollback-recovery protocols in message-passing systems,” ACM Computing Surveys, Vol. 34, 2002, pp. 375-408. [18] R. Koo and S. Toueg, “Checkpointing and rollback-recovery for distributed systems,” IEEE Transactions on Software Engineering, Vol. 13, 1987, pp. 23-31. [19] B. Janssens and W. K. Fuchs, “Experimental evaluation of multiprocessor cache-based error recovery,” in Proceedings of International Conference on Parallel Processing, 1991, pp. 505-508. [20] Y. M. Wang, “Consistent global checkpoints that contain a given set of local checkpoin- ts,” IEEE Transactions on Computers, Vol. 46, 1997, pp. 456-468. [21] R. H. B. Netzer and J. Xu, “Necessary and sufficient conditions for consistent global snapshots,” IEEE Transactions on Parallel and Distributed Systems, Vol. 6, 1995, pp. 165-169. [22] R. Baldoni, J. M. Helary, A. Mostefaoui, and M. Raynal, “A communication-induced checkpointing protocol that ensures rollback-dependency trackability,” in Proceedings of IEEE Fault-Tolerant Computing Symposium, 1997, pp. 68-77. [23] D. L. Russell, “State restoration in systems of communicating processes,” IEEE Transactions on Software Engineering, Vol. 6, 1980, pp. 183-194. [24] R. Baldoni, J. M. Helary, and M. Raynal, “Rollback-dependency trackability: A minimal characterization and its protocol,” Information and Computation, Vol. 165, 2001, pp. 144-173. [25] I. C. Garcia, G. M. D. Vieira, and L. E. Buzato, “RDT-partner: An efficient checkpointing protocol that enforces rollback-dependency trackability,” in Proceedings of IX Brazilian Symposium on Fault-Tolerant Computing, 2001. [26] I. C. Garcia and L. E. Buzato, “A linear approach to enforce the minimal characterization of the rollback-dependency trackability property,” Technical Report, TR-IC-01-17, University of Campinas, Brazil, 2001. [27] D. Manivannan and M. Singhal, “Quasi-synchronous checkpointing: Models, characterization, and classification,” IEEE Transactions on Parallel and Distributed Systems, Vol. 10, 1999, pp. 703-713. [28] R. Baldoni, J. M. Helary, and M. Raynal, “Rollback-dependency trackability: Visible characterizations,” in Proceedings of 18th ACM Symposium on Principles of Distributed Computing, 1999, pp. 33-42. [29] R. Baldoni, J. M. Helary, and M. Raynal, “Impossibility of scalar clock-based communication-induced check-pointing protocols ensuring the RDT property,” Information Processing Letters, Vol. 80, 2001, pp. 105-111. [30] I. C. Garcia and L. E. Buzato, “On the minimal characterization of the rollback-dependency trackability property,” in Proceedings of 21st IEEE International Conference on Distributed Computing Systems, 2001, pp. 342-349. [31] Jichiang Tsai, Sy-Yen Kuo and Yi-Min Wang, "More properties of communication-induced checkpointing protocols with rollback-dependency trackability," Journal of Information Science and Engineering, vol. 21, no. 2, pp. 239-257, Mar. 2005. [32] L. Lamport, “Time, clocks and the ordering of events in a distributed system,” Communications of the ACM, Vol. 21, 1978, pp. 558-565. [33] D. Briatico, A. Ciufoletti, and L. Simoncini, “A distributed domino-effect free recovery algorithm,” Proc. 4th IEEE Symp. on Reliability in Distributed Software and Database Syst., pp. 207-215, Oct. 1984. [34] D. Manivannan and M. Singhal, “A low overhead recovery technique using quasi- synchronous checkpointing,”Proc. 16th IEEE Int''l Conf. on Distributed Computing Syst., pp. 100-107, May 1996. [35] G. M. D. Vieira, I. C. Garcia, and L. E. Buzato, “Systematic analysis of index-based checkpointing algorithms using simulation,” Proc. Of IX Brazilian Symp. on Fault-Tolerant Comput., 2001. [36] F. Quaglia, R. Baldoni, and B. Ciciani, “On the no-Z-cycle property in distributed executions,” Journal of Computer and Systems Sciences, vol. 61, no. 3, pp. 400-427, Dec. 2000. [37] J. M. Helary, A. Mostefaoui, and M. Raynal, “Virtual precedence in asynchronous systems: concept and applications,” 11th Int''l Workshop on Distributed Algorithms, pp. 170-184, Sept. 1997. [38] C. J. Fidge, “Logical time in distributed computing systems,” IEEE Computer, vol. 24, no. 8, pp. 11-76,1991. [39] L. Alvisi, E. Elnozahy, S. Rao, S. A. Husain, and A. De Mel, “Ananalysis of communication-induced checkpointing, ” IEEE Fault-Tolerant Comput. Symp., pp. 242-249, 1999. [40] Jichiang Tsai, “Performance Comparisons of Index-Based DEF Checkpointing Protocols, ” submitted to IEEE Transactions on Parallel and Distributed Systems, October 2003. [41] Jichiang Tsai, “Systematic comparisons of RDT communication-induced checkpointing protocols, ” Pacific Rim International Symposium on Dependable Computing, Mar. 2004. [42] Jichiang Tsai, "Performance comparisons of index-based communication-induced checkpointing protocols," Journal of the Chinese Institute of Engineers, vol. 29, no. 6, pp. 1113-1118, Oct. 2006.
摘要: 
在平行分散式計算系統中,訊息在傳遞中會發生錯誤。因此,回溯復原的能力是很重要的。我們藉由檢查點的設置,使得系統發生錯誤時能復原到正常的情況。

本論文主要目的有二。一、文獻探討整理,二、提出新的演算法協定。協定是以HMNR1協定為基礎,藉由不同的索引方法和情況去提出不同的通訊誘導檢查點協定。並將這些協定,使用軟體模擬分析比較效益。

The data transfer can make errors in parallel and distributed computations, hence the capability is very essential to rollback recovery. We rely on checkpoint setting to restore to the normal condition when the system finds errors.

This thesis has two topics. The first one is literature research, and the second one is presenting new algorithm protocols. The different communication-induced checkpointing protocols are presented by variant indexing and conditions on the basis of HMNR1.We use software simulation to analyze and compare with their performance.
URI: http://hdl.handle.net/11455/7487
其他識別: U0005-1308200718113500
Appears in Collections:電機工程學系所

Show full item record
 

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.