Skip to content

Instantaneous Mean-Time-To-Failure (MTTF)estimation for checkpoint interval computation at run time

Research output: Contribution to journalArticle

  • Mohamad Imran bin Bandan
  • Subhasis Bhattacharjee
  • Suriati Khartini Jali
  • Dhiraj K. Pradhan
Original languageEnglish
Pages (from-to)69-77
Number of pages9
JournalMicroelectronics Reliability
Volume98
Early online date9 May 2019
DOIs
DateSubmitted - 27 Aug 2018
DateAccepted/In press - 21 Apr 2019
DateE-pub ahead of print - 9 May 2019
DatePublished (current) - 1 Jul 2019

Abstract

The Mean-Time-To-Failure (MTTF)is an important parameter that determines the life-time reliability of a system. It is being used in several fault-tolerant mechanisms to take a critical decision on processor/system state. Recently it has been found that the MTTF of a system varies with the environmental conditions, in contrary to the earlier belief of a constant MTTF for electronic chips. Thus there is a need for a good and fast estimate of the MTTF that can accommodate the variation of environmental conditions and the stresses on the system. This paper presents an instantaneous MTTF estimation technique to be executed at runtime of the system. A major contribution of this paper is proposing a simple technique to obtain the MTTF for checkpoint interval computation in real-time systems. Our complete system model consisting of multi-level steps are presented as the main model for the MTTF estimation. We adopt one of the state-of-the-art solutions to obtain the aging rate parameter for the host/processor. Also, we proposed another parameter in the MTTF computation that represents the workload and the stress factor of the running host. The results show that the differences are marginal and they lie between 0.014% and 0.131% compared to other MTTF estimation techniques. Also, we showed that the proposed technique is able to capture the temperature variation effect (towards the MTTF value)during several simulated runtime scenarios. The proposed MTTF estimation technique has been incorporated in the life-time reliability-aware checkpointing mechanism and it has been shown to work excellently without violating the task deadlines in all cases.

    Research areas

  • Failure rate based checkpoint interval computation, Lifetime reliability, Mean-Time-To-Failure, MTTF, Reliability

Documents

View research connections

Related faculties, schools or groups