22 Haziran 2014 Pazar

Fault Tolerant Software

Fault Tolerant Software

Kavramlar
Patterns for Fault Tolerant Software kitabında Fault Tolerance'ı açıklamak için 3 kavram kullanılmış. Bunlar birbirleri ile ilintili.
Fault (Hata) -> Error (Türkçesi Nedir ? Yanlış?)-> Failure (Başarısızlık)
Kitapta verilen örneklerden birinde fatura hesaplarken yanlış fatura çıkması bir Failure, başarısızlığa sebep ise hatalı CDR gelmesi (Fault) ve hatalı CDR'ın hesaba geçirilmesi (Error) olarak tanımlanmış.Örneğin metni aşağıda.

A misrouted telephone call is an example of a failure. Telephone system requirements specify that calls should be delivered to the correct recipient. When a faulty system prevents them from being delivered correctly, the system has failed. In this case the fault might have been an incorrect call routing data being stored in the system. The error occurs when the incorrect data is accessed and an incorrect network path is computed with that incorrect data.

The preparation of an incorrect bill for service is another example of a failure.The system requirements specify that the customer will be accurately charged for service received. A faulty identifi cation received in a message by a billing system can result in the charges being erroneously applied to the wrong account. The fault in this case might have been in the communications channel (a garbled message),or in the system component that prepares the message for transmission. The error was applying the charges to the wrong account. The fact that the customer receives an incorrect charge is the failure, since they agreed with the carrier to pay for the service that they used and not for unused service.

Tanımlar ise aşağıda :
fault is the defect that is present in the system that can cause an error. It is the actual deviation from correctness.
An error is the incorrect behavior from which a failure may occur. Erros can be categorized into two types : timing or value
failure is system behavior that does not conform to the system specification.
Detection Patterns
Detection Patterns başlığı altında eğer hatadan kurtulmak mümkün değilse Fail Silent'ın tercih edilmesi önerilmiş.

The most desirable means of failure handling in a computing system is when the error is detected automatically and corrected before it beceoms a failure. If this is not possible and a failure occurs, then the next most favorable are the Fail Silent and then the Crash Failure mode.

Fail Silent Nedir ?
A Fail-Silent failure is one in which the failing unit either presents the correct result or  no result at all. A Crash Failure is one where the unit stops after the first silent failure.

Fail Silent Durumu İletilir mi ?
Bir başka paragrafta ise Fail Silent moduna giren birimin, durumunu başka birimlere bildirmemesi gerektiği anlatılmış.

When failing silently the erroneous element immediately stops processing without corrupting any of its peers. To ensure peers do not corrupt other peers through the error propagating, the failing element stops without informing them that it is stopping. The problem of detecting that an element has stopped functioning is a totally different and easier problem than the problem of determining if an element has stopped operating correctly.

Hiç yorum yok:

Yorum Gönder