Software Reliability
OBJECTIVES
Unlike electrical and mechanical reliability practices, software reliability is not common within development. This Software Design for Reliability Seminar highlights various topics in software reliability and explains their application and positive impact to the development life cycle phases: Concept, Design, Implementation, and Testing. The seminar covers reliability concepts, planning and software development “best practices” to form the starting point for integrating reliability practices. Then, overviews of reliability techniques from the remaining phases are presented along with explanations of how they can be integrated into a reliability program. This seminar presents a practical approach to software reliability. The seminar is intended for software engineers and managers that are directly involved in software development and want to understand reliability concepts and practices that will improve the reliability of their product software.
WHO SHOULD ATTEND
This course is intended for those who want a better understanding of how to design and test more reliable software in less time.
MODULE LIST
This seminar is comprised of the following training modules. A description of each module follows the list below:
- Software Development Life Cycle Best Practices
- Reliability Concepts for Software
- Integrating Reliability Practices into the Software Development Life Cycle
- Development Phase Reliability Practices
- Software Fault Tolerance
- Software Reliability Testing
- Software Reliability Planning
Each training module is described in the following section.
TRAINING MODULE DESCRIPTIONS
(1) Software Development Life Cycle Best Practices
Despite the commonality of software development practices across industries, companies generate products with different levels of quality and reliability. For this reason, this module focuses on “best practices” for defect removal shared by companies that consistently produce high quality software. After presenting a methodology for measuring defect removal efficiency at each development phase, we then evaluate the effectiveness of integrating various practices into your existing life cycle. We conclude the module with a closer look at design and code inspections; what works, what doesn’t and why.
(2) Reliability Concepts for Software
This module reviews the basic software reliability concepts. To clarify the source of frequent ambiguity, Quality vs. Reliability is examined closely and the characteristics of reliable software are identified. Defects, faults and failures terminology and a 2-level failure classification scheme are established. General reliability concepts that apply to software are presented, including failure rates, system availability, interface robustness, and fault tolerance.
(3) Integrating Reliability Practices into the Software Development Life Cycle
Companies follow various approaches in an attempt to develop reliable software. These traditional approaches are examined for their strengths and weaknesses. Then, a practical approach is introduced that allows for the reliability practices to be integrated by developers into the software life cycle. As with any project, the path to achieving reliability goals is defined by a plan. Next, modeling concepts are introduced as sources for early prediction and late estimation of defect and failure rate data. Design and implementation reliability practices are reviewed and grouped based on objectives. Reliability testing goals and techniques are reviewed for their contribution to the test cycle and reliability criteria. Finally, measurements and metrics are presented as a foundation for all reliability practices.
(4) Development Phase Reliability Practices
This module reviews various reliability practices that apply during the design and implementation phases. System functionality is captured in usage profiles developed from the architectural specification and refined throughout the design phase. Then various analysis techniques are reviewed. The system and HW failure analyses can be leveraged in defining software failure modes. The software can then be analyzed to identify critical and vulnerable sections as focal points for inspections and fault tolerance. We also review guidelines for high reliability derived from the software safety development. This module shows how to mitigate the occurrence of interface defects as a major source of run-time failures by defining robust interface specifications and enforcing them with defensive programming techniques. We present a review technique that combines aspects of several existing inspection methods and utilizes historical data, failure analysis results and usage profiles to identify defects that are likely to result in run-time failures.
(5) Software Fault Tolerance
For systems running at customer installations, fault tolerance offers a last line of defense against failures by focusing on increasing availability. Fault tolerance has become synonymous with either hardware redundancy or software exception handling. This module presents several valuable, but lesser-known, techniques to address many common failure scenarios. This module also addresses the general case with guidelines to evaluate fault tolerance techniques and incorporate them into an overall system design. For systems that are continuously running over very long periods of time, the concept of software rejuvenation is introduced to counteract software aging effect.
(6) Software Reliability Testing
This module addresses reliability practices during both unit and system-level testing. The most prevalent failures encountered during development, testing and early deployment are trivial, run-time failures characterized as deterministic and easily reproducible, even after a system restart or reboot. The main goal of unit testing should be to identify as many of these types of failures as possible for removal, prior to system-level testing.
The techniques used to achieve these goals are:
- Interface robustness testing using equivalence classes and boundary value analysis
- Testing fault tolerant code
- Practical usage of code coverage
- Defect density analysis using change management
- Certifying 3rd party software
Traditional system-level testing incorporates various types of testing to verify functionality and detect failures for repair. Usage profiles increase the efficiency of traditional testing by providing mechanisms for test case definition and prioritization. Moreover, these usage profiles can drive reliability testing to determine software failure rates and measure reliability growth.
(7) Software Reliability Planning
All too often, development organizations seek to improve the reliability of their product software but are uncertain of how to define and achieve their reliability goals. This module demonstrates how this can be accomplished by developing a reliability plan. Initially, the planning endpoints must be determined by assessing the current reliability baseline and defining the reliability targets of the current project. Then, the elements of a reliability plan are reviewed, including establishing goals, making initial predictions, measuring progress, and verifying reliability at each phase.