https://www.youtube.com/watch?v=M7fV-eQwxrY
Define bugs
- System is subject to a set of requirements.
- A software defect is a non-conformity to requirements.
- Pre-Curr-Post condition violated.
- A non-conformity is a failure to meet one or more requirements.
- A defect is incorrect program data causes a non-conformity.
- A symptom is observable evidence of a defect.
- A deterministic defect is a defect that does not change its symptoms under a well-defined set of conditions.
- In contrast, a non-deterministic defect is a defect that changes its symptoms from run-to-run under a well-defined set of conditions.
Terminology
- A context is the totality of the environment is which a program that exhibits symptoms is running
- A problem report describes one or more symptoms in some context
- Analogous context is a replica of the original context
- Lab is the setting that we have total control over the context
- Field is the setting that we have minimal or no control over the context
Relationship
Problem report -> Symptoms <-> DefectsChallenges
- Problem report can be unhelpful (feed back from the user)
- Problem report may not indicate actual problem
- Collecting program state data may be difficult (log/setting/dump)
- Symptoms may not indicate the cause
- Defects and symptoms change as repair progresses
- Fixing one defect may introduce new defects (messy design/quick fix)
- Symptoms can be difficult to reproduce
Debugging process
Tend to think debugging is a linear process; i.e.
- Characterize and reproduce
- Locate
- Classify
- Understand
- Repair
In reality tips
- Review problem report
- Characterize and reproduce problem
- Clone if possible
- Reproduce problem (loop)
- understand problem
- locate problem
- classify problem
- gain insight
- attempt to repair
- Problem fixed; deliver
In detail
Characterizing
- Determining the context in which symptoms were observed
- Version number, platform, resources allocated, external interfaces, configuration data, etc.
- Information that allows you to instantiate an analogous context
Reproducing
- Instantiating an analogous context, in the lab, or in the field
- Running enough of the program/system to observe the reported symptoms
- Developing new/updating existing test assets to demonstrate the failure
- Make sure looking at the correct source code.
Characterizing and reproducing a problem is vital to the debugging process.
Understanding
Gaining ENOUGH knowledge about a problem and the surrounding code, that you believe you can make changes to carry out a repair.
At a minimum
- located the incorrect lines of code
- why the code is incorrect, root cause?
- check the proposed classification
- formulated a set of proposed changes
- determine how the proposed changes could affect the runtime state
Inspect and verify the associated test assets
- The test cases or harnesses may be broken
- Test data should demonstrate correct and incorrect behavior
The defect may not be where you expect it
- Keep an open mind and be ready to question all parts of the program
Ask yourself where the defect is not
- trying to prove the absence of a defect reveals the defect
Explain to people why there is a defect, and why your proposed fix will resolve the defect
- A local guru or bobblehead could be helpful - reach out for help if necessary
Locate the problem
Employ good development practices at the outset
- Practice iterative, incremental, bottom-up development
- Add functionality in small sections of code
- Create test assets for each new increment of functionality
- Verify that new code doesn't cause previous test cases to fail
- Verify that new code passes its own test cases
- Practice defensive programming
Alas
- Well-written and extensive test assets
- Preferably the whole product does this, at a minimum your fixes should
- Adds runtime overhead, which can hinder the search for non-deterministic problems
Use trace logging
- Generating output describing the program state during execution
- In simpler cases, instrument code with print statements
- In more complex systems, take advantage of existing logging facilities
Alas
- Great way to stay 'on the path' when developing new code
- An easy first step in narrowing down a problem's scope
Use debugging and analysis tools
- Compiler warnings
- Static code analysis tools (cppcheck, etc.)
- Interactive debugger(gdb, lldb, udb, etc.)
- Time-travel debugger(gdb, rr, udb, etc.)
- Sanitizers (asan, tsan, ubsan, etc.)
- Dynamic program analyzers(valgrind, etc.)
- tracers (strace, wireshark, etc.)
Alas
- for deterministic problems
- not always useful for non-deterministic problem
Enable and/or add assertions
- verify pre/curr/post condition of a function call.
- verify expected program state
Alas
- little effect on execution speed
Use backtracking
- Try to understand the program state at each backward step
Alas
- Good for very simple programs/small search with deterministic problems
Divide and conquire (binary search)
- Pick section of code to examine
- Place an assertion or set a breakpoint.
- Repeat until reveals the defect
Problem simplification
- Gradually and strategically remove/comment out sections of irrelevant code
Alas
- useful for debugging crashes of release builds
- work backwards from the end of the section
Make the problem worse
- Magnifies the problem signal.
Alas
- helpful in first step finding and understanding the problem
Scientific method
- Form a hypothesis consistent with observations
- Implement tests to refute the hypothesis
- If refuted, form a new hypothesis with new tests
Alas
- time consuming; especially for code base that unfamiliar with
- effective for all problems
Problem Types
Deterministic problems
- Review the logs
- Add assertions
- Use interactive debugger
Non-deterministic problems
- Review the logs
- Create a debug build and see if it also exhibits the same symptoms
- Add assertions where needed to verify invariants
- Add assertions; comment out code, divide-and-conquer
- Make the problem WORSE to magnify the problem
- try low-overhead debugging tools "$gcc -g -o2"
Steps
Classifying
- Determining a defect's category
- Useful in formulating a repair strategy
- Important information in subsequent reviews when considering preventive actions
Syntax errors
Syntax warnings
Implementation errors
Logic errors
Configuration errors
Repairing the problem
- Implementing the appropriate fixed.
- Passing the tests
- Tests should be well written
- Minimize changes to the system - keep changes small and localized
- Verify repairs against test assets
- All new/update tests should pass
- All other tests should pass
Delivery
- Practice good version control
- Don't include fixes for more than one problem in one commit
- Don't include extraneous changes (e.g. new features) in fix commits
- Include new/update test assets in the fix commits
- Write commit comments clear and concise
Verify tests again
- Double check all new/update tests pass
- Double check all other tests pass
Create documentation for posterity
- How the defect was noticed
- The conditions under which the defect occurred - the context
- Steps necessary to reproduce the defect - the analogous context
- Techniques and tools used to localize the defect
- Defect's category
- Underlying root cause of the defect
- Latent defects precluded by fixing this defect
- Possible latent defects left unaddressed
- Mistake made and recommendations for preventive actions
Developing new feature
- Practice defensive programming
- Assume the worst case could happen at any time.
- Employ an appropriate iterative and incremental development process
- Decide what needs to be achieved
- Formulate a plan for the achievement
- Understand the invariants, requirements, and context, then design the solution.
- Implement the solution in small, discrete, testable chunks
- Write code to verify invariants, pre-cur-post conditions and self-test complex components
- Consider employing the principles of test-driven design
- Employ good configuration management practice EVERYWHERE.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.