May 7, 2022

[CPPCON 2021][notes] Debugging/Developing techniques

Reference:
https://www.youtube.com/watch?v=M7fV-eQwxrY

Define bugs

  • System is subject to a set of requirements.
  • A software defect is a non-conformity to requirements.
  • Pre-Curr-Post condition violated.
  • A non-conformity is a failure to meet one or more requirements.
  • A defect is incorrect program data causes a non-conformity.
  • A symptom is observable evidence of a defect.
  • A deterministic defect is a defect that does not change its symptoms under a well-defined set of conditions.
  • In contrast, a non-deterministic defect is a defect that changes its symptoms from run-to-run under a well-defined set of conditions.


Terminology

  • A context is the totality of the environment is which a program that exhibits symptoms is running
  • A problem report describes one or more symptoms in some context
  • Analogous context is a replica of the original context
  • Lab is the setting that we have total control over the context
  • Field is the setting that we have minimal or no control over the context


Relationship

Problem report -> Symptoms <-> Defects


Challenges

  • Problem report can be unhelpful (feed back from the user)
  • Problem report may not indicate actual problem
  • Collecting program state data may be difficult (log/setting/dump)
  • Symptoms may not indicate the cause
  • Defects and symptoms change as repair progresses
  • Fixing one defect may introduce new defects (messy design/quick fix)
  • Symptoms can be difficult to reproduce


Debugging process

Tend to think debugging is a linear process; i.e.
  • Characterize and reproduce
  • Locate
  • Classify
  • Understand
  • Repair


In reality tips

  • Review problem report
  • Characterize and reproduce problem
  • Clone if possible
  • Reproduce problem (loop)
    • understand problem
    • locate problem
    • classify problem
    • gain insight
    • attempt to repair
  • Problem fixed; deliver



In detail

Characterizing

  • Determining the context in which symptoms were observed
  • Version number, platform, resources allocated, external interfaces, configuration data, etc.
  • Information that allows you to instantiate an analogous context


Reproducing

  • Instantiating an analogous context, in the lab, or in the field
  • Running enough of the program/system to observe the reported symptoms
  • Developing new/updating existing test assets to demonstrate the failure
  • Make sure looking at the correct source code.
Characterizing and reproducing a problem is vital to the debugging process.


Understanding

Gaining ENOUGH knowledge about a problem and the surrounding code, that you believe you can make changes to carry out a repair.

At a minimum

  • located the incorrect lines of code
  • why the code is incorrect, root cause?
  • check the proposed classification
  • formulated a set of proposed changes
  • determine how the proposed changes could affect the runtime state

Inspect and verify the associated test assets

  • The test cases or harnesses may be broken
  • Test data should demonstrate correct and incorrect behavior

The defect may not be where you expect it

  • Keep an open mind and be ready to question all parts of the program

Ask yourself where the defect is not

  • trying to prove the absence of a defect reveals the defect

Explain to people why there is a defect, and why your proposed fix will resolve the defect

  • A local guru or bobblehead could be helpful - reach out for help if necessary


Locate the problem

Employ good development practices at the outset

  • Practice iterative, incremental, bottom-up development
  • Add functionality in small sections of code
  • Create test assets for each new increment of functionality
  • Verify that new code doesn't cause previous test cases to fail
  • Verify that new code passes its own test cases
  • Practice defensive programming

Alas

  • Well-written and extensive test assets
  • Preferably the whole product does this, at a minimum your fixes should
  • Adds runtime overhead, which can hinder the search for non-deterministic problems

Use trace logging

  • Generating output describing the program state during execution
  • In simpler cases, instrument code with print statements
  • In more complex systems, take advantage of existing logging facilities

Alas

  • Great way to stay 'on the path' when developing new code
  • An easy first step in narrowing down a problem's scope

Use debugging and analysis tools

  • Compiler warnings
  • Static code analysis tools (cppcheck, etc.)
  • Interactive debugger(gdb, lldb, udb, etc.)
  • Time-travel debugger(gdb, rr, udb, etc.)
  • Sanitizers (asan, tsan, ubsan, etc.)
  • Dynamic program analyzers(valgrind, etc.)
  • tracers (strace, wireshark, etc.)

Alas

  • for deterministic problems
  • not always useful for non-deterministic problem

Enable and/or add assertions

  • verify pre/curr/post condition of a function call.
  • verify expected program state

Alas


Use backtracking

  • Try to understand the program state at each backward step

Alas

  • Good for very simple programs/small search with deterministic problems

Divide and conquire (binary search)

  • Pick section of code to examine
  • Place an assertion or set a breakpoint.
  • Repeat until reveals the defect

Problem simplification

  • Gradually and strategically remove/comment out sections of irrelevant code

Alas

  • useful for debugging crashes of release builds
  • work backwards from the end of the section

Make the problem worse

  • Magnifies the problem signal.

Alas

  • helpful in first step finding and understanding the problem

Scientific method

  • Form a hypothesis consistent with observations
  • Implement tests to refute the hypothesis
  • If refuted, form a new hypothesis with new tests

Alas

  • time consuming; especially for code base that unfamiliar with
  • effective for all problems


Problem Types

Deterministic problems

  • Review the logs
  • Add assertions
  • Use interactive debugger

Non-deterministic problems

  • Review the logs
  • Create a debug build and see if it also exhibits the same symptoms
  • Add assertions where needed to verify invariants
  • Add assertions; comment out code, divide-and-conquer
  • Make the problem WORSE to magnify the problem
  • try low-overhead debugging tools "$gcc -g -o2"



Steps

Classifying

  • Determining a defect's category
  • Useful in formulating a repair strategy
  • Important information in subsequent reviews when considering preventive actions

Syntax errors
Syntax warnings
Implementation errors
Logic errors
Configuration errors


Repairing the problem

  • Implementing the appropriate fixed.
  • Passing the tests
  • Tests should be well written
  • Minimize changes to the system - keep changes small and localized
  • Verify repairs against test assets
    • All new/update tests should pass
    • All other tests should pass

Delivery

  • Practice good version control
  • Don't include fixes for more than one problem in one commit
  • Don't include extraneous changes (e.g. new features) in fix commits
  • Include new/update test assets in the fix commits
  • Write commit comments clear and concise

Verify tests again

  • Double check all new/update tests pass
  • Double check all other tests pass

Create documentation for posterity

  • How the defect was noticed
  • The conditions under which the defect occurred - the context
  • Steps necessary to reproduce the defect - the analogous context
  • Techniques and tools used to localize the defect
  • Defect's category
  • Underlying root cause of the defect
  • Latent defects precluded by fixing this defect
  • Possible latent defects left unaddressed
  • Mistake made and recommendations for preventive actions



Developing new feature

  • Practice defensive programming
    • Assume the worst case could happen at any time.
  • Employ an appropriate iterative and incremental development process
    • Decide what needs to be achieved
    • Formulate a plan for the achievement
    • Understand the invariants, requirements, and context, then design the solution.
    • Implement the solution in small, discrete, testable chunks
    • Write code to verify invariants, pre-cur-post conditions and self-test complex components
  • Consider employing the principles of test-driven design
  • Employ good configuration management practice EVERYWHERE.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.