May 9, 2016

[C][C++][UB] in details.

Reference:
Both true and false: a Zen moment with C

asm: test instruction:
https://web.itu.edu.tr/kesgin/mul06/intel/instr/test.html

asm: SETNE instruction :
https://web.itu.edu.tr/kesgin/mul06/intel/instr/setne_setnz.html

code:

#include <stdio.h>
#include <stdbool.h>

int main(int argc, char *argv[])
{
    volatile bool p;

    if ( p )
        puts("p is true");
    else
        puts("p is not true");

    if ( ! p )
        puts("p is false");
    else
        puts("p is not false");

    return 0;
}
asm code:
 .file   "bool1.c"
        .intel_syntax noprefix
        .section        .rodata
.LC0:
        .string "p is true"
.LC1:
        .string "p is not true"
.LC2:
        .string "p is false"
.LC3:
        .string "p is not false"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        push    rbp
.LCFI0:
        mov     rbp, rsp
.LCFI1:
        sub     rsp, 32
.LCFI2:
        mov     DWORD PTR [rbp-20], edi
        mov     QWORD PTR [rbp-32], rsi
        movzx   eax, BYTE PTR [rbp-1]
        test    al, al
        je      .L2
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        jmp     .L3
.L2:
        mov     edi, OFFSET FLAT:.LC1
        call    puts
.L3:
        movzx   eax, BYTE PTR [rbp-1]
        xor     eax, 1  // HERE, since local variable isn't init., the value could be other value than 1.
                        // Thus, an XOR will always produce True.
        test    al, al
        je      .L4
        mov     edi, OFFSET FLAT:.LC2
        call    puts
        jmp     .L5
.L4:
        mov     edi, OFFSET FLAT:.LC3
        call    puts
.L5:
        mov     eax, 0
        leave
.LCFI3:
        ret
 
Reference:
Undefined behavior can result in time travel

If there's an UB in code path, compiler could consider all code paths go to one code path.

code:
int table[4];
bool exists_in_table(int v)
{
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return true;
    }
    return false;
}

inference:
A post-classical compiler, on the other hand, might perform the following analysis:
  • The first four times through the loop, the function might return true.
  • When i is 4, the code performs undefined behavior.
  • Since undefined behavior lets me do anything I want, I can totally ignore that case and proceed on the assumption that i is never 4. (If the assumption is violated, then something unpredictable happens, but that’s okay, because undefined behavior grants me permission to be unpredictable.)
  • The case where i is 5 never occurs, because in order to get there, I first have to get through the case where i is 4, which I have already assumed cannot happen.
  • Therefore, all legal code paths return true.


to code:
bool exists_in_table(int v)
{
    return true;
}

Reference:
What Every C Programmer Should Know About Undefined Behavior #1/3
What Every C Programmer Should Know About Undefined Behavior #2/3
What Every C Programmer Should Know About Undefined Behavior #3/3
A Guide to Undefined Behavior in C and C++, Part 1
A Guide to Undefined Behavior in C and C++, Part 2
A Guide to Undefined Behavior in C and C++, Part 3

  • Interacting Compiler Optimizations Lead to Surprising Results
  • Undefined Behavior and Security Don't Mix Well
  • Debugging Optimized Code May Not Make Any Sense.
  • "Working" code that uses undefined behavior can "break" as the compiler evolves or changes
  • There is No Reliable Way to Determine if a Large Codebase Contains Undefined Behavior


UBs:
  • Use of an uninitialized variable
  • Signed integer overflow
  • Oversized Shift Amounts
  • Dereferences of Wild Pointers and Out of Bounds Array Accesses
  • Dereferencing a NULL Pointer
  • Violating Type Rules
  • It is undefined behavior to cast an int* to a float* and dereference it (accessing the "int" as if it were a "float").


Reference:
Adventures in undefined behavior: The premature downcast

"If a nonstatic member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined."
In other words, if you are invoking a method on an object of type X, then you are promising that it really is of type X, or a class derived from it.

code:
class Shape
{
public:
    virtual bool Is2D() { return false; }
};

class Shape2D : public Shape
{
public:
    virtual bool Is2D() { return true; }
};

Shape *FindShape(Cookie cookie);

void BuyPaint(Cookie cookie)
{
    Shape2D *shape = static_cast<Shape2D *>(FindShape(cookie));
    if (shape->Is2D()) {  // ALWAYS TRUE! Since it's the type of Shape2D
       .. do all sorts of stuff ...
    }
}

Reference:
A static_cast is not always just a pointer adjustment

The rule for null pointers is that casting a null pointer to anything results in another null pointer.

Reference:
A bit of background on compilers exploiting signed overflow


------------
For infinite loop, compiler should not opt out in these conditions:
The implementation may assume that any thread will eventually do one of the following:
  • terminate,
  • make a call to a library I/O function, 
  • access or modify a volatile object, 
  • or perform a synchronization operation or an atomic operation.

Empty infinite loops are UB in C++11 and later.

Reference:
Compilers and Termination Revisited
Is this infinite recursion UB?
Optimizing away a “while(1);” in C++0x
is C implementation allowed to terminate an infinite loop?
[rust] LLVM loop optimization can make safe programs crash

--
Principles for Undefined Behavior in Programming Language Design - John Regehr

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.