Aug 12, 2018

[C++][clang][gcc] tail call optimization

#include <iostream>
using namespace std;

int voidret(int i)
{
    if (i < 0) {
        i++;
    }
    else {
        i--;
    }
    return voidret(i);
}


int main()
{
    auto i = voidret(10);
}

clang++ -O3 result:
2030951640

g++ -O3 result:
indifinite

clang++ -O3 assembly:
voidret(int):                            # @voidret(int)
        ret
main:                                   # @main
        xor     eax, eax
        ret
_GLOBAL__sub_I_example.cpp:             # @_GLOBAL__sub_I_example.cpp
        push    rax
        mov     edi, offset std::__ioinit
        call    std::ios_base::Init::Init() [complete object constructor]
        mov     edi, offset std::ios_base::Init::~Init() [complete object destructor]
        mov     esi, offset std::__ioinit
        mov     edx, offset __dso_handle
        pop     rax
        jmp     __cxa_atexit            # TAILCALL

g++ -O3 assembly:
voidret(int):
.L2:
        jmp     .L2
main:
        mov     edi, 10
        call    voidret(int)
_GLOBAL__sub_I_voidret(int):
        sub     rsp, 8
        mov     edi, OFFSET FLAT:_ZStL8__ioinit
        call    std::ios_base::Init::Init() [complete object constructor]
        mov     edx, OFFSET FLAT:__dso_handle
        mov     esi, OFFSET FLAT:_ZStL8__ioinit
        mov     edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
        add     rsp, 8
        jmp     __cxa_atexit

Reasoning:
https://stackoverflow.com/questions/18478078/clang-infinite-tail-recursion-optimization

quote:
While both g++ and clang++ are able to compile C++98 and C++11 code, clang++ was designed from the start as a C++11 compiler and has some C++11 behaviors embedded in its DNA.

With C++11 the C++ standard became thread aware, and that means that now there are some specific thread behavior. In particular 6.8.2.2 states:
The implementation may assume that any thread will eventually do one of the following:
  • terminate,
  • make a call to a library I/O function,
  • perform an access through a volatile glvalue, or
  • perform a synchronization operation or an atomic operation.
[ Note: This is intended to allow compiler transformations such as removal of empty loops, even when termination cannot be proven. — end note ]

And that is precisely what clang++ is doing when optimizing. It sees that the function has no side effects and removes it even if it does not terminate.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.