Nov 6, 2024

[C++] name look up minor refresh

One colleague mentioned in the chat; why accessing private inherited base type inside the derived type member function results in access control failure "'BaseType' is a private member of 'BaseType'"?

i.e. https://godbolt.org/z/hWef37sT9

It turns out from the basic name resolution logic in the language which refer to ISO 6.5 : https://timsong-cpp.github.io/cppwp/basic.lookup.unqual


To summarize:

  • Name lookup
    This is the process by which the compiler determines the meaning of a name, without regard to access permissions. This lookup will find the declaration or definition of the name in the applicable scopes (namespace, class, or enclosing function).
  • Access control
    After name lookup has identified the entity, the compiler checks access control to determine whether it is permissible to use the entity based on access specifiers like public, protected, or private.

In essence, the sequence is:

  1. First, perform name lookup.
  2. Then, check access control on the resolved name.

Thus simply solution goes to:
class PrivateBase {
};

class Middle: private PrivateBase {
};

class Child: public Middle {
    ::PrivateBase Fooey() {
        return ::PrivateBase();
    }
};

[C++] include from /dev/stdin

When I was publishing Concurrent LRUCache which is used by my previous job at Linkedin, I used this feature provided by godbolt for demo, which is it could include headers through https protocol. 
e.g: https://godbolt.org/z/Y6he8z9Gf

Turned out this feature can be done through #include "/dev/stdin"

#include "/dev/stdin"

int main(){
  foo f{};
  (void)(f);
}

$ echo "struct foo {};" | clang++ -std=c++20 main.cc -Wall

Just replace the echo part with `wget`

Nov 3, 2024

[C++] Security in C++ - Hardening Techniques From the Trenches

Reference:
Security in C++ - Hardening Techniques From the Trenches - Louis Dionne - C++Now 2024
https://libcxx.llvm.org/Hardening.html

BCE (bounds check elimination)
https://en.wikipedia.org/wiki/Bounds-checking_elimination


Types of memory safety (as of 2024)

  • Spatial memory safety(*)
  • Temporal memory safety(*)
  • Type safety
  • Guaranteed initialization
  • Thread safety


Spatial memory safety

  •  Each memory allocation has a given size.
  •  Accessing memory out of bounds is called an out-of-bounds(OOB) access.

Temporal memory safety

  •  All memory accesses to an object should occur during the lifetime of the objects' allocation.
  •  Access to the object outside of this window is called a use-after-free

Type safety

  •  A memory allocation is used to represent an object of a particular type.
  •  Interpreting it as an object of a different type is called a type confusion.

Guaranteed initialization

  •  When memory is allocated, it contains garbage
  •  Using that undefined content can lead to information disclosure
  •  Can also be exploited if the attacker controls the 'garbage'

Thread safety

  •  concurrent accesses to memory
  •  data races

C++

  •  It turns out that most safety issues are technically UB.

Library UB

  •  UB is what happens when the standard doesn't guarantee anything.
  •  Beware of precondition.

UB is a specification tool

  •  creates a contract with the programmer
  •  allows writing simpler APIs that make sense
  •  gives freedom to the implementation

Valid strategies:

  •  do nothing.
  •  trap if precondition is violated
  •  log and continue
  •  ...

Standard library hardening

  •  turn select UB into guaranteed traps
  •  provide hardening modes with high-level semantics
  •  allow users to select hardening mode
  •  allow vendors to select the default mode
  •  Not for debugging
  •  should be shipped as it is


libc++ hardening modes

  • none
  • fast -> trap
  • extensive -> trap SIGTRAP(5)
  • debug -> abort verbosely
Hardening mode can be selected in each TU.

ABI considerations

  • Orthogonal to hardening
  • ABI is a property of the platform
  • Vendors can select the desired ABI
  • users can't control that
  • Huge simplification

  • WebKit uses hardened libc++
  • Chrome and Google Cloud network virtualization stack.



Clang++
-Wunsafe-buffer-usage





Enter Contracts

  •  enforcing existing preconditions
  •  Contracts provide a framework for expressing them

Typed memory operations

  •  Most temporal memory safety exploits require some type confusion. If memory is never reused for a different type, confusions are impossible.
  •  Segregate allocations by type!
  •  Introduced in the Darwin Kernel.
  •  Data must not alias pointers.
  •  Randomize buckets on boot.






No type info; only size of the type.


 




What about user define new operator?
Would be an ABI breaker, but...
(Hey, remember, "all problems in computer science can be solved by another level of indirection")


Type-aware allocation and deallocation functions

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2719r0.html






[C++] Next-Gen C++ Optimization Techniques

Reference:
Unlocking Modern CPU Power - Next-Gen C++ Optimization Techniques - Fedor G Pikus - C++Now 2024
https://vsdmars.blogspot.com/2016/01/likely-or-unlikely-easy-misleading.html
https://vsdmars.blogspot.com/2022/10/book-art-of-writing-efficient-programs.html

RCU:
https://vsdmars.blogspot.com/2024/07/c-rcu.html

TLB:
https://vsdmars.blogspot.com/2020/07/virtual-memory-refresh.html
https://vsdmars.blogspot.com/2020/07/pacific-2018re-read-designing-for.html
https://vsdmars.blogspot.com/2018/11/pacific-2018-designing-for-efficient.html


Modern CPUs rely on caches and pipelining to a much greater degree.
 Penalty for not using caches and for disrupting pipelines is far greater.

Memory access is characterized bny bandwidth and latency
 Bandwidth is much higher than 'latency per word'
 Random access speed is limited by latency
 Sequential access speed is limited by bandwidth

Prefetch attempts to predict future memory accesses and transfers memory content into cache in advance.
 Random access defeats prediction.









Key


In NUMA, the basic unit is NUMA node.

Solution to cross NUMA node latency-bound program


Trick: task_count_ as in main thread.


even better; batch processing

Redesign for NUMA data structure is intrusive.


CMD:
$ /sbin/lspci
$ cat /sys/bus/pci/devices/xxx/numa_node
$ numactl



GPU

I/O bound program



Real world cases
1) old code run slower on faster hardware



NUMA comes into play


Kernal flushes everything if TLB is outdated through 'TLB shootdown"; which is an inter-processor interrupt. The shootdown kernel code runs on the CPU. The 
shootdown is counted as 'system time' in the profiler.


NUMA migrations

Debugging TLB shootdown

Disable NUMA migration cmd:
$ echo 0 > /proc/sys/kernel/numa_balancing

Reduce TLB shootdown impact
Increase page size.(usually 4kb https://stackoverflow.com/a/11543988 )

$ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled


madvise API

2) Kernel tuning

Monitoring everything(metrics) inside the code.

Pay attention to the hardware spec


Wrap-up