Jun 2, 2018

[C++][book][elements of programming][note] regular type

Reference:
Fundamentals of Generic Programming - Alex Stepanov
[C++11] C++11 Library Design

Reading notes for:
Titus Winters - Revisiting Regular Types

Define Regular types (quoted):
The term Regular is meant to describe the syntax and semantics of built-in types in a fashion that allows user-defined types to behave sensibly.

The C++ programming language allows the use of built-in type operator syntax for user-defined types.
This allows us to make our user-defined types look like built-in types.
Since we wish to extend semantics as well as syntax from built-in types to user types, we introduce the idea of a Regular type, which matches the built-in type semantics, thereby making our user-defined types behave like built-in types as well.

Thus, define customize type as 'do as the ints do'.

Regular definition:
Both definitions focus heavily on semantics not just syntax.


The reasoning about code is much easier if the code consists of Regular types, instead of non-Regular ones, using the existing understanding of how built-in types work.


Four of the most basic semantic requirements on Regular types from Stepanov's early paper:
// comparison follows from copy
T a = b; assert(a==b);

// copy and assignment are the same
T a1; a1 = b; T a2 = b; assert(a1 == a2);

// copy/assignment is by value, not reference
T a = c; T b = c; a = d; assert(b==c);

// zap always mutates, unmutated values are untouched
T a = c; T b = c; zap(a); assert(b==c && a!=b);


Reference:
Identity of indiscernibles : https://en.wikipedia.org/wiki/Identity_of_indiscernibles

Logicians might define equality via the following equivalence:
x == y ⇔ ∀ predicate P, P(x) == P(y)

This is true:
x == y ⇒ ∀ predicate P, P(x) == P(y)

But reverse might not be true:
∀ predicate P, P(x) == P(y) ⇒ x == y


Programming languages inherently contain predicates that don't exist in pure math, because the execution on computing hardware is a somewhat leaky abstraction.
(Consider X, Y refer to same memory address but might has it's value changed.)

In general, we focus on predicates that observe 'the value' rather than the identity of the instance.

Objects which are naturally variable sized must be constructed in C++ out of multiple simple structs, connected by pointers. In such cases, we say that the object has remote parts. For such objects, the equality operator must compare the remote parts...


Definition of equality:
Two objects are equal if their corresponding parts are equal (applied recursively), including remote parts (but not comparing their addresses), excluding inessential components, and excluding components which identify related objects.

When designing our own Regular type:
  • how to compare two instances of our type (by value, focusing on the logical state, not comparing by identity/memory location)
  • If our type implements all the syntactic/semantic requirements for Regular

P0898(r2) Standard Library Concepts
Definition of concept (in general, including philosophy)

UDT considered Regular type iff has these member functions:
  • DefaultConstructible, 
  • CopyConstructible, 
  • Destructible,
  • Movable, 
  • Swappable,
  • Assignable, 
  • EqualityComparable

Consider const member function thread safe.
i.e
  • concurrent (non-synchronized) calls to const methods are allowed,
  • if any concurrent call is made to a non-const method, there is the chance for a data race.
(ref CppCon 2014: Herb Sutter "Lock-Free Programming note )
  • On the C++ abstract machine there is no such thing as a safe data race. 
  • The C++ standard specifically calls this out: data races are undefined behavior. 
  • No correct program has undefined behavior. 
  • const means const contract. If internal state changes, violates the const contract.

Classify types as either 'thread-safe', 'thread-compatible', or 'thread-unsafe', based on the conditions under which use of its API may result in a data race.
  • Thread-safe:
    No concurrent call to any API of this type causes a data race.
    This is useful for things like a Mutex.
    Generally speaking, thread-safe types are easiest to work with, but you pay for some of that usability in performance or API restrictions or both.
    (Reference[C++] Always consider function thread safeness drags performance in single thread code. )
  • Thread-compatible:
    No concurrent call to any const operation on this type causes a data race.
    Any call to a non-const API means that instance must be used with external synchronization.
    C++ guarantees that standard library types are at least thread-compatible.
    This follows from the general pattern of Regular design, and 'do as the ints do' as int is thread-compatible.
    In most cases, this is in-line with the philosophy of C++ - you do not pay for what you do not use.
    If you operate on an optional<int>, you can be sure that it isn't grabbing a mutex.
    On the other hand, thread-compatible may have overhead in some cases: shared_ptr<> is unnecessarily expensive in cases where there is no sharing between threads, because of the use of atomics to synchronize the reference count.
    (GCC's shared_ptr detects whether the executable is linked to libpthread and uses non-atomic updates when possible.)
  • Thread-unsafe: Even concurrent calls to const APIs on this type may cause data races - use of an instance of such a type requires external synchronization or knowledge of some form to be used safely.
    These are generally either used with a mutex or are used with knowledge like 'I know that this instance is only accessed from this thread' or 'I know that my whole program is only single threaded.'
    Types like this may be because of mutable members, or because of non-thread-safe data that is shared between instances.


When using a function, or a Regular type instance, ALWAYS consider precondition.

Consider about int* type instance.
It's precondition can only be checked during run-time,
and there's no member function of int* type can check the precondition of int* type instance.
i.e
Invoking int* operation safely requires structural knowledge of the program. A type that has dependent preconditions has one or more such APIs; these are often (but not always) about properties of non-owned objects/external memory/etc.

APIs that have dependent preconditions are more complicated to use - they fundamentally require knowledge about the rest of the program in order to use safely.


Race-Free + Regular

When operates on a type instance, consider it race-free iff:
  • Thread-compatible and not shared with other threads for writing.
    If been handed a (non-racing) const T& you can operate on this in const fashion.
    If necessary, you can copy it to ensure there are no lurking references and perform any computation / mutation safely (but inefficiently).
    With minor knowledge (the instance isn't shared), a T& can be used safely as if it were T.
  • It has dependent-preconditions, but for a particular instance + any dependent data, the program structure guarantees safe usage.
  • Single-threaded usage - There is only one thread in the program and thus all instances of the type are safe to use, or a given instance is known to not be shared among threads.

With above 3 options, we have these definitions:
  • Thread-compatible + Regular is what we really want for user-defined types that mimic built-ins.
    This lets us reason about an instance in the expected fashion and use it efficiently in conjunction with generic algorithms.
    Types that have mutable data may have some overhead to support this.
  • Dependent-preconditions with knowledge that an instance + its dependent data are safe to use.
    This is the common usage for string_view when we use it as a non-owning parameter type: the underlying buffer will outlive the function call and is immutable for the duration of the call. (reference: https://github.com/jeaye/value-category-cheatsheet/blob/master/value-category-cheatsheet.pdf)
    Given that external knowledge of that underlying buffer, string_view behaves as if it were Regular. This makes sense, given that string_view was designed to be a drop-in replacement for const string&, and although references are not Regular types (Reference: [C++] Union/StandardLayoutType can not have reference data member), std::string types are.
  • Single-threaded usage - This is easy to misuse, but can be an important area for optimization.
    Consider the discussions to provide a shared_ptr analogue that does not synchronize its reference count - if we know something about program structure, or can guarantee particular usage for an instance, we can design a more efficient type in this fashion. Given that knowledge, such a shared_ptr can still behave as if it were Regular.
Use this to verify if the type is regular type:
using T = UDT;
void DoSomething(const T& t);

const T a = SomeT();  // Assume SomeT() is providing a
                      // long-lived and stable buffer.
const T b = SomeT();

if (a == b) {
  DoSomething(a);
  assert(a == b);
}

//Stepanov’s axioms about assignment and comparison.
// comparison follows from copy
T a = b; assert(a==b);

// copy and assignment are the same
T a1; a1 = b; T a2 = b; assert(a1 == a2);

// copy/assignment is by value, not reference
T a = c; T b = c; a = d; assert(b==c);

// zap always mutates, unmutated values are untouched
T a = c; T b = c; zap(a); assert(b==c && a!=b);


The point of having string_view, std::span is to make the API interface consistent, which is,
instead of using
  • const char* 
  • const string& // reference itself is NOT regular type, which is not owning.


We could simply use
  • const std::span  // using const to ensure it's member function called is thread safe.
  • const std::string_view  // using const to ensure it's member function called is thread safe.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.