Mar 8, 2012

[c++] The One-Definition Rule ,excerpt from C++ template programming appendix A

Affectionately known as the ODR, the one-definition rule is a cornerstone for the well-formed structuring of C++ programs. The most common consequences of the ODR are simple enough to remember and apply:

Define noninline functions exactly once across all files, and define classes and inline functions at most once per translation unitmaking sure that all definitions for the same entity are identical.



However, the devil is in the details, and when combined with template instantiation, these details can be daunting.

This appendix is meant to provide a comprehensive overview of the ODR for the interested reader. We also indicate when specific related issues are expounded on in the main text.

A.1 Translation Units


In practice we write C++ programs by filling files with "code." However, the boundary set by a file is not terribly important in the context of the ODR. Instead, what matters are so-called translation units.

Essentially, a translation unit is the result of applying the preprocessor to a file you feed to your compiler. The preprocessor drops sections of code not selected by conditional compilation directives (#if, #ifdef, and friends), drops comments, inserts# included files (recursively), and expands macros.

Hence, as far as the ODR is concerned, having the following two files

// File header.hpp:
#ifdef DO_DEBUG
#define debug(x) std::cout << x << '\n'
#else
#define debug(x)
#endif
void debug_init();
// File myprog.cpp:
#include "header.hpp"
int main()
{
debug_init();
debug("main()");
}
is equivalent to the following single file:
// File myprog.cpp:
void debug_init();
int main()
{
debug_init();
}

Connections across translation unit boundaries are established by having corresponding declarations with external linkage in two translation units (for example, two declarations of the global function debug_init()) or by argument-dependent lookup during the instantation of exported templates.

Note that the concept of a translation unit is a little more abstract than just "a preprocessed file." For example, if we were to feed a preprocessed file twice to a compiler to form a single program, it would bring into the program two distinct translation units (there is no point in doing so, however).

A.2 Declarations and Definitions
The terms declaration and definition are often used interchangeably in common "programmer talk." In the context of the ODR, however, the exact meaning of these words is important.  We also think it's a good habit to handle the terms carefully when exchanging ideas about C or
C++. We do so throughout this book. A declaration is a C++ construct that introduces or reintroduces a name in your program.

A declaration can also be a definition, depending on which entity it introduces and how it introduces it:
  • Namespaces and namespace aliases: The declarations of namespaces and their aliases are always also definitions, although the term definition is unusual in this context because the list of members of a namespace can be "extended" at a later time (unlike classes and enumeration types for example).
  • Classes, class templates, functions, function templates, member functions, and member function templates: The declaration is a definition if and only if the declaration includes a brace-enclosed body associated with the name. This rule includes unions, operators, member operators, static member functions, constructors and destructors, and explicit specializations of template versions of such things (that is, any class-like and function-like entity).
  • Enumerations: The declaration is a definition if and only if it includes the brace-enclosed list of enumerators.
  • Local variables and nonstatic data members: These entities can always e treated as definitions, although the distinction rarely matters.
  • Global variables: If the declaration is not directly preceded by a keyword extern or if it has an initializer, the declaration of a global variable is also a definition of that variable. Otherwise, it is not a definition.
  • Static data members: The declaration is a definition if and only if it appears outside the class or class template of which it is a member.
  • Typedefs, using-declarations, and using-directives: These are never definitions, although typedefs can be combined with class or union definitions.
  • Explicit instantiation directives: We can consider them to be definitions.

A.3 The One-Definition Rule in Detail
As we implied in the introduction to this appendix, there are many details to the actual rule. We organize the rule's constraints by their scope.

A.3.1 One-per-Program Constraints
There can be at most one definition of the following items per program:
  • Noninline functions and noninline member functions
  • Variables with external linkage (essentially, variables declared in a namespace scope or in the global scope, and with the static specifier)
  • Static data members
  • Noninline function templates, noninline member function templates, and noninline members of class templates when they are declared with export
  • Static data members of class templates when they are declared with export:
For example, a C++ program consisting of the following two translation units is invalid :
    Interestingly, it is valid C because C has a concept of  tentative definition, which is a variable
    definition without an initializer and can appear more than once in a program.


// Translation unit ONE:
int counter;
// Translation unit  TWO:
int counter; // ERROR: defined twice! (ODR violation)


This rule does not apply to entities with internal linkage (essentially, entities declared in an unnamed namespace scope or in the global scope using the static specifier ) because even when two such entities have the same name,they are considered distinct. In the same vein, entities declared in unnamed namespaces are considered distinct if they appear in distinct translation units.

For example, the following two translation units can be combined into a valid C++ program:

// Translation unit 1:
static counter = 2; // unrelated to other translation units
namespace {
void unique() // unrelated to other translation units
{
}
}
// Translation unit 2:
static counter = 0; // unrelated to other translation units
namespace {
void unique() // unrelated to other translation units
{
++counter;
}
}
int main()
{
unique();
}

Furthermore, there must be exactly one of the previously mentioned items in the program if they are used. The term used in this context has a precise meaning. It indicates that there is some sort of reference to the entity somewhere in the program.

This reference can be an access to the value of a variable, a call to a function, or the address of such an entity. This reference can be explicit in the source, or it can be implicit.

For example, a new expression may create an implicit call to the associated delete operator to handle situations when a constructor throws an exception requiring the unused (but allocated) memory to be cleaned up.

Another example consists of copy constructors, which must be defined even if they end up being optimized away. Virtual functions are also implicitly used (by the internal structures that enable virtual function calls), unless they are pure virtual functions. Several other kinds of implicit uses exist, but we omit them for the sake of conciseness.

There are two kinds of references that do not constitute a use in the previous sense:

1.The first kind occurs when a reference to an entity appears as part of a sizeof operator.
2.The second kind is similar but with a twist: If a reference appears as part of a typeid operator
(see Section 5.6 on page 58), it is not a use in the previous sense, unless the argument of the typeid operator ends designating a polymorphic object (an object with (possibly inherited) virtual
functions).

For example, consider the following single-file program:

#include 
class Decider {
#if defined(DYNAMIC)
virtual ~Decider() {
}
#endif
};
extern Decider d;
int main()
{
const char* name = typeid(d).name();
return (int)sizeof(d);
}

This is a valid program if and only if the preprocessor symbol DYNAMIC is not defined. Indeed, the variable d is not defined, but the reference to d in sizeof(d) does not constitute a use, and the reference in typeid(d) is a use only if is an object of a polymorphic type (because in general it is not always possible to determine the result of a polymorphic typeid operation until run time).

According to the C++ standard, the constraints described in this section do not require a diagnostic from a C++ implementation. In practice, they are almost always reported by linkers as duplicate or missing definitions.

A.3.2 One-per-Translation Unit Constraints
No entity can be defined more than once in a translation unit. So the following example is invalid
C++:

inline void f() {}
inline void f() {} // ERROR: duplicate definition

This is one of the main reasons for surrounding the code in header files with so-called guards:

// File guard_demo.hpp:
#ifndef GUARD_DEMO_HPP
#define GUARD_DEMO_HPP
…
#endif // GUARD_DEMO_HPP

Such guards ensure that the second time a header file is #included, its contents are discarded, thereby avoiding a duplicate definition of any class, inline function, or template it contains.

The ODR also specifies that certain entities must be defined in certain circumstances. This can be the case for class types, inline functions, and non-export templates. In the following few paragraphs we review the detailed rules.

A class type X (including structs and unions) must be defined in a translation unit prior to any of the following kinds of uses in that translation unit:

  • The creation of an object of type X (for example, as a variable declaration or through a new expression).
    The creation could be indirect, for example, when an object that itself contains an object of type X is being created.
  • The declaration of a data member of type X.
  • Applying the sizeof or typeid operator to an object of type X.
  • Explicitly or implicitly accessing members of type X.
  • Converting an expression to or from type X using any kind of conversion, or converting an expression to or from a pointer or reference to X (except void*) using an implicit cast, static_cast, or dynamic_cast.
  • Assigning a value to an object of type X.
  • Defining or calling a function with an argument or return type of type X. Just declaring such a function doesn't need the type to be defined however.

The rules for types also apply to types X generated from class templates, which means that the corresponding templates must be defined in those situations in which such a type X must be defined. These situations create so-called points of instantiation or POIs (see Section 10.3.2 on page 146).

Inline functions must be defined in every translation unit in which they are used (in which they are called or their address is taken). However, unlike class types, their definition can follow the point of use:

inline int not_so_fast();
int main()
{
not_so_fast();
}
inline int not_so_fast()
{
}

Although this is valid C++, some compilers do not actually "inline" the call to a function with a body that has not been seen yet; hence the desired effect may not be achieved.

Just as with class templates, the use of a function generated from a parameterized function declaration (a function or member function template, or a member function of a class template) creates a point of instantiation. Unlike class templates, however, the corresponding definition can appear after the point of instantiation (or not at all if it is exported).

The facets of the ODR explained in this appendix are generally easily verified by C++ compilers; hence the C++ standard requires that compilers issue some sort of diagnostic when one of these rules is violated. An exception is the lack of definition of a non exported parameterized function. Such situations are typically not diagnosed.

A.3.3 Cross-Translation Unit Equivalence Constraints

The ability to define certain kinds of entities in more than one translation unit brings with it the potential for a new kind of error: multiple definitions that don't match. Unfortunately, such errors are hard to detect by traditional compiler technology in which translation units are processed one at a time.

Consequently, the C++ standard doesn't mandate that differences in multiple definitions be detected or diagnosed (it does allow it, of course). If this cross-translation unit constraint is violated, however, the C++ standard qualifies this as leading to undefined behavior, which means
that anything reasonable or unreasonable may happen.

Typically, such undiagnosed errors may lead to program crashes or wrong results, but in principle they can also lead to other, more direct, kinds of damage (for example, file corruption).

The cross-translation unit constraints specify that when an entity is defined in two different places, the two places must consist of exactly the same sequence of tokens (the keywords, operators, identifiers, and so forth remaining after preprocessing).

Furthermore, these tokens must mean the same thing in their respective context (for example, the identifiers may need to refer to the same variable).

Consider the following example:

// Translation unit 1:
static int counter = 0;
inline void increase_counter()
{
++counter;
}
int main()
{
}
// Translation unit 2:
static int counter = 0;
inline void increase_counter()
{
++counter;
}

This example is in error because even though the token sequence for the inline function increase_counter() looks identical in both translation units, they contain a token counter that refers to two different entities. Indeed, because the two variables named counter have internal linkage (static specifier), they are unrelated despite having the same name. Note that this is an error even though neither of the inline functions is actually used.

Placing the definitions of entities that can be defined in multiple translation units in header files that are #included whenever the definitions are needed ensures that token sequences are identical in almost all situations. 


With this approach, situations in which two identical tokens refer to different things become fairly rare, but when it does happen, the resulting errors are often mysterious and hard to track.

Occasionally, conditional compilation directives evaluate differently in different translation units.
Use such directives with care. Other differences are possible too, but they are even less common.

The cross-translation unit constraints apply not only to entities that can be defined in multiple places, but also to default arguments in declarations. In other words, the following program has undefined behavior:

// Translation unit 1:
void unused(int = 3);
int main()
{
}
// Translation unit 2:
void unused(int = 4);

We should note here that the equivalence of token streams can sometimes involve subtle implicit effects. The following example is lifted (in a slightly modified form) from the C++ standard:

// Translation unit 1:
class X {
public:
X(int);
X(int, int);
};
X::X(int = 0)
{
}
class D : public X {
};

D d2; // X(int) called by D()
// Translation unit 2:
class X {
public:
X(int);
X(int, int);
};
X::X(int = 0, int = 0)
{
}
class D : public X { // X(int, int) called by D();
}; // D()'s implicit definition violates the ODR

In this example, the problem occurs because the implicitly generated default constructor of class D is different in the two translation units.

One calls the X constructor taking one argument, and the other calls the X constructor taking two
arguments. If anything, this example is an additional incentive to limit default arguments to one location in the program (if possible, this location should be in a header file). Fortunately, placing default arguments on out-of-class definitions is a rare practice.

There is also an exception to the rule that says that identical tokens must refer to identical entities. If identical tokens refer to unrelated constants that have the same value and the address of the resulting expressions is not used, then the tokens are considered equivalent. This exception allows for program structures like the following:

// File header.hpp:
#ifndef HEADER_HPP
#define HEADER_HPP
int const length = 10;
class MiniBuffer {
char buf[length];
...
};
#endif // HEADER_HPP

In principle, when this header file is included in two different translation units, two distinct constant variables named length are created because const in this context implies static. However, such constant variables are often meant to define compile-time constant values, not a particular storage location at run time. Hence, if we don't force such a storage location to exist (by referring to the address of the variable), it is sufficient for the two constants to have the same value. This exception to the ODR equivalence rules applies only to integral and enumeration values (floating-point types and pointer types don't fall in this category).[that is , float, pointer always has memory location in data section, not replaced in the text section. shuo-huan]

Finally, a note about templates. The names in templates bind in two phases. So-called nondependent names bind at the point where the template is defined. For these, the equivalence rules are handled similarly to other nontemplate definitions. For names that bind at the point of instantiation, the equivalence rules must be applied at that point, and the bindings must be equivalent. This leads to a subtle observation: Although exported templates are defined in only
one location, they may have multiple instances which must obey the equivalence rules. Here is a particularly far-fetched violation of the ODR:

// File header.hpp:
#ifndef HEADER_HPP
#define HEADER_HPP
enum Color { red, green, blue };
// the associated namespace of Color is the global namespace
export template<typename T> void highlight(T);
void init();
#endif // HEADER_HPP

// File tmpl_def.cpp:
#include "header.hpp"
export template<typename T>
void highlight(T x)
{
paint(x); // (1) a dependent call: argument-dependent lookup required
}
// File init.cpp:
#include "header.hpp"
namespace { // unnamed namespace!
void paint(Color c) // (2)
{
…
}
}
void init()
{
highlight(blue); // argument-dependent lookup of (1) resolves to (2)
}
// File main.cpp:
#include "header.hpp"
namespace { // unnamed namespace!
void paint(Color c) // (3)
{
…
}
}
int main()
{
init();
highlight(red); // argument-dependent lookup of (1) resolves to (3)
}


To understand this example, we must remember that functions defined in an unnamed namespace have external linkage, but they are distinct from any functions defined in an unnamed namespace of other translation units.

Therefore, the two paint() functions are distinct. However, the call to paint() in the exported template has a template-dependent argument and is therefore not bound until the points of instantiation. In our example, there are two points of instantiation for highlight, but they result in different bindings of the name paint; hence the program is invalid.

--FIN--

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.