Ataraxia through Epoché: [ELF] Everything about ELF

• text, data, or bss: A symbol defined in this module. External bit
may or may not be on. Value is the relocatable address in the module
corresponding to the symbol.

• abs: An absolute non-relocatable symbol. (Rare outside of debugger
info.) External bit may or may not be on. Value is the absolute
value of the symbol.
• undefined: A symbol not defined in this module. External bit must
be on. Value is usually zero, but see the ‘‘common block hack’’
below.

These symbol types are adequate for older languages such as C and
Fortran and, just barely, for C++.

ELF header :

char magic[4] = "\177ELF";// magic number
char class; // address size, 1 = 32 bit, 2 = 64 bit
char byteorder; // 1 = little-endian, 2 = big-endian
char hversion; // header version, always 1
char pad[9];
short filetype; // file type: 1 = relocatable, 2 = executable,
// 3 = shared object, 4 = core image
short archtype; // 2 = SPARC, 3 = x86, 4 = 68K, etc.
int fversion; // file version, always 1
int entry; // entry point if executable
int phdrpos; // file position of program header or 0
int shdrpos; // file position of section header or 0
int flags; // architecture specific flags, usually 0
short hdrsize; // size of this ELF header
short phdrent; // size of an entry in program header
short phdrcnt; // number of entries in program header or 0
short shdrent; // size of an entry in section header
short phdrcnt; // number of entries in section header or 0
short strsec; // section number that contains section name strings

Section header :

int sh_name; // name, index into the string table
int sh_type; // section type
int sh_flags; // flag bits, below
int sh_addr; // base memory address, if loadable, or zero
int sh_offset; // file position of beginning of section
int sh_size; // size in bytes
int sh_link; // section number with related info or zero
int sh_info; // more section-specific info
int sh_align; // alignment granularity if section is moved
int sh_entsize; // size of entries if section is an array

Section types include:

• PROGBITS: Program contents including code, data, and
debugger info.
• NOBITS: Like PROGBITS but no space is allocated in the file itself.
Used for BSS data allocated at program load time.
• SYMTAB and DYNSYM: Symbol tables, described in more detail
later. The SYMTAB table contains all symbols and is 
intended for the regular linker, while DYNSYM is just the
symbols for dynamic linking. (The latter table has to be
loaded into memory at runtime,so it’s kept as small
as possible.)
• STRTAB: A string table, analogous to the one in a.out files.
Unlike a.out files, ELF files can and often do contain 
separate string tables for separate purposes, 
e.g. section names, regular symbol names,
and dynamic linker symbol names.
• REL and RELA: Relocation information. REL entries add the
relocation value to the base value stored in the code
or data, while RELA entries include the base value for
relocation in the relocation entries themselves.
(For historical reasons, x86 objects use REL relocation
and 68K objects use RELA.) There are a bunch of relocation
types for each architecture, similar to (and derived from) the
a.out relocation types.
• DYNAMIC and HASH: Dynamic linking information and the runtime
symbol hash table.
There are three flag bits used: ALLOC, which means that
the section occupies memory when the program is
loaded, WRITE which means that the section when loaded
is writable, and EXECINSTR which means that the section
contains executable machine code.

Sections include:

• .text which is type PROGBITS with attributes ALLOC+EXECINSTR.
It’s the equivalent of the a.out text segment.
• .data which is type PROGBITS with attributes ALLOC+
WRITE. It’s the equivalent of the a.out data segment.
• .rodata which is type PROGBITS with attribute ALLOC. It’s
read-only data, hence no WRITE.
• .bss which is type NOBITS with attributes ALLOC+WRITE.
The BSS section takes no space in the file, hence NOBITS, but is
allocated at runtime, hence ALLOC.
• .rel.text, .rel.data, and .rel.rodata, each which is
type REL or RELA. The relocation information for the corresponding
text or data section.
• .init and .fini, each type PROGBITS with attributes ALLOC+
EXECINSTR. These are similar to .text, but are code to
be executed when the program starts up or terminates, respectively.
C and Fortran don’t need these, but they’re essential for C++ which
has global data with executable initializers and finalizers.
• .symtab, and .dynsym types SYMTAB and DYNSYM respectively,
regular and dynamic linker symbol tables. The dynamic
linker symbol table is ALLOC set, since it’s loaded at runtime.
• .strtab, and .dynstr both type STRTAB, a table of name
strings, for a symbol table or the section names for the section
table. The dynstr section, the strings for the dynamic linker
symbol table, has ALLOC set since it’s loaded at runtime.
There are also some specialized sections like .got and .plt, the
Global Offset Table and Procedure Linkage Table used for dynamic
linking (covered in Chapter 10), .debug which contains symbols
for the debugger, .line which contains mappings from
source line numbers to object code locations again for the debugger,
and .comment which contains documentation strings, usually
version control version numbers.

An unusual section type is .interp which contains the name of a program to use as an interpreter. If this section is present, rather than running the program directly, the system runs the interpreter and passes it the ELF file as an argument. Unix has for many years had self-running interpreted text files, using

#! /path/to/interpreter

as the first line of the file. ELF extends this facility to interpreters which run non-text programs. In practice this is used to call the run-time dynamic linker to load the program and link in any required shared libraries. ELF symbol table:

int name; // position of name string in string table
int value; // symbol value, section relative in reloc,
// absolute in executable
int size; // object or function size
char type:4; // data object, function, section, or special case file
char bind:4; // local, global, or weak
char other; // spare
short sect; // section number, ABS, COMMON or UNDEF

If the file is a C++ program, it will probably also contain .init, .fini, .rel.init, and .rel.fini sections as well. Sample relocatable ELF file:

ELF header
.text
.data
.rodata
.bss
.sym
.rel.text
.rel.data
.rel.rodata
.line
.debug
.strtab
(section table, not considered to be a section)

An ELF executable file has the same general format as a relocatable ELF, but the data are arranged so that the file can be mapped into memory and run. The file contains a program header that follows the ELF header in the file. The program header defines the segments to be mapped. ELF program header:

int type; // loadable code or data, dynamic linking info, etc.
int offset; // file offset of segment
int virtaddr; // virtual address to map segment
int physaddr; // physical address, not used
int filesize; // size of segment in file
int memsize; // size of segment in memory (bigger if contains BSS)
int flags; // Read, Write, Execute bits
int align; // required alignment, invariably hardware page size

An executable usually has only a handful of segments, a read-only one for the code and read-only data, and a read-write one for read/write data. All of the loadable sections are packed into the appropriate segments so the system can map the file with one or two operations. ELF files extend the ‘‘header in the address space’’ trick used in QMAGIC a.out files to make the executable files as compact as possible at the cost of some slop in the address space. A segment can start and end at arbitrary file offsets, but the virtual starting address for the segment must have the same low bits modulo the alignment as the starting offset in the file, i.e, must start in the same offset on a page. The system maps in the entire range from the page where the segment starts to the page where the segment ends, even if the segment logically only occupies part of the first and last pages mapped ELF loadable segments:

The mapped text segment consists of the ELF header, program header, and read-only text, since the ELF and program headers are in the same page as the beginning of the text. The read/write but the data segment in the file starts immediately after the text segment. The page from the file is mapped both read-only as the last page of the text segment in memory and copy-on-write as the first page of the data segment. In this example, if a computer has 4K pages, and in an executable file the text ends at 0x80045ff, then the data starts at 0x8005600. The file page is mapped into the last page of the text segment at location 0x8004000 where the first 0x600 bytes contain the text from 0x8004000-0x80045ff, and into the data segment at 0x8005000 where the rest of the page contain the initial contents of data from 0x8005600-0x80056ff. The BSS section again is logically continuous with the end of the read write sections in the data segment, in this case 0x1300 bytes, the difference between the file size and the memory size. The last page of the data segment is mapped in from the file, but as soon as the operating system starts to zero the BSS segment, the copy-on-write system makes a private copy of the page. If the file contains .init or .fini sections, those sections are part of the read only text segment, and the linker inserts code at the entry point to call the .init section code before it calls the main program, and the .fini section code after the main program returns. An ELF shared object contains all the baggage of a relocatable and an executable file. It has the program header table at the beginning, followed by the sections in the loadable segments, including dynamic linking information. Following sections comprising the loadable segments are the relocatable symbol table and other information that the linker needs while creating executable programs that refer to the shared object, with the section table at the end.

Special symbols:
Many systems use a few special symbols defined by the linker itself.
Unix systems all require that the linker define etext, edata,
and end as the end of the text, data, and bss segments, respectively.
The system sbrk() routine uses end as the address of the beginning
of the runtime heap, so it can be allocated contiguously with the existing data and bss.

GOT in addition R_386_GOTPC or its equivalent. The exact
types are architecture-specific, but the x86 is typical:
• R_386_GOT32: The relative location of the slot in the GOT
where the linker has placed a pointer to the given symbol. Used
for indirectly referenced global data.
• R_386_GOTOFF: The distance from the base of the GOT to the
given symbol or address. Used to address static data relative to the
GOT.
• R_386_RELATIVE: Used to mark data addresses in a PIC shared
library that need to be relocated at load time.

In a conventionally linked program, symbols are bound to
addresses and library code is bound to the executable
at link time, so the library the program was linked with
is the one it uses regardless of subsequent changes to
the library.. With static shared libraries, symbols are still
bound to addresses at link time, but library code isn’t bound
to the executable until run time. (With dynamic shared libraries,
they’re both delayed until runtime.)
Structure of typical shared library:

File header, a.out, COFF, or ELF header
(Initialization routine, not always present)
Jump table
Code
Global data
Private data

A UNIX shared library actually consists of two related files,
the shared library itself and a stub library for the linker to
use. A library creation utility takes as input a normal library
in archive format and some files of control information and uses
them to create create the two files. The stub library contains no
code or data at all (other than possibly a tiny bootstrap routine)
but contains symbol definitions for programs linked with the library
to use.

Creating the shared library involves these basic steps,
which we discuss in greater detail below:

• Determine at what address the library’s code and data will

 be loaded.

• Scan through the input library to find all of the exported

 code symbols. (One of the control files may be a list of

 some of symbols not to export, if they’re just used for 

inter-routine communication within the library.)

• Make up the jump table with an entry for each exported

 code symbol.

• If there’s an initialization or loader routine at the 

beginning of the library, compile or assemble that.

• Create the shared library: Run the linker and link 

everything together into one big executable format file.

• Create the stub library: Extract the necessary symbols from

 the newly created shared library, reconcile those symbols

 with the symbols from the input library, create a stub routine

 for each library routine, then compile or assemble the stubs 

and combine them into the stub library.

For static link library:
Linux added a single uselib() system call that took the file
name and address of a library and mapped it into the program
address space. The startup routine bound into the executable
ran down the list of libraries, doing a uselib() on each.

ELF header has a interp section containing the name of
an "interpreter" program to use when running the file.

An ELF shared library:

(Lots of pointer arrows here)
read-only pages:
.hash
.dynsym
.dynstr
.plt
.text
.rodata
read-write pages:
.data
.got
.dynamic
.bss

Ref:

Ataraxia through Epoché

Oct 8, 2012

[ELF] Everything about ELF

No comments:

Post a Comment