• text, data, or bss: A symbol defined in this module. External bit may or may not be on. Value is the relocatable address in the module corresponding to the symbol. • abs: An absolute non-relocatable symbol. (Rare outside of debugger info.) External bit may or may not be on. Value is the absolute value of the symbol. • undefined: A symbol not defined in this module. External bit must be on. Value is usually zero, but see the ‘‘common block hack’’ below. These symbol types are adequate for older languages such as C and Fortran and, just barely, for C++.ELF header :
char magic[4] = "\177ELF";// magic number char class; // address size, 1 = 32 bit, 2 = 64 bit char byteorder; // 1 = little-endian, 2 = big-endian char hversion; // header version, always 1 char pad[9]; short filetype; // file type: 1 = relocatable, 2 = executable, // 3 = shared object, 4 = core image short archtype; // 2 = SPARC, 3 = x86, 4 = 68K, etc. int fversion; // file version, always 1 int entry; // entry point if executable int phdrpos; // file position of program header or 0 int shdrpos; // file position of section header or 0 int flags; // architecture specific flags, usually 0 short hdrsize; // size of this ELF header short phdrent; // size of an entry in program header short phdrcnt; // number of entries in program header or 0 short shdrent; // size of an entry in section header short phdrcnt; // number of entries in section header or 0 short strsec; // section number that contains section name stringsSection header :
int sh_name; // name, index into the string table int sh_type; // section type int sh_flags; // flag bits, below int sh_addr; // base memory address, if loadable, or zero int sh_offset; // file position of beginning of section int sh_size; // size in bytes int sh_link; // section number with related info or zero int sh_info; // more section-specific info int sh_align; // alignment granularity if section is moved int sh_entsize; // size of entries if section is an arraySection types include:
• PROGBITS: Program contents including code, data, and debugger info. • NOBITS: Like PROGBITS but no space is allocated in the file itself. Used for BSS data allocated at program load time. • SYMTAB and DYNSYM: Symbol tables, described in more detail later. The SYMTAB table contains all symbols and is intended for the regular linker, while DYNSYM is just the symbols for dynamic linking. (The latter table has to be loaded into memory at runtime,so it’s kept as small as possible.) • STRTAB: A string table, analogous to the one in a.out files. Unlike a.out files, ELF files can and often do contain separate string tables for separate purposes, e.g. section names, regular symbol names, and dynamic linker symbol names. • REL and RELA: Relocation information. REL entries add the relocation value to the base value stored in the code or data, while RELA entries include the base value for relocation in the relocation entries themselves. (For historical reasons, x86 objects use REL relocation and 68K objects use RELA.) There are a bunch of relocation types for each architecture, similar to (and derived from) the a.out relocation types. • DYNAMIC and HASH: Dynamic linking information and the runtime symbol hash table. There are three flag bits used: ALLOC, which means that the section occupies memory when the program is loaded, WRITE which means that the section when loaded is writable, and EXECINSTR which means that the section contains executable machine code.Sections include:
• .text which is type PROGBITS with attributes ALLOC+EXECINSTR. It’s the equivalent of the a.out text segment. • .data which is type PROGBITS with attributes ALLOC+ WRITE. It’s the equivalent of the a.out data segment. • .rodata which is type PROGBITS with attribute ALLOC. It’s read-only data, hence no WRITE. • .bss which is type NOBITS with attributes ALLOC+WRITE. The BSS section takes no space in the file, hence NOBITS, but is allocated at runtime, hence ALLOC. • .rel.text, .rel.data, and .rel.rodata, each which is type REL or RELA. The relocation information for the corresponding text or data section. • .init and .fini, each type PROGBITS with attributes ALLOC+ EXECINSTR. These are similar to .text, but are code to be executed when the program starts up or terminates, respectively. C and Fortran don’t need these, but they’re essential for C++ which has global data with executable initializers and finalizers. • .symtab, and .dynsym types SYMTAB and DYNSYM respectively, regular and dynamic linker symbol tables. The dynamic linker symbol table is ALLOC set, since it’s loaded at runtime. • .strtab, and .dynstr both type STRTAB, a table of name strings, for a symbol table or the section names for the section table. The dynstr section, the strings for the dynamic linker symbol table, has ALLOC set since it’s loaded at runtime. There are also some specialized sections like .got and .plt, the Global Offset Table and Procedure Linkage Table used for dynamic linking (covered in Chapter 10), .debug which contains symbols for the debugger, .line which contains mappings from source line numbers to object code locations again for the debugger, and .comment which contains documentation strings, usually version control version numbers.An unusual section type is .interp which contains the name of a program to use as an interpreter. If this section is present, rather than running the program directly, the system runs the interpreter and passes it the ELF file as an argument. Unix has for many years had self-running interpreted text files, using
#! /path/to/interpreteras the first line of the file. ELF extends this facility to interpreters which run non-text programs. In practice this is used to call the run-time dynamic linker to load the program and link in any required shared libraries. ELF symbol table:
int name; // position of name string in string table int value; // symbol value, section relative in reloc, // absolute in executable int size; // object or function size char type:4; // data object, function, section, or special case file char bind:4; // local, global, or weak char other; // spare short sect; // section number, ABS, COMMON or UNDEFIf the file is a C++ program, it will probably also contain .init, .fini, .rel.init, and .rel.fini sections as well. Sample relocatable ELF file:
ELF header .text .data .rodata .bss .sym .rel.text .rel.data .rel.rodata .line .debug .strtab (section table, not considered to be a section)An ELF executable file has the same general format as a relocatable ELF, but the data are arranged so that the file can be mapped into memory and run. The file contains a program header that follows the ELF header in the file. The program header defines the segments to be mapped. ELF program header:
int type; // loadable code or data, dynamic linking info, etc. int offset; // file offset of segment int virtaddr; // virtual address to map segment int physaddr; // physical address, not used int filesize; // size of segment in file int memsize; // size of segment in memory (bigger if contains BSS) int flags; // Read, Write, Execute bits int align; // required alignment, invariably hardware page sizeAn executable usually has only a handful of segments, a read-only one for the code and read-only data, and a read-write one for read/write data. All of the loadable sections are packed into the appropriate segments so the system can map the file with one or two operations. ELF files extend the ‘‘header in the address space’’ trick used in QMAGIC a.out files to make the executable files as compact as possible at the cost of some slop in the address space. A segment can start and end at arbitrary file offsets, but the virtual starting address for the segment must have the same low bits modulo the alignment as the starting offset in the file, i.e, must start in the same offset on a page. The system maps in the entire range from the page where the segment starts to the page where the segment ends, even if the segment logically only occupies part of the first and last pages mapped ELF loadable segments:
The mapped text segment consists of the ELF header, program header, and read-only text, since the ELF and program headers are in the same page as the beginning of the text. The read/write but the data segment in the file starts immediately after the text segment. The page from the file is mapped both read-only as the last page of the text segment in memory and copy-on-write as the first page of the data segment. In this example, if a computer has 4K pages, and in an executable file the text ends at 0x80045ff, then the data starts at 0x8005600. The file page is mapped into the last page of the text segment at location 0x8004000 where the first 0x600 bytes contain the text from 0x8004000-0x80045ff, and into the data segment at 0x8005000 where the rest of the page contain the initial contents of data from 0x8005600-0x80056ff. The BSS section again is logically continuous with the end of the read write sections in the data segment, in this case 0x1300 bytes, the difference between the file size and the memory size. The last page of the data segment is mapped in from the file, but as soon as the operating system starts to zero the BSS segment, the copy-on-write system makes a private copy of the page. If the file contains .init or .fini sections, those sections are part of the read only text segment, and the linker inserts code at the entry point to call the .init section code before it calls the main program, and the .fini section code after the main program returns. An ELF shared object contains all the baggage of a relocatable and an executable file. It has the program header table at the beginning, followed by the sections in the loadable segments, including dynamic linking information. Following sections comprising the loadable segments are the relocatable symbol table and other information that the linker needs while creating executable programs that refer to the shared object, with the section table at the end.
Special symbols:
Many systems use a few special symbols defined by the linker itself.
Unix systems all require that the linker define etext, edata,
and end as the end of the text, data, and bss segments, respectively.
The system sbrk() routine uses end as the address of the beginning
of the runtime heap, so it can be allocated contiguously with the existing data and bss.
GOT in addition R_386_GOTPC or its equivalent. The exact
types are architecture-specific, but the x86 is typical:
• R_386_GOT32: The relative location of the slot in the GOT
where the linker has placed a pointer to the given symbol. Used
for indirectly referenced global data.
• R_386_GOTOFF: The distance from the base of the GOT to the
given symbol or address. Used to address static data relative to the
GOT.
• R_386_RELATIVE: Used to mark data addresses in a PIC shared
library that need to be relocated at load time.
In a conventionally linked program, symbols are bound to
addresses and library code is bound to the executable
at link time, so the library the program was linked with
is the one it uses regardless of subsequent changes to
the library.. With static shared libraries, symbols are still
bound to addresses at link time, but library code isn’t bound
to the executable until run time. (With dynamic shared libraries,
they’re both delayed until runtime.)
Structure of typical shared library:
File header, a.out, COFF, or ELF header (Initialization routine, not always present) Jump table Code Global data Private data
A UNIX shared library actually consists of two related files,
the shared library itself and a stub library for the linker to
use. A library creation utility takes as input a normal library
in archive format and some files of control information and uses
them to create create the two files. The stub library contains no
code or data at all (other than possibly a tiny bootstrap routine)
but contains symbol definitions for programs linked with the library
to use.
Creating the shared library involves these basic steps,
which we discuss in greater detail below:
• Determine at what address the library’s code and data will be loaded. • Scan through the input library to find all of the exported code symbols. (One of the control files may be a list of some of symbols not to export, if they’re just used for inter-routine communication within the library.) • Make up the jump table with an entry for each exported code symbol. • If there’s an initialization or loader routine at the beginning of the library, compile or assemble that. • Create the shared library: Run the linker and link everything together into one big executable format file. • Create the stub library: Extract the necessary symbols from the newly created shared library, reconcile those symbols with the symbols from the input library, create a stub routine for each library routine, then compile or assemble the stubs and combine them into the stub library.
For static link library:
Linux added a single uselib() system call that took the file
name and address of a library and mapped it into the program
address space. The startup routine bound into the executable
ran down the list of libraries, doing a uselib() on each.
ELF header has a interp section containing the name of
an "interpreter" program to use when running the file.
An ELF shared library:
(Lots of pointer arrows here) read-only pages: .hash .dynsym .dynstr .plt .text .rodata read-write pages: .data .got .dynamic .bss
Ref:
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.