Dec 27, 2018

[bazel] wrap-up

Why Bazel?

Provides fast and reproducible builds + builds the container image as well.



Bazel operates on

  1. concepts of libraries
  2. binaries
  3. scripts
  4. data sets

Bazel caches all previously done work based on the concept of action graph.



Hands-on

  1. Set up a project workspace
    Bazel looks for build inputs and BUILD files, and where it stores build outputs.
  2. Write a BUILD file, which tells Bazel what to build and how to build it.
    Targets: declared inside BUILD file, a set of input artifacts.
    Rule: Bazel use it to build the targets, specifies the build tools Bazel will use, such as compilers and linkers, and their configurations.
    Options: Used to configure the build rule
  3. Bazel places your outputs within the workspace.
  4. Can use Bazel to run tests and query the build to trace dependencies in your code.



How does Bazel work

  1. Loads the BUILD(i.e packages) files relevant to the target.
  2. Analyzes the inputs and their dependencies, applies the specified build rules, and produces an action graph.
  3. Executes the build actions on the inputs until the final build outputs are produced.

Since all previous build work is cached, Bazel can identify and reuse cached artifacts and only rebuild or retest what's changed.

To further enforce correctness, set up Bazel to run builds and tests hermetically through sandboxing, minimizing skew and maximizing reproducibility.



What is the action graph

  1. Represents the build artifacts and relationships between them
  2. The build actions that Bazel will perform
  3. Generated by Bazel's analysis phase from load phase generated graph.
  4. file level dependencies,
    full command lines,
    other information Bazel needs to execute the build



Concepts and terminology

Reference:

Workspace concept

  1. Organize source code in a nested hierarchy of packages.
  2. Each package is a directory that contains a set of related source files and one BUILD file.
  3. BUILD file specifies what software outputs can be built from the source.

Workspace contains

  1. WORKSPACE file.
    Can be empty or contains references to external dependencies required to build the outputs.
  2. source files and directories contains source files.
  3. build outputs directories.

Packages concept

  1. Atom unit as a directory inside the workspace.
  2. Group of source files and specification of the dependencies among them.
  3. Act as a 'container' which contains targets.

Packages contains

1. BUILD or BUILD.bazel file
2. source files and directories
(except those which themselves contain a BUILD file, which is itself a package) underneath it.


Targets concept

  1. Elements under the package.
  2. Can be those 'kinds':
    files, rules, or package groups
  3. All targets belong to exactly one package.

Files kind

  1. Source files
    Written by the efforts of people, and checked in to the repository.
  2. Generated files(Derived files)
    Not checked in, generated by the build tool from source files according to specific rules. 

Rule kind

  1. Specifies the relationship between a set of input and a set of output files,
    including the necessary steps to derive the outputs from the inputs. 
  2. The outputs of a rule are always generated files. 
  3. The inputs to a rule may be source files, but they may be generated files also;
  4. Outputs of one rule may be the inputs to another, allowing long chains of rules to be constructed.
    (Like a Monad, input/output type is 'rule') This is a important concept since this enables
    us to derive the output of one rule, or modify the output of one rule which is the
    source of another rule.
    This is heavily used by C++, e.g one library rule depends on another library rule, which
    can be a header file.
  5. An invariant of all rules is that the files generated by a rule always belong to the same package as the rule itself;it is not possible to generate files into another package.
    i.e
    2-level design almost exists everywhere~~~
  6. It is common for a rule's inputs to come from another package.(chain)

Package groups kind

  1. Sets of packages whose purpose is to limit accessibility of certain rules.
  2. Package groups are defined by the package_group function.
    They have two properties: the list of packages they contain and their name
  3. The only allowed ways to refer to them are from the visibility attribute of rules
    or from the default_visibility attribute of the package function;
  4. Package groups does not generate or consume files.

Labels (like git's tag)

  1. All targets belong to exactly one package. 
  2. label is the name of a target.
    e.g
    //my/app/main:app_binary
    (package name:target name)
  3. Each label has two parts:
    A package name (my/app/main) and
    a target name (app_binary)
  4. Every label uniquely identifies a target.
  5. When the colon is omitted, the target name is assumed to be the same as the
    last component of the package name.
    e.g those are same
    //my/app
    //my/app:app
  6. Short-form labels such as //my/app are not to be confused with package names.
  7. Labels start with //, but package names never do.
    (A common misconception is that //my/app refers to a package,
    or to all the targets in a package; neither is true.)
  8. Within a BUILD file, the package-name part of label may be omitted, and optionally the colon too.
    e.g those are same
    //my/app:app
    //my/app
    :app
    app
  9. Within a BUILD file,files belong to the package may be referenced by their unadorned name relative to the package directory:
    generate.cc
    testdata/input.txt
  10. From other packages, or from the command-line,
    file targets should be referred to by their complete label
    e.g.
    //my/app:generate.cc
  11. All of the following are forbidden in labels:
    any sort of white space, braces, brackets, or parentheses; wildcards such as *; shell metacharacters such as >, & and |; etc. 

Target names

Target patterns: https://docs.bazel.build/versions/master/guide.html#target-patterns

Target has 3 kinds, files, rules, Package groups.
  1. The name of the target within the package.
  2. The name of a rule is the value of the name parameter in the rule's declaration in a BUILD file.
  3. The name of a file is its pathname relative to the directory containing the BUILD file. (i.e inside the package)
  4. Do not use .. to refer to files in other packages; use //packagename:filename instead.
  5. Filenames must be relative path-names in normal form.
    e.g. /foo,  foo/ and foo//bar are forbidden
  6. Up-level references (..) and current-directory references (./) are forbidden.
  7. However; a target name may consist of exactly '.'
  8. Avoid the use of / in the names of rules.
    Especially when the shorthand form of a label is used, it may confuse the reader.
    The label //foo/bar/wiz is always a shorthand for //foo/bar/wiz:wiz,
    even if there is no such package foo/bar/wiz; it never refers to //foo:bar/wiz,
    even if that target exists.

Package names

  1. The name of a package is the name of the directory containing its BUILD file,
    relative to the top-level directory of the source tree.
  2. Cannot start with a slash.
  3. May not contain the substring //, nor end with a slash.

Rules(Used to build the target)

  1. Specifies the relationship between inputs and outputs, and the steps to build the outputs.
  2. Rules can be of one of many different kinds or classes,
    which produce compiled executables
    and libraries test executables
    and other supported outputs as described in the Build Encyclopedia
  3. Every rule has a name, specified by the name attribute of type 'string'.
  4. The name must be a syntactically valid *target* name.
  5. Rule's name can be 'genrules': the names of the files generated by the rule.
  6. The rule name determines the name of the executable produced by the build.
    e.g
    cc_binary(
        name = "my_app",  # rule name
        srcs = ["my_app.cc"],
        deps = [
            "//absl/base",
            "//absl/strings",
        ],
    )
  7. Every rule has a set of attributes. Each attribute has a name and a type.
    Some of the common types an attribute can have are
    1. integer,
    2. label,
    3. list of labels,
    4. string,
    5. list of strings,
    6. output label,
    7. list of output labels.
      e.g
      srcs: list of label type
      outs: list of output labels type


BUILD Files

  1. Every package contains a BUILD file.
  2. BUILD files are evaluated using an imperative language, Starlark.
    They are interpreted as a sequential list of statements.
  3. Order does matter: variables must be defined before they are used.
  4. When a build rule function, such as cc_library, is executed, it creates a new target in the action graph.
    This target can later be referred using a label.
    Rule declarations can be re-ordered freely without changing the behavior.
  5. BUILD files cannot contain
    function definitions,
    for statements or if statements (but list comprehensions and if expressions are allowed).
    Functions should be declared in .bzl files instead.
  6. **Programs in Starlark are unable to perform arbitrary I/O.
    This invariant makes the interpretation of BUILD files hermetic,
    i.e. dependent only on a known set of inputs, which is essential for ensuring that builds are reproducible.
  7. BUILD files should be written using only ASCII characters,
    although technically they are interpreted using the Latin-1 character set.
  8. BUILD file authors are encouraged to use comments liberally to document the role of each build target,whether or not it is intended for public use, and to document the role of the package itself.


Loading an extension

  1. Bazel extensions are files ending in .bzl
  2. Use the load statement to import a symbol from an extension.
    // load the file foo/bar/file.bzl and add the some_library symbol to the environment
    load("//foo/bar:file.bzl", "some_library")
    load("//foo/bar:file.bzl", library_alias = "some_library")
    load(":my_rules.bzl", "some_rule", nice_alias = "some_other_rule")
  3. 'load' statements must appear at top-level, i.e. they cannot be in a function body.
    load only accept 'string' type arguments.
  4. In a .bzl file, symbols starting with _ are not exported and cannot be loaded from another file.
    Visibility doesn't affect loading (yet): you don't need to use exports_files to make a .bzl file visible.


Types of build rule

  1. The majority of build rules come in families, grouped together by language.
    e.g
    cc_binary, cc_library and cc_test are the build rules for C++ binaries, libraries, and tests.
  2. *_binary rules build executable programs in a given language.
    After a build, the executable will reside in the build tool's binary output tree at the corresponding name for the rule's label, so //my:program would appear at
    e.g
    $(BINDIR)/my/program.
    Such rules also create a runfiles directory containing all the files mentioned in a data attribute belonging to the rule, or any rule in its transitive closure of dependencies; this set of files is gathered together in one place for ease of deployment to production.
  3. *_test rules are a specialization of a *_binary rule, used for automated testing.
    Tests are simply programs that return zero on success.
    Like binaries, tests also have runfiles trees, and the files beneath it are the only files that a test may legitimately open at runtime.
    e.g
    a program cc_test(name='x', data=['//foo:bar']) may open and read $(TEST_SRCDIR)/workspace/foo/bar during execution.
    (Each programming language has its own utility function for accessing the value of $TEST_SRCDIR, but they are all equivalent to using the environment variable directly.) Failure to observe the rule will cause the test to fail when it is executed on a remote testing host.
  4. *_library rules specify separately-compiled modules in the given programming language.
    Libraries can depend on other libraries, and binaries and tests can depend on libraries, with the expected separate-compilation behavior.


Dependencies

  1. 2 dependency graphs,
    1. the graph of actual dependencies
    2. the graph of declared dependencies
  2. A target X is actually dependent on target Y iff Y must be present, built and up-to-date in order for X to be built correctly.
  3. A target X has a declared dependency on target Y iff there is a dependency edge from X to Y in the package of X.
  4. Every rule must explicitly declare all of its actual direct dependencies to the build system, and no more.
  5. You need not (and should not) attempt to list everything indirectly imported, even if it is "needed" by A at execution time.
  6. Rule of thumb: Make sure the declared dependency matches the actual dependency. Make sure the code's explicitly dependencies are declared inside the rule.

Types of dependencies

srcs, deps and data
  1. srcs dependencies
    Files consumed directly by the rule or rules that output source files.
  2. deps dependencies
    Rule pointing to separately-compiled modules providing header files, symbols, libraries, data, etc.
  3. data dependencies
    A build target might need some data files to run correctly.
    (like GUI programs need's resource files, or unit test's input data)
    These files are available using the relative path 'path/to/data/file'
    These data files aren't source code: they don't affect how the target is built.

Visualize your build

Reference:
https://blog.bazel.build/2015/06/17/visualize-your-build.html

e.g
$ bazel query 'deps(//my:target)’ --output=graph > target_graph.in
$ dot -Tpng < target_graph.in > target_graph.png
$ showimage target_graph.png



Using Labels to Reference Directories

  1. DON'T refererence to a directory but to specific files, due to former only triggers
    a rebuild if the directory changes(i.e add or delete files under it),
    but won't be triggered if the existing files being modified.
    e.g
    data = ["//data/regression:unittest/."]  # don't use this
    data = ["testdata/."]  # don't use this
    data = ["testdata/"]  # don't use this
  2. If need to include all the files under a diretory, use:
    data = glob(["testdata/**"])  # ** force the glob() to be recursive.
  3. If must use directory labels, keep in mind that can't refer to the parent package with a relative "../" path; instead, use an absolute path like "//data/regression:unittest/."
  4. Note that 'directory labels' are only valid for data dependencies.
    If try to use a directory as a label in an argument other than data, it will fail.



Evaluation model

https://docs.bazel.build/versions/master/skylark/concepts.html

Bazel extensions are files ending in .bzl and been used by the load
statement to import a symbol from an extension.

e.g
$ load("//foo/bar:file.bzl", "some_library")

load:

  • arguments must be string literals (no variable)
  • and load statements must appear at top-level,
    i.e. they cannot be in a function body.
  • The first argument of load is a label identifying a .bzl file.
  • If it is a relative label, it is resolved with respect to the package (not directory) containing the current bzl file.
  • Relative labels in load statements should use a leading ':'
  • load also supports aliases,
    i.e. you can assign different names to the imported symbols.
    $ load("//foo/bar:file.bzl", library_alias = "some_library")

Native:

  • A built-in module to support native rules and other package helper functions. 
  • All native rules appear as functions in this module, e.g. native.cc_library. 
  • Native module is only available in the loading phase
    (i.e. for macros, not for rule implementations).
  • Attributes will ignore None values, and treat them as if the attribute was unset.

Native rules:

Rules that don't need a load() statement.



Macros:

https://docs.bazel.build/versions/master/skylark/macros.html

Macros are: 

  • Functions called from the BUILD file that can instantiate rules.
  • Are just used for encapsulation and code reuse. 
  • By the end of the loading phase, macros don't exist anymore, and Bazel sees only the set of rules they created.
    e.g
    def my_macro(name, visibility=None):
      native.cc_library(
        name = name,
        srcs = ["main.cc"],
        visibility = visibility,
      )


Debugging:

  • Show how the BUILD file looks after evaluation.
    All macros, globs, loops are expanded.
    $ bazel query --output=build //my/path:all 
  • Filter it
    $ bazel query --output=build 'attr(generator_function, my_macro, //my/path:all)'


Errors:

e.g
def my_macro(name, deps, visibility=None):
  if len(deps) < 2:
    fail("Expected at least two values in deps")
  # ...


Conventions:

  • All public functions (functions that don’t start with underscore) that instantiate rules must have a name argument. 
  • Public functions should use a docstring following Python conventions.
  • In BUILD files, the name argument of the macros must be a keyword argument (not a positional argument).
  • The name attribute of rules generated by a macro should include the name argument as a prefix. For example, macro(name = "foo") can generate a cc_library foo and a genrule foo_gen.
  • Macros should have an optional visibility argument.


Rules:

https://docs.bazel.build/versions/master/skylark/rules.html

Can access Bazel internals and have full control over what is going on.


Evaluation model:

A build consists of three phases:

Loading phase:

  • Load and evaluate all extensions and all BUILD files that are needed for the build.
  • The execution of the BUILD files simply instantiates rules
    (each time a rule is called, it gets added to a graph).
  • This is where macros are evaluated.


Analysis phase:

  • Code of the rules is executed (their implementation function), and actions are instantiated.
  • An action describes how to generate a set of outputs from a set of inputs
    e.g
    "run gcc on hello.c and get hello.o"
  • Have to list explicitly which files will be generated before executing the actual commands.
    In other words, the analysis phase takes the graph generated by the loading phase and generates an action graph.


Execution phase:

  • Actions are executed, when at least one of their outputs is required.
  • If a file is missing or if a command fails to generate one output, the build fails.
  • Tests are also run during this phase.


Other traits:

  • Bazel uses parallelism to read, parse and evaluate the .bzl files and BUILD files.
    (Due to it's thread safe)
  • A file is read at most once per build and the result of the evaluation is cached and reused.
  • A file is evaluated only once all its dependencies (load() statements) have been resolved.(And variables become const afterwards)
  • By design, loading a .bzl file has no visible side-effect, it only defines values and functions.
  • Bazel is smart enough to do lazy evaluation, i.e only build/executes what is needed in this build.






Bazel BUILD Encyclopedia of Functions




Working with external dependencies











No comments:

Post a Comment

Note: Only a member of this blog may post a comment.