Fix typos and grammar in README
Reviewed By: rafaelauler Differential Revision: D13428707 fbshipit-source-id: 254ca066133
This commit is contained in:
parent
b097e688f8
commit
7e2b793d7b
30
README.md
30
README.md
@ -22,12 +22,12 @@ on code layout properties, such as function pointer deltas.
|
||||
Assembly code can be processed too. Requirements for it include a clear
|
||||
separation of code and data, with data objects being placed into data
|
||||
sections/segments. If indirect jumps are used for intra-function control
|
||||
transfer (e.g. jump tables), the code patterns should be matching those
|
||||
transfer (e.g., jump tables), the code patterns should be matching those
|
||||
generated by Clang/GCC.
|
||||
|
||||
NOTE: BOLT is currently incompatible with the "-freorder-blocks-and-partition"
|
||||
NOTE: BOLT is currently incompatible with the `-freorder-blocks-and-partition`
|
||||
compiler option. Since GCC8 enables this option by default, you have to
|
||||
explicitly disable it by adding "-fno-freorder-blocks-and-partition" flag if
|
||||
explicitly disable it by adding `-fno-reorder-blocks-and-partition` flag if
|
||||
you compiling with GCC8.
|
||||
|
||||
PIE and .so support has been added recently. Please report bugs if you
|
||||
@ -35,7 +35,7 @@ encounter any issues.
|
||||
|
||||
## Installation
|
||||
|
||||
BOLT heavily uses LLVM libraries and by design it is built as one of LLVM
|
||||
BOLT heavily uses LLVM libraries, and by design, it is built as one of LLVM
|
||||
tools. The build process is not much different from a regular LLVM build.
|
||||
The following instructions are assuming that you are running under Linux.
|
||||
|
||||
@ -83,7 +83,7 @@ BOLT will also report if it detects relocations while processing the binary.
|
||||
|
||||
This step is different for different kinds of executables. If you can invoke
|
||||
your program to run on a representative input from a command line, then check
|
||||
**For Applications** section below. If your programs typically runs as a
|
||||
**For Applications** section below. If your program typically runs as a
|
||||
server/service, then skip to **For Services** section.
|
||||
|
||||
The version of `perf` command used for the following steps has to support
|
||||
@ -101,7 +101,7 @@ $ perf record -e cycles:u -j any,u -o perf.data -- <executable> <args> ...
|
||||
|
||||
Once you get the service deployed and warmed-up, it is time to collect perf
|
||||
data with LBR (branch information). The exact perf command to use will depend
|
||||
on the service. E.g. to collect the data for all processes running on the
|
||||
on the service. E.g., to collect the data for all processes running on the
|
||||
server for the next 3 minutes use:
|
||||
```
|
||||
$ perf record -e cycles:u -j any,u -a -o perf.data -- sleep 180
|
||||
@ -111,14 +111,14 @@ Depending on the application, you may need more samples to be included with
|
||||
your profile. It's hard to tell upfront what would be a sweet spot for your
|
||||
application. We recommend the profile to cover 1B instructions as reported
|
||||
by BOLT `-dyno-stats` option. If you need to increase the number of samples
|
||||
in the profile, you can either run the `sleep` command for longer, and/or use
|
||||
in the profile, you can either run the `sleep` command for longer and use
|
||||
`-F<N>` option with `perf` to increase sampling frequency.
|
||||
|
||||
Note that for profile collection we recommend using cycle events and not
|
||||
`BR_INST_RETIRED.*`. Empirically we found it to produce better results.
|
||||
|
||||
If collection of a profile with branches is not available, e.g. when you run on
|
||||
a VM or on a hardware that does not support it, then you can use only sample
|
||||
If the collection of a profile with branches is not available, e.g., when you run on
|
||||
a VM or on hardware that does not support it, then you can use only sample
|
||||
events, such as cycles. In this case, the quality of the profile information
|
||||
would not be as good, and performance gains with BOLT are expected to be lower.
|
||||
|
||||
@ -127,9 +127,9 @@ would not be as good, and performance gains with BOLT are expected to be lower.
|
||||
NOTE: you can skip this step and feed `perf.data` directly to BOLT using
|
||||
experimental `-p perf.data` option.
|
||||
|
||||
For this step you will need `perf.data` file collected from the previous step and
|
||||
For this step, you will need `perf.data` file collected from the previous step and
|
||||
a copy of the binary that was running. The binary has to be either
|
||||
unstripped, or should have a symbol table intact (i.e. running `strip -g` is
|
||||
unstripped, or should have a symbol table intact (i.e., running `strip -g` is
|
||||
okay).
|
||||
|
||||
Make sure `perf` is in your `PATH`, and execute `perf2bolt`:
|
||||
@ -158,13 +158,13 @@ to the command above. The processing time will be slightly longer.
|
||||
For a full list of options see `-help`/`-help-hidden` output.
|
||||
|
||||
The input binary for this step does not have to 100% match the binary used for
|
||||
profile collection in **Step 1**. This could happen when you are doing an active
|
||||
profile collection in **Step 1**. This could happen when you are doing active
|
||||
development, and the source code constantly changes, yet you want to benefit
|
||||
from profile-guided optimizations. However, since the binary is not exactly the
|
||||
from profile-guided optimizations. However, since the binary is not precisely the
|
||||
same, the profile information could become invalid or stale, and BOLT will
|
||||
report the number of functions with stale profile. The higher the
|
||||
report the number of functions with a stale profile. The higher the
|
||||
number, the less performance improvement should be expected. Thus, it is
|
||||
important to update `.fdata` for important releases.
|
||||
crucial to update `.fdata` for release branches.
|
||||
|
||||
## Multiple Profiles
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user