From 7e2b793d7b2e05c5c4412171660d04dbb793e4bb Mon Sep 17 00:00:00 2001 From: Maksim Panchenko Date: Tue, 11 Dec 2018 19:01:10 -0800 Subject: [PATCH] Fix typos and grammar in README Reviewed By: rafaelauler Differential Revision: D13428707 fbshipit-source-id: 254ca066133 --- README.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index ace819c..f5edae1 100644 --- a/README.md +++ b/README.md @@ -22,12 +22,12 @@ on code layout properties, such as function pointer deltas. Assembly code can be processed too. Requirements for it include a clear separation of code and data, with data objects being placed into data sections/segments. If indirect jumps are used for intra-function control -transfer (e.g. jump tables), the code patterns should be matching those +transfer (e.g., jump tables), the code patterns should be matching those generated by Clang/GCC. -NOTE: BOLT is currently incompatible with the "-freorder-blocks-and-partition" +NOTE: BOLT is currently incompatible with the `-freorder-blocks-and-partition` compiler option. Since GCC8 enables this option by default, you have to -explicitly disable it by adding "-fno-freorder-blocks-and-partition" flag if +explicitly disable it by adding `-fno-reorder-blocks-and-partition` flag if you compiling with GCC8. PIE and .so support has been added recently. Please report bugs if you @@ -35,7 +35,7 @@ encounter any issues. ## Installation -BOLT heavily uses LLVM libraries and by design it is built as one of LLVM +BOLT heavily uses LLVM libraries, and by design, it is built as one of LLVM tools. The build process is not much different from a regular LLVM build. The following instructions are assuming that you are running under Linux. @@ -83,7 +83,7 @@ BOLT will also report if it detects relocations while processing the binary. This step is different for different kinds of executables. If you can invoke your program to run on a representative input from a command line, then check -**For Applications** section below. If your programs typically runs as a +**For Applications** section below. If your program typically runs as a server/service, then skip to **For Services** section. The version of `perf` command used for the following steps has to support @@ -101,7 +101,7 @@ $ perf record -e cycles:u -j any,u -o perf.data -- ... Once you get the service deployed and warmed-up, it is time to collect perf data with LBR (branch information). The exact perf command to use will depend -on the service. E.g. to collect the data for all processes running on the +on the service. E.g., to collect the data for all processes running on the server for the next 3 minutes use: ``` $ perf record -e cycles:u -j any,u -a -o perf.data -- sleep 180 @@ -111,14 +111,14 @@ Depending on the application, you may need more samples to be included with your profile. It's hard to tell upfront what would be a sweet spot for your application. We recommend the profile to cover 1B instructions as reported by BOLT `-dyno-stats` option. If you need to increase the number of samples -in the profile, you can either run the `sleep` command for longer, and/or use +in the profile, you can either run the `sleep` command for longer and use `-F` option with `perf` to increase sampling frequency. Note that for profile collection we recommend using cycle events and not `BR_INST_RETIRED.*`. Empirically we found it to produce better results. -If collection of a profile with branches is not available, e.g. when you run on -a VM or on a hardware that does not support it, then you can use only sample +If the collection of a profile with branches is not available, e.g., when you run on +a VM or on hardware that does not support it, then you can use only sample events, such as cycles. In this case, the quality of the profile information would not be as good, and performance gains with BOLT are expected to be lower. @@ -127,9 +127,9 @@ would not be as good, and performance gains with BOLT are expected to be lower. NOTE: you can skip this step and feed `perf.data` directly to BOLT using experimental `-p perf.data` option. -For this step you will need `perf.data` file collected from the previous step and +For this step, you will need `perf.data` file collected from the previous step and a copy of the binary that was running. The binary has to be either -unstripped, or should have a symbol table intact (i.e. running `strip -g` is +unstripped, or should have a symbol table intact (i.e., running `strip -g` is okay). Make sure `perf` is in your `PATH`, and execute `perf2bolt`: @@ -158,13 +158,13 @@ to the command above. The processing time will be slightly longer. For a full list of options see `-help`/`-help-hidden` output. The input binary for this step does not have to 100% match the binary used for -profile collection in **Step 1**. This could happen when you are doing an active +profile collection in **Step 1**. This could happen when you are doing active development, and the source code constantly changes, yet you want to benefit -from profile-guided optimizations. However, since the binary is not exactly the +from profile-guided optimizations. However, since the binary is not precisely the same, the profile information could become invalid or stale, and BOLT will -report the number of functions with stale profile. The higher the +report the number of functions with a stale profile. The higher the number, the less performance improvement should be expected. Thus, it is -important to update `.fdata` for important releases. +crucial to update `.fdata` for release branches. ## Multiple Profiles