Computer Science and Engineering, Department of

 

Document Type

Article

Date of this Version

10-2022

Citation

Proc. ACM Program. Lang., Vol. 6, No. OOPSLA2, Article 153. Publication date: October 2022. https://doi.org/10.1145/3563316

Comments

Used by permission.

Abstract

Context-sensitive inter-procedural alias analyses are more precise than intra-procedural alias analyses. However, context-sensitive inter-procedural alias analyses are not scalable. As a consequence, most of the production compilers sacrifice precision for scalability and implement intra-procedural alias analysis. The alias analysis is used by many compiler optimizations, including loop transformations. Due to the imprecision of alias analysis, the programโ€™s performance may suffer, especially in the presence of loops.

Previous work proposed a general approach based on code-versioning with dynamic checks to disambiguate pointers at runtime. However, the overhead of dynamic checks in this approach is ๐‘‚(๐‘™๐‘œ๐‘” ๐‘›), which is substantially high to enable interesting optimizations. Other suggested approaches, e.g., polyhedral and symbolic range analysis, have ๐‘‚(1) overheads, but they only work for loops with certain constraints. The production compilers, such as LLVM and GCC, use scalar evolution analysis to compute an ๐‘‚(1) range check for loops to resolve memory dependencies at runtime. However, this approach also can only be applied to loops with certain constraints.

In this work, we present our tool, Scout, that can disambiguate two pointers at runtime using single memory access. Scout is based on the key idea to constrain the allocation size and alignment during memory allocations. Scout can also disambiguate array accesses within a loop for which the existing ๐‘‚(1) range checks technique cannot be applied. In addition, Scout uses feedback from static optimizations to reduce the number of dynamic checks needed for optimizations.

Our technique enabled new opportunities for loop-invariant code motion, dead store elimination, loopvectorization, and load elimination in an already optimized code. Our performance improvements are up to 51.11% for Polybench and up to 0.89% for CPU SPEC 2017 suites. The geometric means for our allocatorโ€™s CPU and memory overheads for CPU SPEC 2017 benchmarks are 1.05%, and 7.47%, respectively. For Polybench benchmarks, the geometric mean of CPU and memory overheads are 0.21% and 0.13%, respectively.

Share

COinS