Master’s Thesis – Building a Monte-Carlo particle transport proxy-app in Rust

Evaluating Rust’s viability for HPC applications – GitHubReport

Context

The document is an internship report on the development of a Monte-Carlo particle transport code using the Rust programming language. It constitutes my Master’s thesis, and was undertaken at the French Alternative Energies and Atomic Energy Commission to evaluate Rust’s capabilities in handling highly parallel computations, specifically in the context of Monte-Carlo particle transport simulations.

To assess performance, memory management, code safety, and efficiency, we choose to rewrite a C++ proxy application called Quicksilver in Rust. Monte-Carlo methods are embarrassingly parallel but require high computational power for scaling. Thus, the project aimed to determine if Rust’s advanced memory safety and concurrency features could outperform or enhance traditionally used C++ code, as well as provide a better framework for the development of such applications.

Illustrated Principles

The document explains several concepts and processes related to the subject and its context:

  1. Monte-Carlo Methods: These are algorithms using random number sampling to estimate results based on statistical probabilities. They are commonly used in scientific computing and simulation codes.

  2. Parallelism & Synchronization: Monte-Carlo methods are generally highly parallelizable due to the independence of samples. This often enables SIMD, GPU, or, more generally, shared-memory parallelization. In our case, particles can be processed independently, although they require read-write access to shared data.

  3. Rust’s Advantages: The report highlights Rust’s strict ownership and borrowing system, ensuring thread safety and memory efficiency, which were central to preventing data races and undefined behavior. The rayon crate allowed for easy introduction of parallelism in Rust, providing high-level abstractions for data parallelism.

  4. Optimization: The project involved various optimizations, all roughly fitting in one of these categories:

    • Refactoring C++-style code into Rust idiomatic patterns (e.g., iterators instead of indexed loops) to improve both performance and code readability.
    • Profiling and benchmarking tools such as perf, VTune, and criterion were used to monitor performance and pinpoint bottlenecks.
    • Improving execution flow and data structures with additions, deletions, and simplifications.
  5. Data Contention: Fastiron, the Rust implementation, introduced chunking and thread-local storages to optimize parallel execution, minimizing contention over shared data. These optimizations improved memory access patterns and decreased the cost of atomic operations during processing.

Conclusions Reached

  1. Performance Gains: Fastiron outperformed the reference C++ Quicksilver implementation in both sequential and parallel contexts. By the end of the project, Fastiron achieved roughly twice the performance of OpenMP-only Quicksilver.

  2. Rust’s Strengths for HPC: The project demonstrated that Rust is viable for high-performance computing (HPC) applications, particularly because of its ability to prevent concurrency issues at compile-time, reducing debugging time. Rust’s strict memory and ownership model also increase the feasability of large scale refactors.

  3. Trade-offs: While Rust brought benefits in terms of code safety and parallel execution, its memory footprint was larger compared to Quicksilver due to Rust’s borrowing rules and what they implied for data initialization. Nevertheless, the overall number of allocator calls made by Fastiron was only a tenth of Quicksilver’s.

Overall, Rust’s features play well with proxy-application development. It’s fearless concurrency claim also are useful in the HPC context. That being said, the language definetely has issues in the way of a larger adoption in the domain. We can cite, for example, the high entry cost as well as the lack of GPGPU support in pure Rust.