1. Profiling, Tracing and Linux Perf
a) What are the conceptual differences between profiling and tracing? What are the respective advantages and disadvantages of these techniques?
Useful Resource on Profiling vs. Tracing
While researching, I came across this helpful article:
Profiling vs. Tracing by J. Whitham
-
Profiling
- Similar to debugging with backtracing.
- Running code is interrupted at given time intervals (e.g., every millisecond). In each interruption, a ‘screenshot’ or sample is taken of the code by visiting each running thread and examining its stack to discover which functions are running.
- Each sample is aggregated into a report or graph (e.g., flame graphs).
- Profiling cannot be used to construct a trace.
-
Tracing
- Does not operate by sampling.
- A trace is a log of events within the program during runtime.
- The log is customizable and may report function calls, returns, and execution of other statements.
- May require the program to be instrumented, i.e., modified to include log events. These might need to be added to the source code as a pre-compilation step (‘instrumented code’) or might be added dynamically to the machine code.
- A detailed trace can be used to reconstruct a profile.
- Useful for discovering the chain of events that led to a problem.
- More difficult to set up than profiling.
- Detailed traces are large.
- Instrumented code is never as easy to compile as the original code.
b) Perf
is a powerful Linux tool for profiling, tracing, and performance analysis.
- Use
perf --help
to get an overview ofperf
’s capabilities. - Use
perf list
to list performance events and metric groups that can be measured byperf
on SuperMUC-NG. - Use
perf stat -e <event0,event1,…> <command>
to measure cache misses during execution of the parallel triad or matrix multiplication.
It might be more convenient to use an interactive session on SuperMUC-NG.
The commandsalloc -t 30 -A pr58ci -N 1
reserves a single node for 30 minutes for an interactive session.
What I ran
2. Hardware Performance Counter
Performance counters are available in the performance monitoring units (PMUs) on many CPUs. They can be configured to count performance events, e.g., retired instructions or cache misses.
A popular interface to configure and read performance counters. This is the Performance Application Programming Interface (PAPI).
PAPI enables measuring the occurrence of performance events in specific code regions.SuperMUC-NG has a recent version of PAPI installed. You have to load it with
module load papi
before use.
You can list available events withpapi_avail -a
.
Interesting events are those related to the memory subsystem, as well as instructions and floating-point operations, e.g.:
PAPI_L1_DCM
PAPI_L2_DCM
PAPI_L3_TCM
PAPI_TOT_INS
PAPI_DP_OPS
A program can be compiled with PAPI like so:
See the documentation of the easier to use “high-level” API or consult the more powerful “low-level” API, where you can find instructions and examples.
- Insert measurement probes into the matrix multiplication code to measure performance events during the matrix multiplication.
- Show the effect of matrix multiplication optimizations (loop reordering, cache blocking, …) with suitable performance event measurements.
3. MPI: Hello World – Approximating π
Find an interactive Monte Carlo Method to estimate π.
Write an MPI program that approximates π. You can use collective or point-to-point operations for your implementation.
Run the program using 4x48 processes on 4 nodes of SuperMUC-NG.
Hint: You will need to modify your existing batch files for that. Examples for MPI-specific batch files can be found in the [SuperMUC’s documentation](SuperMUC’s documentation).
4. MPI: Communication Benchmarks
Determine the latency and bandwidth of the SuperMUC-NG nodes with the help of benchmark applications.
-
Use the OSU Micro-Benchmarks of the Ohio State University (OSU). You can download the code at http://mvapich.cse.ohio-state.edu/benchmarks/ (click on “Tarball”).
-
Copy the files to SuperMUC and extract the tarball (
tar -xf <file>
). Compile and install the benchmarks in the extracted directory:
Using these commands the benchmarks will be installed into your home directory under `osu/`. Point-to-point benchmarks can be found in `osu/libexec/osu-micro-benchmarks/mpi/pt2pt/`.
- Determine the latency and bandwidth inside one node (MPI processes on (a) same socket (b) different socket) and between different nodes of the system. You can use the program `osu_latency` to determine the latency and `osu_bw` for the bandwidth.
- Use the attached batch file to execute the benchmarks. You have to adapt `nodes`, `ntasks-per-node` and the application name accordingly!
<!-- DISQUS SCRIPT COMMENT START -->
<!-- DISQUS RECOMMENDATION START -->
<div id="disqus_recommendations"></div>
<script>
(function() { // REQUIRED CONFIGURATION VARIABLE: EDIT THE SHORTNAME BELOW
var d = document, s = d.createElement('script'); // IMPORTANT: Replace EXAMPLE with your forum shortname!
s.src = 'https://myuninotes.disqus.com/recommendations.js'; s.setAttribute('data-timestamp', +new Date());
(d.head || d.body).appendChild(s);
})();
</script>
<noscript>
Please enable JavaScript to view the
<a href="https://disqus.com/?ref_noscript" rel="nofollow">
comments powered by Disqus.
</a>
</noscript>
<!-- DISQUS RECOMMENDATION END -->
<hr style="border: none; height: 2px; background: linear-gradient(to right, #f0f0f0, #ccc, #f0f0f0); margin-top: 4rem; margin-bottom: 5rem;">
<div id="disqus_thread"></div>
<script>
/**
* RECOMMENDED CONFIGURATION VARIABLES: EDIT AND UNCOMMENT THE SECTION BELOW TO INSERT DYNAMIC VALUES FROM YOUR PLATFORM OR CMS.
* LEARN WHY DEFINING THESE VARIABLES IS IMPORTANT: https://disqus.com/admin/universalcode/#configuration-variables */
/*
var disqus_config = function () {
this.page.url = PAGE_URL; // Replace PAGE_URL with your page's canonical URL variable
this.page.identifier = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page's unique identifier variable
};
*/
(function() { // DON'T EDIT BELOW THIS LINE
var d = document, s = d.createElement('script');
s.src = 'https://myuninotes.disqus.com/embed.js';
s.setAttribute('data-timestamp', +new Date());
(d.head || d.body).appendChild(s);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
<!-- DISQUS SCRIPT COMMENT END -->