Write two versions of a multithreaded “Hello World” program in which every thread prints a message like:
Hello, I'm thread number 3 of 5 threads
(a)Use Pthread as described in the lecture. The number of threads should be an argument to the program (e.g., ./pthread_hello 4 should create 4 threads). A skeleton and a Makefile are available for download.
Parse the argument (argv[1]) to determine the number of threads (nthreads).
Allocate memory for the pthread_t array and thread_indices.
Create threads:
Each thread is assigned a unique index via thread_indices.
Why thread_indices is important and cannot be NULL:
The perform_work function requires a unique identifier (thread_index) for each thread.
pthread_create passes a pointer (&thread_indices[i]) to perform_work, which allows each thread to know its specific index.
If NULL is passed instead, the function perform_work will receive a NULL pointer, causing a Segmentation Fault when it attempts to dereference it (e.g., *((int *)args)).
The array thread_indices ensures that each thread operates on its dedicated index, avoiding undefined behavior or shared data issues.
The main program waits with pthread_join until all threads are finished.
The program terminates cleanly (memory deallocation is missing and should be added).
(b) Implement an OpenMP version. You may use any compiler that supports OpenMP. On the SuperMUC-NG, both icc and gcc support OpenMP.
Parse the argument (argv[1]) to determine the number of threads (nthreads).
Use OpenMP’s #pragma omp parallel for to parallelize the loop.
Each thread executes the perform_work function with its unique thread index from omp_get_thread_num.
The program completes once all threads have finished.
(c) Modify both versions so that the messages are printed ordered by thread number.
Pthreads Ordered
OpenMP Ordered
2. Calculation of π as the Limit of a Series
The Leibniz formula is one of many methods to calculate π numerically:
k=0∑∞2k+1(−1)k=1−31+51−71+91−⋯=4π
(a) Write a sequential program that calculates π using the Leibniz formula. Approximate at least 8 decimal places.
(b) Parallelize the program using OpenMP and evaluate the speedup compared to the sequential version for an increasing number of threads. Go from one to the total number of cores available on the system.
Note:$lscpu or $cat /proc/cpuinfo
I was intrested in a comparison so I plotted it here are my results:
I used this script to calculate π using the Leibniz formula. The command I ran:
Write a program that calculates the number of primes between 1 and some upper limit N, e.g., N=1000000. Use the following snippet as a prime test:
Experiment with OpenMP’s different loop scheduling options and evaluate their impact on the performance. You can either use the schedule clause or the OpenMP environment variable OMP_SCHEDULE.
There’s an error: 1 is returned as prime, but that’s incorrect. A prime has exactly two divisors: 1 and itself. Since 1 has only one divisor, it is not prime. I changed this in my solution below
Output
Other Solutions and Benchmark
Formula for Speedup
The speedup S is calculated as:
S=TparallelTserial
Explanation
This formula measures how many times faster a parallel execution (Tparallel) is compared to a serial execution (Tserial). A higher S indicates better parallel performance.
4. Generating Partial Sums in Parallel
For a given array A[0…N-1], a second array B[0…N-1] with the prefix sum of A should be generated. The prefix sum bn for the n-th element of B is simply defined as:
bn=k=0∑nak
Where ak is the k-th element of A.
Parallelize the generation of the prefix sums (calculation of the elements of B) in the following program with OpenMP and test the correctness of your implementation.