2. The Top500 List

(a) Explain the terms Rmax and Rpeak in the ranking

  • Rmax
    • → Maximal LINPACK performance achieved
    • LINPACK benchmark measures how fast it can solve a large System of linear equations
  • Rpeak
    • → Theoretical peak performance
    • calculated based on its hardware specifications, such as the number of processors and their clock speeds
  • Exponential Performance Growth: Supercomputers have experienced rapid increases in computational power, with the leading system’s performance doubling approximately every two years.
  • Rise of Parallel Processing: There’s been a significant increase in the number of processing cores, reflecting a shift towards parallel architectures to handle complex computations more efficiently.
  • Enhanced Energy Efficiency: Modern supercomputers are achieving higher performance while consuming less energy, emphasizing a focus on sustainable high-performance computing.

(c) Choose any high ranked system from the Top500 list and discuss its specification compared to SuperMUC-NG

SystemManufacturerCoresProcessorLINPACK Performance (Rmax)RpeakPower Consumption
FrontierHPE Cray8,699,904AMD EPYC 64-core CPUs with AMD MI250X GPUs1.1 exaflops2 exaflops~21 MW
SuperMUC-NGLenovo311,040Intel Xeon Platinum 8174 24-core CPUs19.48 petaflops26.87 petaflops~4 MW

(d) Today, mobile devices, such as smartphones and tablets, are equipped with relatively powerful CPUs. Could a mobile device achieving a Linpack performance of 2 GFLOP/sec have been ranked in past Top500 lists? What is the best ranking it could have reached?

A mobile device achieving 2 GFLOP/s could have ranked within the TOP500 lists up until June 1995. Beyond this date, the performance of the 500th-ranked system exceeded 2 GFLOP/s, making it ineligible for the list thereafter.

(e) Assume a CPU with the following specification: 20 cores @ 2 GHz base frequency, 2x floating-point units per core with support for fused multiply-add (FMA) instructions, and a 256-bit vector length. Calculate the theoretical (double precision floating-point) peak performance of this CPU

  1. Determine Elements per Vector:

  2. Flops per FMA Instruction:

  3. FLOPs per FPU

  4. FLOPs per Core

  5. Total Theoretical Peak Performance:

3. Moore’s Law

> Moore’s Law states that the number of transistors per area unit in an integrated circuit roughly doubles every 18 months. These additional resources can be used to improve performance. How did the performance of the systems ranked in the Top500 list change? Discuss the total performance of all systems, the #1 rank, and the #500 rank with respect to Moore’s Law.

The performance of the systems closely follows the trend predicted by Moore’s Law. We can observe that the #1 ranked supercomputers experience a significant performance increase approximately every two years, creating a “staircase” effect in the graph as each new generation surpasses the previous one. Similarly, the #500 ranked systems also show steady performance growth in alignment with Moore’s prediction, although at a more gradual pace. This overall pattern highlights the consistent improvements across both the top-tier and lower-ranked systems on the Top500 list, validating the impact of Moore’s Law on supercomputing advancements.

4. C Programming

(a) If you are not familiar with C, you can follow, for example, Beej’s Guide to C. First run a hello world example, then read up on the topics arrays, pointers and dynamic memory allocation

https://beej.us/guide/bgc/

(b) Implement the following program in C: Allocate an array array_a of 100 int values dynamically using malloc. Allocate an array array_s of 100 int values statically (i.e., without using malloc). Fill array_a with values from 0 to 99. Copy values from array_a to array_s in reverse order. Print out the values from both arrays. Deallocate (free) the memory of array_a

#include <stdio.h>  /* printf */
#include <stdlib.h> /* malloc, free */
 
int main(int argc, char *argv[]) {
  int *array_d = (int *)malloc(sizeof(int) * 100);
  if (array_d == NULL) {
    return EXIT_FAILURE;
  }
  int array_s[100];
 
  for (int i = 0; i < 100; i++) {
    array_d[i] = i;
    printf("array_d[%i] = %i\n", i, i);
  }
  int zero_index = 0;
  for (int i = 99; i >= 0; i--) {
    array_s[zero_index] = array_d[i];
    printf("array_s[%i] = %i\n", zero_index, array_s[zero_index]);
    zero_index++;
  }
  free(array_d);
  return EXIT_SUCCESS;
}

(c) Why is it a mistake to try to deallocate the memory of array_s with free?

The memory of array_s should not be deallocated with free because it is a static array allocated on the stack, and stack memory is managed automatically. Only dynamically allocated memory on the heap, like array_d (allocated with malloc), requires explicit deallocation with free. Using free on array_s would lead to undefined behavior.

5. Passing a 2D Array in C

Note

Passing an N x M 2D int array as argument to a function can be implemented with several possible parameter types for the 2D array in the function signature:

int matrix[][]
int matrix[0][0]
int (*matrix)[] //array pointer to first row
int *matrix //init pointer to first matrix element

a) Implement the function void print_matrix_arr(int rows, int cols, int matrix[][cols]).

  • Pass matrix to print_matrix_arr in the provided test program to test your implementation.

b) Implement the function void print_matrix_ptr(int rows, int cols, int *matrix).

  • Pass matrix to print_matrix_ptr in the provided test program to test your implementation.

Notes:

  • Depending on the implementation, the number of array elements per sub-array must appear before the array in the parameter list.
  • The 2D array must be passed either as 2D array type, array pointer type, or as a pointer to its first element. In the latter, the number of elements in each sub-array is not part of the passed type.

Solution

#include <stdio.h>
 
void print_matrix_arr(int rows, int cols, int matrix[][cols]) {
  for (int i = 0; i < rows; i++) {
    for (int j = 0; j < cols; j++) {
      printf("%i ", matrix[i][j]);
    }
    printf("\n");
  }
}
 
void print_matrix_ptr(int rows, int cols, int *matrix) {
  for (int i = 0; i < rows; i++) {
    for (int j = 0; j < cols; j++) {
      printf("%i ", *matrix);
      matrix++;
    }
    printf("\n");
  }
}
 
int main(void) {
  int matrix[2][3] = {{1, 2, 3}, {4, 5, 6}};
 
  // expected output:
  // 1 2 3
  // 4 5 6
 
  printf("print_matrix_arr:\n");
  print_matrix_arr(2, 3, matrix);
 
  printf("print_matrix_ptr:\n");
  print_matrix_ptr(2, 3, (int *)matrix);
}
 

6. Linked List in C

Note

Implement the function struct element *push_back(struct element *head, int val) which appends a new list element with value val to the tail of a linked list, and returns a pointer to the new list tail on success, or NULL on error.

The provided test program shall:

  1. Print: head --> [ 42 ] --> [ 99 ] --> [ 37 ] --> [ 13 ] --> NULL
    after push_back(head, 13)
  2. Not crash!

Note:

The linked list head is passed to push_back as the parameter head, and you need to iterate to the tail of the list before you can append the new list element.

Solution to Pushback

struct element *push_back(struct element *head, int val) {
  // check for valid input of 'head'
  if (head == NULL) {
    struct element *new = malloc(sizeof(struct element));
    if (new == NULL) {
      return NULL;
    }
    new->val = val;
    new->next = NULL;
    return new;
  }
  // set 'head' to last element
  while (head->next != NULL) {
    head = head->next;
  }
 
  struct element *new;
 
  // allocate storage for new list element
  new = malloc(sizeof(struct element));
  if (new == NULL) {
    return NULL;
  }
  // set 'val' and 'next'
  new->val = val;
  new->next = NULL;
  head->next = new;
  return new;
}

Complete Solution Code

#include <stdio.h>
#include <stdlib.h>
 
struct element {
  int val;
  struct element *next; // pointer to next list element
};
 
/* TODO: implement push_back function body */
struct element *push_back(struct element *head, int val) {
  // check for valid input of 'head'
  if (head == NULL) {
    struct element *new = malloc(sizeof(struct element));
    if (new == NULL) {
      return NULL;
    }
    new->val = val;
    new->next = NULL;
    return new;
  }
  // set 'head' to last element
  while (head->next != NULL) {
    head = head->next;
  }
 
  struct element *new;
 
  // allocate storage for new list element
  new = malloc(sizeof(struct element));
  if (new == NULL) {
    return NULL;
  }
  // set 'val' and 'next'
  new->val = val;
  new->next = NULL;
  head->next = new;
  return new;
}
 
struct element *push_front(struct element *head, int val) {
  struct element *new = malloc(sizeof(struct element));
  if (new == NULL) {
    return NULL;
  }
 
  // new->val  = val; new->next = head;
  *new = (struct element){.val = val, .next = head};
 
  return new;
}
 
void print_list(struct element *head) {
  printf("head --> ");
  while (head != NULL) {
    printf("[ %d ] --> ", head->val);
    head = head->next;
  }
  printf("NULL\n");
}
 
int main(void) {
  struct element *head = NULL; // empty list
 
  head = push_front(head, 37);
  head = push_front(head, 99);
  head = push_front(head, 42);
 
  printf("before push_back(head, 13):\n");
  print_list(head);
 
  push_back(head, 13);
 
  printf("after push_back(head, 13):\n");
  // expected output: head --> [ 42 ] --> [ 99 ] --> [ 37 ] --> [ 13 ] --> NULL
  print_list(head);
 
  // this should not segfault:
  push_back(NULL, 0);
}