Courses and Syllabus

Microcredit Course on Accelerated Data Science

Course Overview:Most machine/deep learning systems require heavy computational resources and a long time to train and deploy. Many of these algorithms can be accelerated using GPU programming. The goal of this the course is to cover state of the art in GPU accelerated machine learning systems.

The microcredit course will first cover the theoretical basics of machine learning and deep learning algorithms with emphasis on their prospects of GPU based executions. Then several technologies based on CUDA systems on GPUs will be explained. The course will focus on practical aspects of implementation with actual demonstration with examples. Hand-on session on the Paramshakti supercomputer will be conducted.

Syllabus: Fundamentals of GPU Architecture & CUDA Introduction to Accelerated Data Science: RAPIDS, Introduction to Machine Learning Algorithms Case Study/Hands-on: Solving and Benchmarking End to End Data Science Problem using RAPIDS Introduction to Deep Neural Network & Deep Learning, NVIDIA CUDA-X Platform Overview: Accelerated Computing for Deep Neural Networks Accelerating and Scaling Deep Neural Networks using DALI, Mixed Precision and Multi-GPU Scaling Optimizing and Deployment of Neural Networks using TensorRT & Triton Inference Server


CD61002 - High Performance Scientific Computing (3-1-0-0)

Sparse matrices: discretization of differential equations, storage schemes for sparse matrices, permutations and reorderings, direct solution methods

Iterative methods and convergence: sor, gradient search methods: steepest descent, conjugate gradient algorithm, krylov subspaces methods: arnoldi's method, gmres, symmetric lanczos algorithm, convergence analysis, block krylov methods, preconditioning techniques, ilu factorization preconditioners, multigrid methods

Domain decomposition: schwarz algorithms and the schur complement, graph partitioning: geometric approach, spectral techniques

Parallel computing: architectures for parallel computing, shared and distributed memory performance metrics, parallelization of simple algorithms

Mpi and openmp: basic mpi and openmp calls parallelizing matrix solvers using domain decomposition;

CUDA: gpgpu architecture thread algebra for matrix operations accelerating matrix solvers using cuda


CD61004 - High Performance Computing and its Applications in Complex Physical Systems (3-1-0)

Introduction to hpc architecture and parallel programming: basic architecture and organization: memory hierarchy, shared and distributed memory architectures, multiprocessor architecture, introduction to thread level parallelism, accelerators (gpu, xeon-phi), performance prediction and evaluation, parallel programming/computing: introduction to mpi/ openmp, basics of cuda programming, optimizing cluster operation: running jobs in hpc environment, job scheduler, cluster level load balancing

Special methods for studying complex systems: basics of statistical mechanics, potential energy surface, introduction to molecular mechanics, simulation methods: molecular dynamics and monte carlo simulations, enhanced sampling methods, coarse-grain modeling

Applications to complex systems: open-source software: md and mc simulation packages, parallelization in software: domain/spatial decomposition, distribution of non-bonded interactions, dynamic load balancing, multiprocessor communication, modeling of soft matter systems such as biomolecules, polymers, carbon nanostructures etc., Computation of thermodynamic, kinetic and mechanical properties of different complex systems