Rangsiman Ketkaew
Ph.D. Student in Computational Chemistry and Machine Learning at UZH
Home
Blog
Computational Science
Chatbot
Machine Learning: Iris as a case study
Microsoft Imagine Premium for Student and Lecturer
Skills Checklist
General technical skills
Computer Science Skills
Essential skills for software development
Essential skills for HPC
Essential skills for coding GPU
Essential skills for machine learning (bonus)
General technical skills
For Windows:
Install programs and modify system variables
Install Nvidia CUDA toolkit and driver
Setup VPN and local network
Setup dual boost for Linux, or install WSL for Ubuntu
For Linux and macOS:
Basic/intermediate commands
Know some important files/folders
Understand some environmen variables
Scripting programming language
Bash, awk, perl, Python + Jupyter notebook
Cluster / HPC
Understand terminology: master node, compute node, scheduler, CPU cores, processes, memory
Scheduler: Slurm, PBS, SGE
Software manager: module
Software
Commercial: Gaussian, Q-Chem, ADF, MOLPRO, MOLCAS, TURBOMOLE and many more
Non-commercial: PySCF, Psi4, OpenMOLCAS, GAMESS, ORCA, NWChem, DIRAC, DALTON, CP2K, LAMMPS, VASP, QE and many more
Full list is
here
Graphic visualization
JMol, Molden, Gaussview, Avogadro, UCSF Chimera, VMD, Ovito, PyMol
2D and 3D plots
Other useful tools
ASE, MDTraj, Pymatgen, RDKit, OpenBabel
Writing
Microsoft Word
LaTeX
Compiler: pdflatex, xelatex, lualatex
Distribution: TeX Live, MikTeX
Editor: OverLeaf, TeXstudio, Texmaker
Presentation
Powerpoint
LaTeX Beamer
Computer Science Skills
Programming (Python, C++, etc.)
Scientific Computing
Scientific Programming in Python
High Performance Computing
Computer Applications in Chemistry, Biology, and Physics
Programming in C++
Algorithm in Software Development
Interdisciplinary Research Methods In Computational Biology
Data Analysis in Biology
Introduction to Machine Learning
Introduction to Deep Learning
Neural Network and Computer Vision
Essential skills for software development
Tex editor
Vi/Vim, Nano
VS Code, Atom, Eclipse, Sublime
File format
XML, JSON
General programming skills
Type of variables
Loops and conditional statement
Input/output
High-level programming
Python
Pip and conda: Python helper
NumPy: Array (vector, matrix) computation
Numba: JIT compiler for NumPy
Jax: autograd of NumPy array
SciPy: a collection of math functions/routines
Scikit-learn: statistics routines, optimization, curve fitting
Intel Scikit-learn is 10x faster than the standard one
Matplotlib / Plotly for plotting graph
Theano: numerical computation
SCOOP: distributed modules for parallel programming
NetworkX: Graph library
Low-level programming
C
Function, pointer, storage class
Enum, struct, union
Preprocessor
Operator, memory management, array
File handling
C++
C++ 11 or newer
Type of variable: signed, unsigned, long, double, etc.
Loops, conditional statement
Standard libraries: vector, rand
Understanding header and source file
Preprocessor
Function, class, struct, template
Declaration
namespace, const, attribute, pointer, pass by reference, static_assert
Initialization
Misc: casting, lambda expression, encapsulation, file handling, exception handling
Fortran
Learn either F77 or F90 or modern fortran (2003, 2008, 2018)
Module, subroutine, function
Array (allocatable and multidimentional) and string
Operator overloading
Flow control
Derived type
Callback
Interfacing to other language e.g. Python or C++
GNU library
GSL
Many more libraries
here
Memory allocation
Stack, heap, global memory
Math libraries
BLAS (OpenBLAS)
LAPACK for linear algebra
ScaLAPACK - a higher level LAPACK
Intel MKL (Intel oneAPI)
FFTW: for computing the discrete Fourier transform in one or more dimensions, real and complex data
Eigen: linear algebra library
Boost: a collection of C++ functions
QM libraries
libxc: XC function library
libint: For computing Gaussian integral
libcint: general GTO integrals
Code optimization
Benchaming/scaling
Complexity (Big O)
GNU
Static and dynamic libraries
Archive
Compiling (g++, gcc) and linking (ld)
Useful flags for compiler and linker
Compilng tools
autoconf
configure
Make, cmake, automake
Debugging
gdb for general debugging
Valgrind for memory leak analysis
Git (source code control)
Basic/intermediate commands
GitHub & GitLab
Documentation
Sphinx (for markdown and reStructuredText)
Doxygen
Essential skills for HPC
Architecture
Memory management
Threading, multithreading
Block
Parallel computing (SPMD)
Shared memory: OpenMP
Distributed memory: MPI
Implementations: OpenMPI, Intel MPI, MVAPICH
Intel ecosystem
OpenMP compiler: icc, ifort
MPI compiler: mpicc, mpiicc (for Intel C compiler), mpicxx (for C++), mpiifort (for Fortran)
Cloud computing (bonus)
Server and database
Networking
Essential skills for coding GPU
Intermediate/advanced C or C++ skills
Programming model: Kernels, thread hierarchy, memory hierarchy, heterogeneous hierarchy, asynchronous SIMT
CUDA
Understand CUDA operation:
Declare and allocate host and device memory.
Initialize host data.
Transfer data from the host to the device.
Execute one or more kernels.
Transfer results from the device to the host.
CUDA C and CUDA C++ API
Compiler: nvcc
Essential skills for machine learning (bonus)
Basic math: linear algebra and calculus
Programming
Python, R, Julia, Matlab
TensorFlow, PyTorch, Scikit-learn
Python lib
NumPy
Pandas
Terminology: regression, classification, descriptor, feature, kernel, activation function
Data analysis/engineering: EDA, ETL
Graphical representation
Histogram, bar plot, heatmaps
ML algorithms
Decision tree
Random forest
Support vector machine
Principal component analysis
Kernel-ridge method
Neural network
Feedforward NN
Autoencoder
CNN
RNN (LSTM)
GNN
Adversarial NN
GAN
Model training and optimization
Hyperparameter optimization
Techniques to prevent overfittingTechniques
Data augmentation, early stopping, regularization, dropout, batch normalization
Deploying model