Reliability for exascale computing : system modelling and error mitigation for task-parallel HPC applications 

    Subasi, Omer (Date of defense: 2016-10-27)

    As high performance computing (HPC) systems continue to grow, their fault rate increases. Applications running on these systems have to deal with rates on the order of hours or days. Furthermore, some studies for future ...