Fault-Tolerance Techniques for High-Performance Computing

Fault-Tolerance Techniques for High-Performance Computing
Title Fault-Tolerance Techniques for High-Performance Computing PDF eBook
Author Thomas Herault
Publisher Springer
Total Pages 325
Release 2015-07-01
Genre Computers
ISBN 3319209434

Download Fault-Tolerance Techniques for High-Performance Computing Book in PDF, Epub and Kindle

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.

Advances in Mathematical Methods and High Performance Computing

Advances in Mathematical Methods and High Performance Computing
Title Advances in Mathematical Methods and High Performance Computing PDF eBook
Author Vinai K. Singh
Publisher Springer
Total Pages 503
Release 2019-02-14
Genre Computers
ISBN 3030024873

Download Advances in Mathematical Methods and High Performance Computing Book in PDF, Epub and Kindle

This special volume of the conference will be of immense use to the researchers and academicians. In this conference, academicians, technocrats and researchers will get an opportunity to interact with eminent persons in the field of Applied Mathematics and Scientific Computing. The topics to be covered in this International Conference are comprehensive and will be adequate for developing and understanding about new developments and emerging trends in this area. High-Performance Computing (HPC) systems have gone through many changes during the past two decades in their architectural design to satisfy the increasingly large-scale scientific computing demand. Accurate, fast, and scalable performance models and simulation tools are essential for evaluating alternative architecture design decisions for the massive-scale computing systems. This conference recounts some of the influential work in modeling and simulation for HPC systems and applications, identifies some of the major challenges, and outlines future research directions which we believe are critical to the HPC modeling and simulation community.

High Performance Computing in Science and Engineering

High Performance Computing in Science and Engineering
Title High Performance Computing in Science and Engineering PDF eBook
Author Tomáš Kozubek
Publisher Springer Nature
Total Pages 172
Release 2021-01-07
Genre Computers
ISBN 3030670775

Download High Performance Computing in Science and Engineering Book in PDF, Epub and Kindle

This book constitutes the thoroughly refereed post-conference proceedings of the 4th International Conference on High Performance Computing in Science and Engineering, HPCSE 2019, held in Karolinka, Czech Republic, in May 2019. The 9 papers presented in this volume were carefully reviewed and selected from 13 submissions. The conference provides an international forum for exchanging ideas among researchers involved in scientific and parallel computing, including theory and applications, as well as applied and computational mathematics. The focus of HPCSE 2019 was on models, algorithms, and software tools that facilitate efficient and convenient utilization of modern parallel and distributed computing architectures, as well as on large-scale applications.

Proceedings of the 5th Workshop on Fault Tolerance for HPC at EXtreme Scale

Proceedings of the 5th Workshop on Fault Tolerance for HPC at EXtreme Scale
Title Proceedings of the 5th Workshop on Fault Tolerance for HPC at EXtreme Scale PDF eBook
Author Nathan DeBardeleben
Publisher
Total Pages 72
Release 2015
Genre Computer science
ISBN 9781450335690

Download Proceedings of the 5th Workshop on Fault Tolerance for HPC at EXtreme Scale Book in PDF, Epub and Kindle

Innovative Research and Applications in Next-Generation High Performance Computing

Innovative Research and Applications in Next-Generation High Performance Computing
Title Innovative Research and Applications in Next-Generation High Performance Computing PDF eBook
Author Hassan, Qusay F.
Publisher IGI Global
Total Pages 488
Release 2016-07-05
Genre Computers
ISBN 1522502882

Download Innovative Research and Applications in Next-Generation High Performance Computing Book in PDF, Epub and Kindle

High-performance computing (HPC) describes the use of connected computing units to perform complex tasks. It relies on parallelization techniques and algorithms to synchronize these disparate units in order to perform faster than a single processor could, alone. Used in industries from medicine and research to military and higher education, this method of computing allows for users to complete complex data-intensive tasks. This field has undergone many changes over the past decade, and will continue to grow in popularity in the coming years. Innovative Research Applications in Next-Generation High Performance Computing aims to address the future challenges, advances, and applications of HPC and related technologies. As the need for such processors increases, so does the importance of developing new ways to optimize the performance of these supercomputers. This timely publication provides comprehensive information for researchers, students in ICT, program developers, military and government organizations, and business professionals.

Software Fault Tolerance Techniques and Implementation

Software Fault Tolerance Techniques and Implementation
Title Software Fault Tolerance Techniques and Implementation PDF eBook
Author Laura L. Pullum
Publisher Artech House
Total Pages 368
Release 2001
Genre Computers
ISBN 9781580534703

Download Software Fault Tolerance Techniques and Implementation Book in PDF, Epub and Kindle

Look to this innovative resource for the most comprehensive coverage of software fault tolerance techniques available in a single volume. It offers you a thorough understanding of the operation of critical software fault tolerance techniques and guides you through their design, operation and performance. You get an in-depth discussion on the advantages and disadvantages of specific techniques, so you can decide which ones are best suited for your work. The book examines key programming techniques such as assertions, checkpointing, and atomic actions, and provides design tips and models to assist in the development of critical fault tolerant software that helps ensure dependable performance. From software reliability, recovery, and redundancy... to design and data diverse software fault tolerance techniques, this practical reference provides detailed insight into techniques that can improve the overall dependability of your software.

High Performance Computing in Clouds

High Performance Computing in Clouds
Title High Performance Computing in Clouds PDF eBook
Author Edson Borin
Publisher Springer Nature
Total Pages 337
Release 2023-07-05
Genre Computers
ISBN 3031297695

Download High Performance Computing in Clouds Book in PDF, Epub and Kindle

This book brings a thorough explanation on the path needed to use cloud computing technologies to run High-Performance Computing (HPC) applications. Besides presenting the motivation behind moving HPC applications to the cloud, it covers both essential and advanced issues on this topic such as deploying HPC applications and infrastructures, designing cloud-friendly HPC applications, and optimizing a provisioned cloud infrastructure to run this family of applications. Additionally, this book also describes the best practices to maintain and keep running HPC applications in the cloud by employing fault tolerance techniques and avoiding resource wastage. To give practical meaning to topics covered in this book, it brings some case studies where HPC applications, used in relevant scientific areas like Bioinformatics and Oil and Gas industry were moved to the cloud. Moreover, it also discusses how to train deep learning models in the cloud elucidating the key components and aspects necessary to train these models via different types of services offered by cloud providers. Despite the vast bibliography about cloud computing and HPC, to the best of our knowledge, no existing manuscript has comprehensively covered these topics and discussed the steps, methods and strategies to execute HPC applications in clouds. Therefore, we believe this title is useful for IT professionals and students and researchers interested in cutting-edge technologies, concepts, and insights focusing on the use of cloud technologies to run HPC applications.