Transient and Permanent Error Control for Networks-On-Chip

Transient and Permanent Error Control for Networks-On-Chip
Title Transient and Permanent Error Control for Networks-On-Chip PDF eBook
Author Springer
Publisher
Total Pages 172
Release 2012-05-01
Genre
ISBN 9781461409632

Download Transient and Permanent Error Control for Networks-On-Chip Book in PDF, Epub and Kindle

Transient and Permanent Error Control for Networks-on-Chip

Transient and Permanent Error Control for Networks-on-Chip
Title Transient and Permanent Error Control for Networks-on-Chip PDF eBook
Author Qiaoyan Yu
Publisher Springer Science & Business Media
Total Pages 166
Release 2011-11-18
Genre Technology & Engineering
ISBN 1461409624

Download Transient and Permanent Error Control for Networks-on-Chip Book in PDF, Epub and Kindle

This book addresses reliability and energy efficiency of on-chip networks using cooperative error control. It describes an efficient way to construct an adaptive error control codec capable of tracking noise conditions and adjusting the error correction strength at runtime. Methods are also presented to tackle joint transient and permanent error correction, exploiting the redundant resources already available on-chip. A parallel and flexible network simulator is also introduced, which facilitates examining the impact of various error control methods on network-on-chip performance.

Transient and Permanent Error Management for Networks-on-chip

Transient and Permanent Error Management for Networks-on-chip
Title Transient and Permanent Error Management for Networks-on-chip PDF eBook
Author Qiaoyan Yu
Publisher
Total Pages 502
Release 2011
Genre
ISBN

Download Transient and Permanent Error Management for Networks-on-chip Book in PDF, Epub and Kindle

"Reliability has become one of the most important metrics for on-chip communications infrastructures in nanoscale technologies. Reduced supply voltages and high clock frequency exacerbate the impact of noise sources such as particle strikes and crosstalk, which can cause transient errors in transmitted data. Manufacturing defects and aging issues can cause permanent errors in the communication links. The modularity of the Networks-on-Chip (NoCs) approach facilitates the exploration of error control techniques for on-chip interconnects and many-cores systems. Unfortunately, error control is not free. Worst-case error management methods are simple but waste energy and bandwidth in favorable noise conditions. Consequently, cost-effective techniques for improving link error resilience are needed. In this work, we propose configurable error control methods to tackle variable transient errors and exploit existing transient error control redundancy for permanent error management, achieving high reliability and low average energy consumption with minor area overhead. To adapt to the variable transient error rates, a configurable error control coding (ECC) scheme is proposed for datalink-layer transient error management. The proposed method can adjust both error detection and error correction capability at runtime by varying the number of redundant wires for parity check bits. The obtained error resilience makes the proposed method suitable for a range of link error rates. Configuring the number of redundant wires to match the noise conditions reduces the average energy consumption in the ECC codec and interconnect link. A hardware efficient implementation for the configurable ECC is presented, as well. We integrate the error control techniques in the datalink and physical layers to co-manage transient and permanent errors. Infrequently used redundant wires for the configurable ECC are utilized as spare wires to replace permanently unusable links. To maintain the transient and permanent error co-management capability as noise conditions change, we propose a packet re-organization algorithm combined with shortening error control coding method. This method reduces the need for energy consuming fault-tolerant routing, minimizing latency and energy overhead induced by error control. This co-management method is suitable for NoCs operating in variable noise conditions with a small number of permanently unusable wires. To further improve energy efficiency, the adaptation on ECC is extended to the network layer. We employ end-to-end error control in the network layer in low noise conditions and enhance the error control capability in high noise conditions by adding hop-to-hop error control in the datalink layer. A protocol that boosts or reduces error control strength is presented to support runtime seamless ECC mode switching. Simply combining end-to-end error control with hop-to-hop error control significantly increases energy consumption. To address this issue, we apply the concept of product codes to the dual-layer error control; the hop-to-hop error control is designed to be compatible with one dimension of the product code. Consequently, the dual-layer cooperative error control can switch error control modes without interrupting normal NoC operation, achieving high reliability and energy efficiency in a wide range of link error rates. To evaluate performance and energy consumption of different error control methods on a large size NoC, we propose a flexible parallel NoC simulator. Plug-and-play error control coding (ECC) insertion and some typical error control codecs have been implemented in the simulator. The flexible fault injection environment provided by our simulator assists error control exploration for specific purposes. In addition, we use C and message passing interface (MPI) languages to schedule parallel simulation on a multiprocessor server, addressing the prohibitive simulation time and system resource challenges caused by the large number of communicating nodes and extensive number of simulation variables"--Leaves iv-vi.

Error Control for Network-on-Chip Links

Error Control for Network-on-Chip Links
Title Error Control for Network-on-Chip Links PDF eBook
Author Bo Fu
Publisher Springer Science & Business Media
Total Pages 159
Release 2011-10-09
Genre Technology & Engineering
ISBN 1441993134

Download Error Control for Network-on-Chip Links Book in PDF, Epub and Kindle

This book provides readers with a comprehensive review of the state of the art in error control for Network on Chip (NOC) links. Coverage includes detailed description of key issues in NOC error control faced by circuit and system designers, as well as practical error control techniques to minimize the impact of these errors on system performance.

Reliability, Availability and Serviceability of Networks-on-Chip

Reliability, Availability and Serviceability of Networks-on-Chip
Title Reliability, Availability and Serviceability of Networks-on-Chip PDF eBook
Author Érika Cota
Publisher Springer Science & Business Media
Total Pages 220
Release 2011-09-23
Genre Technology & Engineering
ISBN 1461407915

Download Reliability, Availability and Serviceability of Networks-on-Chip Book in PDF, Epub and Kindle

This book presents an overview of the issues related to the test, diagnosis and fault-tolerance of Network on Chip-based systems. It is the first book dedicated to the quality aspects of NoC-based systems and will serve as an invaluable reference to the problems, challenges, solutions, and trade-offs related to designing and implementing state-of-the-art, on-chip communication architectures.

Asynchronous On-Chip Networks and Fault-Tolerant Techniques

Asynchronous On-Chip Networks and Fault-Tolerant Techniques
Title Asynchronous On-Chip Networks and Fault-Tolerant Techniques PDF eBook
Author Wei Song
Publisher CRC Press
Total Pages 381
Release 2022-05-10
Genre Computers
ISBN 1000578828

Download Asynchronous On-Chip Networks and Fault-Tolerant Techniques Book in PDF, Epub and Kindle

Asynchronous On-Chip Networks and Fault-Tolerant Techniques is the first comprehensive study of fault-tolerance and fault-caused deadlock effects in asynchronous on-chip networks, aiming to overcome these drawbacks and ensure greater reliability of applications. As a promising alternative to the widely used synchronous on-chip networks for multicore processors, asynchronous on-chip networks can be vulnerable to faults even if they can deliver the same performance with much lower energy and area compared with their synchronous counterparts – faults can not only corrupt data transmission but also cause a unique type of deadlock. By adopting a new redundant code along with a dynamic fault detection and recovery scheme, the authors demonstrate that asynchronous on-chip networks can be efficiently hardened to tolerate both transient and permanent faults and overcome fault-caused deadlocks. This book will serve as an essential guide for researchers and students studying interconnection networks, fault-tolerant computing, asynchronous system design, circuit design and on-chip networking, as well as for professionals interested in designing fault-tolerant and high-throughput asynchronous circuits.

Designing Reliable and Efficient Networks on Chips

Designing Reliable and Efficient Networks on Chips
Title Designing Reliable and Efficient Networks on Chips PDF eBook
Author Srinivasan Murali
Publisher Springer Science & Business Media
Total Pages 200
Release 2009-05-26
Genre Technology & Engineering
ISBN 1402097573

Download Designing Reliable and Efficient Networks on Chips Book in PDF, Epub and Kindle

Developing NoC based interconnect tailored to a particular application domain, satisfying the application performance constraints with minimum power-area overhead is a major challenge. With technology scaling, as the geometries of on-chip devices reach the physical limits of operation, another important design challenge for NoCs will be to provide dynamic (run-time) support against permanent and intermittent faults that can occur in the system. The purpose of Designing Reliable and Efficient Networks on Chips is to provide state-of-the-art methods to solve some of the most important and time-intensive problems encountered during NoC design.