Back to Results
First PageMeta Content
Science / Systems engineering / United States Department of Energy National Laboratories / Fault-tolerant system / Software / Application checkpointing / Fault injection / Resilience / Psychological resilience / Computing / Fault-tolerant computer systems / Software quality


Fault Management Workshop Final Report August 13, 2012 U.S. Department of Energy Fault Management Workshop BWI Airport Marriott, Maryland June 6, 2012
Add to Reading List

Document Date: 2012-12-10 09:26:24


Open Document

File Size: 153,39 KB

Share Result on Facebook

City

Austin / Catonsville / /

Company

Checkpoint / IBM / Argonne National Laboratory / Lawrence Berkley National Laboratory / Los Alamos National Laboratory / Amazon / Google / High Performance Computing Systems / Pacific Northwest National Laboratory / Oak Ridge National Laboratory / Lawrence Livermore National Laboratory / Sandia National Laboratories / Lawrence Berkeley National Laboratory / Future-Generation HighPerformance Computing Systems / HP / DOE National Laboratories / Intel / /

Facility

University of Tennessee / University of Connecticut / DOE complex / University of Illinois / Scalable Checkpoint/Restart Library / Advanced Reactor Simulation Advanced Reactor / University of Chicago / DOE laboratory / BWI Airport Marriot hotel / Indiana University / North Carolina State University / Louisiana Tech University / Ohio State University / University of Texas / /

IndustryTerm

time-to-solution / run-time systems / energy generation technologies / proactive fault tolerance technology / scientific applications / energy constraints / array applications / large-scale computing facilities / legacy applications / conduct fault management / particular algorithm / parallel file system software / science applications / large computing facilities / large systems / resilience technologies / software stack / energy consumption / mission critical applications / software/system/machine / storage systems / time-shared processors / faults occurring on systems / process-level software redundancy using state-machine replication / software engineering techniques / system-level virtualization technologies / resilient algorithms / fault management / nuclear security applications / petascale systems / /

OperatingSystem

Linux / /

Organization

Resilience Technical Council / DOE’s office of Science and National Nuclear Security Administration / U.S. Department of Energy / University of Connecticut / Northwest National Lab / Louisiana Tech University / Indiana University / Pacific Northwest National Lab / University of Illinois / DOE’s office of Science / the University of Chicago / National Nuclear Security Administration / University of Tennessee / Knoxville / the Ohio State University / North Carolina State University / University of Texas / /

Person

Mike Heroux / Sriram Krishnamoorthy / Rob Ross / Andrew Chien / Eric Roman / Martin Schulz / Christian Engelmann / Larry Kaplan (Cray) / Marc Snir / Lee-Ann Talley / Al Geist / Greg Bronevetsky / Bert Still / Lucy Nowell / Robert Clay / Nathan DeBardeleben / Bob Lucas / /

Position

Executive / representative / programmer / /

ProvinceOrState

Texas / Maryland / Illinois / Connecticut / Tennessee / /

Technology

radiation / proactive fault tolerance technology / time-shared processors / supporting resilience technologies / two-level algorithm / Linux system / Linux / one particular algorithm / system-level virtualization technologies / operating system / one processor / automatic identification / energy generation technologies / simulation / /

SocialTag