The 13th IEEE Workshop on Silicon Errors in Logic – System Effects (SELSE 2017)
SELSE-13: The 13th Workshop on Silicon Errors in Logic – System Effects
21-22 March 2017, Northeastern University, Boston, Massachusetts
Registration for the SELSE workshop is open:
SELSE 2017 will feature three keynote speeches by experts from academia and industry.
Karthik Pattabiraman, Associate Professor of Electrical and Computer Engineering (ECE) at the University of British Columbia (UBC) in Vancouver will present the latest developments on low overhead software approaches for protecting commodity computer systems from hardware errors.
AMD’s Vilas Sridharan will focus on learnings about DRAM and SRAM reliability gathered from systems in the field.
Michael Carbin, Assistant Professor of Electrical Engineering and Computer Science at MIT will present challenges related to the design of programming systems that deliver improved performance and resilience by incorporating approximate computing and self-healing.
We will have a very interesting panel on the topic of new challenges on radiation testing with experts from academia and industry.
Random Access Session
SELSE 2017 will feature a “random access” session in which any registered participant (time permitting) may give a very brief talk to highlight some recent advance or issue of interest. The intent is to give the opportunity for an informal platform within the community.
Please do not forget to book your hotels early, as the hotel prices can increase closer to the SELSE dates. You can find a list of nearby hotels on the SELSE web page.
Transportation and Local Information:
Connect with SELSE on social media!
Keynote talk by Karthik Pattabiraman
Tolerating Hardware Faults in Commodity Software: Problems, Solutions and a Roadmap
Abstract: Commodity software is often designed with the assumption that the hardware is fault-free, and hence the software hardly ever needs to deal with hardware faults. However, this assumption is increasingly difficult to satisfy as CMOS devices scale to smaller and smaller sizes, and as manufacturing variations increase. In addition, traditional solutions such as guard-banding and dual modular redundancy (DMR) are challenging to apply in commodity systems due to stringent power and performance constraints. Therefore, there is a compelling need to develop low overhead software approaches for protecting commodity computer systems from hardware errors.
To address this need, many researchers have developed a wide variety of techniques spanning the software stack to tolerate hardware faults, over the last decade or so. Unfortunately, very few of these techniques have found their way back to practitioners, and most software developers continue to develop software without worrying about hardware faults. In this talk, I will examine some of the reasons why I believe this to be the case, and what we as a community can do about it. Along the way, I will share some anecdotes of our (often unsuccessful) attempts to perform tech transfer to industry. I will then conclude by providing an overview of the many research challenges and opportunities in this area.
Bio: Karthik Pattabiraman received his M.S and PhD. degrees from the University of Illinois at Urbana-Champaign (UIUC) in 2004 and 2009 respectively. After a post-doctoral stint at Microsoft Research (Redmond), Karthik joined the University of British Columbia (UBC) in 2010, where he is now an associate professor of electrical and computer engineering. Karthik’s research interests are in building error-resilient software systems, and in software engineering and testing. Karthik has won distinguished paper (or runner up) awards at the IEEE International Conference on Dependable Systems and Networks (DSN), 2008, the IEEE International Conference on Software Testing (ICST), 2013, at the IEEE/ACM International Conference on Software Engineering (ICSE), 2014 and the European Dependable Computing Conference (EDCC), 2015 and 2016. Karthik was the general chair for the IEEE Pacific Rim International Symposium on Dependable Computing (PRDC), 2013, and regularly serves on the organizing committees of the DSN and ISSRE conferences. Karthik is a senior member of the IEEE, and a member of the IFIP Working Group on Dependable Computing (10.4). Find out more about him at: http://blogs.ubc.ca/karthik
Keynote talk by Vilas Sridharan
Memory Errors in Modern Systems
Abstract: Hardware faults are commonplace, especially in memory subsystems consisting of DRAM and SRAM devices. These memory subsystems need to provide resilience techniques to tolerate these faults when deployed in mission-critical or high-reliability environments such as supercomputers or data centers. In order to design resilient memory systems, one must understand what faults are likely to occur. One mechanism to do this is to analyze what faults do occur in systems deployed in the field. In this talk, I will focus on learnings about DRAM and SRAM reliability gathered from systems in the field. I will also touch on issues involved in performing large-scale studies of systems in the field.
Bio: Vilas Sridharan works in the RAS (Reliability, Availability, and Serviceability) Architecture group at AMD, Inc., where he is responsible for defining the reliability features of all AMD server products. He received his Ph.D. and M.S.E. from the Department of Electrical and Computer Engineering at Northeastern University, and his B.S.E. in Computer Engineering from Princeton University in 2000. From 2000 – 2004, he worked in the SPARC server division at Sun Microsystems. His research focuses on the modeling of hardware faults and architectural and micro-architectural approaches to reliability and fault tolerance in high-performance microprocessors.