Program

The 14th IEEE Workshop on Silicon Errors in Logic – System Effects (SELSE 2018)

3-4 April 2018, Northeastern University, Boston, Massachusetts

 

Program 2018

Day 1 – April 3, 2018

08:00 – 08:45 Breakfast and Registration

08:45 – 09:00 Welcome Remarks: SELSE General and Program Chairs

09:00 – 10:00 Session I: Keynote Talk (Chair: Karthik Pattabiraman)

          Title: Automotive Reliability doesn’t drive itself [Site]

          Speaker: Celine Geiger, Waymo

10:00 – 10:30 Coffee Break

10:30 – 12:00 Session II: Evaluation Techniques for Resilient Systems (Chair: Laura Monroe)

  • Hamartia: A Fast and Accurate Error Injection Framework, Chun-Kai Chang, Sangkug Lym and Mattan Erez
  • Analyzing the Vulnerability of Vector-Scalar Execution on Data-Parallel Architectures, Charu Kalra, Fritz Previlon, Xiangyu Li, Norman Rubin and David Kaeli
  • A 807 Mrad total dose tolerance of an optically reconfigurable gate array VLSI, Takumi Fujimori and Minoru Watanabe

12:00 – 13:00 Lunch

13:00 – 14:00 Session III: Keynote Talk (Chair: Siva Hari)

          Title: Cost-effective reliability – trade-offs & challenges. Slides: [PPTX]

          Speaker: Arijit Biswas, Intel Corporation

14:00 – 15:30 Session IV: Poster Session (Chair: Alan Wood) and Coffee Break

  • Phase Transition Material assisted circuits for improved soft error tolerance, Sai Subrahmanya Teja Nibhanupudi and Jaydeep Kulkarni
  • An Architectural-Level Fault Injector for Kepler GPUs, Lucas Fernando Weigel Weigel, Paolo Rech and Philippe Navaux
  • Compression with Multi-ECC: Enhanced Error Resiliency for Magnetic Memories, Irina Alam, Saptadeep Pal and Puneet Gupta
  • Parity++: Lightweight Error Correction for Last Level Caches, Irina Alam, Clayton Schoeny, Lara Dolecek and Puneet Gupta
  • Identifying Critical Variables Using Algorithmic Differentiation for a Realistic Fault Model, Harshitha Menon, Chun-Kai Chang, Kathryn Mohror and Mattan Erez
  • Characterization of the Impact of Soft Errors on Iterative Methods, Burcu Mutlu, Gokcen Kestor, Joseph Manzano, Osman Unsal and Sriram Krishnamoorthy
  • A 807 Mrad total dose tolerance of an optically reconfigurable gate array VLSI, Takumi Fujimori and Minoru Watanabe

15:30 – 17:00 Session V: Hardware Fault Mitigation Techniques (Chair: Jon Stephan, Intel)

  • Low Voltage SRAM with Fault Mitigation Techniques for Energy-Efficient Convolutional Neural Networks, Xiao Shi, Zhongmao Sun, Yunxuan Yu, Jun Yang, Longxing Shi and Lei He
  • Compression with Multi-ECC: Enhanced Error Resiliency for Magnetic Memories, Irina Alam, Saptadeep Pal and Puneet Gupta
  • Parity++: Lightweight Error Correction for Last Level Caches, Irina Alam, Clayton Schoeny, Lara Dolecek and Puneet Gupta

17:00 – 17:45 SELSE Business Meeting

18:00              Reception and Banquet

 

 

Day 2 – April 4, 2018

08:00 – 09:00 Breakfast

09:00 – 10:30 Session VI:  Software Fault Mitigation Techniques (Chair: David Kaeli)

  • Low Cost Transient Fault Protection Using Loop Output Prediction, Sunghyun Park, Shikai Li and Scott Mahlke
  • Identifying Critical Variables Using Algorithmic Differentiation for a Realistic Fault Model, Harshitha Menon, Chun-Kai Chang, Kathryn Mohror and Mattan Erez
  • Automated Data Flow Protection for Software Fault Tolerance on Microcontrollers, Matthew Bohman, Benjamin James, Michael Wirthlin, Heather Quinn and Jeffrey Goeders

10:30 – 11:00 Coffee Break

11:00 – 12:00 Session VII: Panel Discussion (Chair: Vilas Sridharan, AMD)

          Topic: What Resilience Will The Market Pay For?

          Panelists: Arijit Biswas (Intel Corporation), Nathan DeBardeleben (LANL), Ken LaBel (NASA),  Shubu Mukherjee (Cavium)

12:00 – 13:00 Lunch

13:00 – 14:00 Session VIII: Keynote Talk (Chair: Paolo Rech)

          Title: NASA and COTS Electronics: Past Approach and Successes – Future Considerations

          Speaker: Ken Label, NASA

14:00 – 14:15 Break

14:15 – 15:45 Session IX: Fault Detection and Mitigation (Chair: Nathan DeBardeleben)

  • Low-Cost, High-Fidelity SRAM Dosimeter for Multi-Spectra Neutron Detection, Kai Jiang, Derek Wright, Manoj Sachdev and Ewart Blackmore
  • Using Partial TMR to Improve SER Performance of a Commercial FPGA-Based Networking System, Andrew Keller, Michael Wirthlin, Jared Anderson, Shi-Jie Wen, Richard Wong, Feng Cao, Yang Pan and Yongtao Jiang
  • Characterization of the Impact of Soft Errors on Iterative Methods, Burcu Mutlu, Gokcen Kestor, Joseph Manzano, Osman Unsal and Sriram Krishnamoorthy

15:45 – 16:45 Session X: Random Access (Chair: Paolo Rech)

This session will consist of several 5-10 mins presentations. All the participants will be given an opportunity to present interesting ideas, observations, trends, and/or summaries of completed/on-going projects in this session. This can be a great way to share your views/recent findings. Please signup here by April 4, 2 pm, if you want to present in this session.

16:45 – 17:00 Closing Remarks

Top


Panel Topic: What Resilience Will the Market Pay For?

Abstract: The SELSE panel will consist of position statements from silicon providers and silicon consumers, followed by a group discussion. The question for the silicon providers is their view on what resilience features they can provide from a business perspective – i.e. what do their customers ask for and how much will their customers pay? The question for the silicon consumers is the reverse – what types of resilience features are they willing to pay for?

Panelists:

  • Arijit Biswas, Intel Corporation
  • Nathan DeBardeleben, LANL
  • Ken LaBel, NASA
  • Shubu Mukherjee, Cavium

Top


Keynote speaker: Celine Geiger, Waymo

Title: Automotive Reliability doesn’t drive itself [Site]

Abstract: Reliability is one of the key factors to make self-driving technologies a success. It plays a crucial role towards trusting a product and adopting the technology.
The presentation will look into questions like How is Reliability done in the automotive industry? What reliability challenges has the automotive industry faced in the past? How has automotive reliability changed over time and what reliability challenges are ahead of us? In the talk automotive reliability will also be highlighted from a supplier, OEM and customer point of view and give an overview of development process and the reliability challenges with that as well as look at aspects on how to address the challenges.

CelineGeiger

Bio: Celine Geiger received her Diploma degree in Mechanical Engineering from the University of Stuttgart, in 2010. She joined Bosch in 2011 and worked at Bosch as a Reliability Engineer in the field of automotive electronics. In 2013 she started to work for Tesla and worked as a Senior Reliability Engineer, transitioned to the role of an Associate Manager in 2015 and managed a team of 7 people there. In April 2016 Celine moved from Tesla to X/ now Waymo and is currently working there on the self-driving car as the Technical Lead, Product Reliability. She is focused on design for reliability activities and her working fields include test planning and test specification, developing design guidelines and specs, improving and developing the FMEA process as well as running the board level reliability testing. In her free time Celine likes making her own Pretzels and learning about new technologies.

Top


Keynote speaker: Arijit Biswas, Intel Corporation

Title: Cost-effective reliability – trade-offs & challenges

Abstract: Reliability is a critical pillar of modern silicon design.  And it can also be very costly.  Candid discussions around the reasons for and types of trade-offs that need to be made can become complicated by overlapping requirements from markets, use cases, workloads and even realities of design resourcing.  This keynote will discuss some of these complications and offer some ideas and observations about what can be done about them.  We discuss reliability beyond quantifiable measurements as an implicit pact with customers and end users.  It will also provide examples from the tech industry where we have been successful at overcoming some of these challenges and what it took to do so as well as promising paths that have not been sufficiently explored.

ArijitBiswas

Bio: Arijit Biswas is a Principal Engineer at Intel Corporation leading the Technologies for Reliability & Usage team in Intel’s Product Architecture Group for the last 10 years.  He graduated from Carnegie Mellon University in 1997 with a MS in electrical & computer engineering.  He has been involved in nearly all aspects of processor design over his 20+ year career including circuit & logic design, layout, debug, validation, architecture & micro-architecture.  He co-developed the concept of architectural vulnerability factor computations which has had a tremendous impact on soft error computation and was the technical lead behind the concepts for Intel Turbo Boost Max 3.0 technology.  Arijit is currently a chief architect for Intel’s Xeon server processor family.

Top


Keynote speaker: Kenneth A. LaBel, NASA Electronic Parts and Packaging (NEPP) Program Manager

Title: NASA and COTS Electronics: Past Approach and Successes – Future Considerations

Abstract: NASA has a long history of using commercial grade electronics in space. In this talk, a brief history of NASA’s trends and approaches to commercial grade electronics focusing on processing and memory systems will be presented. This will include providing summary information on the space hazards to electronics as well as NASA mission trade space. We will also discuss developing recommendations for risk management approaches to Electrical, Electronic and Electromechanical (EEE) parts and reliability in space. The final portion of the talk will discuss emerging aerospace trends and the future for COTS usage.

selse1

Bio: Kenneth A. LaBel (BES in EECS with minor in Mathematical Sciences, 1983). His career at NASA has included development of:

  • Fault tolerant computing,
  • Hardware/software for ground systems,
  • Advanced technology,
  • Spaceflight hardware,
  • Systems engineering,
  • Radiation hardness assurance/research for >50 NASA projects, and,
  • Radiation effects and reliability assurance leadership.

He is currently co-manager of the NASA Electronic Parts and Packaging (NEPP) Program as well as senior staff engineer for the Radiation Effects and Analysis Group (REAG) at NASA GSFC. He has won multiple awards at NASA including both the prestigious National Resource and Moe I. Schneebaum Awards. Mr. LaBel has published over 100 papers as author/co-author (multiple best papers), has taught multiple short courses at IEEE Nuclear and Space Radiation Effects Conference (NSREC), Hardened Electronics and Radiation Technology (HEART) Conference, Radiation Effects on Components and Systems (RADECS) Conference, and others, and is a recognized expert in radiation effects systems engineering. He was the 2009 IEEE NSREC Short Course Chair and was the 2012 IEEE NSREC General Chair.

Top