Time simulation of digital circuits is one of the core tasks in electronic design automation with applications in power simulation, timing validation, diagnosis or vulnerability analysis under variation. For most applications, a massive amount of circuit instances with varying parameters, inputs or faults has to be simulated. This fact provides great opportunities for parallelism, but getting the best performance out of a parallel architecture is still not a trivial task.

Modern computing server contain multiple processors, each of them consisting of various compute units. The memory subsystem consists of multiple ports to the RAM chips and many cache levels. Mastering the trade-offs between memory bandwidth, memory latency and all available compute resources is essential to achieve the best performance for a parallel algorithm.

The task is to

At the end, an optimized version of the algorithm has to be presented which shows a verified, near-optimal utilization of the parallel architecture.