This paper presents Simba, a scalable deep-learning inference accelerator employing multi-chip-module-based integration and proposes three tail-latency-aware, non-uniform tiling optimizations targeted ...