SLICC: Self-Assembly of Instruction Cache Collectives

Transactional processing systems typically assign transactions to worker threads in a “random” fashion, with each thread running on a single core of a multicore system. The instruction footprint of a typical transaction does not fit into a single L1-I cache, thus thrashing the cache and incurring a high instruction miss rate, leading to as much as 70-80% instruction stalls and less than 1 instruction per cycle on modern commodity, typically 4-issue, machines.
SLICC [1, 2]: OLTP workloads are known to have large instruction footprints that foil existing L1 instruction caches resulting in poor overall performance. Prefetching can reduce the impact of such instruction cache miss stalls; however, state-of-the-art solutions require large dedicated hardware tables on the order of 40KB in size. SLICC is a programmer transparent, low cost technique to minimize instruction cache misses when executing OLTP workloads. SLICC migrates threads, spreading their instruction footprint over several L1 caches. It exploits repetition within and across transactions, where a transaction’s first iteration prefetches the instructions for subsequent iterations or similar subsequent transactions. SLICC reduces instruction misses by 58% on average for TPC-C and TPCE, thereby improving performance by 68%. When compared to a state-of-the-art prefetcher, and notwithstanding the increased storage overheads (42

References

  1. Atta, I., Tözün, P., Ailamaki, A. and Moshovos, A. (2012) SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads. Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture. [detailed record]
  2. Atta, I., Tözün, P., Ailamaki, A. and Moshovos, A. (2012) Reducing OLTP Instruction Misses With Thread Migration. Proceedings of the 8th International Workshop on Data Management on New Hardware (DaMoN 2012). [detailed record]