Does anyone know of a simple cache simulator for multi-processors? I’m curious what the performance of various applications would be like if you had hardware level DSM across a low-latency interconnect like Infiniband.
Would handling the cache-line coherency issues in hardware be a big difference over a software DSM?