Review: Dark Silicon and the End of Multicore Scaling

Paper link: p365-esmaeilzadeh

Premise

Single core scaling stopped some number of years ago. But we’re okay, because we now have the “multi-core revolution” (I realllly hate that phrase). Or are we?

It turns out that another, very real wall is blocking future progress — power density. Already CPU designs strive mightily to shutdown or slowdown underutilized circuitry to save power. But we’re reaching the point where processors simply won’t ever be able to run “full-out”.

This paper explores the future of scaling by extrapolating based on a wide range of processors over time, and they find, rather compellingly, that we’re pretty much boned.

Details

The paper is pretty dense, but it boils down to the creation of a model parameterized by processor features:

  • number of cores
  • number of threads
  • CPU frequency
  • L1/L2/DRAM cache size, speed, miss rate
  • DRAM fetch width

and code features:

  • % code that is parallel
  • % read instructions

You can fit this model based on existing processors and their benchmark performance, as well as their power consumption and die area.

Once the model is created, now the fun starts — we can see how performance will scale with future process improvements (moving to 32, 22, 16…nm), or if we assume highly parallel code, we can see how to optimize our CPU design (lots of cores)!

The short of it

If we make optimistic assumptions for how power consumption drops with process scaling, we still end up with the sobering conclusion that we’re rapidly reaching the point where we won’t be able to power all of the parts of a chip. 128 cores aren’t as useful if we can’t keep them all running.

On the plus side, I can buy 1000 watt power supplies now, so I’m looking forward to my 16 processor, 16 core machine of the future. Welcome to the multiprocessor revolution!

Posted in reviews | Leave a comment

fun with shared libraries — version GLIBC_2.14 not found

The shared library system for linux, especially when it comes to libc, is (choose one): archaic|complex|awesome. Today in trying to run some software on our local cluster, I encountered this lovely error:

/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not found

Why was this occurring? For some reason, there was a version bump in the glibc for the memcpy instruction. How memcpy’s interface could possibly changed in a non-backward-compatible fashion, I don’t know. Still, since my local machine has a later libc then our cluster machines, I’m effectively boned.

There is a way around this, namely, I can ask the compiler to emit references to a specific, older version:

__asm__(".symver memcpy,memcpy@GLIBC_2.2.5");

Unfortunately, this code has to appear before any other uses of memcpy! Rather then update a whole lot of code, I took advantage of the -include option for gcc. I simply created a file with my desired symbol override:

# gcc-preinclude.h 

__asm__(".symver memcpy,memcpy@GLIBC_2.2.5");

and forced the build (here, autotools) to always include that before any other code:

./configure CC=gcc -include /path/to/gcc-preinclude.h ...

And away we go. A full rebuild and I’m back to a usable program again. Hurray!

Posted in Uncategorized | 3 Comments

Review: DThreads: Efficient Deterministic Multithreading

I’ve decided to try and be more pro-active about reading papers from the community. Since I tend to forget everything I read, it’s best I write down somewhere my thoughts for later reference — I’ve decided to post these reviews here in case they prove useful to others.

PDF of the paper

Overview:

This paper follows one of the latest fads in systems (NB: academics have fads just like fashionistas) — deterministic threading. The main idea behind all of these papers is to try and allow users to make use of multi-core/multi-threaded programming, but at the same time have it be deterministic. This is helpful as it ensures the behavior we see when testing is the same as what we see at runtime, and it also helps make it easier to recreate test failures.

This is a non-trivial problem, and it has a lot of work behind it already. The contribution with dthreads is that it, (a) claims high performance, and (b) provides a drop-in replacement for existing threads.

How does it work:

There’s a standard trick for d-threading, and DThreads uses it as well. It’s somewhat a kin to transactions in databases. First, split each thread into a separate world that can’t communicate with the other threads. Buffer all of the updates that would go to shared memory. Then, at a synchronization point (mutex lock, condition var, etc.), wait for all of the threads to finish their work, and then apply the shared memory changes in a deterministic fashion.

The DThreads way of handling this is to run each thread as a separate process. Whenever a commit point is reached, the processes determine which parts of memory they’ve changed, and commit them to shared memory at that time.

There are a number of things that need to be handled to make this all possible — you need to provide a deterministic malloc, you have to ensure threads are created and destroyed in a deterministic way, etc.

Why aren’t we using it today?

Two reasons, really.

Speed

DThreads is faster then previous work on the subject, but still can end up a lot slower then regular threading. This isn’t unexpected — you have to do more work, so you end up paying for it.

Correctness

DThreads, while being a drop-in replacement for pthreads, isn’t really a drop-in replacement for pthreads. Unfortunately for deterministic threading proponents, this type of code:

Thread 1:

while not done:
  sleep(0)

Thread 2:
... do some work ...
done = 1

where synchronization is handled outside the scope of the threading library is all too common in the wild. This forces the use of more expensive techniques then DThreads, which keeps determistic threading, for the moment, an academic exercise.

What did I think of the paper

It’s well written (which I expect of all SOSP papers), and presents a bunch of interesting tricks for improving performance and making things work. The fact that it requires all synchronization be performed through pthreads calls is on the face of it, reasonable, but sadly, it is far from realistic when it comes to running everyday applications.

Still, fun to read.

Posted in reviews | Leave a comment

email habits over time

I was curious if my sleeping/waking habits had really changed over the years – I definitely don’t feel I work as late now as when I was 22, but it’s hard to tell. To test this, I looked over all of the timestamps of mail I’ve sent in the past few years and tried to make a pretty graph.

I’m not sure how meaningful it is, but thanks to ggplot, it is pretty, at least.

The plotting code is straightforward — try it out!

library("ggplot2")
library("reshape")

getData = function() {
  SOURCE=Sys.glob('/home/power/.thunderbird/*/*/*/*/Sent Mail')
  return(readLines(SOURCE))
}

getMatches = function(data) {
  matched_lines = grep('^Date:.*, .*-0\\d+', data, value=T)
  return(gsub('Date:.*, (.*-0\\d+).*', '\\1', matched_lines))
}

lines = getData()
matches = getMatches(lines)
dates = lapply(matches, function(f) {strptime(f, format="%d %b %Y %H:%M:%S %z")})
df = ldply(dates, unlist)
df$year = df$year + 1900
df_counts = count(df, vars=c("year", "hour"))

for (year in df_counts$year) {
  m = df_counts$year == year
  df_counts$freq[m] = df_counts$freq[m] / sum(df_counts$freq[m])
}

p = ggplot(df_counts, aes(x=year, y=hour)) +
  scale_x_continuous(limits=c(2004, 2012)) +
  scale_y_datetime() +
  geom_tile(aes(fill=freq)) +
  scale_fill_gradient()

show(p)
Posted in R | Leave a comment

robust pdf title extraction

I end up with a lot of PDF documents lying around – at last glance, this amounted to a few thousand files. Unfortunately, most of these documents end up with rather obscure names, making it rather annoying to find what I want, or what is interesting.  For example, these are the documents I’ve recently downloaded:

wodet3-paper12.pdf
jong_afst.pdf
tut_gpu_2012_03.pdf
lecture1-1.pdf
natella_binary_sfi_edcc_2012.pdf
TR-Farrukh-58.pdf
730959.pdf
NLSEmagic_Paper.pdf
M23584378H1770Q2.pdf
G89T37P10W263075.pdf
journal_online.pdf
manus_Jour-INFORMATION-Camera.pdf
12011.VitekJan.Paper.pdf
R3X8722476T2X278.pdf
1203.0321.pdf

I previously tried to organize everything using something like Papers, which is a lovely product, but still required effort from me and isn’t very useful now that I no longer have a Mac.

I’ve also tried to rectify this situation via half-hearted attempts at using pdftotext, and grabbing the first 10 words of text, but more often then not I was left with more incomprehensible garbage.

Today, I had some spare time, and far too much interest in this problem, but I managed to come up with an easy and fairly effective solution.  It also resembles a rube-goldberg machine.  After digging around for various pdf conversion utilities, I discovered that pdftohtml not only generated reasonable output, but it could also be set to output to an easily parsed xml format.  From there it was a simple bit of BeautifulSoup to get nice titles for most of my documents:

from BeautifulSoup import BeautifulStoneSoup
import subprocess
import sys
import tempfile

def extract_pdf_title(pdfdata):
  src_file = tempfile.NamedTemporaryFile(delete=True)
  src_file.write(pdfdata)
  src_file.flush()

  try:
    command = ' '.join(['pdftohtml', '-c -s -i', '-stdout', '-f 1', '-l 1',
                        '-xml', src_file.name, '/tmp/pdftoxml'])
    xmlout, xmlerr = subprocess.Popen(command, shell=True,
                                      stdout=subprocess.PIPE,
                                      stderr=subprocess.STDOUT).communicate('')
    xml_data = open('/tmp/pdftoxml.xml').read()
  except:
    print 'Error in pdftohtml '
    return ''

  dom = BeautifulStoneSoup(xml_data)
  text = dom.findAll('text')

  # let the title be the first set of text elements until we see a change in font
  title_text = ''
  last_font = None
  for t in text:
    if last_font is not None and t.get('font') != last_font:
      if len(title_text) > 5: break
      else: title_text = ''

    title_text += t.getText().encode('utf-8') + ' '
    last_font = t.get('font')

  return title_text

if __name__ == '__main__':
  for f in sys.argv[1:]:
    print f, ' -- ', extract_pdf_title(open(f).read())
view raw pdfutil.py This Gist brought to you by GitHub.
Posted in R | Leave a comment

tcp timelines with ggplot2

I’ve come across the need to analyze TCP flows from time to time, and while scripts like flowtime and EasyTimeline are nice, they aren’t really, well, pretty.  ggplot2, on the other hand is, and it turns out to be really easy to get nice, somewhat useful plots. Here’s an example conversation between my local browser and nytimes.com: (warning, gigantic) You can easily see the importance of fast DNS resolution, with almost 2 seconds of time spent idle waiting for the first resolver hit.  Then we see a large number of connections opened up, as modern browsers and sites try to work around the small TCP initial congestion window.  Finally there’s the petering out of the connections and the final FIN packets as the browser finishes the page. It’s at least slightly more informative then staring at wireshark dumps, and it provides another excuse to practice my R. The code is pretty straightforward, and mostly dedicated to munging the tshark field output to make streams show up in a reasonable way:

require('ggplot2')
require('stringr')

FIELDS = c('frame.time_relative', 'frame.len',
           'ip.src', 'tcp.srcport', 'udp.srcport',
           'ip.dst', 'tcp.dstport', 'udp.dstport')
PCAP = 'nytimes.pcap'
TSHARK = paste('tshark','-E header=y', '-T fields')

data = read.csv(header=T, sep="\t", pipe(
  paste(TSHARK, '-r', PCAP,
        str_c(c('-e '), FIELDS, collapse=' '))))

# pick from yes or no, based on q.
select = function(q, yes, no) {
  q = as.logical(q)
  q[is.na(q)] <- F
  result <- rep(no, length.out = length(q))
  result[q] <- rep(yes, length.out = length(q))[q]
  return(result);
}

# Munge some fields -- use either the tcp or udp port for a given connection,
# and determine whether a packet is incoming or outgoing.
data = within(data, {
  srcport = select(tcp.srcport, tcp.srcport, udp.srcport)
  dstport = select(tcp.dstport, tcp.dstport, udp.dstport)
  direction = select(srcport < 1000, "incoming", "outgoing")
  server_ip = select(srcport < 1000, ip.src, ip.dst)
  local_port = select(srcport < 1000, dstport, srcport)
  server_port = select(srcport < 1000, srcport, dstport)
  stream = paste(local_port, paste(server_ip, server_port, sep=":"))
})

p = ggplot(data=data, aes(x=frame.time_relative, y=stream,
                          color=direction, size=frame.len)) +
  geom_point(position=position_jitter(width=0, height=0.2)) +
  scale_fill_hue() +
  scale_size(range=c(0.5, 5))

show(p)

Posted in R | Leave a comment