Yesterday I mentioned the strange way the thread pool was described and implemented in the (most excellent) book Ruby Cookbook. I’m not sure if I’m allowed to “re-print” portions of the source code, but the code snippet in question is easily accessible online. Go to the site above and click on Tests column for the chapter 20, section 7: Limiting Multithreading with a Thread Pool. To make the test more interesting, change the end of the test to run 100 threads, and sleep for only 1 (one) second (I do the same in the code that follows).

Reacting based on instinct acquired while developing with C++ and C#/.NET, it does not seem smart creating all those threads. The classic definition of a thread pool is a small group of threads that you schedule the work to. But threads in C++ and C# are native threads whereas Ruby threads are green threads, so my instinct could be wrong.

Yesterday’s small test that brute-force opens hundred threads hints at one obvious cost to spawning all those threads – higher memory consumption, but the only way to confirm this suspicion is to implement a classic thread pool and compare. Here it is:

require 'thread'

class ClassicThreadPool
  def initialize(num_threads)
    @blocks = []
    @mutex = Mutex.new
    @cond = ConditionVariable.new
    
    @threads = []
    1.upto(num_threads) do
      @threads << Thread.new do
        run
      end
    end
  end
  
  def dequeue
    @mutex.synchronize do
      @cond.wait(@mutex) if @blocks.empty? 
      @blocks.shift
    end
  end
  
  def enqueue(&block)
    @mutex.synchronize do
      @blocks << block
      @cond.signal
    end
  end
  
  def run
    while(true)
      blk = dequeue
      break if blk.nil?
      begin
        blk.call
      rescue => e
        puts "Exception #{e} during job processing" 
      end
    end
  end
  
  def shutdown
    1.upto(@threads.length) { enqueue }
    @threads.each { |t| t.join }
  end
end

#Now test it

pool = ClassicThreadPool.new(3)

1.upto(100) do |i|
  pool.enqueue do
    print "Job #{i} started.\n"
    sleep(1)
    print "Job #{i} complete.\n"
  end
end

pool.shutdown

The code is very much alike the original code, except that for the pool of size P, only P threads are created. Each code block (job) scheduled to run is saved in an array and then executed on the first free thread from the pool. Threads in a pool run an efficient loop that waits for the available jobs and then executes them. When it’s time to shutdown, the pool schedules P nil blocks, allowing the infinite thread loops to exit, then the pool just joins the threads waiting for all of them to exit.

The result is very clear: this approach is far more efficient memory-wise. It starts and ends consuming about ~3.1MB of memory. Comparable example from the book starts with ~7.3MB dropping to ~3.7MB at the end.

Moral of the story: creating threads is never cheap, even on systems that implement green threads. You don’t incur the cost of setting up the real OS thread, but you do waste memory. Just because it’s easier to spawn a thread in Ruby doesn’t mean you should do so lightly.

On the other hand, if there are no bursts of activity, so there are only a few more threads than the pool would normally execute at any given point in time, then the difference between these two approaches is academic. Still, I’d rather use the solution that scales well under all circumstances than the one that only works well under specific conditions

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
0 Comments