Yesterday I mentioned the strange way the thread pool was described and implemented in the (most excellent) book Ruby Cookbook. I’m not sure if I’m allowed to “re-print” portions of the source code, but the code snippet in question is easily accessible online. Go to the site above and click on Tests column for the chapter 20, section 7: Limiting Multithreading with a Thread Pool. To make the test more interesting, change the end of the test to run 100 threads, and sleep for only 1 (one) second (I do the same in the code that follows).
Reacting based on instinct acquired while developing with C++ and C#/.NET, it does not seem smart creating all those threads. The classic definition of a thread pool is a small group of threads that you schedule the work to. But threads in C++ and C# are native threads whereas Ruby threads are green threads, so my instinct could be wrong.
Yesterday’s small test that brute-force opens hundred threads hints at one obvious cost to spawning all those threads – higher memory consumption, but the only way to confirm this suspicion is to implement a classic thread pool and compare. Here it is:
require 'thread'
class ClassicThreadPool
def initialize(num_threads)
@blocks = []
@mutex = Mutex.new
@cond = ConditionVariable.new
@threads = []
1.upto(num_threads) do
@threads << Thread.new do
run
end
end
end
def dequeue
@mutex.synchronize do
@cond.wait(@mutex) if @blocks.empty?
@blocks.shift
end
end
def enqueue(&block)
@mutex.synchronize do
@blocks << block
@cond.signal
end
end
def run
while(true)
blk = dequeue
break if blk.nil?
begin
blk.call
rescue => e
puts "Exception #{e} during job processing"
end
end
end
def shutdown
1.upto(@threads.length) { enqueue }
@threads.each { |t| t.join }
end
end
#Now test it
pool = ClassicThreadPool.new(3)
1.upto(100) do |i|
pool.enqueue do
print "Job #{i} started.\n"
sleep(1)
print "Job #{i} complete.\n"
end
end
pool.shutdown
The code is very much alike the original code, except that for the pool of size P, only P threads are created. Each code block (job) scheduled to run is saved in an array and then executed on the first free thread from the pool. Threads in a pool run an efficient loop that waits for the available jobs and then executes them. When it’s time to shutdown, the pool schedules P nil blocks, allowing the infinite thread loops to exit, then the pool just joins the threads waiting for all of them to exit.
The result is very clear: this approach is far more efficient memory-wise. It starts and ends consuming about ~3.1MB of memory. Comparable example from the book starts with ~7.3MB dropping to ~3.7MB at the end.
Moral of the story: creating threads is never cheap, even on systems that implement green threads. You don’t incur the cost of setting up the real OS thread, but you do waste memory. Just because it’s easier to spawn a thread in Ruby doesn’t mean you should do so lightly.
On the other hand, if there are no bursts of activity, so there are only a few more threads than the pool would normally execute at any given point in time, then the difference between these two approaches is academic. Still, I’d rather use the solution that scales well under all circumstances than the one that only works well under specific conditions 
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5