Lambda expressions available in the new C# 3.0 compiler are mostly used in the context of LINQ queries, like this:
internal class Product {
internal int ID { get; set; }
}
var products = new List<Product>();
products.Add(new Product() { ID = 1 });
products.Add(new Product() { ID = 5 });
// continue to fill in products
var one1 = from p in products
where p.ID == 1
select p;
It’s not obvious, but the above query is equivalent to (also note that one1’s type is IEnumerable<Product>, not Product):
var one2 = products.Where(p => p.ID == 1);
The part p => p.ID == 1 is the lambda expression and in this case (lambdas are more general than a simple delegate substitute) can be substituted by a C# 2.0 feature called anonymous delegate:
var one3 = products.Where(delegate(Product p) { return p.ID == 1; });
One advantage of lambdas is already obvious: we didn't have to specify the type of the parameter of the delegate, plus there was no need to type return. To a certain extent, this is actually a feature of the new compiler (better type inference), but it does not work in all cases (it can’t), so sometimes (not here, but works nevertheless) you must qualify the parameter with a type, like this:
var one4 = products.Where((Product p) => p.ID == 1);
It is relatively easy to conclude from this that the syntax for lambdas with no parameters is likely:
() => /* some criteria, note that parenthesis are required */
This is probably obvious to many developers (especially those using the help files and not deducing things like this), but the reason I mention it here is I've seen many LINQ articles that use anonymous delegates as if the author did not know that lambdas can have no parameters too. For completion, here's how a syntax of an anonymous delegate with no parameters looks like:
delegate { /* statements */ }
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5
The age of multi-core processors is here. It’s not because we’ve just started getting multi-core processors but because most consumers today will get one without even knowing it. Just like most users today will possess 64–bit processors but because their OS is 32–bit they won’t know it, nor care.
We developers have to care if the multi-core is becoming mainstream because that is the only cheap way to scale. If your code is CPU bound you should try your best to distribute the work over as many cores as the machine has. This is not easy to solve right and a lot of work will be in the libraries anyway so why not create something all of us can use?
Microsoft is working on exactly this. The December CTP of Parallel Extensions is here and it’s looking really good. It’s just a library that you add to your project and all of the sudden there are several ways in which you can trivially parallelize your code.
The emphasis is on trivially – most experienced developers could do the same (or close) kind of work if given enough time but why invent hot water? Parallelism is here to stay, having a library in the .NET Framework is the way to go.
But all this theorizing will not convince you so I’ll give you a concrete example. Let’s say you have a server with lots of data (doesn’t matter which). This data is served to clients over a slow network, thus you’d like to transmit as little as possible. You decide that you should compute a hash of data so that when clients connect to you and send you the hash, if it matches, you won’t send them anything because the data is the same.
Computing a hash is not cheap though. In fact, let’s have a look at how fast it is on a modern 64–bit Core2 Duo running at 2.8GHz with plenty of DDR2 RAM running at 800MHz. The benchmark (whose source will follow shortly) hashes the data in the memory:
Generating 512 MB of random data...
....done.
Calculating hashes sequentually
MD5 hashes 371,82 MB/sec
MD5 hashes 379,82 MB/sec
SHA1 hashes 341,56 MB/sec
SHA256 hashes 68,31 MB/sec
SHA384 hashes 104,79 MB/sec
SHA512 hashes 105,11 MB/sec
The fastest one (MD5) crunches about 380MB/s. Since the memory throughput (theoretical) is about 6400MB/s, this code is CPU bound; if we had a faster processor we’d be able to hash more data. Using a simpler hash algorithm would also help, but let’s assume you only want to use what’s already available in the .NET BCL and you don’t care if the hash is cryptographically secure.
Let’s first take a look at the code:
using System;
using System.Diagnostics;
using System.Security.Cryptography;
using System.Threading;
namespace HashSpeedTest {
class Program {
private const int MB = 1024 * 1024;
private static int _sampleSize;
private static bool _parallelExecution;
private static readonly Stopwatch _stopWatch = new Stopwatch();
static void Main(string[] args) {
if (args.Length < 2 || !int.TryParse(args[0], out _sampleSize)
|| (args[1] != "par" && args[1] != "seq")) {
Console.WriteLine("Usage: HashSpeedTest <data_size> <style>");
Console.WriteLine("<data_size>: size of test data in megabytes (32, 64 or more)");
Console.WriteLine("<style>: 'par' for parallel and 'seq' for sequential execution (no quotes)");
return;
}
_sampleSize *= MB;
_parallelExecution = args[1] == "par";
byte[] data = new byte[_sampleSize];
Console.WriteLine("Generating {0} MB of random data...", _sampleSize / MB);
new RNGCryptoServiceProvider().GetBytes(data);
Console.WriteLine("...done.");
Console.WriteLine("Calculating hashes {0}", _parallelExecution ? "in parallel" : "sequentually");
string[] algorithms = new string[] { "MD5", "MD5", "SHA1", "SHA256", "SHA384", "SHA512" };
foreach (var algo in algorithms) {
Measure(data, algo);
}
}
private static void Measure(byte[] data, string algo) {
_stopWatch.Reset();
int iterations = _sampleSize / MB;
_stopWatch.Start();
if (_parallelExecution) {
Parallel.For(0, iterations, i => {
var hash = HashAlgorithm.Create(algo);
byte[] result = hash.ComputeHash(data, i * MB, MB);
});
}
else {
var hash = HashAlgorithm.Create(algo);
for (int i = 0; i < iterations; ++i) {
byte[] result = hash.ComputeHash(data, i * MB, MB);
}
}
_stopWatch.Stop();
double speed = 1000.0 / (_stopWatch.ElapsedMilliseconds / (double)iterations);
Console.WriteLine("{0} hashes {1:f2} MB/sec", algo, speed);
}
}
}
The main work is done in the Measure method. We use StopWatch class to measure passed time, then hash 1MB of data several times and then calculate the speed of hashing in MB/s. The Main method just prepares the list of names of hash routines, parses parameters, displays usage etc.
Now look at the if(_parallelExecution) branch – it’s almost identical to the non-parallel version. The only difference is usage of Parallel.For instead of keyword for and the fact that we create hasher object for each hash computation – since some of the calculations will be done in parallel, we don’t want to worry about the shared state of the hash algorithm.
What do we gain? Let’s see the results of the parallel run:
Generating 512 MB of random data...
....done.
Calculating hashes in parallel
MD5 hashes 660,65 MB/sec
MD5 hashes 719,10 MB/sec
SHA1 hashes 650,57 MB/sec
SHA256 hashes 132,47 MB/sec
SHA384 hashes 203,26 MB/sec
SHA512 hashes 203,42 MB/sec
Just as expected, we get about 190% of single core performance when we use both cores (excellent result)! Even better is that we almost didn’t have to do anything. But the real gain is this: put this code on a 4–core machine and it will be almost twice as fast – we get scalability for free. As you add cores, they get used. Note: you might have noticed that MD5 is calculated twice in the example: that’s because there seems to be a small performance penalty as the Parallel Extensions library is initializing, just discard the first result and look at the second.
This isn’t the silver bullet of the multi-threaded programming though as it mostly solves CPU bound problems. If you are I/O bound you’ll need a different set of techniques. But this is a great step in the right direction and it’s still just a library. Go get it and try it out yourself.
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5