When writing multi-threaded applications, one of the most common problems experienced are race conditions.
My question to the community, is:
What is a race condition? How do you detect them? How do you handle them? And finally, how do you prevent them from occurring?
-
There is a great chapter in the Secure Programming for Linux HOWTO that describes what they are, and how to avoid them.
From Craig H -
A race condition is a kind of bug, that happen only with certain temporal conditions.
Example: Imagine you have two threads A and B.
In Thread A:
if( object.a != 0 ) object.avg = total / object.aIn Thread B:
object.a = 0If thread A is preempted just after having check that object.a is not null, B will do a = 0, and when thread A will gain the processor, it will do a "divide by zero". This bug only happen when thread A is preempted just after the if statement, it's very rare, but it can happen.
From Steve Gury -
A race condition occurs when 2 or more threads are able to access shared data and they try to change it at the same time. Because the thread scheduling algorithm can swap between threads at any point, you don't know the order at which the threads will attempt to access the shared data. Therefore, the result of the change in data is dependent on the thread scheduling algorithm, i.e. both threads are 'racing' to access/change the data.
Often problems occur when one thread does a "check-then-act" (e.g. "check" if the value is X, and then "act" to do something that depends on the value being X) and another thread does something to the value in between the "check" and the "act".
if(x == 5) //The 'Check' { y = x * 2; //The 'Act' //If x is changed by another thread in between the if(x==5) and the "y=x*5", y will not be equal to 10. }The point being, y could be 10, or it could be anything, depending on whether another thread changed x in between the check and act. You have no real way of knowing.
In order to prevent race conditions occuring, typically you would put a lock around the shared data to ensure that only one thread can access the data at a time. This would mean something like this:
//Obtain lock for x if(x == 5) { y = x * 2; //Now, nothing can change x until the lock is released. Therefore y = 10 } //release lock for xBrian Ortiz : What does the other thread do when it encounters the lock? Does it wait? Error?Lehane : Yes, the other thread will have to wait until the lock is released before it can proceed. This makes it very important that the lock is released by the holding thread when it is finished with it. If it never releases it, then the other thread will wait indefinitely.From Lehane -
A race condition is a situation on concurrent programming where two concurrent threads or processes and the resulting final state depends on who gets the resource first.
From Jorge Córdoba -
Race conditions occur in multi-threaded applications or multi-process systems. A race condition, at its most basic, is anything that makes the assumption that two things not in the same thread or process will happen in a particular order, without taking steps to ensure that they do. This happens commonly when two threads are passing messages by setting and checking member variables of a class both can access. There's almost always a race condition when one thread calls sleep to give another thread time to finish a task (unless that sleep is in a loop, with some checking mechanism).
Tools for preventing race conditions are dependent on the language and OS, but some comon ones are mutexes, critical sections, and signals. Mutexes are good when you want to make sure you're the only one doing something. Signals are good when you want to make sure someone else has finished doing something. Minimizing shared resources can also help prevent unexpected behaviors
Detecting race conditions can be difficult, but there are a couple signs. Code which relies heavily on sleeps is prone to race conditions, so first check for calls to sleep in the affected code. Adding particularly long sleeps can also be used for debugging to try and force a particular order of events. This can be useful for reproducing the behavior, seeing if you can make it disappear by changing the timing of things, and for testing solutions put in place. The sleeps should be removed after debugging.
The signature sign that one has a race condition though, is if there's an issue that only occurs intermittently on some machines. Common bugs would be crashes and deadlocks. With logging, you should be able to find the affected area and work back from there.
From tsellon -
A sort-of-canonical definition is "when two threads access the same location in memory at the same time, and at least one of the accesses is a write." In the situation the "reader" thread may get the old value or the new value, depending on which thread "wins the race." This is not always a bug—in fact, some really hairy low-level algorithms do this on purpose—but it should generally be avoided. @Steve Gury give's a good example of when it might be a problem.
From Chris Conway -
A "race condition" exists when multithreaded (or otherwise parallel) code that would access a shared resource could do so in such as way as to cause unexpected results.
Take this example:
for ( int i = 0; i < 10000000; i++ ) { x = x + 1; }If you had 5 threads executing this code at once, the value of x WOULD NOT end up being 50,000,000. It would in fact vary with each run.
This is because, in order for each thread to increment the value of x, they have to do the following: (simplified, obviously)
Retrieve the value of x Add 1 to the value of x Store the value of x
Any thread can be at any step in this process at any time, and they can step on each other when a shared resource is involved. The state of x can be changed by another thread during the time between x is being read and when it is written back.
Let's say a thread retrieves the value of x, but hasn't stored it yet. Another thread can also retrieve the same value of x (because no thread has changed it yet) and then they would both be storing the same value (x+1) back in x!
Example:
Thread 1: reads x, value is 7 Thread 1: add 1 to x, value is now 8 Thread 2: reads x, value is 7 Thread 1: stores 8 in x Thread 2: adds 1 to x, value is now 8 Thread 2: stores 8 in x
Race conditions can be avoided by employing some sort of locking mechanism before the code that accesses the shared resource:
for ( int i = 0; i < 10000000; i++ ) { //lock x x = x + 1; //unlock x }Here, the answer comes out as 50,000,000 every time.
For more on locking, search for: mutex, semaphore, critical section, shared resource.
jakobengblom2 : See http://jakob.engbloms.se/archives/65 for an example of a program to test how oiften such things go bad... it really depends on the memory model of the machine you are running on.Philip Morton : Great explanation, thanks.From privatehuff
0 comments:
Post a Comment