Some changes to address blocking after a node is lost#2
Some changes to address blocking after a node is lost#2sveesible wants to merge 2 commits intocouchbase:masterfrom sveesible:master
Conversation
…erations. closing channel.socket so as not to orphan the socket. Using an iterator in handleIO instead of a foreach loop which isn't thread safe.
There was a problem hiding this comment.
this seems like the big one in my testing that clogs up the mechanism when a server node is lost suddenly
There was a problem hiding this comment.
Please see this change set where the bug was introduced last July
00f2e78
|
Hi, first, thanks much for your effort into fixing bugs here! Looking at your PR here, I identified two things that look indeed wrong:
For your other code, I added some additional comments that need a bit of clarification. Note that we do the actual code review through gerrit, github is just a mirror here. You can either setup a gerrit account and work through the process, or I'll do it for you and attribute you in the commit message, whatever you like more. Cheers, |
|
I don't see where to get setup with gerrit, I suppose I'd defer to you |
|
@sveesible it is described here, but seems legit that I'll do it, its quicker I guess. http://www.couchbase.com/wiki/display/couchbase/Contributing+Changes |
This should not be defaulted to 1 as this value must be 0 for the isActive() call to return true, and thus allow redistributed ops to be put into this nodes operation queue
BinaryOperationFactory was improperly setting a null get callback for the gets callback on clone of get operations, causing bad behavior under load with redistribution of bulkgets
Redistribution of write operations was just popping the operation back onto the original node and then continuing through the clone logic.
closing channel.socket to be consistent with other close logic
Using an iterator in handleIO for Selector.selectedKeys instead of a foreach loop because foreach changes stuff in the list while it loops