Hi all!
The biggest issue we have found in our current NServiceBus project is that sometimes the second message that comes to a saga was not correlated and it was missed into a IHandleSagaNotFound handler.
After some despair trying a lot of things I asked the people of Particular for support, the great Szymon Pobiega pointed me that as inserts in Oracle doesn’t block reads if the dtc commit is fast in the msmq but very slow in the database the reply message for another handler to the saga can arrive BEFORE the saga is written into its table so It goes to the not saga found.
(Commit to database takes longer than the Reply to be processed.)
So we had a few options here:
- Throw an exception in the saga not found handler.
- Put a timeout for retrying the first message.
- Outbox
- Block the saga row until it is committed in the database.
I chose the fourth option as it is very developer friendly. Nowone has to be aware about it. My second option would be to use outbox, but we haven’t tested it too much.
Here you have the code. It uses the nservicebus nhibernate infrastructure, with a fake saga to create a table for pessimistic locking. All the trick is done in a step in the pipeline, with it it blocks any saga message that comes for a ConversationId if another message of the same ConversationId is executed by the saga, this way every message of a conversation is executed one by one under a saga. Never in parallel.
https://github.com/pablocastilla/SagaMissingMessages
In the final code we create different ConversationIds if a conversation creates a lot of sagas. This helps in make the program smoother.
Hope this helps