Showing posts with label Code. Show all posts
Showing posts with label Code. Show all posts

Tuesday, July 05, 2011

ASP.NET Deadlock on WCF service hosted in IIS

One of our most important core business services is a WCF soap web service for selling our products to other companies. As we have lots of requests to those services, we must have multiple instances of this service running on several IIS servers - all behind a load balancer.

Problem begins:

Our service was running happily in production for quite a long time.
Than one day, our monitoring team noticed all instances went red in the load balancer health monitor.

EMERGENCY!!!

We immediately started a conference call with all the needed people, and as a first action we decide to recycle some instances and see if they come up ok.

They do. We were back online.

Investigation:

when we look at the event viewer, we see IIS has started to recycle some of the instances we haven’t, because of the following error:

ISAPI 'c:\windows\microsoft.net\framework\v2.0.50727\aspnet_isapi.dll' reported itself as unhealthy for the following reason: 'Deadlock detected'.

This error pointed us to this article:

Contention, poor performance, and deadlocks when you make Web service requests from ASP.NET applications

In short, the article explains that this error might be caused by the limit of the threads .Net allows to open Simultaneously, where all the allowed thread are used and they waits for something which needs thread as well.

So we have started to think if we may have such scenario in our code, and couldn’t come with this exact scenario, but we thought maybe one of our internal services was blocked and caused all the waiting thread.

Another option which we thought about was DoS attack, but we haven’t seen any increase in calls per seconds.

Anyway – we decided to follow Microsoft recommendations for the runtime configuration settings, like:

  • maxWorkerThreads
  • maxIoThreads
  • minWorkerThreads
  • minFreeThreads
  • minLocalRequestFreeThreads

And also WCF settings

  • MaxConcurrentCalls
  • MaxConcurrentInstances
  • MaxConcurrentSessions

It didn’t help – we got the same crash scenario few days later.

Investigating Dumps

We have decided to follow the instructions in “How to generate a dump file when ASP.NET deadlocks in IIS 6.0”, and created dump file the next deadlock happened.

The Dump file hasn’t shown any exception at the deadlock time, however the threads counts gave us the first clue:

  • Hundreds of ASP.NET HTTP Request threads (System.ServiceModel.Activation.HttpModule.ProcessRequest)
  • Zero WCF executing threads
    (System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin)

Logs to the help

We have collected the logs from IIS, Event Viewer, Load Balancer, Data warehouse DB, and application logs. The picture we saw was:

19:52:51 – Last successful WCF request (IIS log)
19:53:49 – 1 connection_Dropped (HTTPERR log)
19:57:45 – Last successful .aspx page request (IIS log)
20:00:24 - Deadlock Detected (W3SVC-WP Event Viewer)
20:03:35 - 164 * Connection_Abandoned_By_AppPool – most of WCF requests, few aspx pages (HTTPERR)

This gives the same picture as the Dump file:

somehow for almost 5 minutes – aspx files worked perfectly, but WCF requests were not returned to the clients (instead they were stack in the asp.net threads as there weren't any WCF threads to handle them).
All those WCF requests eat up all the available thread from the thread pool, and once finish them all (after the declared ResponseDeadlockInterval passed with no request responded) – Deadlock was declared.

Once the Process recycled – IIS abandoned all those queued requests.

What happened to the WCF?

As you can see, the dump we used was created way too late to find out the cause for the WCF problem.
But, Here logs came to the help again.

we have identified (in the IIS logs) a unique User-Agent which repeats each time just before the services crashed, so we followed this user, and using Network Monitor (Wireshark) we finally arrived to the source of the problem:

This client application has sent us requests with SOAPAction which didn’t fit to any operation in the WCF service

The Bug

We have in our WCF service an attribute - “ErrorHandledWebService” – which inherit from “Attribute” and implemets “IServiceBehavior” & “IErrorHandler” interfaces. This attribute tells WCF to bind our Error handling code the WCF flow.

When exception happens on the WCF request it triggers our “ProvideFault” method of the “IErrorHandler” interface – which we implement to write the original request to our log.

The problem starts when a request comes to the service which doesn’t fit any operation in the service an exception is thrown by WCF.

In this case - OperationContext.Current is null.

Our code used OperationContext.Current.RequestContext.RequestMessage.State which caused a Null Reference Exception which was not handled by our code.

Microsoft’s WCF Bug?

This unhandled exception inside the ProvideFault can cause one of the following:

  1. The Process terminate itself.
  2. The Process stop responding to WCF requests.

those behaviors depends on the value of this parameter:

<legacyUnhandledExceptionPolicy enabled="true/false" />

if this flag is false – the process terminate because of the unhandled exception.
if it is true – WCF is dead, but the rest functionalities (like aspx) continue to work.

as long as you the flag set to false – Microsoft behavior is just fine – crashing the process. Smile

But if the flag is true (also I know Microsoft recommend to go without the legacy mode) - I think it as a bug in the framework ,as WCF shouldn’t leave the process in an unstable state, or at least, it should write proper error message to the event log - explaining WCF is dead because of this.

What do you think?

Thursday, April 30, 2009

Real Incremental Build - Part 1 – Motivation for small deployment packages

You have deployed version x of your compiled application to your production environment.
Some bugs were resolved and you want to deploy a fix.

You have 2 options:

  1. Deploy a full updated build of the application.
  2. Deploy differential package only.

Now, Option 1 will clearly work.
You run your build, take everything from the drop location, and deploy it.

Pros:

  • You're sure that you won't missed anything.
  • All files are compatible with each other.

Cons:

  • The package will have big size (lots of MB – depends on your application size)

At first, the size of the patch looks unimportant.

But if you take into account the time it will take to upload it to your production environment and distributed it between your hundreds of applications instances – this would be unwise not to consider this factor. Or if you let users to download this patch from your website – I guess you’ll want to minimize its size to save your bandwidth usage.

on the next parts we will go from this globally described scenario into more detailed one – based on Microsoft TFS (Team foundation server) source control & Team Build. We will try to show how to create a delta package contains only the new and updated binaries and files. We will meet some implantation problems – and hopefully overcome them.

See part 2.

Thursday, December 18, 2008

.NET Framework 3.5 Family Update - My Peek inside the .NET 2 update

UPDATE:

KB 959209 has appeared! (was published on December 20, 2008).
This KB point you to 3 other KBs for the 3 parts of the Family Update:

  • 958481 – .Net 2.0 SP2 update
  • 958483 - .Net 3.0 SP2 update
  • 958484 - .Net 3.5 SP1 update

(Those docs are very similar each to the other, telling about the same issues)
Better late than never.

ORIGINAL POST: 

As I stated in my previous entry, Microsoft has uploaded this Family Update, with the usual missing of a description document what went into this update.

The only thing they said was:

The Microsoft .NET Framework 3.5 Family Update provides compatibility roll-up updates for customer reported issues found after the release of Microsoft .NET Framework 3.5 SP1

So I have decided to peek inside.

Notice:
I don't take any responsibility on the following information. I might have done something totally wrong in the way I have extracted the update and got wrong info from within. You take all the risk on yourself if you use this info.

Steps:

Now that we done with those legal issues, what I have done is extract the .msp file from within the the NDP20SP2-KB958481-x86.exe.

Then, using MSIX tool, I have extract the files from the .msp file.

Then, extract the patch.cab file.

Next step was renaming the dlls I have found to have the correct extension (which was missing)

Next, for each dll (only managed), using Reflector, disassemble and export the source files.
Doing the same last action to the original dlls from the .NET GAC.

Using Araxis Merge to compare the source.

Please remember that I have done the investigation only for the .NET 2.0 SP2 Family update.

Non managed dlls in the update: (can't say what was changed there)

  • aspnet_wp.exe
  • mscordacwks.dll
  • mscorjit_dll
  • webengine.dll

Managed dlls:

  • mscorlib.dll
  • System.Data.OracleClient
  • System.Xml
  • System.Web

Main changes in those Managed dlls:

(I will just write the changes in short, I don't have more info to share- this would required days of reading Microsoft's code).

System.Web:

  • Better handling of IIS Rewrite Module in multiple classes(coockieless security, IIS7 worker request, HTML Form, Client Path)
  • Process Custom Browser Files - ignores hidden files in the directory (at least this what it's look like)
  • Something in state cache handling.
  • Change in Sort entries in control builder.

System.Xml

  • xml CanWrite check instead of GetSetMethod in one method.

System.Data.OracleClient

  • RollbackDeadTransaction and some more Transactions code was added to a method .

mscorlib

  • Handling Delegate & MulticastDelegate Combine null issue better

the Version was changed from 2.0.50727.3053 to 2.0.50727.3082.

wow, this was long.
Hope you enjoy it.

I still wait for the missing KB doc from Microsoft...

If it won't come soon, you might find here the same analysis to the 2 other updates (I don't promise).

Wednesday, November 28, 2007

Dept Payment (Build Failure Email Alert)

Almost a year ago I told you about the Build Failure Alert system which I have build:
Team Build - Failure Email Alert - Make it Actually work for you

I have promise some code samples.

So here it is:

Build Failure Email - Code Sample