Improving BizTalk Server throughput and identifying bottlenecks

Recently one of our BizTalk 2009 systems started to suffer in terms of perfomance due to the sudden influx of extra messages that were fed into the system. My team was given the task to identify the bottleneck and provide reccomendations to improve performance.

With curiousity, I jumped onto the BizTalk Server and opened up the performance monitor to observe a few counters. I was interested in the following performance counter:

Message publishing delay (ms) under the category BizTalk:MessageAgent

Our observation was this delay kept increasing over a period of time which led us to believe that this “might” be due to BizTalk throttling. Throttling is a mechanism to control the flow of the workload associated with a host instance.

The next performance counter to confirm our suspicions was the following:

Message Publishing Throttling State under category BizTalk:Message Agent

This counters tells us if throttling is occuring and the likely causes based on the value in the graph.

Average was 6.

Now going back to the msdn document (found here), this is what the value 6 represented:

A flag indicating whether the system is throttling message publishing (affecting XLANG message processing and inbound transports).
0: Not throttling
2: Throttling due to imbalanced message publishing rate (input rate exceeds output rate)
4: Throttling due to process memory pressure
5: Throttling due to system memory pressure
6: Throttling due to database growth
8: Throttling due to high session count
9: Throttling due to high thread count
11: Throttling due to user override on publishing 

Possible reasons for this condition include:
A. The SQL Agent jobs used by BizTalk Server to maintain the BizTalk Server databases not running or are running slowly.
B. Down-stream components are not processing messages from the in-memory queue in a timely manner.
C. Number of suspended messages is high.
D. Maximum sustainable load for the system has been reached.

After the process of elimination, we were able to eliminate A,C and D which left B for further investigation. We were able to eliminated A by checking the record counts in the Spool and TrackingData tables in the messagebox database. The number of records in these tables was less than 500,000 or mostly 0 which proved that BizTalk wasnt throttling due to tracked messages as defined under:

“The Message count in database setting also indirectly defines the threshold for a throttling condition based on the number of messages in the spool table or tracking table. If the number of messages in the spool table or tracking table exceeds 10 times this value then a throttling condition will be triggered. By default the Message count in database value is set to 50,000, which will cause a throttling condition if the spool table or the tracking table exceeds 500,000 messages.”

We could eliminate C and D because the suspended messages queue was almost empty and the CPU & memory (physical and process) consumption on the server was almost negligible.

Apparently, it was the downstream system (WCF service in our case) which was queueing requests and slowing things down due to too many requests to the service.

We then wanted to control the flow of messages going to the WCF port and we could do that by checking “Ordered Delivery” in our custom WCF solicit-response sendport. Once this was done, we saw a significant improvement in the throughput and BizTalk wasnt throttling anymore.

In cases where the reason for throttling is the database, I would highly recommend reading the throttling whitepapers on msdn before making any changes to the defaults or disabling throttling altogether.

Cheers!

Dipesh

Advertisements