BizTalk Maintenance Jobs in SQL Server

I recently came across a situation where I had to identify bottlenecks on a BizTalk server which was performing like a snail. I have written a blog post about this here.

However, I also realised the importance of various BizTalk jobs in SQL Server during this process. I would like to share with you what I learnt in relation to this and how one should keep an eye on these sql jobs as they play a pivotal role in highly (& poorly) performing BizTalk setups.

There are about 12 sql jobs in BT 2009 and 13 jobs in BT 2010 which play their own little role when it comes to ensuring good BizTalk performance.

BizTalk Jobs

The DTA purge job is disabled by default and can be enabled if the tracking database gets too big.  The messagebox cleanup job is also disabled by default as it is triggered by the ManageRefCountLog job as needed.

It is very important to note that if these jobs dont run on a timely manner especially when the tracking and spool tables get bigger than the set throttling limit, BizTalk starts throttling. The moment BizTalk starts throttling, performance starts to degrade.

Lets look at what each job is meant to do (referenced from MS Support):

Job name Description Enabled by default
Backup BizTalk Server Consists of three steps. Step 1 performs full database backups of the BizTalk Server databases. Step 2 backs up the BizTalk Server database logs. Step 3 specifies for how long the backup history is kept. No
CleanupBTFExpiredEntriesJob_BizTalkMgmtDb Deletes expired BizTalk Framework entries from the BizTalk Management (BizTalkMgmtDb) database. Yes
DTA Purge and Archive Automates the archiving of tracked messages and the purging of the BizTalk Tracking database to maintain a healthy system and to keep the tracking data archived for future use.Note On BizTalk Server 2004, this job is created after you install BizTalk Server 2004 Service Pack 2. No
MessageBox_DeadProcesses_Cleanup_BizTalkMsgBoxDb Detects when a BizTalk Server host instance (BTSNTSvc.exe) has stopped responding. The job then releases the work from the host instance so a different host instance can finish the tasks. Yes
MessageBox_Message_Cleanup_BizTalkMsgBoxDb Removes all messages that are not referenced by any subscribers in the BizTalkMsgBoxDb database tables.Important This job is also started by the MessageBox_Message_ManageRefCountLog_BizTalkMsgBoxDbjob. Therefore, we recommend that you disable this job.Note On BizTalk Server 2004, this job is enabled by default. Therefore, we recommend that you disable this job. No
MessageBox_Message_ManageRefCountLog_BizTalkMsgBoxDb Manages the reference count logs for messages and determines when a message is no longer referenced by a subscriber.This job runs in an infinite loop and deletes the entries from the two individual message reference count logs. This job also calls the MessageBox_Message_Cleanup_BizTalkMsgBoxDbjob.Important At first, the MessageBox_Message_ManageRefCountLog_BizTalkMsgBoxDb job status icon displays a status of Success. However, there will be no corresponding success entry in the job history. If one of the jobs in the MessageBox_Message_ManageRefCountLog_BizTalkMsgBoxDb job fails, a failure entry appears in the job history and the status icon displays a status of Failure. The job will always display a status of Failure after the first failure. To verify that the other BizTalk Server SQL Server Agent jobs run correctly, check the status of the other BizTalk Server SQL Server Agent jobs.

Note On BizTalk Server 2004, this job is created after you install BizTalk Server 2004 Service Pack 2.

MessageBox_Parts_Cleanup_BizTalkMsgBoxDb Removes all message parts that are no longer referenced by a message in the BizTalkMsgBoxDb database tables. All messages are composed of one or more message parts that contain the message data. Yes
MessageBox_UpdateStats_BizTalkMsgBoxDb Updates the statistics for the BizTalkMsgBoxDb database.Note This job does not exist on BizTalk Server 2004. Yes
Monitor BizTalk Server Scans for any known issues with the BizTalkMgmtDb, BizTalkMsgBoxDb, and BizTalkDTADb databases. This includes orphaned instances.Note This job is created on BizTalk Server 2010. Yes
Operations_OperateOnInstances_OnMaster_BizTalkMsgBoxDb Used for multiple BizTalkMsgBoxDb database deployment. It asynchronously performs operational actions. For example, it asynchronously performs bulk terminates on the master BizTalkMsgBoxDb database after those changes are applied to the subordinate BizTalkMsgBoxDb database. Note This job does not exist on BizTalk Server 2004.  Yes
PurgeSubscriptionsJob_BizTalkMsgBoxDb Purges unused subscription predicates from the BizTalkMsgBoxDb database. Yes
Rules_Database_Cleanup_BizTalkRuleEngineDb Purges old audit data from the Rule Engine (BizTalkRuleEngineDb) database every 90 days. This job also purges old history data (deploy/undeploy notifications) from the Rule Engine (BizTalkRuleEngineDb) database every 3 days.Note This job is created on BizTalk Server 2009. Yes
TrackedMessages_Copy_BizTalkMsgBoxDb Copies the message bodies of tracked messages from the BizTalkMsgBoxDb database to the Tracking (BizTalkDTADb) database. Yes
TrackingSpool_Cleanup_BizTalkMsgBoxDb Purges inactive tracking spool tables to free database space.Note This job exists only on BizTalk Server 2004.

Improving BizTalk Server throughput and identifying bottlenecks

Recently one of our BizTalk 2009 systems started to suffer in terms of perfomance due to the sudden influx of extra messages that were fed into the system. My team was given the task to identify the bottleneck and provide reccomendations to improve performance.

With curiousity, I jumped onto the BizTalk Server and opened up the performance monitor to observe a few counters. I was interested in the following performance counter:

Message publishing delay (ms) under the category BizTalk:MessageAgent

Our observation was this delay kept increasing over a period of time which led us to believe that this “might” be due to BizTalk throttling. Throttling is a mechanism to control the flow of the workload associated with a host instance.

The next performance counter to confirm our suspicions was the following:

Message Publishing Throttling State under category BizTalk:Message Agent

This counters tells us if throttling is occuring and the likely causes based on the value in the graph.

Average was 6.

Now going back to the msdn document (found here), this is what the value 6 represented:

A flag indicating whether the system is throttling message publishing (affecting XLANG message processing and inbound transports).
0: Not throttling
2: Throttling due to imbalanced message publishing rate (input rate exceeds output rate)
4: Throttling due to process memory pressure
5: Throttling due to system memory pressure
6: Throttling due to database growth
8: Throttling due to high session count
9: Throttling due to high thread count
11: Throttling due to user override on publishing 

Possible reasons for this condition include:
A. The SQL Agent jobs used by BizTalk Server to maintain the BizTalk Server databases not running or are running slowly.
B. Down-stream components are not processing messages from the in-memory queue in a timely manner.
C. Number of suspended messages is high.
D. Maximum sustainable load for the system has been reached.

After the process of elimination, we were able to eliminate A,C and D which left B for further investigation. We were able to eliminated A by checking the record counts in the Spool and TrackingData tables in the messagebox database. The number of records in these tables was less than 500,000 or mostly 0 which proved that BizTalk wasnt throttling due to tracked messages as defined under:

“The Message count in database setting also indirectly defines the threshold for a throttling condition based on the number of messages in the spool table or tracking table. If the number of messages in the spool table or tracking table exceeds 10 times this value then a throttling condition will be triggered. By default the Message count in database value is set to 50,000, which will cause a throttling condition if the spool table or the tracking table exceeds 500,000 messages.”

We could eliminate C and D because the suspended messages queue was almost empty and the CPU & memory (physical and process) consumption on the server was almost negligible.

Apparently, it was the downstream system (WCF service in our case) which was queueing requests and slowing things down due to too many requests to the service.

We then wanted to control the flow of messages going to the WCF port and we could do that by checking “Ordered Delivery” in our custom WCF solicit-response sendport. Once this was done, we saw a significant improvement in the throughput and BizTalk wasnt throttling anymore.

In cases where the reason for throttling is the database, I would highly recommend reading the throttling whitepapers on msdn before making any changes to the defaults or disabling throttling altogether.



Hyperthreading – To enable or disable?

If you have been dealing with BizTalk 2009 or prior, you must have observed that it was always advised to turn off hyperthreading to get optimum performance in BizTalk. This is what is stated in the BizTalk Server 2009 Operations Guide:

  • It is critical that hyper-threading be turned off on BizTalk Server computers. This is a BIOS setting, typically found in the Processor settings of the BIOS setup. Hyper-threading makes the server appear to have more processors/processor cores than it actually does; however, hyper-threaded processors typically provide between 20-30% of the performance of a physical processor/processor core. When BizTalk Server counts the number of processors to adjust its self-tuning algorithms, the hyper-threaded processors cause these adjustments to be skewed, which is detrimental to overall performance.
  • Hyper-threading should be turned off on SQL Server computers because applications that can cause high levels of contention (such as BizTalk Server) may cause decreased performance in a hyper-threaded environment on a SQL Server computer.

However, one should make this decision about enabling or disabling hyperthreading based on the hardware. So,  with Intels Nehalem+ hyperthreading, is this advice still valid?

As per some tests carried out by the the CAT team, this is no longer the case….

Read more here!

The latest guide on improving server performance is available here.