Categories
Business Intelligence Geeky/Programming SQLServerPedia Syndication

SQL Server Agent – Query To Find Long Running Jobs

I use SQL Agent a ton. It is like my go to place to schedule jobs and tasks. Problem is, there are pieces “missing”. Like if a job hangs, you can have it auto stop after a given amount of time. It doesn’t alert on changes to a job, etc, etc.

I asked on twitter what the DBA’s think, @SQLRockstar thinks using OpsMgr is the way, which probably would be great, if I had it. I need a quick and dirty solution.

What I did was query the sysjobs, sysjobactivity, and sysjobhistory tables to get some kind of “look” into the running jobs. That way, if a job that is normally 45 minutes has been running for 5 hours, I should be able to catch it early on, just trying to be more proactive instead of reactive.

SELECT
	j.job_id AS 'JobId',
	name AS 'JobName',
	start_execution_date AS 'StartTime',
	stop_execution_date AS 'StopTime',
	avgruntimeonsucceed,
	DATEDIFF(s,start_execution_date,GETDATE()) AS 'CurrentRunTime',
	CASE WHEN stop_execution_date IS NULL THEN
			DATEDIFF(ss,start_execution_date,stop_execution_date) ELSE 0 END 'ActualRunTime',
	CASE
		WHEN stop_execution_date IS NULL THEN 'JobRunning'
		WHEN DATEDIFF(ss,start_execution_date,stop_execution_date)
			> (AvgRunTimeOnSucceed + AvgRunTimeOnSucceed * .05) THEN 'LongRunning-History'
		ELSE 'NormalRunning-History'
	END 'JobRun',
	CASE
		WHEN stop_execution_date IS NULL THEN
			CASE WHEN DATEDIFF(ss,start_execution_date,GETDATE())
						> (AvgRunTimeOnSucceed + AvgRunTimeOnSucceed * .05) THEN 'LongRunning-NOW'
			ELSE 'NormalRunning-NOW'
		END
		ELSE 'JobAlreadyDone'
	END AS 'JobRunning'
 FROM msdb.dbo.sysjobactivity ja
	INNER JOIN msdb.dbo.sysjobs j ON ja.job_id = j.job_id
INNER JOIN (
	SELECT job_id,
	AVG
	((run_duration/10000 * 3600) + ((run_duration%10000)/100*60) + (run_duration%10000)%100)
	+
	STDEV
	((run_duration/10000 * 3600) + ((run_duration%10000)/100*60) + (run_duration%10000)%100) AS 'AvgRuntimeOnSucceed'
	 FROM msdb.dbo.sysjobhistory
	WHERE step_id = 0 AND run_status = 1
	GROUP BY job_id) art
	ON j.job_id = art.job_id
WHERE
(stop_execution_date IS NULL) OR
	(DATEDIFF(ss,start_execution_date,stop_execution_date) > 60
	AND
	CAST(LEFT(start_execution_date,11) AS DATETIME) = CAST(LEFT(GETDATE(),11) AS DATETIME))
ORDER BY start_execution_date DESC

The inner query looks at the history to get the average runtime, and I added the standard deviation to that to make it more realistic, in case you have a one off that throws the average off. Of course this all depends on you having more than 3 entries in your job history as well.

I then just compare the average to the current runtime (if the job is executing) and to the actual (if it is complete).

You could use this query in a report, to view throughout the day to catch long running jobs, you could also use it to just give you the one that are currently running and over the time threshold, and alert you, so you can catch them right away.

I tested this on a few servers running SQL 2005, but I am guessing it will work on SQL 2008 as well. Find and kill those long running jobs!!