I use SQL Agent a ton. It is like my go to place to schedule jobs and tasks. Problem is, there are pieces “missing”. Like if a job hangs, you can have it auto stop after a given amount of time. It doesn’t alert on changes to a job, etc, etc.
I asked on twitter what the DBA’s think, @SQLRockstar thinks using OpsMgr is the way, which probably would be great, if I had it. I need a quick and dirty solution.
What I did was query the sysjobs, sysjobactivity, and sysjobhistory tables to get some kind of “look” into the running jobs. That way, if a job that is normally 45 minutes has been running for 5 hours, I should be able to catch it early on, just trying to be more proactive instead of reactive.
SELECT j.job_id AS 'JobId', name AS 'JobName', start_execution_date AS 'StartTime', stop_execution_date AS 'StopTime', avgruntimeonsucceed, DATEDIFF(s,start_execution_date,GETDATE()) AS 'CurrentRunTime', CASE WHEN stop_execution_date IS NULL THEN DATEDIFF(ss,start_execution_date,stop_execution_date) ELSE 0 END 'ActualRunTime', CASE WHEN stop_execution_date IS NULL THEN 'JobRunning' WHEN DATEDIFF(ss,start_execution_date,stop_execution_date) > (AvgRunTimeOnSucceed + AvgRunTimeOnSucceed * .05) THEN 'LongRunning-History' ELSE 'NormalRunning-History' END 'JobRun', CASE WHEN stop_execution_date IS NULL THEN CASE WHEN DATEDIFF(ss,start_execution_date,GETDATE()) > (AvgRunTimeOnSucceed + AvgRunTimeOnSucceed * .05) THEN 'LongRunning-NOW' ELSE 'NormalRunning-NOW' END ELSE 'JobAlreadyDone' END AS 'JobRunning' FROM msdb.dbo.sysjobactivity ja INNER JOIN msdb.dbo.sysjobs j ON ja.job_id = j.job_id INNER JOIN ( SELECT job_id, AVG ((run_duration/10000 * 3600) + ((run_duration%10000)/100*60) + (run_duration%10000)%100) + STDEV ((run_duration/10000 * 3600) + ((run_duration%10000)/100*60) + (run_duration%10000)%100) AS 'AvgRuntimeOnSucceed' FROM msdb.dbo.sysjobhistory WHERE step_id = 0 AND run_status = 1 GROUP BY job_id) art ON j.job_id = art.job_id WHERE (stop_execution_date IS NULL) OR (DATEDIFF(ss,start_execution_date,stop_execution_date) > 60 AND CAST(LEFT(start_execution_date,11) AS DATETIME) = CAST(LEFT(GETDATE(),11) AS DATETIME)) ORDER BY start_execution_date DESC
The inner query looks at the history to get the average runtime, and I added the standard deviation to that to make it more realistic, in case you have a one off that throws the average off. Of course this all depends on you having more than 3 entries in your job history as well.
I then just compare the average to the current runtime (if the job is executing) and to the actual (if it is complete).
You could use this query in a report, to view throughout the day to catch long running jobs, you could also use it to just give you the one that are currently running and over the time threshold, and alert you, so you can catch them right away.
I tested this on a few servers running SQL 2005, but I am guessing it will work on SQL 2008 as well. Find and kill those long running jobs!!
13 replies on “SQL Server Agent – Query To Find Long Running Jobs”
OK, found out that OpsMgr uses sp_help_jobactivity and does a datediff between the start execution time and the current time on the server. also, by default the monitor will flag jobs that have run for longer than 1 and 2 minutes, which is less than helpful, really.i love your idea of checking on the average run time. i could put something into PBM but the return results are limited there. so now i am thinking of some powershell, but i don't think that will do what we want here: you want to be notified as soon as a job has exceeded the average by 10%, correct?
LikeLike
you are right, the query I put on the blog just look if it is over 5% at this point. Takes AVG + STD of historical run times, * 5%, you might need to tweak the pct's accordingly. this query would have to run every X min/seconds and alert
LikeLike
This is exactly what I have been looking for…. It doesn’t give accurate results in 2008. Do you have an update to this by any chance?
LikeLike
OK, found out that OpsMgr uses sp_help_jobactivity and does a datediff between the start execution time and the current time on the server. also, by default the monitor will flag jobs that have run for longer than 1 and 2 minutes, which is less than helpful, really.i love your idea of checking on the average run time. i could put something into PBM but the return results are limited there. so now i am thinking of some powershell, but i don't think that will do what we want here: you want to be notified as soon as a job has exceeded the average by 10%, correct?
LikeLike
you are right, the query I put on the blog just look if it is over 5% at this point. Takes AVG + STD of historical run times, * 5%, you might need to tweak the pct's accordingly. this query would have to run every X min/seconds and alert
LikeLike
Couldn't we use a similar script within Ops Manager to only trigger the monitor when the job is x% past avg runtime or some other criteria to make this Ops Manager monitor a bit more useful?
LikeLike
yes, if you are lucky enough to have opsmgr
LikeLike
I got duplicate values, like if I have 4 jobs , it showed 8/9, with different results, like the same job is normal & long running.
LikeLike
Hi, I have few jobs that hangs and I need to put it in the SQL 2005 Alert so it pages me and then I can take a look to find out if the jobs hangs and why. How do I use your query to put it in the alert so it pages me when it hangs more than 3 or 4 hours. Thanks for your help. Ali
LikeLike
I created a proc that I run from a job every 15min. I added this at the end of the script to email me when a job is running long.–********************Long Running NOW**************************************IF EXISTS(SELECT * FROM #tempjobs where jobrunning= 'LongRunning-NOW') BEGIN SELECT * INTO #tempjobs2 FROM #tempjobs where jobrunning= 'LongRunning-NOW' DECLARE @server varchar(max) set @server = '' –add servername DECLARE @subject varchar(max) set @subject = '' DECLARE @body varchar(max) set @body = '' DECLARE @tmpbody2 varchar(max) set @tmpbody2 = null DECLARE @tmpbody3 varchar(max) set @tmpbody3 = null DECLARE @tmpbody4 varchar(max) set @tmpbody4 = nullselect @tmpbody2 = JobName from #tempjobs2select @tmpbody3 = CurrentRuntime from #tempjobs2select @tmpbody4 = avgruntimeonsucceed from #tempjobs2–if @@rowcount > 0set @body = @body + 'Server:' + @server + ' – Job ''' + @tmpbody2 + ''' is RUNNING LONG!' + '.'set @body = @body + ' ' + @tmpbody3 + ' – CurrentRuntime' + ''set @body = @body + @tmpbody4 + ' – AverageRuntime' + ''set @body = @body + ''SET @body = @bodySET @subject = @server + ' – Long Running Jobs' EXEC msdb.dbo.sp_send_dbmail @recipients='', –add email address @subject = @subject, @body = @body, @body_format = 'HTML'
LikeLike
Hi Steve, when i paste and run this i get an error – invalid column name ‘AvgRunTimeOnSucceed’ ??
LikeLike
I just copied and pasted and it ran fine.
LikeLike
Reblogged this on Can Atuf Kansu and commented:
SQL Agent Job’larının kontrolü için kullanışlı bir sorgu.
LikeLike