Monitoring at submit

Tags: Slurm Condor

This section will detail the monitoring available at submit. Here we will detail how you can keep track of the submit machines as you work as well as monitor your condor jobs.

The main submit page

On the main submit page you can find interesting links useful for monitoring. Most of these links are explained in more detail below.

Ganglia Monitoring for submit

Ganglia is a distributed monitoring system for high-performance computing systems such as submit. The Ganglia monitoring can be found through a link on the main submit page or can be found directly here. Information on the individual servers can be found at the bottom of the page or through the following link to servers.

CondorMon

If you would like to monitor your condor jobs you can use the condor monitoring CondorMon, where you can see how many condor jobs are running, idle or held as well as where they are being submitted to. This site also gives an overview of your recent submissions.

The submissions that are sent to the CMS global pool can also be monitored through the central site CERN Summary.

SlurmMon

To monitor your slurm jobs you can use the slurm monitoring SlurmMon, where you can see how many slurm jobs are running, as well as where they are being submitted to. This site also gives an overview of your recent submissions including submissions to the LQCD cluster and submissions to GPU machines.

Summary Plots for SubMIT

There are additional summary plots to help keep track of the growth and health of the SubMIT system Submit Monitoring Tools.

Monitoring for the T3

For those working on the T3 machines, we have similar monitoring, including T3 CondorMon and Ganglia.