Splunk Alerts

Splunk runs a bunch of saved searches that can activate alerts. Log in to splunk and go to "admin" and then "saved searches". Splunk saved search scripts are located at charon:/opt/splunk/bin/scripts These alerts are designed to let us know about problems so they can be fixed quickly and we can improve our overall service level. If you find a problem that can be found through a log search, please add a saved search.

=Users logging into coreservers= This saved search runs once a minute and scans between one and two minutes ago for invalid users logging into coreservers. It then runs a script (muggles_trying_coreservers.py) that sends the user a helpful email. It has a couple of protections to protect people from getting notices about ssh brute-force attempts. This doesn't actually work and doesn't currently run.

=LDAP server down= This alert sends sysadmins and sysadmins non-ugcs addresses notices if there are too many "ldap server down" messages. If you get it, double-check that Hera and Zeus are up and running correctly

=Mail forwarded in past 15 min= This alert checks to see if we've forwarded any mail in the past 15min. If we haven't for a few periods, it is a likely indication of problems. It emails sysadmins and external sysadmins if it finds a problem. If you get one in the middle of the night, it's not a big deal. If you get 3 or 4 in a row, look through postfix logs for errors.

=Client key expired= This alert lets you know if a Kerberos principal has expired. If one has, you should go reset its expiration date. This is especially important for server principals but also causes a lot of user pain.

=IMAP Folder too full= When a user's mailbox fills up, they can't check their mailbox through IMAP. This alert lets us know if we need to increase their quota a little bit so they can check their mailbox and clean it up.

=Email heartbeat= Hermes runs a cron job that tries sending an email through the system, and seeing if it gets all the way through. If there is too much delay, it sends alerts. The code is in hermes:/usr/local/sbin/email_tester.py. See Email Heartbeat

=See Also=
 * Nagios
 * Logging
 * Splunk