Ugh it’s almost been a month since I last blogged, and I’ve had a lot to blog. It’s been super busy with a lot of big projects coming to an end at work, and I’ve spent my free time blasting through rolls of film and enjoying the awesome weather… Going to take tomorrow off and just write up any of the blog posts I can remember…..
I’m not a huge bash guy, in fact nine times out of ten I’ll chose perl. I like the way perl hands strings, escaping and its general syntax better. Sometimes however, it’s just better to use bash, say for a cron, especially if you can pull it off in one line (instead of maintaining a script on disk). This is definitely advantages when you are adding a cron to an appliance, it’s nice to maintain everything that isn’t provided by the managed distribution in a single location (i.e. root’s crontab)
I’m a big fan of F5, the Big-IP product line is fantastic as is their support. Theres definitely a lot of way to get alerting, the best of which would be SNMP, or even have the included alertd directly email your pagers. Personally, I don’t have an SNMP driven alerting system, 99% of our devices/systems are actively monitored by dedicated monitoring systems. Modifying alertd has the problem that you have to port your changes forward during any OS updates, and we are currently split between OS 9.x and OS 10.
I decided to write a quick little non-intrusive script (no changes OS configuration, nothing to maintain changes to), and keep it as a single cronable line, to comb the LTM logs (though this could easily be used for GTM logs as well) and email out notifications. The log looks like:
Jun 8 17:31:09 local/tmm notice tmm[1823]: 01070028:3: No members available for pool db-cluster.omghi2u.dev
Jun 8 19:12:17 local/ltm1 notice mcpd[3377]: 01070640:5: Node 192.168.110.82 monitor status down.
Jun 8 19:13:51 local/ltm1 notice mcpd[3377]: 01070728:5: Node 192.168.110.82 monitor status up.
Jun 8 19:13:51 local/tmm notice tmm[1823]: 01070028:3: No members available for pool web-cluster.sup2u.qa
Jun 8 19:15:08 local/ltm1 notice mcpd[3377]: 01070727:5: Pool member web2.sup2u.qa:80 monitor status up.
The log formatting in OS9 is a bit less verbose than OS10, but basically it always starts with a timestamp formatted %b %e %H:%M:%S, then the log entry (OS9 lacks the log level and context). We want to check every 5 minutes (could be bumped to every minute) for any new entries in the last 5 minutes, and email them out if they aren’t stuff we don’t care about.
Some cool stuff of note: if you want to do a for loop on new lines instead of any whitespace, you need to change the IFS variable around. Just make sure you unset it when you are done or you will screw your terminal (or cron run!) up. The code originally looked like:
export IFS=$(echo -en "\n\b"); guts=$(for i in `cat /var/log/ltm | awk '{print $1 " " $2 " " $3}'` ; do unixtime=$(date --date=$i +"%s"); if (( unixtime > `date --date="5 minutes ago" +%s` )); then grep `date --date="@$unixtime" +"%b %e %H:%M:%S"` /var/log/ltm; fi; done | sort | uniq | grep -v -f /root/ltm_excludes.txt); if [ -n "$guts" ]; then echo "$guts" | mail -s "$HOSTNAME logs" "[email protected]"; fi; unset IFS;
And worked like a charm on OS10, but OS9 is based on Redhat 3, and has an ancient version of the date command that didn’t support the @timestamp format. Lovely. Theres lots of people using the pure date command to get around this, but timezones become a problem and are messy. It’s way better to use awk’s wrapper to strftime() and thus our fully backward compatible cron is:
*/5 * * * * export IFS=$(echo -en "\n\b"); guts=$(for i in `cat /var/log/ltm | awk '{print $1 " " $2 " " $3}'` ; do unixtime=$(date --date=$i +"\%s"); if (( unixtime > `date --date="5 minutes ago" +\%s` )); then grep `date --date=\`echo | awk "{ print strftime(\"\%c\", $unixtime) }"\` +"\%b \%e \%H:\%M:\%S"` /var/log/ltm; fi; done | sort | uniq | grep -v -f /root/ltm_excludes.txt); if [ -n "$guts" ]; then echo "$guts" | mail -s "$HOSTNAME logs" "[email protected]"; fi; unset IFS;
Only thing I wasn’t totally happy with was that I had to do a dumb echo | into awk. I couldn’t figure out (from the man page and a quick googling) how to get awk to do it’s thing without stdin or a file. Oh well. That was a lot of fun to write. You could change it to a whitelist by making a /root/includes.txt kind of file and losing the -v on grep. In fact you could have two crons. General alerts goto your ops inbox, alerts you are worried about (like pools having no members left :D) go to your emergency inbox (pagers). Or many crons. Or actually just hack up the alertd.conf… Or alert on SNMP! Either way, happy scripting!
UPDATE: (June 21st, 2011) On a day filled with a particularly large number of port scans resulting in lots of grep-filtered RST response messages, we decided to move the grep -v up to the beginning instead of at the end, this increases performance immensely. No more spikes on the CPU0 graphs! Here’s the updated script, smarter logic this time ’round:
*/5 * * * * export IFS=$(echo -en "\n\b"); guts=$(for i in `grep -v -f /root/ltm_excludes.txt /var/log/ltm | awk '{print $1 " " $2 " " $3}'` ; do unixtime=$(date --date=$i +"\%s"); if (( unixtime > `date --date="5 minutes ago" +\%s` )); then grep `date --date=\`echo | awk "{ print strftime(\"\%c\", $unixtime) }"\` +"\%b \%e \%H:\%M:\%S"` /var/log/ltm; fi; done | sort | uniq ); if [ -n "$guts" ]; then echo "$guts" | mail -s "$HOSTNAME logs" "[email protected]"; fi; unset IFS;