The Importance of Proper Monitoring
Hey everyone Mike and in this great episode number 8 podcast, I want to talk about the importance of monitoring your environment. I have said it in other blogs and podcasts but there isn’t much that is more important the properly monitoring your environment. Without any monitoring in place, you really don’t know how your environment is performing. Any monitoring you can do it better than no monitoring at all. Even if it is a simple ping monitor or PowerShell script. Having proper monitoring in place can really mean the difference between being an IT Hero or an IT Villian.
Proper Monitoring saved me from being an IT Villian
Not two terribly long on a Saturday when I am typically off. My normal workweek is Monday through Friday. But on this particular Saturday, I started getting a few alerts and notifications. These were telling me that there was a problem on my Exchange server. Now that problem specifically is called “Back Pressure”.
Back Pressure in regards to an Exchange environment is when Exchange is being exhausted of resources. When this happens Exchange will start to prioritize the work that it does and will prioritize internal emails over emails coming in from outside. The way this would show up for users is that emails coming in from the outside would take much longer to arrive. Depending on the total workload this could be up to an hour late.
So I received this alert that Back Pressure was happening before any users noticed. This was great because it gave me time to log in and correct the issue. Not one user ever reported a problem because my monitoring picked up on the problem first. This didn’t really make me an IT Hero because no one knew there was a problem is certainly saved me from being an IT Villian.
Benefits of Proper Monitoring
I think my story shows you the importance of proper monitoring. You can take care of problems before they get worse or your users or clients notice them. This is huge in IT because part of your job as a Systems Administrator is to know when problems happen before users do. If you discover a problem because a user has notified you, then you are a little late. Your user has been inconvenienced or they are not a productive as they could be.
Performance Trending
Another useful benefit of the importance of proper monitoring is trends. When the proper monitoring is in place you can keep track of performance trends. One good trend to keep track of is CPU utilization on servers. If you look at a server three weeks from now and notice the utilization is hanging around 30% you won’t know if that is normal without trending data.
Another great place that proper monitoring and trends can be useful is for finding memory leaks within applications. If an application is poorly written you may notice that is will over time us up more and more RAM memory until it is restarted. You can watch these trends and have information to take back to the developer or vendor to help prove a memory leak.
Trending can also help you plan for the future such as with bandwidth or storage space. If you see that your bandwidth is growing over time you can plan for an increased internet connection in the future. If your storage continues to grow in a predictable way from the trends then you can plan out storage upgrades and not be caught off guard.
Types of Monitoring
There are many different types of monitoring and they range from the very basic to the very advanced. Most of the time you can get the very basic versions for free while the more advanced one will cost you some money. What type of monitoring is up to you but the importance of proper monitor really means that you should have some form of monitoring.
Ping Monitoring
A ping is similar to sonar in that with sonar you send out a signal and wait for the signal to bounce back. When you send a ping to a device, the device will get that ping and send a reply. So ping is a quick and dirty way of telling whether a server is up or not.
Monitoring software that works just on pings is available and is typically free. This is because of the basic nature of the software is simple. The software will sit there and ping your intended device every 10 or 15 seconds, whatever you decide. If the software doesn’t get a response back it assumes that the device is down.
One thing we have to understand about this monitoring is that it will only tell us if a device is up or down, nothing more. As I mentioned this type of monitoring is better than no monitoring but you could miss potential issues. In the example of a web server, our server could be up and replying to pings. But the software that runs the webserver could have crashed. In this case, our ping monitor is not helping us much because our server is essentially down but our monitoring software doesn’t alert us.
Port Monitoring
Proper Port Monitoring is very similar to a ping monitor except it is going to look at specific ports on a server. It will then tell you whether those ports are up or down. It will use a ping of sorts but to a specific port.
So let’s look at our web server example again. With a ping monitor, we know if the server is up or down but we don’t know when the IIS service crashes. If we set up a port monitor we tell it to monitor ports 80 and 443 on our web server (IIS). Now if our instance of IIS crashes then it is no longer listening on ports 80 and 443 so our Port Monitor would alert us that those ports are down. Proper Port Monitoring gives us a little more depth of monitoring and will help us catch things that ping monitoring misses.
Syslog Monitoring
Syslog Monitoring is a really good type of monitoring and really good information to capture when you can. It is typically captured from network devices. Cisco Switches, Routers, Access Points, and even VoIP phones have the ability to generate Syslog information.
Some of the more expensive monitoring software out there will also act as a Syslog server. A Syslog server is where you tell all of your devices to send it’s Syslog data. Not only do you get up down statuses of network devices and physical ports. But you also get information that can help you troubleshoot a problem or know if a service on a cisco switch has crashed.
Let’s look back at our IIS example, now Windows does not natively generate Syslog data, that requires a third-party application to do. But our example should still work. So let’s say we have Syslog setup on our IIS server and IIS crashes. My monitoring software capturing that Syslog data will alert me that it when down. But if I pull the Syslog data I just may be able to see why it went down by looking at the logs prior to the service crashing.
Syslog is going to give you a lot more information than just a ping or a port monitor will. You can also adjust how much information Syslog gives you. This is typically on a scale of 1 to 7 where 1 is the least verbose and only give you critical errors. While 7 is the most verbose and will give you tons of information even when the system is working as normal.
Application Monitoring
The last type of Proper Monitoring I will talk about it Application Monitoring. This is certainly the best form of monitoring because it can actually look into the applications. You are going to find this type of monitor from a reputable monitoring software company. They can sometimes be very simple or very complex.
One great example of Application Monitoring is for Exchange servers. Exchange Servers not only have services that need to be monitored but it has mail queues. They have IIS sites and a lot of other pieces and parts that should be monitored. Good application monitoring is able to monitor all of this type of monitoring. It will give you a full and comprehensive look at how your Exchange Servers are performing and it may even predict problems that could come up in the future.
Now obviously the downside of Application Monitoring is the price that can come along with it. Ping and Port monitors are typically free or pretty cheap. But once you get into advanced application monitoring software you could be paying a hefty price. You just need to determine how important monitoring is to you and what even a few minutes of downtime can cost your company.
Monitoring Software
Now let’s talk about a few different companies and the software they provide. I could probably do an entire podcast just on all the different Monitoring Softwares out there. But I am going to try and just list off the ones that I have personally used or at the very least familiar with.
Microsoft System Center Operations Manager
I am going to start with the one that I think is great software but is expensive and super complicated.
Microsoft makes a lot of great software and many businesses use Microsoft software exclusively. So why not use Microsoft software to monitor everything. Microsoft has a System Center suite of tools and their monitoring software is call System Center Operations Manager or SCOM for short.
This is incredible software and you can pretty much monitor at the application level any Microsoft application. So Exchange, IIS, Windows Server, Skype for Business, MS SQL Server, etc.
With this software being very customizable that customization can become very complicated very quickly. Because of this, it may not be appropriate for smaller IT companies since this software could have an entire team dedicated to it. Many of the out of the box configurations work well but will give you alerting overload.
One thing that I did really like about SCOM was the fact it handled all of your trendings for you. When you install the SCOM agent on the server it determines which applications are running. Then it will start sending back that trend data for you. So from the dashboard, you can see what the CPU or memory has been doing on your servers over a given period of time.
Solarwinds Orion
Solarwinds is a great company that makes a lot of different types of monitoring software. The two I am going to mention here are their Network Monitor of NPM and their Server and Application Monitoring or SAM for short. One of the things I do like about Solarwinds is that their software is not difficult to use or set up. This is the software that I currently use for most of my monitoring.
NPM or Network Performance Monitoring is Solarwinds network monitor and Syslog server. I spoke earlier about Syslog well that comes built into NPM for you which makes capturing that data that much easier. When setting up NPM it will actually look at your network equipment and allow you to monitor and alert on specific ports. It will also start immediately collecting performance data. So you can see bandwidth usage, top conversations, and other great data points.
SAM or Server and Application Monitoring works just as well as NPM and can be accessed all from the same Web Interface. SAM can work with your servers in a couple of different ways. You can install the SAM agent on the server or you can use WMI if it is a Windows Server. There are other ways of monitoring but the Agent and WMI are my preferred choices. Similar to NPM as soon as you add a server into SAM via the Agent of WMI you start getting performance data back for trending.
Another great feature of SAM is the ability to create custom alerts and custom notifications. One example I will use is a custom alert I created for my phone system. This alert watches specific services and if they go down it will alert. Now depending on which service goes down is who it alerts. If a critical service goes down then the IT person on call will get the alert. If a non-critical service goes down then I will get an email alert. This just shows some of the great flexibility that SAM and NPM provide.
Alerting Overload
I mentioned alerting overload earlier and I wanted to take the last bit here to quickly go over it. When you first set up any type of alerting you may be inclined to alert on everything. This is very common but very quickly leads to alert overload. This is where all alerts blend together and while you are ignoring the un-important alerts you miss a very important alert. If you get into this situation you are pretty much negating the entire point of alerting.
The best way to overcome this is to really think about the alerts you set up. Every alert should be actionable and have actionable steps within the alert. Doing this forces you to not just delete the email alert but to actually take some form of action. The next thing to think about is if you find yourself deleting the same alert over and over again ask yourself “is this alert really needed”. If you are constantly ignoring or deleting the alert you get is it really helping? So keep actively tweaking and updating your alerts to prevent alert overload and keep your systems up and running.
I hope this podcast helps to show you the Importance of Proper Monitoring and all the different options that you have when monitoring your environment.