Postfix Queues
Moderators: Developers, Moderators
as i've been running this, it's been taking longer and longer to run as my queue size increases. i decided to change it today, and instead of graphing the number of messages in each queue, i now graph the size of each queue. this gives me analogous data, since my interest is in whether or not mail is flowing.
if you find that getmailq.sh is taking several minutes to run and cacti is killing it after 30 or 60 seconds, you can change the COUNT line as follows:
the redirect of STDERR to /dev/null is necessary to avoid the error when du is looking at a message that postfix removes from the queue, such as:
additionally, you'll have to change the maximum value on the Data Template for each of the queues, so that cacti records the data that returns. i set mine to 1000000000 (1G) to start and will adjust if needed. i also changed the lower limit on the Graph Template from 0 to 1 so that i can use logarithmic scaling to give me a view of all the values on the graph.
finally, du -k returns output in kilobytes, and specifying -k is a good idea in order to make this as portable as possible. the problem is that cacti then shows a megabyte value as a kilobyte value (e.g. 200M shows up as 2.0k because it's 2,000 kilobytes and the kilobytes is inferred). what we want is for it to be in bytes. there are a couple of ways to achieve this - one is to go into the Graph Template and change the CDEF function to "Multiply by 1024" for all 16 items, or to modify the script to do it for me:
don't underestimate the resource requirements of qshape.pl. by changing the methodology within cacti's polling, i cut the CPU utilization of my server dramatically (~40% to ~12%). the thought that almost 30% of my CPU was occupied solely by statistics gathering is appalling. the best solutions are the ones that have no noticeable impact on the actual role the system is supposed to play.
if you find that getmailq.sh is taking several minutes to run and cacti is killing it after 30 or 60 seconds, you can change the COUNT line as follows:
Code: Select all
COUNT=`du -ks /var/spool/postfix/${i} 2>/dev/null | awk '{ print $1 }'`
Code: Select all
incoming:406 du: cannot access `/var/spool/postfix/active/AEB2B7BFD3': No such file or directory
finally, du -k returns output in kilobytes, and specifying -k is a good idea in order to make this as portable as possible. the problem is that cacti then shows a megabyte value as a kilobyte value (e.g. 200M shows up as 2.0k because it's 2,000 kilobytes and the kilobytes is inferred). what we want is for it to be in bytes. there are a couple of ways to achieve this - one is to go into the Graph Template and change the CDEF function to "Multiply by 1024" for all 16 items, or to modify the script to do it for me:
Code: Select all
COUNT=`du -ks /var/spool/postfix/${i} 2>/dev/null | awk '{ print $1 }'`
COUNT=`expr $COUNT \* 1024`
Adrian Goins - President / CEO
Arces Network, LLC
http://www.arces.net
Arces Network, LLC
http://www.arces.net
i've had the new system in place for a couple of days now, and i'm quite pleased with it. here's a post of the graph when tracking the queue sizes in MB, as well as a graph showing the CPU cycles go up when i installed the qshape-dependent version, and drop when i modified it to use du instead.
- Attachments
-
- Postfix queues with du instead of qshape
- gravity_pfq.png (41.55 KiB) Viewed 21241 times
-
- CPU recovered by using du instead of qshape
- gravity_cpu.png (32.47 KiB) Viewed 21241 times
Adrian Goins - President / CEO
Arces Network, LLC
http://www.arces.net
Arces Network, LLC
http://www.arces.net
-
- Posts: 22
- Joined: Tue Nov 13, 2007 10:19 am
postfix queues
I had similar problem, my deferred and hold queue where to high -> very long response time with qshape . I had to add a -t 60 to the snmpwalk command and it was not enough ...
So now I replaced qshape with a simple "find" command:
find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l
So the full script on the snmp client side is now:
#!/bin/bash
QUEUES="incoming active deferred hold"
for i in $QUEUES; do
COUNT=`find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l`
printf "$i:$COUNT "
done
it now executes within only 2 seconds instead of more than 5mn sometimes with qshape ...
I also had to modify the data-template for deferred and hold to increase the max number from 2000 to 20000 (x10 !)
that increase is available for new data sources, but for previous one , you have to tune the rrd file, exemple:
bash-3.1# rrdtool info smtp1_sodome_incoming_732.rrd | grep deferred
ds[deferred].max = 2.0000000000e+03
ds[deferred].last_ds = "14913"
bash-3.1# rrdtool tune smtp1_sodome_incoming_732.rrd --maximum deferred:20000
bash-3.1# rrdtool info smtp1_sodome_incoming_732.rrd | grep deferred
ds[deferred].max = 2.0000000000e+04
ds[deferred].last_ds = "14746"
ds[deferred].value = 1.4746000000e+04
Now it work for "long queues" .... !
So now I replaced qshape with a simple "find" command:
find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l
So the full script on the snmp client side is now:
#!/bin/bash
QUEUES="incoming active deferred hold"
for i in $QUEUES; do
COUNT=`find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l`
printf "$i:$COUNT "
done
it now executes within only 2 seconds instead of more than 5mn sometimes with qshape ...
I also had to modify the data-template for deferred and hold to increase the max number from 2000 to 20000 (x10 !)
that increase is available for new data sources, but for previous one , you have to tune the rrd file, exemple:
bash-3.1# rrdtool info smtp1_sodome_incoming_732.rrd | grep deferred
ds[deferred].max = 2.0000000000e+03
ds[deferred].last_ds = "14913"
bash-3.1# rrdtool tune smtp1_sodome_incoming_732.rrd --maximum deferred:20000
bash-3.1# rrdtool info smtp1_sodome_incoming_732.rrd | grep deferred
ds[deferred].max = 2.0000000000e+04
ds[deferred].last_ds = "14746"
ds[deferred].value = 1.4746000000e+04
Now it work for "long queues" .... !
cviebrock wrote:I've got this problem too:I tried adding the "snmp" user to the "postfix", the "adm" and even the "root" group, with no success.Code: Select all
# snmpwalk -v 1 -c public localhost .1.3.6.1.4.1.2021.53.101.0 UCD-SNMP-MIB::ucdavis.53.101.0.1.1 = INTEGER: 1 UCD-SNMP-MIB::ucdavis.53.101.0.2.1 = STRING: "mailq" UCD-SNMP-MIB::ucdavis.53.101.0.3.1 = STRING: "/usr/local/bin/getmailq.sh" UCD-SNMP-MIB::ucdavis.53.101.0.100.1 = INTEGER: 0 UCD-SNMP-MIB::ucdavis.53.101.0.101.1 = STRING: "Can't cd to incoming: Permission denied" UCD-SNMP-MIB::ucdavis.53.101.0.101.2 = STRING: " at /usr/sbin/qshape line 286" UCD-SNMP-MIB::ucdavis.53.101.0.101.3 = STRING: "incoming:0 Can't cd to active: Permission denied" UCD-SNMP-MIB::ucdavis.53.101.0.101.4 = STRING: " at /usr/sbin/qshape line 286" UCD-SNMP-MIB::ucdavis.53.101.0.101.5 = STRING: "active:0 Can't cd to deferred: Permission denied" UCD-SNMP-MIB::ucdavis.53.101.0.101.6 = STRING: " at /usr/sbin/qshape line 286" UCD-SNMP-MIB::ucdavis.53.101.0.101.7 = STRING: "deferred:0 Can't cd to hold: Permission denied" UCD-SNMP-MIB::ucdavis.53.101.0.101.8 = STRING: " at /usr/sbin/qshape line 286" UCD-SNMP-MIB::ucdavis.53.101.0.101.9 = STRING: "hold:0 " UCD-SNMP-MIB::ucdavis.53.101.0.102.1 = INTEGER: 0 UCD-SNMP-MIB::ucdavis.53.101.0.103.1 = ""
The directories have the following perms:
Any suggestions?Code: Select all
xxx:/var/spool/postfix/deferred# ls -las total 72 4 drwx------ 18 postfix root 4096 2006-07-20 00:09 . 4 drwxr-xr-x 19 root root 4096 2006-11-08 11:33 .. 4 drwx------ 2 postfix postfix 4096 2006-12-06 22:59 0 4 drwx------ 2 postfix postfix 4096 2006-12-06 22:59 1 etc.
I fixed this issue by modifying the getmailq.sh script, so that qshape is run as root with sudo:
Code: Select all
COUNT=`sudo qshape $i | grep TOTAL | awk '{print $2}'`
Code: Select all
snmp ALL=(ALL) NOPASSWD:/usr/sbin/qshape
Brian Schlect
Economic Modeling Specialists Inc.
Moscow, ID
I tried just tried this and it seems to be significantly faster:I had similar problem, my deferred and hold queue where to high -> very long response time with qshape . I had to add a -t 60 to the snmpwalk command and it was not enough ...
So now I replaced qshape with a simple "find" command:
find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l
So the full script on the snmp client side is now:
#!/bin/bash
QUEUES="incoming active deferred hold"
for i in $QUEUES; do
COUNT=`find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l`
printf "$i:$COUNT "
done
it now executes within only 2 seconds instead of more than 5mn sometimes with qshape ...
( COUNT=`find $QUEUEDIR/$i -type f | wc -l | sed 's/ *//g'`)
Code: Select all
# timex /usr/local/getmailq-find.sh
incoming:228 active:8 deferred:34 hold:0
real 0.29
user 0.00
sys 0.07
Code: Select all
# timex /usr/local/getmailq-qshape.sh
incoming:0 active:5 deferred:35 hold:0
real 1.99
user 0.71
sys 0.32
I've looked at the qshape script and it's too much for me too pick apart. It would be nice to figure out how qshape determines real incoming messages and incorporate that to a script based on find because it's way faster.
-
- Posts: 20
- Joined: Wed Feb 20, 2008 1:57 pm
One Minute Intervals
My install of cacti polls every minute with spine, thus the graph for postfix is coming out chopped up. Is there a way to convert the template to work for one minute polling? Or is it something else?
Another suggestion for the rights problem
Hi,
I'm using this script to poll a bunch of mailservers with very large deferred-queues (thx to yahoo ). First of all a thx to jehan.procaccia for his suggestions on large queues, worked fine for me.
I modified the getmailq script to scan the queues with
which is very fast, even for large queues. To get around the rights problem, which occurs, when you try to run this script as user snmp, who has no access rights to the /var/spool/ directory (which is a good thing at all), I simply wrote a small script, which is executed by cron every minute. It runs the getmailq script as root and writes the result in a file accessible for snmp. In the snmpd.conf you have to add
and you are done. Since cacti checks every 5 minutes you are fine with an 1 minute cron on the remote machine.
An incredible script, which is executed as cronjob every minute might look like this:
A magical script, which is able to read the result file might look like this:
I'm using this script to poll a bunch of mailservers with very large deferred-queues (thx to yahoo ). First of all a thx to jehan.procaccia for his suggestions on large queues, worked fine for me.
I modified the getmailq script to scan the queues with
Code: Select all
find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l
Code: Select all
exec .1.3.6.1.4.1.2021.53.101.0 mailq PathToScriptThatIsAbleToReadTheResultFile
and you are done. Since cacti checks every 5 minutes you are fine with an 1 minute cron on the remote machine.
An incredible script, which is executed as cronjob every minute might look like this:
Code: Select all
/opt/mailqcheck/getmailq.sh > /opt/mailqcheck/Mailqueuestats
Code: Select all
#!/bin/bash
stats=`cat /opt/mailqcheck/Mailqueuestats`
printf "$stats"
Postfix Queues
Hi all...
So I've been beating my head against the wall with this one and now it's time to get some smarter people involved. I'm trying to create Postfix Q monitoring for serverl servers.
I have NET-SNMP installed, enabled and working. I have added all the files as specified in this post, but the piece in the snmpd.conf file is not working. I am able to run the getmailq.sh file without any errors.
Using the original script in postfixqueues.sh, I get this: UCD-SNMP-MIB::ucdavis.53.101.0.101.1 = No more variables left in this MIB View (It is past the end of the MIB tree)
Using the revised script in this post referencing NET-SNMP and changing exec to extend, I get this: "No Such Object available on this agent at this OID"
Is it a correct assumption that this is the same error, just with different verbiage? I assume that when the postfixqueues.sh command is run from within Cacti, that script is suppose to call for the getmailq.sh script to run through the MIB or something. Is that correct?
HELP!!! This is probably the last piece of Cacti I need to get working so that I can have a full Cactus..
OK, bad humour.
It seems like the call to that exec is not being recognized, not working, whatever... I
So I've been beating my head against the wall with this one and now it's time to get some smarter people involved. I'm trying to create Postfix Q monitoring for serverl servers.
I have NET-SNMP installed, enabled and working. I have added all the files as specified in this post, but the piece in the snmpd.conf file is not working. I am able to run the getmailq.sh file without any errors.
Using the original script in postfixqueues.sh, I get this: UCD-SNMP-MIB::ucdavis.53.101.0.101.1 = No more variables left in this MIB View (It is past the end of the MIB tree)
Using the revised script in this post referencing NET-SNMP and changing exec to extend, I get this: "No Such Object available on this agent at this OID"
Is it a correct assumption that this is the same error, just with different verbiage? I assume that when the postfixqueues.sh command is run from within Cacti, that script is suppose to call for the getmailq.sh script to run through the MIB or something. Is that correct?
HELP!!! This is probably the last piece of Cacti I need to get working so that I can have a full Cactus..
OK, bad humour.
It seems like the call to that exec is not being recognized, not working, whatever... I
Anyone...
Sorry for the bump but I would like to get this template working before I go over to the Mailgraph-based one...
Anyone have any ideas?[/quote]
Anyone have any ideas?[/quote]
In the same way, to improve performances, I wrote a C program to do this. There are no sub-process, soyodalives wrote: I tried just tried this and it seems to be significantly faster:
( COUNT=`find $QUEUEDIR/$i -type f | wc -l | sed 's/ *//g'`)(COUNT=`qshape $i | grep TOTAL | awk '{print $2}'`)Code: Select all
# timex /usr/local/getmailq-find.sh incoming:228 active:8 deferred:34 hold:0 real 0.29 user 0.00 sys 0.07
Code: Select all
# timex /usr/local/getmailq-qshape.sh incoming:0 active:5 deferred:35 hold:0 real 1.99 user 0.71 sys 0.32
- It should be faster
- It report both count and disk space (in bytes)
- It can be setuid postfix, no more sudo needed
Code: Select all
gcc -Wall -O2 postfix_queue.c -o postfix_queue
install --mode 4755 --owner postfix --strip postfix_queue <path_cacti>/scripts/
Code: Select all
incoming:0 incomingSize:0 active:0 activeSize:0 deferred:1 deferredSize:4096 hold:0 holdSize:0
- Attachments
-
- postfix_queue.c
- Source code
- (2.59 KiB) Downloaded 789 times
Adding "Sent" using mailstats.pl
Hi All,
I am trying to add "Sent" report to this graph..
What I've done are:
- download http://taz.net.au/postfix/mrtg/update-mailstats.pl
- download http://taz.net.au/postfix/mrtg/mailstats.pl
- put those two scripts above to /usr/local/bin
- change permission of the scripts
- run the script at background
- modify /usr/local/bin/getmailq.sh
- test script modification
- use data and graph template which I attach
but to be honest, am not so sure the data is accurate.. anyone has suggestion?
I am trying to add "Sent" report to this graph..
What I've done are:
- download http://taz.net.au/postfix/mrtg/update-mailstats.pl
- download http://taz.net.au/postfix/mrtg/mailstats.pl
- put those two scripts above to /usr/local/bin
- change permission of the scripts
Code: Select all
$ sudo chmod 755 /usr/local/bin/update-mailstats.pl
$ sudo chmod 755 /usr/local/bin/mailstats.pl
Code: Select all
$ sudo update-mailstats.pl&
Code: Select all
#!/bin/bash
QUEUES="incoming active deferred hold"
for i in $QUEUES; do
COUNT=`sudo qshape $i | grep TOTAL | awk '{print $2}'`
printf "$i:$COUNT "
done
SENT=`sudo mailstats.pl | sed -e '/local/d' | grep SENT | awk '{print $2}'`
echo -e "sent:$SENT"
Code: Select all
$ sudo /usr/share/cacti/site/scripts/postfixqueues.sh localhost
incoming:0 active:0 deferred:28 hold:0 sent:627
but to be honest, am not so sure the data is accurate.. anyone has suggestion?
- Attachments
-
- cacti_data_template_postfix_queues.xml
- (7.54 KiB) Downloaded 600 times
-
- cacti_graph_template_postfix_queues.xml
- (21.63 KiB) Downloaded 580 times
-
- graph_image.php.png (35.89 KiB) Viewed 15185 times
-
- Posts: 5
- Joined: Tue Apr 08, 2008 3:12 pm
Re: Adding "Sent" using mailstats.pl
I am thinking your sent stats are going to be the total over time and not the 5 minute interval in which cacti is polling. I am not sure if that's what you wanted or not but that seems odd to be doing the total and not the current. If that's the case I am not sure why you would need to have a separate script(s) do to that. All you would need to do is go back 5 mins in the log file, search for anything sent and spit out a number. I like your idea though, seems stupid qshape doesn't off this out of the box.
Who is online
Users browsing this forum: No registered users and 1 guest