Postfix Queues

Templates, scripts for templates, scripts and requests for templates.

Moderators: Developers, Moderators

flowolf
Posts: 3
Joined: Wed Nov 07, 2007 7:07 am

Post by flowolf »

hello!

i tried this how to and have some problems with it.

i opened a new thread here: http://forums.cacti.net/viewtopic.php?t ... highlight=

thanks for help
monachus
Posts: 42
Joined: Mon Sep 06, 2004 1:27 am
Location: New York, NY
Contact:

Post by monachus »

as i've been running this, it's been taking longer and longer to run as my queue size increases. i decided to change it today, and instead of graphing the number of messages in each queue, i now graph the size of each queue. this gives me analogous data, since my interest is in whether or not mail is flowing.

if you find that getmailq.sh is taking several minutes to run and cacti is killing it after 30 or 60 seconds, you can change the COUNT line as follows:

Code: Select all

COUNT=`du -ks /var/spool/postfix/${i} 2>/dev/null | awk '{ print $1 }'`
the redirect of STDERR to /dev/null is necessary to avoid the error when du is looking at a message that postfix removes from the queue, such as:

Code: Select all

incoming:406 du: cannot access `/var/spool/postfix/active/AEB2B7BFD3': No such file or directory
additionally, you'll have to change the maximum value on the Data Template for each of the queues, so that cacti records the data that returns. i set mine to 1000000000 (1G) to start and will adjust if needed. i also changed the lower limit on the Graph Template from 0 to 1 so that i can use logarithmic scaling to give me a view of all the values on the graph.

finally, du -k returns output in kilobytes, and specifying -k is a good idea in order to make this as portable as possible. the problem is that cacti then shows a megabyte value as a kilobyte value (e.g. 200M shows up as 2.0k because it's 2,000 kilobytes and the kilobytes is inferred). what we want is for it to be in bytes. there are a couple of ways to achieve this - one is to go into the Graph Template and change the CDEF function to "Multiply by 1024" for all 16 items, or to modify the script to do it for me:

Code: Select all

COUNT=`du -ks /var/spool/postfix/${i} 2>/dev/null | awk '{ print $1 }'`
COUNT=`expr $COUNT \* 1024`
don't underestimate the resource requirements of qshape.pl. by changing the methodology within cacti's polling, i cut the CPU utilization of my server dramatically (~40% to ~12%). the thought that almost 30% of my CPU was occupied solely by statistics gathering is appalling. the best solutions are the ones that have no noticeable impact on the actual role the system is supposed to play.
Adrian Goins - President / CEO
Arces Network, LLC
http://www.arces.net
monachus
Posts: 42
Joined: Mon Sep 06, 2004 1:27 am
Location: New York, NY
Contact:

Post by monachus »

i've had the new system in place for a couple of days now, and i'm quite pleased with it. here's a post of the graph when tracking the queue sizes in MB, as well as a graph showing the CPU cycles go up when i installed the qshape-dependent version, and drop when i modified it to use du instead.
Attachments
Postfix queues with du instead of qshape
Postfix queues with du instead of qshape
gravity_pfq.png (41.55 KiB) Viewed 21352 times
CPU recovered by using du instead of qshape
CPU recovered by using du instead of qshape
gravity_cpu.png (32.47 KiB) Viewed 21352 times
Adrian Goins - President / CEO
Arces Network, LLC
http://www.arces.net
jehan.procaccia
Posts: 22
Joined: Tue Nov 13, 2007 10:19 am

postfix queues

Post by jehan.procaccia »

I had similar problem, my deferred and hold queue where to high -> very long response time with qshape :-( . I had to add a -t 60 to the snmpwalk command and it was not enough ...
So now I replaced qshape with a simple "find" command:
find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l
So the full script on the snmp client side is now:
#!/bin/bash
QUEUES="incoming active deferred hold"
for i in $QUEUES; do
COUNT=`find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l`
printf "$i:$COUNT "
done
it now executes within only 2 seconds instead of more than 5mn sometimes with qshape ...

I also had to modify the data-template for deferred and hold to increase the max number from 2000 to 20000 (x10 !)
that increase is available for new data sources, but for previous one , you have to tune the rrd file, exemple:
bash-3.1# rrdtool info smtp1_sodome_incoming_732.rrd | grep deferred
ds[deferred].max = 2.0000000000e+03
ds[deferred].last_ds = "14913"

bash-3.1# rrdtool tune smtp1_sodome_incoming_732.rrd --maximum deferred:20000

bash-3.1# rrdtool info smtp1_sodome_incoming_732.rrd | grep deferred
ds[deferred].max = 2.0000000000e+04
ds[deferred].last_ds = "14746"
ds[deferred].value = 1.4746000000e+04

Now it work for "long queues" .... !
bschlect
Posts: 1
Joined: Tue Nov 27, 2007 4:21 pm
Location: Moscow, ID

Post by bschlect »

cviebrock wrote:I've got this problem too:

Code: Select all

# snmpwalk -v 1 -c public localhost .1.3.6.1.4.1.2021.53.101.0
UCD-SNMP-MIB::ucdavis.53.101.0.1.1 = INTEGER: 1
UCD-SNMP-MIB::ucdavis.53.101.0.2.1 = STRING: "mailq"
UCD-SNMP-MIB::ucdavis.53.101.0.3.1 = STRING: "/usr/local/bin/getmailq.sh"
UCD-SNMP-MIB::ucdavis.53.101.0.100.1 = INTEGER: 0
UCD-SNMP-MIB::ucdavis.53.101.0.101.1 = STRING: "Can't cd to incoming: Permission denied"
UCD-SNMP-MIB::ucdavis.53.101.0.101.2 = STRING: " at /usr/sbin/qshape line 286"
UCD-SNMP-MIB::ucdavis.53.101.0.101.3 = STRING: "incoming:0 Can't cd to active: Permission denied"
UCD-SNMP-MIB::ucdavis.53.101.0.101.4 = STRING: " at /usr/sbin/qshape line 286"
UCD-SNMP-MIB::ucdavis.53.101.0.101.5 = STRING: "active:0 Can't cd to deferred: Permission denied"
UCD-SNMP-MIB::ucdavis.53.101.0.101.6 = STRING: " at /usr/sbin/qshape line 286"
UCD-SNMP-MIB::ucdavis.53.101.0.101.7 = STRING: "deferred:0 Can't cd to hold: Permission denied"
UCD-SNMP-MIB::ucdavis.53.101.0.101.8 = STRING: " at /usr/sbin/qshape line 286"
UCD-SNMP-MIB::ucdavis.53.101.0.101.9 = STRING: "hold:0 "
UCD-SNMP-MIB::ucdavis.53.101.0.102.1 = INTEGER: 0
UCD-SNMP-MIB::ucdavis.53.101.0.103.1 = ""
I tried adding the "snmp" user to the "postfix", the "adm" and even the "root" group, with no success.

The directories have the following perms:

Code: Select all

xxx:/var/spool/postfix/deferred# ls -las
total 72
4 drwx------  18 postfix root    4096 2006-07-20 00:09 .
4 drwxr-xr-x  19 root    root    4096 2006-11-08 11:33 ..
4 drwx------   2 postfix postfix 4096 2006-12-06 22:59 0
4 drwx------   2 postfix postfix 4096 2006-12-06 22:59 1
etc.
Any suggestions?

I fixed this issue by modifying the getmailq.sh script, so that qshape is run as root with sudo:

Code: Select all

 COUNT=`sudo qshape $i | grep TOTAL | awk '{print $2}'`
I also had to modify my /etc/sudoers file. Adding the following line at the end of the file will allow the snmp user to run the qshape script as root without giving a password:

Code: Select all

 snmp    ALL=(ALL) NOPASSWD:/usr/sbin/qshape 
NB: To edit /etc/sudoers, use the 'visudo' command. When you are done editing, sudo will automatically re-scan the file.

Brian Schlect
Economic Modeling Specialists Inc.
Moscow, ID
User avatar
gninja
Cacti User
Posts: 371
Joined: Tue Aug 24, 2004 5:02 pm
Location: San Francisco, CA
Contact:

Post by gninja »

I installed the template for this, and the graph template only shows one data source/line item.

Anyone have a working template for this, or do I need to fix it myself?
FreeBSD/RHEL
cacti-0.8.7i, spine 0.8.7i, PIA 3.1+boost 5.1
MySQL 5.5/InnoDB
RRDtool 1.2.27, PHP 5.1.6
yodalives
Posts: 1
Joined: Tue Dec 04, 2007 12:23 pm
Contact:

Post by yodalives »

I had similar problem, my deferred and hold queue where to high -> very long response time with qshape . I had to add a -t 60 to the snmpwalk command and it was not enough ...
So now I replaced qshape with a simple "find" command:
find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l
So the full script on the snmp client side is now:
#!/bin/bash
QUEUES="incoming active deferred hold"
for i in $QUEUES; do
COUNT=`find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l`
printf "$i:$COUNT "
done
it now executes within only 2 seconds instead of more than 5mn sometimes with qshape ...
I tried just tried this and it seems to be significantly faster:

( COUNT=`find $QUEUEDIR/$i -type f | wc -l | sed 's/ *//g'`)

Code: Select all

# timex /usr/local/getmailq-find.sh
incoming:228 active:8 deferred:34 hold:0 
real        0.29
user        0.00
sys         0.07
(COUNT=`qshape $i | grep TOTAL | awk '{print $2}'`)

Code: Select all

# timex /usr/local/getmailq-qshape.sh
incoming:0 active:5 deferred:35 hold:0 
real        1.99
user        0.71
sys         0.32
The problem is that find is not reporting the correct number in the incoming queue. There are 228 files in the incoming dir but qshape reports that there are 0 in the incoming queue. Does anybody know how qshape determines what files are in the incoming queue?

I've looked at the qshape script and it's too much for me too pick apart. It would be nice to figure out how qshape determines real incoming messages and incorporate that to a script based on find because it's way faster.
doughairfield
Posts: 20
Joined: Wed Feb 20, 2008 1:57 pm

One Minute Intervals

Post by doughairfield »

My install of cacti polls every minute with spine, thus the graph for postfix is coming out chopped up. Is there a way to convert the template to work for one minute polling? Or is it something else?
feld
Posts: 3
Joined: Thu Feb 21, 2008 12:01 pm

Post by feld »

I couldn't get it to work until I hard coded the cacti path in the xml file. It wasn't picking it up for some odd reason....

All works well now (CentOS5) and I didn't have to use the "updated" method. Just followed the first post.
schrom
Posts: 1
Joined: Wed Feb 27, 2008 8:57 am

Another suggestion for the rights problem

Post by schrom »

Hi,

I'm using this script to poll a bunch of mailservers with very large deferred-queues (thx to yahoo :evil: ). First of all a thx to jehan.procaccia for his suggestions on large queues, worked fine for me.

I modified the getmailq script to scan the queues with

Code: Select all

find /var/spool/postfix/$i -maxdepth 5 -type f | wc -l
which is very fast, even for large queues. To get around the rights problem, which occurs, when you try to run this script as user snmp, who has no access rights to the /var/spool/ directory (which is a good thing at all), I simply wrote a small script, which is executed by cron every minute. It runs the getmailq script as root and writes the result in a file accessible for snmp. In the snmpd.conf you have to add

Code: Select all

exec .1.3.6.1.4.1.2021.53.101.0 mailq PathToScriptThatIsAbleToReadTheResultFile

and you are done. Since cacti checks every 5 minutes you are fine with an 1 minute cron on the remote machine.
An incredible script, which is executed as cronjob every minute might look like this:

Code: Select all

/opt/mailqcheck/getmailq.sh > /opt/mailqcheck/Mailqueuestats
A magical script, which is able to read the result file might look like this:

Code: Select all

#!/bin/bash
stats=`cat /opt/mailqcheck/Mailqueuestats`
printf "$stats"
sterpstra
Posts: 45
Joined: Tue May 27, 2008 11:48 pm
Location: So Cal
Contact:

Postfix Queues

Post by sterpstra »

Hi all...

So I've been beating my head against the wall with this one and now it's time to get some smarter people involved. I'm trying to create Postfix Q monitoring for serverl servers.

I have NET-SNMP installed, enabled and working. I have added all the files as specified in this post, but the piece in the snmpd.conf file is not working. I am able to run the getmailq.sh file without any errors.

Using the original script in postfixqueues.sh, I get this: UCD-SNMP-MIB::ucdavis.53.101.0.101.1 = No more variables left in this MIB View (It is past the end of the MIB tree)

Using the revised script in this post referencing NET-SNMP and changing exec to extend, I get this: "No Such Object available on this agent at this OID"

Is it a correct assumption that this is the same error, just with different verbiage? I assume that when the postfixqueues.sh command is run from within Cacti, that script is suppose to call for the getmailq.sh script to run through the MIB or something. Is that correct?

HELP!!! This is probably the last piece of Cacti I need to get working so that I can have a full Cactus..

OK, bad humour.




It seems like the call to that exec is not being recognized, not working, whatever... I
sterpstra
Posts: 45
Joined: Tue May 27, 2008 11:48 pm
Location: So Cal
Contact:

Anyone...

Post by sterpstra »

Sorry for the bump but I would like to get this template working before I go over to the Mailgraph-based one...

Anyone have any ideas?[/quote]
koocotte
Posts: 4
Joined: Mon Jun 16, 2008 5:29 am

Post by koocotte »

yodalives wrote: I tried just tried this and it seems to be significantly faster:

( COUNT=`find $QUEUEDIR/$i -type f | wc -l | sed 's/ *//g'`)

Code: Select all

# timex /usr/local/getmailq-find.sh
incoming:228 active:8 deferred:34 hold:0 
real        0.29
user        0.00
sys         0.07
(COUNT=`qshape $i | grep TOTAL | awk '{print $2}'`)

Code: Select all

# timex /usr/local/getmailq-qshape.sh
incoming:0 active:5 deferred:35 hold:0 
real        1.99
user        0.71
sys         0.32
In the same way, to improve performances, I wrote a C program to do this. There are no sub-process, so
  • It should be faster
  • It report both count and disk space (in bytes)
  • It can be setuid postfix, no more sudo needed
It have been tested only on linux:

Code: Select all

gcc -Wall -O2 postfix_queue.c -o postfix_queue
install --mode 4755 --owner postfix --strip postfix_queue <path_cacti>/scripts/
Output:

Code: Select all

incoming:0 incomingSize:0 active:0 activeSize:0 deferred:1 deferredSize:4096 hold:0 holdSize:0
Attachments
postfix_queue.c
Source code
(2.59 KiB) Downloaded 810 times
shiro
Posts: 1
Joined: Sat May 02, 2009 8:53 am

Adding "Sent" using mailstats.pl

Post by shiro »

Hi All,

I am trying to add "Sent" report to this graph..

What I've done are:
- download http://taz.net.au/postfix/mrtg/update-mailstats.pl
- download http://taz.net.au/postfix/mrtg/mailstats.pl
- put those two scripts above to /usr/local/bin
- change permission of the scripts

Code: Select all

$ sudo chmod 755 /usr/local/bin/update-mailstats.pl
$ sudo chmod 755 /usr/local/bin/mailstats.pl
- run the script at background

Code: Select all

$ sudo update-mailstats.pl&
- modify /usr/local/bin/getmailq.sh

Code: Select all

#!/bin/bash

QUEUES="incoming active deferred hold"

for i in $QUEUES; do
        COUNT=`sudo qshape $i | grep TOTAL | awk '{print $2}'`
        printf "$i:$COUNT "
done

SENT=`sudo mailstats.pl | sed -e '/local/d' | grep SENT | awk '{print $2}'`
echo -e "sent:$SENT"
- test script modification

Code: Select all

$ sudo /usr/share/cacti/site/scripts/postfixqueues.sh localhost
incoming:0 active:0 deferred:28 hold:0 sent:627
- use data and graph template which I attach

but to be honest, am not so sure the data is accurate.. anyone has suggestion?
Attachments
cacti_data_template_postfix_queues.xml
(7.54 KiB) Downloaded 616 times
cacti_graph_template_postfix_queues.xml
(21.63 KiB) Downloaded 590 times
graph_image.php.png
graph_image.php.png (35.89 KiB) Viewed 15296 times
islandsound
Posts: 5
Joined: Tue Apr 08, 2008 3:12 pm

Re: Adding "Sent" using mailstats.pl

Post by islandsound »

I am thinking your sent stats are going to be the total over time and not the 5 minute interval in which cacti is polling. I am not sure if that's what you wanted or not but that seems odd to be doing the total and not the current. If that's the case I am not sure why you would need to have a separate script(s) do to that. All you would need to do is go back 5 mins in the log file, search for anything sent and spit out a number. I like your idea though, seems stupid qshape doesn't off this out of the box.
Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests