smartmontools - hdd temperature checker script

Templates, scripts for templates, scripts and requests for templates.

Moderators: Developers, Moderators

maXXmaster
Posts: 3
Joined: Wed Sep 01, 2004 5:49 am

smartmontools - hdd temperature checker script

Post by maXXmaster »

i've wrote a script that uses the smartmontools to grep the temperature
of a certain harddisk. it requires the smartmontools package (smartctl)
and has to have root-privileges to read the smartmon-data of the
harddrive (-> sudo the smartctl command) as you probably won't run the
script as root.

there is one argument the script needs. in my case it is "-d scsi /dev/sda"
for one of the scsidrives and "-d 3ware,0 /dev/sdc" for a ide drive in my
3ware-raid.

as i could only include one file, i've added the template and will include
the php-code of the script within my post. you can configure one more
value in the source, as you might think harddrives are allowed to be
hotter than 40 degrees centigrade.
hdtemp.php:

#!/usr/local/bin/php -f
<?php
$parm = $_SERVER["argv"][1];
$max_allowed = 40; // <- maximum temperature for drives...

$foo=exec("/usr/bin/sudo /usr/local/sbin/smartctl -a $parm | egrep -i \"Celsius|Current\" | grep Temp");
$foo=explode(" ",$foo);
$foo=$foo[sizeof($foo)-1];
$foo=explode(" ",trim(ltrim($foo)));
$foo="temp:".intval($foo[0])." allowed:".$max_allowed;
echo $foo;
?>

* change the path to the php-cli as you wish. -f allows the script not to show the X-powerde-by stuff.

as i mentioned above, you have to be able to run smartctl as non-root
user, so sudo it, here is the line you have to add in /etc/sudoers:

nobody ALL=NOPASSWD:/usr/local/sbin/smartctl

*nobody can be replaced by the user you let the datagathering run


i hope that post helps und you'll use it ;o)
Attachments
cacti_data_template_hdtemp_mit_option.xml
the data template
(4.35 KiB) Downloaded 2772 times
cacti_graph_template_hdtemp.xml
the template for graphs and so on
(7.3 KiB) Downloaded 2410 times
User avatar
sumsum
Cacti User
Posts: 68
Joined: Mon Apr 26, 2004 7:18 am
Location: Switzerland
Contact:

Post by sumsum »

hello

this is what i was searching for , since weeks ! thanx a lot.

i got that scipt partialy running

1. import of XML (successful)
2. i had to modify the path for smartctl in the php script > /usr/sbin/smartctl (successful)
3. generating the graph - empty but ... (successful)
4. sudo is working fine for the user I let the data gathering

>> afer 1h still no data in the graph !

debugging :

for debugging I was checking the comand /usr/bin/sudo /usr/sbin/smartctl -a -d ata /dev/hda

result : (successful)

Code: Select all

=== START OF INFORMATION SECTION ===
Device Model:     Maxtor 6E040L0
Serial Number:    E1Q48FPE
Firmware Version: NAR61EA0
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Tue Sep 14 00:09:08 2004 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (1021) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  17) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   252   252   063    Pre-fail  Always       -       974
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       8
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   249   247   187    Pre-fail  Always       -       57223
  9 Power_On_Minutes        0x0032   247   247   000    Old_age   Always       -       1040h+08m
 10 Spin_Retry_Count        0x002b   252   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   252   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       10
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       8
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       29
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       41
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       3402
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       7
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   252   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   252   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   253   253   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1926         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

then I tried the whole command :
/usr/bin/sudo /usr/sbin/smartctl -a -d ata /dev/hda | egrep -i \"Celsius|Current\" | grep Temp

result :
/usr/bin/sudo /usr/sbin/smartctl -a -d ata /dev/hda | egrep -i \"Celsius|Current\" | grep Temp
bash: Current": command not found


I'm shure "the evil is in the detail" ;)

Where did I make the mistake ?
Thank you for your help
maXXmaster
Posts: 3
Joined: Wed Sep 01, 2004 5:49 am

Post by maXXmaster »

.) well,.. using " in php can only be done when escape'ing the character -> \"
so if you are trying it on the shell, just do:

... egrep -i "Celsius|Current" | grep Temp

or (if you are more common to regualr expression) just formulate an egrep-string which does all in one, (without the pipe)...


.) still, the php-file should have worked, so try executing it (without the cacti cmd.php) by just starting the script ;o)


ps: liebe grüsse in die schweiz ;o)
maXXmaster
Posts: 3
Joined: Wed Sep 01, 2004 5:49 am

Post by maXXmaster »

you also have to give 1 paramter to the script in my case it would be:

hdtemp.php "-d scsi /dev/sda" for the first scsi-drive, or
hdtemp.php "-d 3ware,0 /dev/sdc" for the first ide drive in my 3ware-raid or
hdtemp.php "/dev/hda" for a regular ide-drive...

if you didn't give this paramter to the script, it wouldn't know what to read. so maybe you mistake was in cacti where you have to enter that paramter.


a friend of mine hinted, that you could increase speed, when doing smartctl -A and not smartctl -a, which causes smartctl to read less information from the drive...

cheers
guest

Post by guest »

My graphs render, but they do not contain any data.
if only some of your graphs are not updating correctly, double check the Maximum Value field for all data sources used by these graphs. If the value being fed to the .rrd file exceeds its Maximum Value, RRDTool will insert an Unknown and you will see no data on the graph.
In your script the Maximum Value Stands is 50 ==> Max Value is 40 + x = 50 ==> if the Temperature is above 10 ° Celsius you SEE NOTHING

i took me about 2 days to get to this error ^^

but thx for this script anyway :)
Guest

Post by Guest »

ohh man, got it wrong, script stopt working once again ...

i dont get it ... this script works, then i add aditional drives, and the script stops working :(((
guest

my script for smartctl (on FreeBSD)

Post by guest »

Here's my shell script for running smartctl, developed on FreeBSD (havn't tried it on any other OS yet):

----- begin script -----
#!/bin/sh
temp_celsius="`/usr/local/bin/sudo /usr/local/sbin/smartctl -A $1 | egrep Temperature_Celsius | cut -c86-`"
echo "$temp_celsius"
------ end script ------

The other option is to use awk instead of cut:

sudo /usr/local/sbin/smartctl -A ad4 | egrep Temperature_Celsius | awk '{print $10}'

and really the 'echo "$temp_celsius"' doesn't need to be there, it could just be one line.

I've added the following to sudoers:
----- begin sudoers adds -----
Cmnd_Alias CACTI_CMDS = /usr/local/sbin/smartctl -A [a-z][a-z][0-9]
cacti ALL = NOPASSWD: CACTI_CMDS
----- end sudoers adds -----
User avatar
Bobi wan Kenobi
Posts: 1
Joined: Wed Dec 01, 2004 2:05 pm
Location: Belgrade

One little thing

Post by Bobi wan Kenobi »

Great job Maxxmaster. Thanks for the templates.
I just finished my hdtemp setup.

Your instructions were slightly incomplete.
I changed the input_string tag in both templates to suit my cacti installation.
This is the point when data appeared in graphs.
Maybe sumsum referred to this.
It looks good.

Rgds.
Guest

Post by Guest »

thx a lot
working great
jul
Posts: 2
Joined: Tue Apr 25, 2006 2:36 am

Post by jul »

hi
i have a problem with drawing the graph from this script:

Code: Select all

#!/bin/sh
sudo /usr/local/sbin/smartctl -A /dev/ad6 | egrep Temperature_Celsius | awk '{print $10}'
when executed as user cacti, gives correct result:
root@hal9000# /usr/local/share/cacti/scripts/hddtemp.sh
40
but the graph remains blank and says nan...

here's the cacti's log:
LOW wrote:04/26/2006 02:05:01 AM - CMDPHP: Poller[0] Host[0] DS[35] WARNING: Result from CMD not valid. Partial Result:
DEBUG wrote:04/26/2006 01:50:02 AM - CMDPHP: Poller[0] Host[0] DS[35] WARNING: Result from CMD not valid. Partial Result:
04/26/2006 01:50:02 AM - CMDPHP: Poller[0] Host[0] DS[35] CMD: /usr/local/share/cacti/scripts/hddtemp.sh, output: U
04/26/2006 01:50:02 AM - CMDPHP: Poller[0] DEBUG: SQL Exec: "insert into poller_output (local_data_id,rrd_name,time,output) values (35,'hdd_temp','2006-04-26 01:50:01','U')"
please, help me!

FreeBSD 6.0-RELEASE-p6
cacti-0.8.6h_4
apache-2.0.55_4
php4-4.4.2_1
gotchi
Posts: 9
Joined: Mon Jul 31, 2006 11:47 am
Location: Austria
Contact:

Post by gotchi »

guest wrote:
My graphs render, but they do not contain any data.
if only some of your graphs are not updating correctly, double check the Maximum Value field for all data sources used by these graphs. If the value being fed to the .rrd file exceeds its Maximum Value, RRDTool will insert an Unknown and you will see no data on the graph.
In your script the Maximum Value Stands is 50 ==> Max Value is 40 + x = 50 ==> if the Temperature is above 10 ° Celsius you SEE NOTHING

i took me about 2 days to get to this error ^^

but thx for this script anyway :)
sorry but I think I am too silly

what means this and where are I have to chance the values.
I changed the value for allowed in the php script to 50.
and what or where should I change the rest ? my harddisk has a temp between 30 and 45
gotchi
Posts: 9
Joined: Mon Jul 31, 2006 11:47 am
Location: Austria
Contact:

Post by gotchi »

delete
Last edited by gotchi on Mon Jul 31, 2006 11:52 am, edited 1 time in total.
gotchi
Posts: 9
Joined: Mon Jul 31, 2006 11:47 am
Location: Austria
Contact:

Post by gotchi »

hi

I tried everything really everyting but this error makes me crazy
07/31/2006 08:20:13 PM - CACTID: Poller[0] Host[1] ERROR: Empty result [127.0.0.1]: '/usr/share/cacti/site/scripts/hdtemp.php /dev/hda'

i set up sudo to let www-data run the smartctl as root. it works.
its a debian sarge installation and I installed it via aptitude.

all other scripts are working these script wont work with the poller - starting it as root - all fine
starting it as a user who I also added to the sudoers all fine

what can I do?
gotchi
Posts: 9
Joined: Mon Jul 31, 2006 11:47 am
Location: Austria
Contact:

Post by gotchi »

ok now its really late here in austria

the hd temp script worked for a short time, now I get all errors out of the cacti.log except one:

08/01/2006 01:00:15 AM - CACTID: Poller[0] Host[1] WARNING: Result from SCRIPT not valid. Partial Result: temp:42 allowed:50 ...

and this makes me crazy - cause with this error no graph drawing is possible.

please please help me :cry: :cry:
gotchi
Posts: 9
Joined: Mon Jul 31, 2006 11:47 am
Location: Austria
Contact:

Post by gotchi »

new information
using cacti 086c and cactid 086d from debian sarge install

using cmd.php as poller works without this error.

i will go to bed - perhaps someone have a solution for me tomorrow.

thanx
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests