Submit Your CMD.PHP vs. SPINE Metrics Here

Important information about Cacti developments that all users should be interested in.

Moderators: Developers, Moderators

Post Reply
MrRat
Cacti User
Posts: 135
Joined: Thu Jan 07, 2010 10:33 am

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by MrRat »

11/20/2015 11:24:21 AM - SYSTEM STATS: Time:19.8991 Method:spine Processes:1 Threads:50 Hosts:1321 HostsPerProcess:1321 DataSources:77508 RRDsProcessed:0
11/20/2015 11:23:24 AM - SYSTEM STATS: Time:23.0834 Method:spine Processes:1 Threads:50 Hosts:1321 HostsPerProcess:1321 DataSources:77509 RRDsProcessed:0
11/20/2015 11:22:01 AM - SYSTEM BOOST STATS: Time:87.4000 RRDUpdates:1008040
User avatar
rony
Developer/Forum Admin
Posts: 6022
Joined: Mon Nov 17, 2003 6:35 pm
Location: Michigan, USA
Contact:

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by rony »

MrRat wrote:11/20/2015 11:24:21 AM - SYSTEM STATS: Time:19.8991 Method:spine Processes:1 Threads:50 Hosts:1321 HostsPerProcess:1321 DataSources:77508 RRDsProcessed:0
11/20/2015 11:23:24 AM - SYSTEM STATS: Time:23.0834 Method:spine Processes:1 Threads:50 Hosts:1321 HostsPerProcess:1321 DataSources:77509 RRDsProcessed:0
11/20/2015 11:22:01 AM - SYSTEM BOOST STATS: Time:87.4000 RRDUpdates:1008040
Damn! Nice!
[size=117][i][b]Tony Roman[/b][/i][/size]
[size=84][i]Experience is what causes a person to make new mistakes instead of old ones.[/i][/size]
[size=84][i]There are only 3 way to complete a project: Good, Fast or Cheap, pick two.[/i][/size]
[size=84][i]With age comes wisdom, what you choose to do with it determines whether or not you are wise.[/i][/size]
User avatar
dragossto
Cacti User
Posts: 86
Joined: Tue May 15, 2007 5:24 am
Location: Romania
Contact:

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by dragossto »

SYSTEM STATS: Time:28.8599 Method:spine Processes:4 Threads:18 Hosts:296 HostsPerProcess:74 DataSources:9167 RRDsProcessed:4746
SYSTEM STATS: Time:32.2797 Method:cmd.php Processes:16 Threads:N/A Hosts:296 HostsPerProcess:19 DataSources:9163 RRDsProcessed:8192
Imperial
Posts: 6
Joined: Wed Jan 12, 2011 9:22 am
Location: Vienna - Austria

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by Imperial »

08/17/2016 06:46:04 PM - SYSTEM THOLD STATS: Time:4.8714 Tholds:1459 TotalHosts:4607 DownHosts:67 NewDownHosts:0
08/17/2016 06:45:59 PM - SYSTEM STATS: Time:57.5754 Method:spine Processes:8 Threads:10 Hosts:4608 HostsPerProcess:576 DataSources:273959 RRDsProcessed:134509
08/17/2016 06:40:53 PM - SYSTEM THOLD STATS: Time:4.0979 Tholds:1459 TotalHosts:4607 DownHosts:67 NewDownHosts:0
08/17/2016 06:40:49 PM - SYSTEM STATS: Time:48.6012 Method:spine Processes:8 Threads:10 Hosts:4608 HostsPerProcess:576 DataSources:273959 RRDsProcessed:134509
08/17/2016 06:36:12 PM - SYSTEM THOLD STATS: Time:4.9235 Tholds:1459 TotalHosts:4607 DownHosts:67 NewDownHosts:0
08/17/2016 06:36:07 PM - SYSTEM STATS: Time:66.5020 Method:spine Processes:8 Threads:10 Hosts:4608 HostsPerProcess:576 DataSources:273959 RRDsProcessed:134384

Cacti Version 0.8.8f
Cacti OS unix
Attachments
graph_pie.png
graph_pie.png (16.21 KiB) Viewed 32748 times
Rno
Cacti Pro User
Posts: 692
Joined: Wed Dec 07, 2011 9:19 am

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by Rno »

With Spine I got:
02/02/2017 12:57:05 PM - SYSTEM STATS: Time:3.6141 Method:spine Processes:10 Threads:15 Hosts:186 HostsPerProcess:19 DataSources:2214 RRDsProcessed:1268
jozatan
Posts: 11
Joined: Sun Apr 09, 2017 1:10 am
Location: San Diego, CA

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by jozatan »

OS: CentOS Linux release 7.4.1708 (Core)
CPU: Intel(R) Xeon(R) CPU E5-2403 0 @ 1.80GHz, 8g ram, 8g swap
Kernel: 3.10.0-693.17.1.el7.x86_64
cacti 1.1.36, spine 1.1.36

Before (cmd.php):

cacti.log:03/05/2018 14:50:03 - SYSTEM STATS: Time:2.2430 Method:cmd.php Processes:1 Threads:N/A Hosts:10 HostsPerProcess:10 DataSources:110 RRDsProcessed:57
cacti.log:03/05/2018 14:55:03 - SYSTEM STATS: Time:2.2305 Method:cmd.php Processes:1 Threads:N/A Hosts:10 HostsPerProcess:10 DataSources:110 RRDsProcessed:57
cacti.log:03/05/2018 15:00:04 - SYSTEM STATS: Time:2.2336 Method:cmd.php Processes:1 Threads:N/A Hosts:10 HostsPerProcess:10 DataSources:110 RRDsProcessed:57

After (spine):

cacti.log:03/05/2018 15:05:03 - SYSTEM STATS: Time:2.2340 Method:spine Processes:1 Threads:1 Hosts:10 HostsPerProcess:10 DataSources:110 RRDsProcessed:57
cacti.log:03/05/2018 15:10:04 - SYSTEM STATS: Time:2.2398 Method:spine Processes:1 Threads:1 Hosts:10 HostsPerProcess:10 DataSources:110 RRDsProcessed:57
cacti.log:03/05/2018 15:15:04 - SYSTEM STATS: Time:2.2391 Method:spine Processes:1 Threads:1 Hosts:10 HostsPerProcess:10 DataSources:110 RRDsProcessed:57

After adjusting spine threads from 1 to 10:

03/05/2018 15:20:02 - SYSTEM STATS: Time:1.2320 Method:spine Processes:1 Threads:10 Hosts:10 HostsPerProcess:10 DataSources:110 RRDsProcessed:57
03/05/2018 15:25:03 - SYSTEM STATS: Time:1.2360 Method:spine Processes:1 Threads:10 Hosts:10 HostsPerProcess:10 DataSources:110 RRDsProcessed:57
03/05/2018 15:30:02 - SYSTEM STATS: Time:1.2416 Method:spine Processes:1 Threads:10 Hosts:10 HostsPerProcess:10 DataSources:110 RRDsProcessed:57
LdubSham
Posts: 10
Joined: Wed Nov 12, 2014 5:32 am

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by LdubSham »

Time:63.1088 Method:spine Processes:16 Threads:32 Hosts:8189 HostsPerProcess:512 DataSources:3458700 RRDsProcessed:1069607 SystemLoad5mAvg:33.8

Ubuntu 18.04 LTS (GNU/Linux 4.15.0-20-generic x86_64) - Vanilla kernel
MariaDB 10.1
Dual Xeon Gold 6152, 22 Cores @ 2.1Ghz each CPU
192GB DDR4 2666Mhz RAM
2 x 240GB SSD's - 12Gbs SAS - RAID 1 (Database and O/S)
2 x Samsung PM1725a 800GB M.2 NVME Drives (RRD storage only)- Software RAID 0 using MDADM (RRD's are backed up nightly) - XFS filesystem and mounted with noatime.

Cacti Version 1.38
Cacti OS unix
RSA Fingerprint
NET-SNMP Version NET-SNMP version: 5.7.3
RRDtool Version RRDtool 1.7.x
Devices 8245
Graphs 1084712
Data Sources Script/Command: 46
SNMP Get: 8257
SNMP Query: 1068248
Script Query: 4
Script Server: 8055
Script Query - Script Server: 110
Total: 1084720

Interval 300
Type SPINE 1.1.35 Copyright 2004-2017 by The Cacti Group
Items Action[0]: 3450521
Action[1]: 54
Action[2]: 8125
Total: 3458700

Concurrent Processes 16
Max Threads 32
PHP Servers 10
Script Timeout 60
Max OID 60

MemTotal 192.07 K MB
MemFree 3.45 K MB
Buffers 732.58 MB
Cached 148.83 K MB
Active 138.21 K MB
Inactive 39.75 K MB
SwapTotal 2.05 K MB
SwapFree 2.03 K MB

This system is not using boost - with NVME there is no need and would probably slow the system down. However in order to make full use of the system resources available I've had to make modifications to the batch size poller.php fetches from poller_output table. I have increased $max_rows from 40k to 1.5million (You will need to disable or increase php memory limits in the scripts to do this). I have also replaced the path to the RRDTool binary with a python script that splits the batch into equal chunks then spawns multiple RRDTool processes. This was necessary as a single RRDTool process was not able to saturate the write speed of the NVME drives properly.

MariaDB also took considerable tuning at this scale in order to stop spine from getting 2013 'Lost connection' errors and be able to deal with the concurrency. After tuning this in conjuction with spine it is now rock solid. (config below)

The run time for spine is about 20 - 25 seconds. The remaining total polling time is poller.php working its way through the entries in the poller_output table. It would be good if cacti had parallelization built into this part of the system - spine is highly parallel when collecting the data but poller.php is single threaded and calls a single rrdtool instance which bottlenecks the poller and doesn't effectively use modern multicore hardware or SSD disks.

After my modifications I would suggest this system could probably take twice the number of RRD's and still produce good polling times (with additional RAM). RAM size is critical to performance by holding hot portions of the RRD's in disk cache. Without sufficient RAM the system would swap these pages or evict them and poll times would suffer.

root@cacti-02:~# vmtouch /nvme/rra
Files: 1083217
Directories: 1
Resident Pages: 37501349/105650472 143G/403G 35.5%
Elapsed: 32.506 seconds

Maria DB Configuration

#
# * Fine Tuning
#
key_buffer_size = 256M
max_allowed_packet = 1G
net_read_timeout = 600
net_write_timeout = 180
wait_timeout = 86400
interactive_timeout = 86400
join_buffer_size = 512
max_heap_table_size = 8G
tmp_table_size = 8G
net_retry_count = 20


# Thread Pool Configuration
thread_handling = pool-of-threads
thread_pool_idle_timeout = 250
thread_pool_max_threads = 1500
thread_pool_size = 88
thread_concurrency = 44
thread_stack = 192K

# Back Log increases wait time(ms) in queue for clients connecting.
back_log = 3000

# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam_recover_options = BACKUP
max_connections = 5000
max_connect_errors = 10000
table_cache = 8128

#
# * Query Cache Configuration
#
query_cache_limit = 8M
query_cache_size = 256M
query_cache_type = 1

#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
innodb_doublewrite = OFF
innodb_flush_neighbors=0
innodb_buffer_pool_size=30G
innodb_buffer_pool_instances=30
innodb_log_file_size=3G
innodb_additional_mem_pool_size=80M
innodb_flush_log_at_timeout = 3
innodb_read_io_threads = 64
innodb_write_io_threads = 64
innodb_log_buffer_size = 16M

Additional system settings:

/etc/sysctl.conf

vm.swappiness = 1
net.ipv4.tcp_max_syn_backlog = 8192

ulimit openfile limits have been increased as well.

I hope this helps people make informed decisions and scale their systems effectively. I noticed a lack of documentation online from people with really big installations so thought I would share my findings.
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by TheWitness »

That's pretty sweet. One thing that would help with parallelization would be to use either multiple poller_item tables, if you look at the process list, you are likely spending quite a bit of time with threads waiting on that table. Maybe partitioning by site id might be a way, to speed things up too, or at least solve some of the thread contention issues.

Using boost would still be a good thing, otherwise you are going to wear out your NVMe's. Advice would be to graph the wear level otherwise, you will find yourself in a bad place a few years down the road. Using that python based RRDtool helps with NFS, but I'm surprised that you are not getting good write performance. From my experience, you write to cache, and then the cache flushes at upto 3.2GBytes per second. Could be you don't have enough disk cache to cover so many RRDfiles, so you are forced into writing directly to the NVMe.
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
User avatar
TheWitness
Developer
Posts: 17007
Joined: Tue May 14, 2002 5:08 pm
Location: MI, USA
Contact:

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by TheWitness »

You should also go with the recommendations from FusionI/O related to tuning, specifically:
# -------------------------------------------------
# Recommendations from FusionI/O
# -------------------------------------------------
innodb_thread_concurrency=0
innodb_read_ahead_threshold=0
innodb_read_io_threads = 64
innodb_write_io_threads = 64
innodb_adaptive_flushing=1
#innodb_adaptive_checkpoint=keep_average
innodb_max_dirty_pages_pct=60
innodb_flush_method=O_DIRECT
innodb_io_capacity = 100000
innodb_io_capacity_max = 200000
#innodb_flush_neighbor_pages=0
True understanding begins only when we realize how little we truly understand...

Life is an adventure, let yours begin with Cacti!

Author of dozens of Cacti plugins and customization's. Advocate of LAMP, MariaDB, IBM Spectrum LSF and the world of batch. Creator of IBM Spectrum RTM, author of quite a bit of unpublished work and most of Cacti's bugs.
_________________
Official Cacti Documentation
GitHub Repository with Supported Plugins
Percona Device Packages (no support)
Interesting Device Packages


For those wondering, I'm still here, but lost in the shadows. Yearning for less bugs. Who want's a Cacti 1.3/2.0? Streams anyone?
Rno
Cacti Pro User
Posts: 692
Joined: Wed Dec 07, 2011 9:19 am

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by Rno »

07/22/2019 12:16:10 PM - SYSTEM STATS: Time:8.5287 Method:spine Processes:20 Threads:25 Hosts:1144 HostsPerProcess:58 DataSources:52449 RRDsProcessed:24989
07/22/2019 12:15:28 PM - SYSTEM THOLD STATS: Time:16.4836 Tholds:3797 TotalHosts:1143 DownHosts:4 NewDownHosts:0


I'm i right to say that I'm a member of large usage of cacti ?
But it's on version 0.8.8.h, I was never able to make it work on 1.x
User avatar
Osiris
Cacti Guru User
Posts: 1424
Joined: Mon Jan 05, 2015 10:10 am

Re: Submit Your CMD.PHP vs. SPINE Metrics Here

Post by Osiris »

1000 devices is respectable, though not exceptionally large. With that many threads and processes, you would be best to spread the load to multiple data collectors. Otherwise you block the database. Doing so also locks you into boost. Best retry with 1.2 6 that includes some boost updates.
Post Reply

Who is online

Users browsing this forum: No registered users and 0 guests