Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 8
|
![]() |
Author |
|
ericinboston
Senior Cruncher Joined: Jan 12, 2010 Post Count: 256 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi all. I was always under the impression that the Runtime column on the Devices webpage signified how much time each computer spent processing WUs...and that each Thread on a CPU was counted individually. Therefore, my understanding was that if I have an 8 thread CPU and it's running 24x7 at 100% dedication to crunching WUs (I only do the MCM project) and BOINC/WCG is running on all 8 threads 100% of the time, then I should get 8 days of work for that machine (8 threads running all day long should be 8 days of "runtime").
----------------------------------------Currently I am noticing that my long-time crunching Intel iMac 2017 which has 8 threads is only getting 1-5 days worth of Runtime credit. Also, it's not like some days are 3 and the next day is 11 in some kind of reporting discrepancy with WCG. The machine is full of WUs so it never runs out. It runs 24x7. I haven't changed anything on it in years. Today I upgraded BOINC to 8.x hoping it would help (it had been running the same 7.x for at least 3 years) and that doesn't seem to have helped. The Mac is not going to sleep or anything like that. My sister-in-law has the exact same Mac and her machine seems to be reporting ~7 or 8 days' worth of Runtime. Other Wintel and Mac M1 and M2 machines hover right at the 8 days' worth of Runtime every single day. Prior to January it would report 5-8 days every day. But even with those previous months, why would it technically ever be under 8 days of Runtime (so long as it had WUs to crunch)? I understand sometimes WUs aren't reported right away if there are server problems...but doesn't the server know when the WUs were completed and thus calculate the Runtime properly? Any ideas? If my understanding of Runtime is wrong, can someone please correct me? My machines for this project are dedicated 24x7 so they don't get shut off. Even if the power goes out, they start right back up automatically and continue (and we have a home generator so they miss about 5 mins worth of compute time while the generator kicks in and the machines reboots). ![]() |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7643 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I don't know much about Macs, but I suspect you have other processes running in the background of which you may be unaware. I would check for this. Also the operating system does take some cycles as overhead so you may approach an 8 day average of work per day, but never reach it.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2149 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Also, Eric, have you considered actually looking at the running tasks in the queue for a while, especially if the individual tasks are running normally, inspecting the times when they start and when they finish?
Adri |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 937 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Eric,
Have you checked the problem system to see if it is logging any faults (for BOINC or otherwise)? One of my old Linux boxes started to show a similar effect a few weeks before one of the disks in its RAID array failed :-( Otherwise, it is presumably something related to the work it is receiving, which can make diagnosis a bit tricky, I fear. Some insight into the system specifications and how you have BOINC set up might help... For instance, are the "identical" systems running the same job mix (either all MCM1 or a [controlled?] mix of ARP1 and MCM1)? Do they all have the same buffer sizes (larger buffers may cause retries to get priority over already-running tasks)? Such things can make a difference to throughput. [If you already know some/all of the below, my apologies :-) ...] If you are running ARP1 and MCM1 but the other "identical" system isn't, you will see a difference in the daily run-times, but not quite as dramatic a fall-off as you have described. (If you are only running ARP1, all bets are off!) When ARP1 is in the mix, its [much] larger memory footprint could be a factor if your system hasn't got lots of RAM. Also, ARP1 puts quite a load on the system each time a checkpoint is taken and is a constant source of other disk write operations as it runs. If only running MCM1, the only time when there might be issues is during task set-up, when a slow start can cause a watchdog timer to decide that something is wrong and the set-up needs to restart! With the above in mind, I'd add a suggestion to look at some returned results on the web site to compare run-time to elapsed time and to see if there's evidence that they are having difficulty starting up (a problem on some Macs in the past), or if they are restarting from checkpoints at some stage. If a task rolls back to a checkpoint the timing information gets set back to what it was at checkpoint time [or zero if it never checkpointed :-( ] Good luck finding out what's going on! Cheers - Al |
||
|
ericinboston
Senior Cruncher Joined: Jan 12, 2010 Post Count: 256 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Also, Eric, have you considered actually looking at the running tasks in the queue for a while, especially if the individual tasks are running normally, inspecting the times when they start and when they finish? Adri I should have mentioned that yes, I looked for running tasks on the OS and there's nothing else running. Again, this machine hasn't changed software-wise in like 5 years. It was reporting WUs just fine until this recent December WCG shutdown. ![]() |
||
|
ericinboston
Senior Cruncher Joined: Jan 12, 2010 Post Count: 256 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
...For instance, are the "identical" systems running the same job mix (either all MCM1 or a [controlled?] mix of ARP1 and MCM1)? Do they all have the same buffer sizes (larger buffers may cause retries to get priority over already-running tasks)? Such things can make a difference to throughput. Thanks for all the points. 1)They are identical because I set them up side by side. :) 2)Both machines (as well as my entire account) is only running MCM. I will watch it over the next several days and see if it improves. ![]() |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 792 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Anything in the Event Log saying that CPU computation has been paused because it is busy? I've seen that before and just needed to tweak my Options on that machine so that it basically ran 100% without pausing, since it's a dedicated machine.
----------------------------------------
|
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1672 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You can monitor your machines using BoincTasks which will give you the effective CPU load used for each WU .
----------------------------------------If something is using much CPU resources in the background, you will see that allocated load for the WU will be significantly below 100%. Additionally, you might have some throttle activated on the machine which will have an impact on the CPU usage. Cheers, Yves |
||
|
|
![]() |