Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 7
|
![]() |
Author |
|
gordonbb
Cruncher Canada Joined: May 14, 2019 Post Count: 19 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
One of my systems has 3 OPN tasks stuck. The normal time to complete tasks for this system is estimated at 2:01:19.
----------------------------------------HW Specs: AMD Ryzen 2600x; 16 GB DDR4-3200; RTX 1070Ti; Ubuntu 18.04.5 LTS Desktop with current patches and latest HWE stack. I first saw this a few days ago and ended up aborting the tasks as they just stayed at a fixed % completed and the Load Avereage on the system was down from ~12 to 12-X where X was the number of stuck Tasks. I noticed the same thing today. About two days ago we did have a Thunderstorm and though the system is not on a UPS it is connected to a surge suppressor on a UPS and curiously the systems on the Load side of that UPS both rebooted but this system did not reboot (shows an up-time of almost 5 days. Here are the "Properties" of the three tasks that are currently stuck: Application: OpenPandemics - COVID 19 7.17 ![]() AMD - 2600x, 2 x 2700, 2700x, 3900x, 3950x, 2 x 5900x, 5950x Intel - E3-1231v3, 9900K NVidia - GTX 1060 6GB, 1660ti, 1070ti; RTX 2060, 2060s, 2070a, 5 x 2070s [Edit 1 times, last edit by gordonbb at Jul 27, 2021 2:37:59 PM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7693 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The quickest and easiest thing to do is to reboot the system and see if that allows the tasks to resume their normal progress. if this does not work, please post about the first 30 lines of the log after reboot and that may give a clue as to what is happening.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
gordonbb
Cruncher Canada Joined: May 14, 2019 Post Count: 19 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The quickest and easiest thing to do is to reboot the system and see if that allows the tasks to resume their normal progress. if this does not work, please post about the first 30 lines of the log after reboot and that may give a clue as to what is happening. Thanks! That did the trick.Cheers The 3 tasks are now progressing once again and curiously the Elapsed time is now showing "normal" values: Name: OPN1_0057699_00361 CPU time: 01:09:56 Elapsed time: 01:10:15 Estimated time remaining: 00:23:00 Name: OPN1_0057765_03282 CPU time: 01:36:44 Elapsed time: 01:36:54 Estimated time remaining: 00:31:23 Name: OPN1_0057721_00322 CPU time: 01:05:06 Elapsed time: 01:05:13 Estimated time remaining: 00:50:57 ![]() AMD - 2600x, 2 x 2700, 2700x, 3900x, 3950x, 2 x 5900x, 5950x Intel - E3-1231v3, 9900K NVidia - GTX 1060 6GB, 1660ti, 1070ti; RTX 2060, 2060s, 2070a, 5 x 2070s |
||
|
biini
Senior Cruncher Finland Joined: Jan 25, 2007 Post Count: 334 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Seems like they resumed on the last checkpoint. I had few of those also couple of months ago.
----------------------------------------One obvious reason I found (on windows) that gpu driver was updated automatically. ---------------------------------------- [Edit 1 times, last edit by biini at Jul 28, 2021 7:39:07 AM] |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Many times you can get stuck tasks progressing again without a reboot by suspending them for a few seconds and then resuming them.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 326 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
When using the suspend / resume technique it is best to set LAIM (leave application in memory) off. This ensures that program and checkpoint files are read from disk. Remember to set LAIM on afterwards.
|
||
|
gordonbb
Cruncher Canada Joined: May 14, 2019 Post Count: 19 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Many times you can get stuck tasks progressing again without a reboot by suspending them for a few seconds and then resuming them. When using the suspend / resume technique it is best to set LAIM (leave application in memory) off. This ensures that program and checkpoint files are read from disk. Remember to set LAIM on afterwards. Thanks - I'll try that next time. I had another task stick (OPN1_0058122_00445_0). I usually see this by noticing htop showing less than all threads utilized then looking at BOINC Manager to see the offending task. A reboot again set the stuck tasks elapsed time but looking closer in htop the CPU% for the task was still at 0% after the reboot and the Progress in BOINC Manager was not increasing. That one I Aborted and after the abort it showed "Computation Error" before reporting. The logs are showing squat but I've them at the default verbosity. EDIT - Another one. After the reboot the system naturally picked up a few OPNG tasks for the GPU to crunch on. The last of these (OPNG_0068524_00030 ) got stuck at 27% so I did as suggested, set LTIM to off and suspended then resumed the task and it finished to completion. So I suspect I have a marginal core on this CPU. I'm using a -0.100V Vcore offset to under-volt the processor a "titch" so I'm going to remove that and see. It's strange, this system has been chugging away for months with nary an issue. It did, however, recently have the ATI HD5870 replaced with a GTX 1070Ti and the RAM increased from 4x4GB DDR4-2400 to 2x8GB DDR4-3200 but I wiped and re-installed the OS (Ubuntu 18.04.5 LTS Desktop). The RAM and GPU were swapped from another system that was also running OPN. "She who must be obeyed" has her birthday soon and wants a gaming system once again to play games with the kids (well, young adults) so this system is destined to running Windows 10 in a few days once I get my 100 year badge ๐ ![]() AMD - 2600x, 2 x 2700, 2700x, 3900x, 3950x, 2 x 5900x, 5950x Intel - E3-1231v3, 9900K NVidia - GTX 1060 6GB, 1660ti, 1070ti; RTX 2060, 2060s, 2070a, 5 x 2070s [Edit 1 times, last edit by gordonbb at Jul 29, 2021 2:55:12 AM] |
||
|
|
![]() |