Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 19
|
![]() |
Author |
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1950 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have complained about this probably the first time more than a year ago, but recently, this has been creeping up more in a couple of variations. One of them I just noticed today, with the properties of the WU (before I aborted it) showing
----------------------------------------Application Mapping Cancer Markers 7.41 The host in question is an idle sitting 8 thread i7, with 8GB of RAM, running Windows 10/1903. Did notice this and a similar various on all kinds of hosts, including brand spanking new 9th Gen i5-9700 (6 cores, 6 threads) with 16GB of RAM, running Windows 10/1909 and the latest BOINC client for download from WCG. Ralf ![]() ![]() |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7660 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Have you tried either the suspend/resume routine or the shutdown and/or reboot routine to see if that allows the work unit to continue normally?
----------------------------------------Edit: I do not have any machines running Win 10, but have never experienced this behavior in either Win 7 or in Linux. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Mar 6, 2020 2:23:31 AM] |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1950 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Have you tried either the suspend/resume routine or the shutdown and/or reboot routine to see if that allows the work unit to continue normally? Yes, it doesn't help. Beside that there are a lot of hosts that I simply can't babysit all the time.Edit: I do not have any machines running Win 10, but have never experienced this behavior in either Win 7 or in Linux. I have only one Linux machine running right now (Linux Mint 19.3, Core Duo with 3GB of RAM), where I don't think I have ever seen this problem. Cheers I have seen this IIRC once on my my Macbook Pro (i7 8T, 8GB RAM, High Sierra), but otherwise across a wider range of Windows hosts, including Windows 7, Windows 8.1 (the very machine I am typing this) as well as Windows Server 2003, besides Windows 10. So I don't think that this is a Windows 10 specific issue, I just mentioned this initially because in the past, some people have been harping about the fact that it might have been an older CPU, or Windows version or not the latest BOINC client. But there doesn't seem to any obvious circumstances when this happens and I usually only notice this when I check the daily stats and notice a larger than usual deviation in the points for a stats update. Ralf ![]() |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1950 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
And here's another one, from a different host (i7 8T, 8GB RAM, Windows 10/1909)
----------------------------------------Application Mapping Cancer Markers 7.41 ![]() |
||
|
Hanski
Senior Cruncher Finland Joined: Nov 14, 2005 Post Count: 157 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have the same problem in one machine with Win10. But rebooting the machine has always helped the problem. The hard disk in the machine is failing and I've think that is the reason.
----------------------------------------
<><
|
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1950 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have the same problem in one machine with Win10. But rebooting the machine has always helped the problem. The hard disk in the machine is failing and I've think that is the reason. In all of my cases, rebooting has not helped. Either the WU keeps running/blocking a slot at 100% or in some cases terminates quickly with "Computation Error"...And it happens even on brand spanking new PCs, within hours of taking them out of the shipping box and initial setup, and ALL machines are everyday used workstations (though sometimes idle for prolonged period of times). And beside those seemingly random blocking WUs, they are crunching hundreds if not thousands WUs at the same time just fine... Ralf ![]() |
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have only one Linux machine running right now (Linux Mint 19.3, Core Duo with 3GB of RAM), where I don't think I have ever seen this problem. Same here. On Windows, I always suspect the anti-virus for hanging up work. It may help to exclude the BOINC folders, but that may not stop the real-time monitoring processes from interfering as they inspect the work, or else the communications on the Internet. |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1950 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have only one Linux machine running right now (Linux Mint 19.3, Core Duo with 3GB of RAM), where I don't think I have ever seen this problem. Same here. On Windows, I always suspect the anti-virus for hanging up work. It may help to exclude the BOINC folders, but that may not stop the real-time monitoring processes from interfering as they inspect the work, or else the communications on the Internet. ![]() Ralf ![]() |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7660 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You say neither suspending nor rebooting has solved the problem. Have you tried a complete shutdown and then a cold start ?
----------------------------------------After the cold start, I would snooze BOINC and see if any unknown processes are running and/or using an inordinate amount of cpu cycles. Then I would suspend every running work unit and only leave the affected one running and then see if it progresses normally as the only one running. The only other idea that comes to mind is a memory problem which is causing a memory bottle neck someplace. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It seems we had this problem (or one similar) where a work unit got to the very end and hung. I sent in the directory listing from the slot directory showing that it had finished the work unit but had not started compressing the results for transmission. Never received any response to the post.
----------------------------------------EDIT: It was in the ARP project where a work unit was showing as running but no process was known to the OS. Looked like BOINC started the WU but never transferred control to the executable. A little different scenario... [Edit 1 times, last edit by Doneske at Mar 6, 2020 11:17:23 PM] |
||
|
|
![]() |