Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 18
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've noticed a problem with my tasks that occur about 10% of the time.
When a task reaches 100% Progress, it usually performs some sort of upload and then switches the states to Ready to Report. However, in about 10% of the cases, the task sits on 100% Progress with a "---" in the Remaining column and the Elapsed timer continuing to run. What's up with this? Should I manually abort them because they're never going to reach a state where the results are going to be reported? |
||
|
IBM01902
Cruncher Joined: Aug 13, 2017 Post Count: 11 Status: Offline Project Badges: ![]() ![]() |
If you look in Boinc and find the Advanced tab, and click Event Log, it should open something and give you a clue like a network problem on your end or server maintenance on their end. You can also go to the Transfer tab, and if there are some line items there, select them and click Retry now. Lastly, go to the Project tab, select your project of concern, click Update and that might delay 2 mins but should request a communication with the host. After all that, take another look in the Event log. If all else fails, there is a Reset Project in BOINC but I'm pretty sure you'll lose credit for the last few work units, but at least you're starting fresh with everything redownloaded. Good Luck.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
You don't say whether the status says Uploading or still says Running. However, for me at least, the Elapsed time stops moving once the task is Uploading.
Some WCG projects do do a sort of consolidation or very short secondary simulation at the end of the main run, but I've always seen progress stick at fraction under 100%. You don't say which WCG projects this is happening for, or if it's all of them. Have you checked to see if the task is still actually using CPU time (task manager, or similar)? Or is it just the case that BOINC isn't reporting properly? But, all in all, this sounds very odd to me. I think my first course of action would be to reboot the machine. If that doesn't fix it, I would make sure you're running the latest supported WCG/BOINC version and, if so, then try resetting the project. Good luck getting this sorted. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks, but this is clearly a problem of the task getting to 100%, but not knowing it's there and continuing to run. There's nothing to communicate and nothing in the logs because the task doesn't recognize it's been completed. The only two options are to continue to let it run and eventually have the slot cleared because it took too long or to manually abort it.
Bottom line - there's a bug in WCG client program. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I understand what you're saying, and I can see why you might think that, but, even though I don't know the ins and outs of the BOINC system design, I don't completely agree with you. I know from my ancient Windows machine that there is communication between the active tasks and the manager -- and that that communication can be disrupted with unwanted results. That is why I suggested a re-boot to start things over. Unless you know something you haven't told us, or you do a lot of monitoring of what is going on, I think you are jumping to unfounded conclusions.
|
||
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2384 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The time to completion is only an estimate. I see this every day and the WUs do upload & report.
----------------------------------------Did you do everything IBM01902 suggested??? ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've put the computer on Run Always and watched the individual tasks. While time to completion is an estimate, at least tasks that are actually performing actual work show both progress and something remaining in the Remaining field.
In this particular case, the task progresses normally until it reaches 100%. Then, instead of doing the expected upload to the server and placing the task in the Ready to Report status, it continues to sit at 100%, the Elapsed time continues to increase, and the Remaining field shows "----". If I don't do anything and leave it, a task in this condition will run for at least a day or so, and when I first noticed this bug, a couple tasks stuck in this condition had been running for several. As a task in this state continues to consume a slot and CPU cycles, I now abort it when I see this happening. |
||
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 325 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
An alternative method of trying to 'prod' the work unit into completing is:
Turn off 'leave application in memory' Suspend the work unit Wait a minute or so Resume the work unit. It should restart from its latest checkpoint (hopefully the one after the calculations) and then complete correctly. Turn on 'leave application in memory' This technique is always worth a try as it does not interfere with the other running work units. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Up to now, nothing you've said tells me that you have independently verified what is happening. Just because the manager is reporting CPU time does not mean the the WU is using CPU time. Neither does the fact that the manager is not uploading the WU mean that the WU has not completed. It could simply mean that the message from the WU to the manager to tell the manager that it is ready to upload has gone AWOL.
If you're sure that the unit is still using CPU time (by observing the usage in a monitoring tool, not by checking the WU properties in BOINC) then, if and when it finishes, or you cancel it after allowing 100% more time than expected, I would take a look at the WU's log on the web site (or you can try reading it in real time in the appropriate slot directory). If there really is recorded activity once the unit has hit 100%, then I should report it using the contact procedure on the web site so that the techs can take a look at it. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes, it was using CPU time, according Resource Monitor. I did all the basic troubleshooting noted above before posting, and as nothing addressed the behavior, started looking to see if this was something others experienced.
What has seemed to make a difference is cutting back the PC's resources, from 100% of the cores to 50%, and the CPU slice to 10%. Since doing that, I haven't seen this particular phenomena again. Of course, not as much work is getting done, but at least it's all getting reported now. Thanks for the replies. If it happens again, I'll look into what's available on the website, as nothing on the host provided any insight. |
||
|
|
![]() |