World Community Grid - View Thread - Everything since Feb either "Error" or too late

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: Everything since Feb either "Error" or too late

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 4

[ ]

Author

This topic has been viewed 1239 times and has 3 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Everything since Feb either "Error" or too late

Just noticed my average points had dropped, and now I see that as far back as I can see, all of my projects are either listed as "error" "too late" or "In progress" (may have a "pending" as well). I can only see back to about the 25th of Feb on this system, and the "Error" filter in the history returns no results. I only see "error" when I turn all filters off in the "results status" page.

I have not had any issues that I know of prior to this. Of course, it appears the system is not telling me that it has failed unless I log into the website and check results status. I did upgrade Boinc to the 7.2.39 (latest) version within the last few weeks. System is I7 core running Windows 7, X64, lots of RAM, not pushing any limits that I can see. No other projects are showing any issues (GPUGRID, SETI and Milkyway)

Note that I am configured for all projects, though it appears that the only two that are not "In progress" and showing errors are FAHV, FAAH, MCM1 projects.

Hate wasting processor cycles if they are just getting dumped. Have set the system to only process WCG, but not get new tasks until I clear the queue and get this sorted out. Any suggestions on how to fix this? Thanks! -Mike

[Mar 9, 2014 12:08:07 AM]

jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

180 day badge for Help Fight Childhood Cancer

180 day badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

180 day badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Everything since Feb either "Error" or too late

In Result Status locate a task that has Error under Status. Click on Error. Copy and paste the entire report which will look something like this:

Result Log

Result Name: MCM1_ 0002743_ 6059_ 1--
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.28_windows_x86_64 -SettingsFile MCM1_0002743_6059.txt -DatabaseFile dataset-17_72_SDG_v1.txt
Settings File
DateOfDesign = 11/07/2013
Designer = PMCC_OCI
WorkOrderID = 2743_6059
DatasetID = 17_72_SDG_v1
NumberOfGenesInStartingSignature = 13
NumberOfGenesInSignatureMin = 10

That's the first step in getting assistance,. If both MCM & FAAH are erroring out post example for each.

----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread

----------------------------------------
[Edit 1 times, last edit by jonnieb-uk at Mar 9, 2014 12:21:44 AM]

[Mar 9, 2014 12:21:08 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Everything since Feb either "Error" or too late

Thanks! I had spot checked the errors, but most are along the lines of:
-----------------
Result Log

Result Name: MCM1_ 0002587_ 3514_ 0--
<core_client_version>7.2.39</core_client_version

----end of error log----
Another:

Result Log

Result Name: FAHV_ x3NF9_ A_ IN_ Y3a_ rig_ 0199245_ 0851_ 0--
<core_client_version>7.2.39</core_client_version>
------end of log-----

One I just found:

Result Log

Result Name: MCM1_ 0002611_ 8280_ 1--
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
]]>

-----------------end log--------------
I see that some are now validating again.

Thanks!

-Mike

[Mar 9, 2014 11:58:17 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Everything since Feb either "Error" or too late

Well, since I like to see the resolution when I am searching...

I believe I tracked this to my TThrottle utility settings. I had been working with them at some point, and had apparently set the max CPU temp down quite a bit lower than I should. Since this primarily impacted Boinc projects, it dropped the CPU activity for each project down to about 1/3 or less of normal.

Add to this that WCG tends to have some short "complete by" dates compared to SETI and some of the others I support, and WCG tasks don't seem to upgrade to priority/urgent status even when they are about to expire. Even though the Resource Share was set to 400 (more than all other projects combined) I guess the algorithm was not detecting the reduced CPU cycles it was taking, so it (or perhaps the other projects) were sending a LOT more work than could be done by deadline. I think the WCG stuff was basically timing out.

With TThrottle set properly, I am now getting them out on time (roughly per the estimated time to complete) and the results are validating. There is a bit of a race going on to get caught up, since I only finally figured this out a few days ago and a lot of work was queued up (stopped updates until I am caught up).

So, basically, it was the much lower CPU availability/cycles caused by TThrottle being set too "cool", combined with quirks of the work estimating algorithms of WCG/Boinc and the other projects.

I figured it out mostly by looking at the elapsed versus remaining times in the "TASK" tab , and taskmanager CPU load by task. I also wondered why even when I set Boinc to run 100% of my CPU, there was no slowdown in other programs (TThrottle only impacts Boinc tasks, not the others).

Hope this helps someone else... (or me, when I forget 6 months from now and do it again).

-Mike

[Apr 2, 2014 12:39:55 AM]

[ ]