World Community Grid - View Thread - "result {blah} is no longer usable"

World Community Grid Forums

Category: Completed Research

Forum: FightAIDS@Home

Thread: "result {blah} is no longer usable"

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 20

[ ]

Author

This topic has been viewed 3958 times and has 19 replies

dguntner
Cruncher
Joined: Jan 17, 2013
Post Count: 8
Status: Offline


"result {blah} is no longer usable"

It's been a long time since I last was using a grid computing client. Recently (within the last couple of months), I've been in a state where I can do so again.

I'm running Debian 6.0.6 with the Debian-packaged version of BOINC, which is 6.10.58. I've got computing time split 50/50 between FightAIDS@Home and SETI@Home. The SETI work units have been running just swimmingly; no issues, no problem, no nuthin'. smile

Unfortunately, I can't say the same for the FightAIDS@Home. To date (since restarting a couple of months ago), every single work unit that has been sent to me as gotten the result of "result {whatever} is no longer usable." The last couple of days, I've gotten no new work units at all. And when I tried manually doing an update this morning in the BOINC client, I got a message that the project was currently offline for maintenance.

I'm starting to come to the conclusion that I should just drop the FA@H project and concentrate all CPU cycles to the S@H project - at least they are accepting my work units when my machine is ready to send them in and don't seem to have any problems sending more....

I think FA@H is a very worthwhile project and I'd like to contribute to it, but if every work unit I get is just thrown away as being "unusable" when sent back, then the CPU usage is just wasted cycles.

But before giving up on it completely, I figured I'd ask here and see if anyone knows what's going on. Is this just a short-term problem? Something else? Will it likely be sorted out soon? I tried searching the forum but didn't really come up with anything useful on this particular subject.

So, if anyone knows that's going on WRT my stated problem here, I'd love to know what it is.

Thanks.

--Dave

[Jan 17, 2013 3:54:30 PM]

gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 3010
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

90 day badge for Help Cure Muscular Dystrophy

90 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

1 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

180 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: "result {blah} is no longer usable"

The FA@H WU's that you're returning, are they after the set deadline of 10 days? if so, then the question is, why? (perhaps the cache setting is too high for the amount of time your computer is on - after all, if you haven't been in a position to run BOINC for some time, it may take a little while for it to determine the percentage of time your computer is on).

----------------------------------------

[Jan 17, 2013 4:30:18 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: "result {blah} is no longer usable"

Hi,

Not commenting on SETI's schwimmingness, just from reading their alternate [when they are down] cafe thread at Berkeley they're down like 3 of 7 days a week. Members can tell you about the WCG uptime by comparison and the general rate of validation they experience in processing FAAH tasks... which is like very high.

On that message, really like you to post the log from the Result Status page. Goto My Grid > Result Status, then click on the link of an FAAH task in the Status column where it says for instance Server Aborted, Error, Invalid.

Also, like you to go back into the message/event log or the stdoutdae.txt file [log record] and find the point where such a task produces this message and post this plus the before and after so we can see the sequence.

How much computing time is being logged on the Result Status page for the "result ... no longer usable"? My impression is that the task is too old, and then gets annulled with that message. That would mean such a task is sitting a long time on your system... more than 10 days, probably 12 or more without getting processed.

But, that's for the moment all guessing. The log info and other bits asked for will tell us how/where to look further.

edit: Think gb is on to something. If you're caching high because SETI is off-line so often/long, then FAAH may never get it's foot in sideways. Cache settings are generally advised never to exceed the shortest deadline of any project that's active on a host.. WCG longest deadline is 10 days]. Either fetching is rejected because the server is told the result wont come back in time, or if with high cache you manage to get FAAH, it would process with high priority and be returned in time.

----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 17, 2013 4:36:39 PM]

[Jan 17, 2013 4:30:31 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7852
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: "result {blah} is no longer usable"

Another thing you might post is the machine specifications and how much time during an average day you leave it crunching.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Jan 17, 2013 5:46:33 PM]

dguntner
Cruncher
Joined: Jan 17, 2013
Post Count: 8
Status: Offline


Re: "result {blah} is no longer usable"

Thank you to those who have replied so far. I'll post my replies here for the current set of replies. :-)

As an also-note: After posting my original message, I decided to try using the "reset project" button in the BOINC client, and that did in fact result in a new work unit being sent to me. We'll see what happens with this one.... Today is 1/17; it gives me a due-date of 1/27.

One thing I note that seems kind-of odd, though, is that it was downloaded over two hours ago and yet no processing has started on that work unit (it's still working on the current S@H unit), even though my preferences state that it should switch between applications every 120 minutes. Is that setting for wall-clock time, or the amount of CPU time being run up by an application? I *have* in fact seen the FAAH task go active in the BOINC client in the past; I'll be keeping an eye on it to make sure it does this time as well.

Anyway, onto my replies:

From gb009761:
The FA@H WU's that you're returning, are they after the set deadline of 10 days? if so, then the question is, why? (perhaps the cache setting is too high for the amount of time your computer is on - after all, if you haven't been in a position to run BOINC for some time, it may take a little while for it to determine the percentage of time your computer is on).

I don't *think* they've been getting submitted after the deadline, but to be honest, I don't really know. I'll pay closer attention to it this time and see if that's what's happening. The computer is used as a server for my home network; as such, it's on 24/7.

From SekeRob:
Hi,

Not commenting on SETI's schwimmingness, just from reading their alternate [when they are down] cafe thread at Berkeley they're down like 3 of 7 days a week. Members can tell you about the WCG uptime by comparison and the general rate of validation they experience in processing FAAH tasks... which is like very high.

Useful to know.

On that message, really like you to post the log from the Result Status page. Goto My Grid > Result Status, then click on the link of an FAAH task in the Status column where it says for instance Server Aborted, Error, Invalid.

I checked that page as you suggested. Unfortunately, under the Results Status page (thanks for describing how to get there!), the only thing showing is the currently-downloaded work unit. And on that, it just shows "in progress." No old units of any kind are showing. Filters are all set to "all." :-/

Also, like you to go back into the message/event log or the stdoutdae.txt file [log record] and find the point where such a task produces this message and post this plus the before and after so we can see the sequence.

I looked through the file. The latest incident of it contained several (over a few days) worth of this:


16-Jan-2013 03:40:48 [World Community Grid] Task faah37425_ZINC58356543_xh2_xtal_01_0: no shared memory segment
16-Jan-2013 03:40:48 [World Community Grid] Task faah37425_ZINC58356543_xh2_xtal_01_0 exited with zero status but no 'finished' file
16-Jan-2013 03:40:48 [World Community Grid] If this happens repeatedly you may need to reset the project.

And then at the point where the failure occurs is this:


16-Jan-2013 04:20:51 [World Community Grid] Sending scheduler request: Requested by project.
16-Jan-2013 04:20:51 [World Community Grid] Not reporting or requesting tasks
16-Jan-2013 04:20:55 [World Community Grid] Scheduler request completed
16-Jan-2013 04:20:55 [World Community Grid] Message from server: Result faah37425_ZINC58356543_xh2_xtal_01_0 is no longer usable
16-Jan-2013 04:21:09 [---] Resuming computation
16-Jan-2013 04:21:09 [SETI@home] Starting 25jl12ae.24023.20517.15.10.42_1
16-Jan-2013 04:21:09 [SETI@home] Starting task 25jl12ae.24023.20517.15.10.42_1 using setiathome_enhanced version 603
16-Jan-2013 04:21:10 [World Community Grid] Sending scheduler request: To report completed tasks.
16-Jan-2013 04:21:10 [World Community Grid] Reporting 1 completed tasks, not requesting new tasks
16-Jan-2013 04:21:12 [World Community Grid] Scheduler request completed
16-Jan-2013 04:21:12 [World Community Grid] [error] garbage_collect(); still have active task for acked result faah37425_ZINC58356543_xh2_xtal_01_0; state 5
16-Jan-2013 04:21:13 [World Community Grid] Computation for task faah37425_ZINC58356543_xh2_xtal_01_0 finished
16-Jan-2013 04:21:15 [World Community Grid] Started upload of faah37425_ZINC58356543_xh2_xtal_01_0_0
16-Jan-2013 04:21:15 [World Community Grid] Started upload of faah37425_ZINC58356543_xh2_xtal_01_0_1
16-Jan-2013 04:21:18 [SETI@home] Sending scheduler request: To fetch work.
16-Jan-2013 04:21:18 [SETI@home] Requesting new tasks
16-Jan-2013 04:21:20 [SETI@home] Scheduler request completed: got 2 new tasks
16-Jan-2013 04:21:22 [SETI@home] Started download of 30oc12aa.6460.2934.12.10.60
16-Jan-2013 04:21:22 [SETI@home] Started download of 30oc12af.6489.53482.9.10.222
16-Jan-2013 04:21:30 [---] Resuming computation
16-Jan-2013 04:21:51 [SETI@home] Finished download of 30oc12af.6489.53482.9.10.222
16-Jan-2013 04:21:52 [SETI@home] Finished download of 30oc12aa.6460.2934.12.10.60

Let me know if I missed something that you need.

How much computing time is being logged on the Result Status page for the "result ... no longer usable"? My impression is that the task is too old, and then gets annulled with that message. That would mean such a task is sitting a long time on your system... more than 10 days, probably 12 or more without getting processed.

But, that's for the moment all guessing. The log info and other bits asked for will tell us how/where to look further.

Well, as mentioned above, unfortunately the only work unit being shown at the moment on My Grid is the one currently downloaded, so I've got no other information I can give you. Maybe the log entries above help? I know I'm not liking the looks of some of them, though I don't know enough about this to know if they actually indicate a problem.

edit: Think gb is on to something. If you're caching high because SETI is off-line so often/long, then FAAH may never get it's foot in sideways. Cache settings are generally advised never to exceed the shortest deadline of any project that's active on a host.. WCG longest deadline is 10 days]. Either fetching is rejected because the server is told the result wont come back in time, or if with high cache you manage to get FAAH, it would process with high priority and be returned in time.

Is there a way to determine if that is what's happening?

From Sgt.Joe:
Another thing you might post is the machine specifications and how much time during an average day you leave it crunching.

Well, you're in luck. :-) Going through the stdoutdae.txt file for the above; I saw where it listed the machine specs as it started for the first time. The hardware hasn't changed since then, so here's that info:


17-Dec-2012 15:38:41 [---] Starting BOINC client version 6.10.58 for x86_64-pc-linux-gnu
17-Dec-2012 15:38:41 [---] Config: GUI RPC allowed from:
17-Dec-2012 15:38:41 [---] log flags: file_xfer, sched_ops, task
17-Dec-2012 15:38:41 [---] Libraries: libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6
17-Dec-2012 15:38:41 [---] Data directory: /var/lib/boinc-client
17-Dec-2012 15:38:41 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) 64 Processor 3200+ [Family 15 Model 47 Stepping 2]
17-Dec-2012 15:38:41 [---] Processor: 512.00 KB cache
17-Dec-2012 15:38:41 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up rep_good pni lahf_lm
17-Dec-2012 15:38:41 [---] OS: Linux: 2.6.32-5-amd64
17-Dec-2012 15:38:41 [---] Memory: 2.95 GB physical, 2.05 GB virtual
17-Dec-2012 15:38:41 [---] Disk: 27.50 GB total, 25.25 GB free
17-Dec-2012 15:38:41 [---] Local time is UTC -8 hours
17-Dec-2012 15:38:41 [---] No usable GPUs found
17-Dec-2012 15:38:41 [---] No general preferences found - using BOINC defaults
17-Dec-2012 15:38:41 [---] Reading preferences override file
17-Dec-2012 15:38:41 [---] Preferences:
17-Dec-2012 15:38:41 [---]    max memory usage when active: 1511.34MB
17-Dec-2012 15:38:41 [---]    max memory usage when idle: 2720.40MB
17-Dec-2012 15:38:41 [---]    max disk usage: 10.00GB
17-Dec-2012 15:38:41 [---]    don't use GPU while active
17-Dec-2012 15:38:41 [---]    suspend work if non-BOINC CPU load exceeds 25 %
17-Dec-2012 15:38:41 [---]    (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
17-Dec-2012 15:38:41 [---] Not using a proxy
17-Dec-2012 15:38:41 [---] This computer is not attached to any projects
17-Dec-2012 15:38:41 [---] Visit http://boinc.berkeley.edu for instructions
Initialization completed

And after that, of course, I reattached it to my logins in both places (after being away for a few years, I was surprised that the accounts still existed!) and let it grab the projects.

--Dave

[Jan 17, 2013 9:33:04 PM]

dguntner
Cruncher
Joined: Jan 17, 2013
Post Count: 8
Status: Offline


Re: "result {blah} is no longer usable"

As a quick followup to the information I provided yesterday, which I checked today, after almost 24 hours, there were three S@h tasks showing, two of which were done and waiting to be uploaded and one being worked on (the third one wasn't there when I posted yesterday). The FAAH task had not even been started on, even though I've got BOINC configured as a 50/50, and set to switch applications every 60 minutes. What the heck is going on?

I was able to manually get it to start on the FAAH task by suspending the S@h project briefly (fortunately, when I hit resume, the S@h task is showing as "waiting to run" and the FAAH is still running). I'll keep an eye on it and see how it behaves once enough time has passed that it switches back to S@h. I don't think I should have to be quite so manually involved, though.... :-) Are there any tweaks I can apply that will get this thing to balance out better?

--Dave

[Jan 18, 2013 2:54:21 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: "result {blah} is no longer usable"

You may look in the Project > Select Project > Properties screen of BOINC Manager. It would tell what the priorities are. If WCG is overworked, it would not get a turn, or only when it's getting late... that is with 6.10.58. The scheduling behavior with 7.0.4x is much more benign, but don't know if there's a package of that series which fits on Debian 6.06.

Switch 60 minutes... I've got it to 240 minutes... Much less flip flopping, which also on 7.0.4x wont be too often. This client tends to bulk work on one project, then move on to the next and back again. If you set [trick alert] the switch time to longer than the FAAH run time, it will likely finish in one go, when there's no High Priority processing interruption.

[Jan 18, 2013 3:07:04 PM]

dguntner
Cruncher
Joined: Jan 17, 2013
Post Count: 8
Status: Offline


Re: "result {blah} is no longer usable"

Thanks for the reply, SekeRob. I checked, and the only packaged version for Squeeze (Debian 6.0.x) is 6.10.58. There's no newer already-packaged version, even via the backports channel. I did some hunting, and found that the BOINC package that's included in Wheezy (the next Debian version, which is currently in testing) is 7.0.27. So, once Wheezy goes stable, I'll be able to upgrade to that and get the newer version of BOINC in the process (of course, that's not at the 7.0.4x level you mention above, so I still won't see that particular benefit). Until then, though, I'm stuck where I am unless I decide it bugs me enough to abandon the packaged version and grab current and compile/install from there (which, at the moment, it doesn't).

As an aside, I checked, and after an hour of processing the FAAH task, it switched back to the outstanding S@h task, and then it's stayed there. <grumble> I have no idea why it's doing this....

I checked the project properties as you describe above. On the scheduling section of that pop-up window for the FAAH project, it says that CPU scheduling priority is -1252.17, CPU work fetch priority of 0.00, CPU work fetch deferred for ---, CPU work fetch deferral interval ---, Duration correction factor 1.0000.

For S@h, the numbers are: CPU scheduling priority 809.72, CPU work fetch priority -84965.46, CPU work fetch deferred for ---, CPU work fetch deferral interval ---, Duration correction factor 1.8788.

Don't really understand those numbers, although I suspect the "CPU scheduling priority" is why S@h seems to be getting all the love from my system, despite the fact that both projects show "resource share" of 100 (50.00%) on the Projects tab.

Is there any way to adjust those values to help even things out better? FWIW, I've not been touching the client preferences file directly. I've left that clear, instead preferring to set the preference values via the website settings both here and at the S@h page.

The reason I was using a 60 switch was because when at the settings page for S@h (which I've been part of way longer (1999) than here at WCG (2008)), it said that 60 was recommended. So I just stuck with that number. I'll go ahead and adjust the settings on both pages to move it to 120 to see if that makes any kind of impact.

Where is the [trick alert] switch found?

--Dave

[Jan 18, 2013 6:35:10 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: "result {blah} is no longer usable"

If you know where to set it from 60 to 120 minutes you know where to set it to e.g 1440 minutes [freakish long]. In the client the option is on the Processor tab of local preferences and is called "Switch between applications every..." (which is the field the trick alert was about).

That field worked slightly different on the 5 series clients. It would force any task ahead if the value was set longer than the deadline of e.g. repair jobs [4 days at WCG]. Some would input 6000 or 7000 minutes to rush process these tasks.

Can't really understand why 50:50 is not running 50:50. Search the stdoutdae.txt log if you have messages such as "wont send work... on 99% and of that 100% computing". A very high cache setting would tell the client and the WCG server "look, if I give you work, but you wont send it back in time, I will only give you 1 task, long as you keep reporting tasks too late". Think you hand forcing FAAH on and setting an exorbitant switch time will get things moving towards more time for WCG [or temp suspend S@H], then backfilling of the buffer has to come from WCG... or lower the cache to something less than 10 days... try 8 days. No matter what, in FIFO order of processing that task will/should be finished before the 8th day is over.

P.S. The CPU priority of -1252.17 is the reason... WCG used up it's 60 minutes and handed over back to the alien search team. Can't remember ATM if that value is seconds or minutes. If minutes, see you tomorrow before the FAAH tasks gets another hour.

[Jan 18, 2013 7:30:02 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: "result {blah} is no longer usable"

Oh, errata: If you have a WCG task and it wont move... alternate trick: Temporarily reduce the resource share / project weight value to something extremely low and update client. If WCG only gets for instance 1%, all it's work will be processed by priority because the client will think it's only going to schedule 14.4 minutes per day. Because it knows that in 10 days it normally only schedules 144 minutes, which wont be enough, it will rush the FAAH job. My money is though on setting the switch app time to 1440 minutes... if FAAH starts [automatic or manual], it will run to the end, that is if your host is powerful enough to complete a FAAH task in under 24 hours.

[Jan 18, 2013 8:01:17 PM]

[ ]