World Community Grid - View Thread

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: Grid App Idling

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 9

[ ]

Author

This topic has been viewed 947 times and has 8 replies

ebahapo
Cruncher
Joined: Nov 2, 2005
Post Count: 45
Status: Offline
Project Badges:

90 day badge for Human Proteome Folding - Phase 2

180 day badge for Mapping Cancer Markers

14 day badge for Uncovering Genome Mysteries

14 day badge for Outsmart Ebola Together

14 day badge for FightAIDS@Home - Phase 2

45 day badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Grid App Idling

I use the BOINC Linux client and have noticed that sometimes BOINC thinks that rosetta 4.19 is running, but the OS tells me it's sleeping:

26403 tty1 S 0:04 /home/emenezes/boinc/boinc_5.2.7_i686-pc-linux-gnu -a
32573 tty1 SN 86:25 \_ rosetta_4.19_i686-pc-linux-gnu -series 15 -protei
387 tty1 SN 18:20 \_ rosetta_4.78_i686-pc-linux-gnu aa 1n0u _ -silent
388 tty1 S 0:00 \_ rosetta_4.78_i686-pc-linux-gnu aa 1n0u _ -sil
389 tty1 S 0:00 \_ rosetta_4.78_i686-pc-linux-gnu aa 1n0u _

According to BOINC, WCG has gone through 100% of its WU. I tried pausing WCG, when other projects ran in its place, and then resuming it again, when other projects paused, but it still remained in the stop state.

I've given it a day to, but it still remained stopped and BOINC reported it as running at 100% of the WU. I eventually aborted it, which was the only way to avoid wasting cycles.

Please, advise.

Thanks.

----------------------------------------

[Nov 11, 2005 7:29:23 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

45 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

45 day badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

180 day badge for OpenPandemics - COVID-19


Re: Grid App Idling

emenezes,

We have found an error condition in the rosetta application that has affected 65 results out of the 29,791 results returned via the Linux agent so far. It usually automatically aborts the workunit or otherwise resolves itself on its on after a short period and continues with the next workunit. You are the first person to report that the error did not resolve itself automatically.

We are investigating this now and hope to have a fix in the near future. You should feel free to continue processing as this condition is extremely rare.

thanks for letting us know,
Kevin

[Nov 11, 2005 8:54:36 PM]

steven424
Cruncher
Joined: Nov 1, 2005
Post Count: 2
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

90 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

2 year badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project


Re: Grid App Idling

I, too, have observed this problem on my Linux server. I'm running both SETI and WCG. What happens is that apparently every 60 minutes the BOINC daemon switches between the SETI binary and the Rosetta binary; it never seems to run both at the same time. I see a message in the logfile that the active binary had been swapped out of memory. After the swap, the binary which is "active" does not execute, but instead seems to be in some sort of a paused state. After observing this once for 30 minutes, I manually restarted BOINC and the appropriate binary began executing.

Besides this condition, why do you not allow more than one binary to execute at the same time? Isit and L1/L2 cache swapping or context switching issue or somesuch, or are there more macro reasons?

--- Steve

[Nov 16, 2005 1:51:44 AM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:


Re: Grid App Idling

Steve - thanks for those comments. That gives us another place to look.

As far as the 60 minute switching cycle goes - that is the BOINC mechanism for giving the appropriate run time for different projects (you can configure the relative weight for each project in your device profile settings). If you have only one processor - then you should only have one binary to run at a time becuase it should be consuming 100% of the available CPU time. Adding a second process running at 100% would only make it compete with the first for cpu cycles and space in the L1/L2 and main memory. If you have more then one cpu then BOINC will automatically start up as many processes as you have processors in order to use them to their full advantage.

Obviously the bug we have that is causing it not to run is interfering with this mechanism and thus resulting in cpu cycles not being fully utilized.

----------------------------------------
[Edit 1 times, last edit by knreed at Nov 16, 2005 3:00:12 AM]

[Nov 16, 2005 2:59:19 AM]

steven424
Cruncher
Joined: Nov 1, 2005
Post Count: 2
Status: Offline
Project Badges:


Re: Grid App Idling

So far I have not seen the overly long pause phenomenon since my previous post. In the event it does occur again, what information should I collect, and where should I send it to? Should I include Logfiles? whiche ones??

--- Steve

[Nov 17, 2005 10:38:47 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:


Re: Grid App Idling

Steve,

If you see it again, please send the last day or so of the log (the one that you see in the boinc manager) to support@worldcommunitygrid.org. That will help.

thanks,
Kevin

[Nov 18, 2005 3:49:22 AM]

ebahapo
Cruncher
Joined: Nov 2, 2005
Post Count: 45
Status: Offline
Project Badges:


Re: Grid App Idling

It bit me again, now both on Linux as well as on Windows (both BOINC). The results which stopped were de456_13_1 and de332_04_2, respectively.

HTH

----------------------------------------

[Nov 21, 2005 9:23:53 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:


Re: Grid App Idling

thanks,

We have a fix for this that has been running in dev over the weekend. I want to run it a bit longer and make sure that it behaves as expected, but then we should be able to release it.

Kevin

FYI - you will see him around the forums now and then but Rick Alther does the work to 'board' the science applications onto the grid clients and he is the one who implemented the fix.

[Nov 22, 2005 2:34:06 AM]

ebahapo
Cruncher
Joined: Nov 2, 2005
Post Count: 45
Status: Offline
Project Badges:


Re: Grid App Idling

thanks,

We have a fix for this that has been running in dev over the weekend. I want to run it a bit longer and make sure that it behaves as expected, but then we should be able to release it.

Please, keep us posted. For now, I disabled HPF in order to not waste idle cycles.

Thank you.

----------------------------------------

[Nov 22, 2005 3:39:33 PM]

[ ]