| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 9
|
|
| Author |
|
|
ebahapo
Cruncher Joined: Nov 2, 2005 Post Count: 45 Status: Offline Project Badges:
|
I use the BOINC Linux client and have noticed that sometimes BOINC thinks that rosetta 4.19 is running, but the OS tells me it's sleeping:
----------------------------------------26403 tty1 S 0:04 /home/emenezes/boinc/boinc_5.2.7_i686-pc-linux-gnu -a 32573 tty1 SN 86:25 \_ rosetta_4.19_i686-pc-linux-gnu -series 15 -protei 387 tty1 SN 18:20 \_ rosetta_4.78_i686-pc-linux-gnu aa 1n0u _ -silent 388 tty1 S 0:00 \_ rosetta_4.78_i686-pc-linux-gnu aa 1n0u _ -sil 389 tty1 S 0:00 \_ rosetta_4.78_i686-pc-linux-gnu aa 1n0u _ According to BOINC, WCG has gone through 100% of its WU. I tried pausing WCG, when other projects ran in its place, and then resuming it again, when other projects paused, but it still remained in the stop state. I've given it a day to, but it still remained stopped and BOINC reported it as running at 100% of the WU. I eventually aborted it, which was the only way to avoid wasting cycles. Please, advise. Thanks. ![]() |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
emenezes,
We have found an error condition in the rosetta application that has affected 65 results out of the 29,791 results returned via the Linux agent so far. It usually automatically aborts the workunit or otherwise resolves itself on its on after a short period and continues with the next workunit. You are the first person to report that the error did not resolve itself automatically. We are investigating this now and hope to have a fix in the near future. You should feel free to continue processing as this condition is extremely rare. thanks for letting us know, Kevin |
||
|
|
steven424
Cruncher Joined: Nov 1, 2005 Post Count: 2 Status: Offline Project Badges:
|
I, too, have observed this problem on my Linux server. I'm running both SETI and WCG. What happens is that apparently every 60 minutes the BOINC daemon switches between the SETI binary and the Rosetta binary; it never seems to run both at the same time. I see a message in the logfile that the active binary had been swapped out of memory. After the swap, the binary which is "active" does not execute, but instead seems to be in some sort of a paused state. After observing this once for 30 minutes, I manually restarted BOINC and the appropriate binary began executing.
Besides this condition, why do you not allow more than one binary to execute at the same time? Isit and L1/L2 cache swapping or context switching issue or somesuch, or are there more macro reasons? --- Steve |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
Steve - thanks for those comments. That gives us another place to look.
----------------------------------------As far as the 60 minute switching cycle goes - that is the BOINC mechanism for giving the appropriate run time for different projects (you can configure the relative weight for each project in your device profile settings). If you have only one processor - then you should only have one binary to run at a time becuase it should be consuming 100% of the available CPU time. Adding a second process running at 100% would only make it compete with the first for cpu cycles and space in the L1/L2 and main memory. If you have more then one cpu then BOINC will automatically start up as many processes as you have processors in order to use them to their full advantage. Obviously the bug we have that is causing it not to run is interfering with this mechanism and thus resulting in cpu cycles not being fully utilized. [Edit 1 times, last edit by knreed at Nov 16, 2005 3:00:12 AM] |
||
|
|
steven424
Cruncher Joined: Nov 1, 2005 Post Count: 2 Status: Offline Project Badges:
|
So far I have not seen the overly long pause phenomenon since my previous post. In the event it does occur again, what information should I collect, and where should I send it to? Should I include Logfiles? whiche ones??
--- Steve |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
Steve,
If you see it again, please send the last day or so of the log (the one that you see in the boinc manager) to support@worldcommunitygrid.org. That will help. thanks, Kevin |
||
|
|
ebahapo
Cruncher Joined: Nov 2, 2005 Post Count: 45 Status: Offline Project Badges:
|
It bit me again, now both on Linux as well as on Windows (both BOINC). The results which stopped were de456_13_1 and de332_04_2, respectively.
----------------------------------------HTH ![]() |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
thanks,
We have a fix for this that has been running in dev over the weekend. I want to run it a bit longer and make sure that it behaves as expected, but then we should be able to release it. Kevin FYI - you will see him around the forums now and then but Rick Alther does the work to 'board' the science applications onto the grid clients and he is the one who implemented the fix. |
||
|
|
ebahapo
Cruncher Joined: Nov 2, 2005 Post Count: 45 Status: Offline Project Badges:
|
thanks, We have a fix for this that has been running in dev over the weekend. I want to run it a bit longer and make sure that it behaves as expected, but then we should be able to release it. Please, keep us posted. For now, I disabled HPF in order to not waste idle cycles. Thank you. ![]() |
||
|
|
|