| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 16
|
|
| Author |
|
|
bieberj
Senior Cruncher United States Joined: Dec 2, 2004 Post Count: 406 Status: Offline Project Badges:
|
I installed the BOINC Agent (5.10.30) on my new work thinkpad - it has two processors and in general seems to be doing a great job.
I have noticed from time to time that one of the two projects or both would stop running even though it says it is running. I can tell this because the CPU time is no longer incrementing. I notice that this seems to happen when the Help Conquer Cancer is running, but today, the two projects that stopped running were both Help Conquer Cancer and FightAIDs at home. When this happened today, I opened up the Window Task Manager and can see that there is almost no activity on either processor so it is clearly not running at all, even though the status says it is running two tasks. I shut down the World Community Grid and restarted it and it is now running both tasks but had to redo part of the job (around 1% - been as much as 10% in the past). Any suggestions on how to get this resolved? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello bieberj,
The first thing that occurs to me is Preferences. BOINC allows users to set their Preferences to stop running BOINC applications if various conditions occur. I always select the Maximum Output or a Custom Profile with similar values in my Device Profile. Select My Grid - Device Manager - (selected profile) and see what you have selected. Make sure you are not overriding with Local Preferences. Bring up BOINC Manager and select Advanced - Preferences - Clear to make sure that you are not overriding the global preferences on the website. Lawrence |
||
|
|
bieberj
Senior Cruncher United States Joined: Dec 2, 2004 Post Count: 406 Status: Offline Project Badges:
|
Lawrence,
I tried what you suggested - used the website to select Maximum Output. I waited for the current assignment to complete and for the new Help Conquer Cancer to start which it did. The screensaver kicked in and the Help Conquer Cancer stopped running while the Fight Aids at Home continue crunching away. Do you have any other suggestions? |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Does the message log of the BOINC client show there was a retrieval of the changed device profile? How much RAM in your thinkpad?
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello bieberj,
It sounds as though screen saver + FAAH + HCC exceed your preferred memory limits. If you copy the Messages tab from the start of the log, it will tell us how much memory you allow BOINC to use. Lawrence |
||
|
|
bieberj
Senior Cruncher United States Joined: Dec 2, 2004 Post Count: 406 Status: Offline Project Badges:
|
Here is the requested data. Sounds interesting that the memory limit could be causing the problem. If this is the case, perhaps a message should be posted saying that computation was suspended?
2/9/2008 9:15:33 AM||Starting BOINC client version 5.10.30 for windows_intelx86 2/9/2008 9:15:33 AM||log flags: task, file_xfer, sched_ops 2/9/2008 9:15:33 AM||Libraries: libcurl/7.17.1 OpenSSL/0.9.8e zlib/1.2.3 2/9/2008 9:15:33 AM||Data directory: C:\Program Files\BOINC 2/9/2008 9:15:33 AM||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz [x86 Family 6 Model 15 Stepping 2] 2/9/2008 9:15:33 AM||Processor features: fpu tsc sse sse2 mmx 2/9/2008 9:15:33 AM||OS: Microsoft Windows XP: Professional Edition, Service Pack 2, (05.01.2600.00) 2/9/2008 9:15:33 AM||Memory: 1.99 GB physical, 3.84 GB virtual 2/9/2008 9:15:33 AM||Disk: 93.15 GB total, 76.90 GB free 2/9/2008 9:15:33 AM||Local time is UTC -5 hours 2/9/2008 9:15:33 AM|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 437790; location: (none); project prefs: default 2/9/2008 9:15:33 AM||General prefs: from World Community Grid (last modified 08-Feb-2008 12:59:06) 2/9/2008 9:15:33 AM||Host location: none 2/9/2008 9:15:33 AM||General prefs: using your defaults 2/9/2008 9:15:33 AM||Preferences limit memory usage when active to 1528.77MB 2/9/2008 9:15:33 AM||Preferences limit memory usage when idle to 1834.52MB 2/9/2008 9:15:33 AM||Preferences limit disk usage to 3.73GB 2/9/2008 9:15:33 AM|World Community Grid|Restarting task faah3050_ZINC01694590_xMut_md19390_00_1 using faah version 542 2/9/2008 9:15:33 AM|World Community Grid|Restarting task X0000041620073200411231739_0 using hcc1 version 515 2/9/2008 9:16:44 AM||General prefs: from World Community Grid (last modified 08-Feb-2008 12:59:06) 2/9/2008 9:16:44 AM||Host location: none 2/9/2008 9:16:44 AM||General prefs: using your defaults 2/9/2008 9:16:44 AM||Reading preferences override file 2/9/2008 9:16:44 AM||Preferences limit memory usage when active to 1528.77MB 2/9/2008 9:16:44 AM||Preferences limit memory usage when idle to 1834.52MB 2/9/2008 9:16:44 AM||Preferences limit disk usage to 3.73GB |
||
|
|
stoneysilence
Cruncher Joined: May 2, 2007 Post Count: 10 Status: Offline Project Badges:
|
I am noticing the same thing on my machine. I am running WCG and Rosetta. Rosetta will always use 50% of my cpu but at times WCG will drop down to 0-3% of my cpu. This has never occurred before recently (maybe 2 weeks ago that I noticed). I have been running WCG since it was United Devices/Grid.org and been running both Rosetta and WCG since I switched from UD to Boinc (when UD closed). Always have seen my CPU at 100% all the time until recently. Even upped my memory usage capability to 65% (of 4gig but windows sees 2.8gig so that is probably the number it uses).
While I was writing this I noticed in Perfmon that WCG is hitting my hard drive a lot to write. It would stop WCG computations in order to write wcp_checkpoint_**.ckp files and it seems to do this every minute or so. Also it was writing to a file called stderror.txt a lot as well as receptor.* files. Something weird is going on with WCG. Nothing has changed on my system in this time. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
bieberj, BOINC does print a message when suspending computation due to insufficient free memory.
If the log you supplied is complete, then at no point did BOINC suspend computation for any reason. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
stoneysilence, some of the things you describe may be normal. The checkpointing you see is the way WCG saves state so it can continue if you restart your computer.
However, taken together with bieberj's report, I would like to look into this further. When you see WCG using little CPU time, what tasks are running? Please could you copy the task names from BOINC Manager (in the Messages view). Then, please will you track these tasks and check that they complete and validate properly. You can use the Results Status page for that. We will be interested to hear what you discover. Bear in mind, though, that it is normal for CPU to drop during intense disk IO (the CPU is being used for the IO, not for the WCG process). |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
stoneysilence wrote
----------------------------------------.... While I was writing this I noticed in Perfmon that WCG is hitting my hard drive a lot to write. It would stop WCG computations in order to write wcp_checkpoint_**.ckp files and it seems to do this every minute or so. Also it was writing to a file called stderror.txt a lot as well as receptor.* files. Something weird is going on with WCG. Nothing has changed on my system in this time. The only project i know at WCG that writes checkpoints every 60 seconds or so is the Dengue project. Actually the checkpoint is written every 1% progress. The BOINC default to allow writing to disk is 60 seconds. Because like me many think that this frequent writing is not necessary, you can increase this delay to up to 999 seconds or 17 minutes. With a 4 core running DDDT you get the picture. I've thus set it to 10 minutes (600 seconds), which means any science running that wanted checkpointing during that 10 minute frame will skip that opportunity and try again later. Personally i have no issue with that. At most on system restart would I loose about 10 minute of progress. On average its though less. Some projects outside WCG are rude and don't have that routine and verify for the okay to write. With their large size checkpoints that's pretty impeding if 4 are running and each hits the disk every 80-90 seconds. So far this mornings addition. Much more on checkpointing in a Start Here forum topic. end of off topic.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
|