Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 16
Posts: 16   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1608 times and has 15 replies Next Thread
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
World Community Grid is stalled

I installed the BOINC Agent (5.10.30) on my new work thinkpad - it has two processors and in general seems to be doing a great job.

I have noticed from time to time that one of the two projects or both would stop running even though it says it is running. I can tell this because the CPU time is no longer incrementing. I notice that this seems to happen when the Help Conquer Cancer is running, but today, the two projects that stopped running were both Help Conquer Cancer and FightAIDs at home.

When this happened today, I opened up the Window Task Manager and can see that there is almost no activity on either processor so it is clearly not running at all, even though the status says it is running two tasks.

I shut down the World Community Grid and restarted it and it is now running both tasks but had to redo part of the job (around 1% - been as much as 10% in the past).

Any suggestions on how to get this resolved?
[Feb 8, 2008 5:18:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: World Community Grid is stalled

Hello bieberj,
The first thing that occurs to me is Preferences. BOINC allows users to set their Preferences to stop running BOINC applications if various conditions occur. I always select the Maximum Output or a Custom Profile with similar values in my Device Profile. Select My Grid - Device Manager - (selected profile) and see what you have selected. Make sure you are not overriding with Local Preferences. Bring up BOINC Manager and select Advanced - Preferences - Clear to make sure that you are not overriding the global preferences on the website.

Lawrence
[Feb 8, 2008 5:32:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: World Community Grid is stalled

Lawrence,

I tried what you suggested - used the website to select Maximum Output. I waited for the current assignment to complete and for the new Help Conquer Cancer to start which it did. The screensaver kicked in and the Help Conquer Cancer stopped running while the Fight Aids at Home continue crunching away.

Do you have any other suggestions?
[Feb 9, 2008 2:23:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: World Community Grid is stalled

Does the message log of the BOINC client show there was a retrieval of the changed device profile? How much RAM in your thinkpad?
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Feb 9, 2008 2:41:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: World Community Grid is stalled

Hello bieberj,
It sounds as though screen saver + FAAH + HCC exceed your preferred memory limits. If you copy the Messages tab from the start of the log, it will tell us how much memory you allow BOINC to use.

Lawrence
[Feb 9, 2008 7:12:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: World Community Grid is stalled

Here is the requested data. Sounds interesting that the memory limit could be causing the problem. If this is the case, perhaps a message should be posted saying that computation was suspended?

2/9/2008 9:15:33 AM||Starting BOINC client version 5.10.30 for windows_intelx86
2/9/2008 9:15:33 AM||log flags: task, file_xfer, sched_ops
2/9/2008 9:15:33 AM||Libraries: libcurl/7.17.1 OpenSSL/0.9.8e zlib/1.2.3
2/9/2008 9:15:33 AM||Data directory: C:\Program Files\BOINC
2/9/2008 9:15:33 AM||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz [x86 Family 6 Model 15 Stepping 2]
2/9/2008 9:15:33 AM||Processor features: fpu tsc sse sse2 mmx
2/9/2008 9:15:33 AM||OS: Microsoft Windows XP: Professional Edition, Service Pack 2, (05.01.2600.00)
2/9/2008 9:15:33 AM||Memory: 1.99 GB physical, 3.84 GB virtual
2/9/2008 9:15:33 AM||Disk: 93.15 GB total, 76.90 GB free
2/9/2008 9:15:33 AM||Local time is UTC -5 hours
2/9/2008 9:15:33 AM|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 437790; location: (none); project prefs: default
2/9/2008 9:15:33 AM||General prefs: from World Community Grid (last modified 08-Feb-2008 12:59:06)
2/9/2008 9:15:33 AM||Host location: none
2/9/2008 9:15:33 AM||General prefs: using your defaults
2/9/2008 9:15:33 AM||Preferences limit memory usage when active to 1528.77MB
2/9/2008 9:15:33 AM||Preferences limit memory usage when idle to 1834.52MB
2/9/2008 9:15:33 AM||Preferences limit disk usage to 3.73GB
2/9/2008 9:15:33 AM|World Community Grid|Restarting task faah3050_ZINC01694590_xMut_md19390_00_1 using faah version 542
2/9/2008 9:15:33 AM|World Community Grid|Restarting task X0000041620073200411231739_0 using hcc1 version 515
2/9/2008 9:16:44 AM||General prefs: from World Community Grid (last modified 08-Feb-2008 12:59:06)
2/9/2008 9:16:44 AM||Host location: none
2/9/2008 9:16:44 AM||General prefs: using your defaults
2/9/2008 9:16:44 AM||Reading preferences override file
2/9/2008 9:16:44 AM||Preferences limit memory usage when active to 1528.77MB
2/9/2008 9:16:44 AM||Preferences limit memory usage when idle to 1834.52MB
2/9/2008 9:16:44 AM||Preferences limit disk usage to 3.73GB

[Feb 10, 2008 5:00:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
stoneysilence
Cruncher
Joined: May 2, 2007
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: World Community Grid is stalled

I am noticing the same thing on my machine. I am running WCG and Rosetta. Rosetta will always use 50% of my cpu but at times WCG will drop down to 0-3% of my cpu. This has never occurred before recently (maybe 2 weeks ago that I noticed). I have been running WCG since it was United Devices/Grid.org and been running both Rosetta and WCG since I switched from UD to Boinc (when UD closed). Always have seen my CPU at 100% all the time until recently. Even upped my memory usage capability to 65% (of 4gig but windows sees 2.8gig so that is probably the number it uses).

While I was writing this I noticed in Perfmon that WCG is hitting my hard drive a lot to write. It would stop WCG computations in order to write wcp_checkpoint_**.ckp files and it seems to do this every minute or so. Also it was writing to a file called stderror.txt a lot as well as receptor.* files.

Something weird is going on with WCG.

Nothing has changed on my system in this time.
[Feb 10, 2008 7:31:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: World Community Grid is stalled

bieberj, BOINC does print a message when suspending computation due to insufficient free memory.

If the log you supplied is complete, then at no point did BOINC suspend computation for any reason.
[Feb 10, 2008 7:36:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: World Community Grid is stalled

stoneysilence, some of the things you describe may be normal. The checkpointing you see is the way WCG saves state so it can continue if you restart your computer.

However, taken together with bieberj's report, I would like to look into this further.

When you see WCG using little CPU time, what tasks are running? Please could you copy the task names from BOINC Manager (in the Messages view). Then, please will you track these tasks and check that they complete and validate properly. You can use the Results Status page for that.

We will be interested to hear what you discover. Bear in mind, though, that it is normal for CPU to drop during intense disk IO (the CPU is being used for the IO, not for the WCG process).
[Feb 10, 2008 7:45:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: World Community Grid is stalled

stoneysilence wrote
....
While I was writing this I noticed in Perfmon that WCG is hitting my hard drive a lot to write. It would stop WCG computations in order to write wcp_checkpoint_**.ckp files and it seems to do this every minute or so. Also it was writing to a file called stderror.txt a lot as well as receptor.* files.

Something weird is going on with WCG.

Nothing has changed on my system in this time.

The only project i know at WCG that writes checkpoints every 60 seconds or so is the Dengue project. Actually the checkpoint is written every 1% progress.

The BOINC default to allow writing to disk is 60 seconds. Because like me many think that this frequent writing is not necessary, you can increase this delay to up to 999 seconds or 17 minutes.

With a 4 core running DDDT you get the picture. I've thus set it to 10 minutes (600 seconds), which means any science running that wanted checkpointing during that 10 minute frame will skip that opportunity and try again later. Personally i have no issue with that. At most on system restart would I loose about 10 minute of progress. On average its though less.

Some projects outside WCG are rude and don't have that routine and verify for the okay to write. With their large size checkpoints that's pretty impeding if 4 are running and each hits the disk every 80-90 seconds.

So far this mornings addition. Much more on checkpointing in a Start Here forum topic.

end of off topic.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Feb 10, 2008 8:11:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 16   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread