Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1007 times and has 7 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
BOINC Agent appears stalled

I'm running BOINC agent 5.8.15 on a few computers. On one of my Win XP machines, the agent runs through a work unit to completion, has another work unit ready to go, but it never starts. Just sits there at zero cpu. If I do a manual update, the completed work unit gets uploaded fine, but the new work still doesn't start. The only way I've found to get the new work unit started is to exit and restart the BOINC agent, then it begins the new work unit fine.

I don't see any problem messages in the log. My device profile configuration is using the Maximum Output option. Virtual memory is set to 2.5 times physical (2.5G).

I'm a recent grid.org refugee that had 4M points, and am trying to get up to 4k points/day here at WC, but this manual process is slowing me down...
[May 9, 2007 5:42:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC Agent appears stalled

Can you find the global_prefs.xml file in the BOINC directory and copy / paste it here. Also a piece of the message when you have hit the update button, until the point where the next task does not start.

In the BOINC Advance View Activity menu, is the option set to always process and always have the network available?

Very possibly a project reset fixes this:

1. Set in the projects tab to not fetch work
2. finish the job and transmit result and update until the acknowledgement comes thru and the WU vanishes from the Tasks list
3. Hit Project Reset in the Projects Tab, when WCG is selected.

Finally, try get ahold of version 5.8.16 and install over 5.8.15. Get it here:
http://boinc.berkeley.edu/download_all.php

It's curious that work seemingly does get send nonetheless.

cheers
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 9, 2007 6:00:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC Agent appears stalled

What security software do you use? That is usually the culprit in situations like this.

However, the recommended BOINC version is 5.8.16, so you may want to grab the latest version from here: http://boinc.berkeley.edu/download.php and see if that fixes your problem.
[May 9, 2007 6:01:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC Agent appears stalled

Thanks for the quick replies. I did have the Always process and Network always available set. On the security software, I have Symantec Client Firewall running. Did a quick check and didn't see any hits that I thought might be related - even so I went ahead and authorized the application and WC domain just to make sure.

I updated to the BOINC version 5.8.16 after doing the project reset as suggested and it is now off and running. I'll let it run over night and see how it goes.

By the way, what dictates when a backup work unit gets loaded? I typically only see one get pre-loaded, ocasionally 2 if it's one of the short ones. Is this configurable to be able to download enough for a full day of CPU crunching for example? I'd like the machines to keep crunching even if there is a temporary network outage.

thanks for the help!
[May 10, 2007 1:14:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC Agent appears stalled

Yes - use the "Connect to network about every x days" setting in your device profile.
[May 10, 2007 1:25:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC Agent appears stalled

Ok, It's happening with the latest version as well. Stalled again after completion of a work unit:

WCG Genome Comparison 5.14 10000213-10001858_1 02.19:25 100%
WCG Human Proteome Folding lb750_00022_10 00.00:01 0.00%
WCG Human Proteome Folding lb758_00014_15 ---- 0.00%

It looks like it started the next work unit and then paused after one second. Here's the message log for a couple of work units prior to the point of hangup:

5/11/2007 11:50:52 AM|World Community Grid|Computation for task lb732_00048_18 finished
5/11/2007 11:50:52 AM|World Community Grid|Starting 10000213-10001858_1
5/11/2007 11:50:52 AM|World Community Grid|Starting task 10000213-10001858_1 using fcg1 version 514
5/11/2007 11:50:54 AM|World Community Grid|[file_xfer] Started upload of file lb732_00048_18_0
5/11/2007 11:50:56 AM|World Community Grid|[file_xfer] Finished upload of file lb732_00048_18_0
5/11/2007 11:50:56 AM|World Community Grid|[file_xfer] Throughput 336352 bytes/sec
5/11/2007 11:55:52 AM|World Community Grid|Sending scheduler request: To fetch work
5/11/2007 11:55:52 AM|World Community Grid|Requesting 131 seconds of new work, and reporting 1 completed tasks
5/11/2007 11:55:57 AM|World Community Grid|Scheduler RPC succeeded [server version 509]
5/11/2007 11:55:57 AM||General prefs: from World Community Grid (last modified 2007-05-11 10:31:53)
5/11/2007 11:55:57 AM||Host location: home
5/11/2007 11:55:57 AM||General prefs: using separate prefs for home
5/11/2007 11:55:57 AM|World Community Grid|Deferring communication for 5 min 3 sec
5/11/2007 11:55:57 AM|World Community Grid|Reason: requested by project
5/11/2007 11:56:00 AM|World Community Grid|[file_xfer] Started download of file lb758_00014_lb758.fasta.gz
5/11/2007 11:56:00 AM|World Community Grid|[file_xfer] Started download of file lb758_00014_lb758.psipred.gz
5/11/2007 11:56:01 AM|World Community Grid|[file_xfer] Finished download of file lb758_00014_lb758.fasta.gz
5/11/2007 11:56:01 AM|World Community Grid|[file_xfer] Throughput 788 bytes/sec
5/11/2007 11:56:01 AM|World Community Grid|[file_xfer] Finished download of file lb758_00014_lb758.psipred.gz
5/11/2007 11:56:01 AM|World Community Grid|[file_xfer] Throughput 6709 bytes/sec
5/11/2007 11:56:01 AM|World Community Grid|[file_xfer] Started download of file lb758_00014_lb758.psipred_ss2.gz
5/11/2007 11:56:01 AM|World Community Grid|[file_xfer] Started download of file lb758_00014_aalb75803_05.075_v1_3.gz
5/11/2007 11:56:02 AM|World Community Grid|[file_xfer] Finished download of file lb758_00014_lb758.psipred_ss2.gz
5/11/2007 11:56:02 AM|World Community Grid|[file_xfer] Throughput 49066 bytes/sec
5/11/2007 11:56:02 AM|World Community Grid|[file_xfer] Finished download of file lb758_00014_aalb75803_05.075_v1_3.gz
5/11/2007 11:56:02 AM|World Community Grid|[file_xfer] Throughput 796377 bytes/sec
5/11/2007 11:56:02 AM|World Community Grid|[file_xfer] Started download of file lb758_00014_aalb75809_05.075_v1_3.gz
5/11/2007 11:56:04 AM|World Community Grid|[file_xfer] Finished download of file lb758_00014_aalb75809_05.075_v1_3.gz
5/11/2007 11:56:04 AM|World Community Grid|[file_xfer] Throughput 1204938 bytes/sec
5/11/2007 2:10:41 PM|World Community Grid|Computation for task 10000213-10001858_1 finished
5/11/2007 2:10:42 PM|World Community Grid|Starting lb750_00022_10
5/11/2007 2:10:42 PM|World Community Grid|Starting task lb750_00022_10 using hpf2 version 518
5/11/2007 2:10:44 PM|World Community Grid|[file_xfer] Started upload of file 10000213-10001858_1_0
5/11/2007 2:10:47 PM|World Community Grid|[file_xfer] Finished upload of file 10000213-10001858_1_0
5/11/2007 2:10:47 PM|World Community Grid|[file_xfer] Throughput 420980 bytes/sec
*** end of log ***


No CPU time being expended by the process in Task Manager. Here's the global prefs:

- <global_preferences>
<source_project>http://www.worldcommunitygrid.org/</source_project>
<source_scheduler>https://secure.worldcommunitygrid.org/boinc/w...</source_scheduler>
<mod_time>1178904713</mod_time>
<cpu_scheduling_period_minutes>120</cpu_scheduling_period_minutes>
<disk_interval>60.0</disk_interval>
<disk_max_used_gb>5.0</disk_max_used_gb>
<disk_max_used_pct>90.0</disk_max_used_pct>
<disk_min_free_gb>0.5</disk_min_free_gb>
<idle_time_to_run>0.1</idle_time_to_run>
<max_bytes_sec_down>0.0</max_bytes_sec_down>
<max_bytes_sec_up>0.0</max_bytes_sec_up>
<max_cpus>16</max_cpus>
<run_if_user_active />
<cpu_usage_limit>100.0</cpu_usage_limit>
<ram_max_used_busy_pct>100.0</ram_max_used_busy_pct>
<ram_max_used_idle_pct>100.0</ram_max_used_idle_pct>
<work_buf_min_days>0.75</work_buf_min_days>
<end_hour>0</end_hour>
<net_end_hour>0</net_end_hour>
<net_start_hour>0</net_start_hour>
<start_hour>0</start_hour>
- <venue name="home">
<cpu_scheduling_period_minutes>120</cpu_scheduling_period_minutes>
<disk_interval>60.0</disk_interval>
<disk_max_used_gb>10.0</disk_max_used_gb>
<disk_max_used_pct>80.0</disk_max_used_pct>
<disk_min_free_gb>0.5</disk_min_free_gb>
<end_hour>0</end_hour>
<idle_time_to_run>0.1</idle_time_to_run>
<max_bytes_sec_down>0.0</max_bytes_sec_down>
<max_bytes_sec_up>0.0</max_bytes_sec_up>
<max_cpus>16</max_cpus>
<net_end_hour>0</net_end_hour>
<net_start_hour>0</net_start_hour>
<run_if_user_active />
<start_hour>0</start_hour>
<cpu_usage_limit>100.0</cpu_usage_limit>
<ram_max_used_busy_pct>100.0</ram_max_used_busy_pct>
<ram_max_used_idle_pct>100.0</ram_max_used_idle_pct>
<work_buf_min_days>0.75</work_buf_min_days>
</venue>
- <venue name="work">
<cpu_scheduling_period_minutes>120</cpu_scheduling_period_minutes>
<disk_interval>60.0</disk_interval>
<disk_max_used_gb>10.0</disk_max_used_gb>
<disk_max_used_pct>80.0</disk_max_used_pct>
<disk_min_free_gb>0.5</disk_min_free_gb>
<end_hour>0</end_hour>
<idle_time_to_run>0.1</idle_time_to_run>
<max_bytes_sec_down>0.0</max_bytes_sec_down>
<max_bytes_sec_up>0.0</max_bytes_sec_up>
<max_cpus>16</max_cpus>
<net_end_hour>0</net_end_hour>
<net_start_hour>0</net_start_hour>
<run_if_user_active />
<start_hour>0</start_hour>
<cpu_usage_limit>100.0</cpu_usage_limit>
<ram_max_used_busy_pct>100.0</ram_max_used_busy_pct>
<ram_max_used_idle_pct>100.0</ram_max_used_idle_pct>
<work_buf_min_days>0.75</work_buf_min_days>
</venue>
</global_preferences>


Any Ideas?
[May 11, 2007 11:47:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC Agent appears stalled

Interestingly, I did an update to upload the completed WU and then when I suspended the apparently hung WU, the next work unit started processing immediately. I went back to the hung work unit and un-suspended it then suspended the second work unit. Hung unit did not restart. Un-suspended second work unit, it didn't restart either - so now both work units were appearing hung. Clicked on first work unit and did a show graphics - The screen sort of jumped. got no graphics, and then noticed the WU was updated to 100% complete, no CPU time change, and the next work unit was now running again. Here's the log messages:

5/11/2007 4:47:43 PM|World Community Grid|Sending scheduler request: Requested by user
5/11/2007 4:47:43 PM|World Community Grid|Reporting 1 tasks
5/11/2007 4:47:48 PM|World Community Grid|Scheduler RPC succeeded [server version 509]
5/11/2007 4:47:48 PM|World Community Grid|Deferring communication for 5 min 3 sec
5/11/2007 4:47:48 PM|World Community Grid|Reason: requested by project
5/11/2007 4:48:31 PM|World Community Grid|Starting lb758_00014_15
5/11/2007 4:48:31 PM|World Community Grid|Starting task lb758_00014_15 using hpf2 version 518
5/11/2007 4:50:18 PM|World Community Grid|Resuming task lb750_00022_10 using hpf2 version 518
5/11/2007 4:51:26 PM|World Community Grid|Computation for task lb750_00022_10 finished
5/11/2007 4:51:26 PM|World Community Grid|Output file lb750_00022_10_0 for task lb750_00022_10 absent
5/11/2007 4:51:26 PM|World Community Grid|Resuming task lb758_00014_15 using hpf2 version 518
[May 12, 2007 12:04:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC Agent appears stalled

Suspicious line: Output file lb750_00022_10_0 for task lb750_00022_10 absent

If next job after restart was aHPF2 too and froze on resume, it's very likely something with your security software (Antivirus / Firewall / Windows DEP / Spamfilter), that in this instance does not seem to like the HPF2 jobs.

Symantec...... cold streaks along my spine. At any rate, you may need to permit wcg_hpf2_rosetta_5.18_windows_intel86 process as an exception. Strange your firewall does not flare up. Mine does to tell me it's stopping the process. While at it permit port 31416 on 127.0.0.1.

Note if versions for science processes change, you need to add the permission again.

Added: The error ".... absent" is also discussed here. http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=13083#98594

Norton, Symantec, Kaspersky are the most frequent showings in relation to agents running problems. Most often it's resolved by permission exceptions granting.
Lawrence posted a long collection of security software related posts in the Start Here forum: Problems with Anti-Virus and Firewall Programs
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at May 12, 2007 9:48:21 AM]
[May 12, 2007 9:32:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread