Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 156
Posts: 156   Pages: 16   [ Previous Page | 4 5 6 7 8 9 10 11 12 13 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 350007 times and has 155 replies Next Thread
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

It might help; I was thinking about suggesting it.
[Oct 18, 2010 2:16:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

I don't think that limiting the number of concurrent CEP2 WU's is going to help... I had just one running on my quad-core yesterday with 3 c4cw WU's, and the CEP2 WU still got shorted, exactly as all others have for me for the last few weeks. Also, before this problem started, there were plenty of times I was running 4 at once on the quad core, and got perfectly acceptable CPU times reported.
[Oct 18, 2010 3:55:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Plz understand that the limiting of concurrent CEP2 tasks was not to imply that the 'big' time losses would be erased with that. That's an issue to be resolved before an all supported platforms release is put in production. The concurrency limiting is to further reduce any of the inefficiency due the large disk I/O the more running the more bottlenecking. On my quad running 2 or 4 makes a difference of > 10 minutes per task i.e. I don't know at this time if it's optional or enforced restricting of CEP2... some might not be concerned and just want to run the science app all out. We'll see.

edit: strike "i.e."
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Oct 18, 2010 4:17:00 PM]
[Oct 18, 2010 4:13:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Just a note to say that when I suspended one of these tasks, it immediately errored out:
E200464_ 778_ A.24.C20H15NOSSi.230.0.set1d06_ 1-- four Error 17/10/10 19:20:09 18/10/10 18:57:13 3.06 60.6 / 0.0
Kubuntu x64 Q6600
[Oct 18, 2010 7:41:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Yikes, here's the log for a CEP2 job that came up "valid." Look at how many times it encountered some kind of problem during job #2 and started job 2 over again, resetting the CPU time every time. It ended up with 6.2 hrs CPU time, but must have spent more than that on job #2 alone. (This is on a "reliable machine" that has been crunching nearly 24/7 since Feb.)

I'm going to stop crunching CEP2 for a while. I look forward to participating in its next Linux beta and rejoining when there's a better-running version.

World Community Grid

Result Log

Result Name: E200459_ 340_ A.24.C19H11NOS2Se.187.3.set1d06_ 0--
<core_client_version>6.10.17</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[17:04:26] Number of jobs = 16
[17:04:26] Starting job 0,CPU time has been restored to 0.000000.
[17:04:26] Starting new Job
[17:04:26] Qink name = fldman
[17:04:26] Qink name = gesman
[17:04:26] Qink name = scfman
[17:07:06] Qink name = anlman
[17:07:09] End of Job
[17:07:11] Finished Job #0
[17:07:11] Starting job 1,CPU time has been restored to 88.290000.
[17:07:11] Starting new Job
[17:07:11] Qink name = fldman
[17:07:12] Qink name = gesman
[17:07:12] Qink name = scfman
[17:13:42] Qink name = anlman
[17:14:18] End of Job
[17:14:21] Finished Job #1
[17:14:21] Starting job 2,CPU time has been restored to 345.950000.
[17:14:21] Starting new Job
[17:14:21] Qink name = fldman
[17:14:21] Qink name = gesman
[17:14:21] Qink name = scfman
[17:19:00] Qink name = anlman
[17:19:00] Qink name = drvman
[17:20:19] Qink name = optman
[17:20:19] Qink name = fldman
[17:20:19] Qink name = gesman
[17:20:19] Qink name = scfman
[17:29:00] Qink name = anlman
[17:29:00] Qink name = drvman
[17:30:19] Qink name = optman
[17:30:19] Qink name = fldman
[17:30:19] Qink name = gesman
[17:30:20] Qink name = scfman
Parent was killed, exiting
[17:33:48] Number of jobs = 16
[17:33:48] Starting job 2,CPU time has been restored to 345.950000.
[17:33:48] Starting new Job
[17:33:48] Qink name = fldman
[17:33:48] Qink name = gesman
[17:33:48] Qink name = scfman
[17:35:00] Number of jobs = 16
[17:35:00] Starting job 2,CPU time has been restored to 345.950000.
Error reading in TMP file 44/0 (1120): No such file or directory
[17:35:00] Starting new Job
[17:35:00] Qink name = fldman
[17:35:01] Qink name = gesman
[17:35:01] Qink name = scfman
Parent was killed, exiting
[17:35:41] Number of jobs = 16
[17:35:41] Starting job 2,CPU time has been restored to 345.950000.
[17:35:41] Starting new Job
[17:35:41] Qink name = fldman
[17:35:41] Qink name = gesman
[17:35:41] Qink name = scfman
Parent was killed, exiting
[17:37:24] Number of jobs = 16
[17:37:24] Starting job 2,CPU time has been restored to 345.950000.
[17:37:24] Starting new Job
[17:37:24] Qink name = fldman
[17:37:24] Qink name = gesman
[17:37:24] Qink name = scfman
[17:39:06] Number of jobs = 16
[17:39:06] Starting job 2,CPU time has been restored to 345.950000.
[17:39:06] Starting new Job
[17:39:07] Qink name = fldman
[17:39:07] Qink name = gesman
[17:39:07] Qink name = scfman
[17:40:44] Number of jobs = 16
[17:40:44] Starting job 2,CPU time has been restored to 345.950000.
Error reading in TMP file 44/0 (1120): No such file or directory
[17:40:45] Starting new Job
[17:40:45] Qink name = fldman
[17:40:45] Qink name = gesman
[17:40:45] Qink name = scfman
Parent was killed, exiting
[17:43:51] Number of jobs = 16
[17:43:51] Starting job 2,CPU time has been restored to 345.950000.
[17:43:51] Starting new Job
[17:43:51] Qink name = fldman
[17:43:52] Qink name = gesman
[17:43:52] Qink name = scfman
Parent was killed, exiting
[17:47:48] Number of jobs = 16
[17:47:48] Starting job 2,CPU time has been restored to 345.950000.
[17:47:48] Starting new Job
[17:47:48] Qink name = fldman
[17:47:49] Qink name = gesman
[17:47:49] Qink name = scfman
[17:52:08] Qink name = anlman
[17:52:08] Qink name = drvman
[17:53:23] Qink name = optman
[17:53:23] Qink name = fldman
[17:53:23] Qink name = gesman
[17:53:24] Qink name = scfman
[17:57:43] Number of jobs = 16
[17:57:43] Starting job 2,CPU time has been restored to 345.950000.
[17:57:43] Starting new Job
[17:57:43] Qink name = fldman
[17:57:44] Qink name = gesman
[17:57:44] Qink name = scfman
[18:02:03] Qink name = anlman
[18:02:03] Qink name = drvman
[18:03:17] Qink name = optman
[18:03:17] Qink name = fldman
[18:03:17] Qink name = gesman
[18:03:17] Qink name = scfman
[18:11:25] Qink name = anlman
[18:11:25] Qink name = drvman
[18:12:41] Qink name = optman
[18:12:41] Qink name = fldman
[18:12:41] Qink name = gesman
[18:12:43] Qink name = scfman
Parent was killed, exiting
[18:16:33] Number of jobs = 16
[18:16:33] Starting job 2,CPU time has been restored to 345.950000.
[18:16:34] Starting new Job
[18:16:34] Qink name = fldman
[18:16:34] Qink name = gesman
[18:16:34] Qink name = scfman
[18:20:59] Qink name = anlman
[18:20:59] Qink name = drvman
[18:22:14] Qink name = optman
[18:22:14] Qink name = fldman
[18:22:14] Qink name = gesman
[18:22:14] Qink name = scfman
[18:30:11] Qink name = anlman
[18:30:11] Qink name = drvman
[18:31:22] Qink name = optman
[18:31:22] Qink name = fldman
[18:31:22] Qink name = gesman
[18:31:23] Qink name = scfman
[18:39:07] Qink name = anlman
[18:39:07] Qink name = drvman
[18:40:18] Qink name = optman
[18:40:18] Qink name = fldman
[18:40:18] Qink name = gesman
[18:40:19] Qink name = scfman
[18:48:17] Qink name = anlman
[18:48:17] Qink name = drvman
[18:49:27] Qink name = optman
[18:49:27] Qink name = fldman
[18:49:27] Qink name = gesman
[18:49:28] Qink name = scfman
[18:56:27] Qink name = anlman
[18:56:27] Qink name = drvman
[18:57:41] Qink name = optman
[18:57:42] Qink name = fldman
[18:57:42] Qink name = gesman
[18:57:42] Qink name = scfman
[19:05:52] Qink name = anlman
[19:05:52] Qink name = drvman
[19:07:06] Qink name = optman
[19:07:06] Qink name = fldman
[19:07:06] Qink name = gesman
[19:07:06] Qink name = scfman
[19:14:32] Qink name = anlman
[19:14:32] Qink name = drvman
[19:15:44] Qink name = optman
[19:15:44] Qink name = fldman
[19:15:44] Qink name = gesman
[19:15:45] Qink name = scfman
[19:22:57] Qink name = anlman
[19:22:57] Qink name = drvman
[19:24:09] Qink name = optman
[19:24:09] Qink name = fldman
[19:24:09] Qink name = gesman
[19:24:09] Qink name = scfman
[19:31:47] Qink name = anlman
[19:31:47] Qink name = drvman
[19:32:58] Qink name = optman
[19:32:58] Qink name = fldman
[19:32:58] Qink name = gesman
[19:32:58] Qink name = scfman
[19:40:43] Qink name = anlman
[19:40:43] Qink name = drvman
[19:41:53] Qink name = optman
[19:41:53] Qink name = fldman
[19:41:53] Qink name = gesman
[19:41:54] Qink name = scfman
[19:49:35] Qink name = anlman
[19:49:35] Qink name = drvman
[19:50:45] Qink name = optman
[19:50:45] Qink name = fldman
[19:50:45] Qink name = gesman
[19:50:46] Qink name = scfman
[19:57:24] Qink name = anlman
[19:57:24] Qink name = drvman
[19:58:34] Qink name = optman
[19:58:34] Qink name = fldman
[19:58:34] Qink name = gesman
[19:58:35] Qink name = scfman
[20:04:51] Qink name = anlman
[20:04:51] Qink name = drvman
[20:06:02] Qink name = optman
[20:06:02] Qink name = fldman
[20:06:02] Qink name = gesman
[20:06:02] Qink name = scfman
[20:11:34] Qink name = anlman
[20:11:34] Qink name = drvman
[20:12:46] Qink name = optman
[20:12:46] Qink name = fldman
[20:12:46] Qink name = gesman
[20:12:46] Qink name = scfman
[20:18:45] Qink name = anlman
[20:18:45] Qink name = drvman
[20:19:55] Qink name = optman
[20:19:55] Qink name = fldman
[20:19:55] Qink name = gesman
[20:19:56] Qink name = scfman
[20:25:07] Qink name = anlman
[20:25:07] Qink name = drvman
[20:26:20] Qink name = optman
[20:26:20] Qink name = fldman
[20:26:20] Qink name = gesman
[20:26:20] Qink name = scfman
[20:31:34] Qink name = anlman
[20:31:34] Qink name = drvman
[20:32:47] Qink name = optman
[20:32:47] Qink name = fldman
[20:32:47] Qink name = gesman
[20:32:48] Qink name = scfman
[20:38:31] Qink name = anlman
[20:38:31] Qink name = drvman
[20:39:43] Qink name = optman
[20:39:43] Qink name = fldman
[20:39:43] Qink name = gesman
[20:39:44] Qink name = scfman
[20:45:54] Qink name = anlman
[20:45:54] Qink name = drvman
[20:47:07] Qink name = optman
[20:47:07] Qink name = fldman
[20:47:07] Qink name = gesman
[20:47:08] Qink name = scfman
[20:53:21] Qink name = anlman
[20:53:21] Qink name = drvman
[20:54:38] Qink name = optman
[20:54:38] Qink name = fldman
[20:54:38] Qink name = gesman
[20:54:39] Qink name = scfman
[21:00:45] Qink name = anlman
[21:00:45] Qink name = drvman
[21:02:01] Qink name = optman
[21:02:01] Qink name = fldman
[21:02:01] Qink name = gesman
[21:02:02] Qink name = scfman
[21:07:10] Qink name = anlman
[21:07:10] Qink name = drvman
[21:08:22] Qink name = optman
[21:08:22] Qink name = anlman
[21:08:58] End of Job
[21:09:00] Finished Job #2
[21:09:00] Starting job 3,CPU time has been restored to 6444.680000.
[21:09:00] Starting new Job
[21:09:00] Qink name = fldman
[21:09:01] Qink name = gesman
[21:09:01] Qink name = scfman
[21:16:31] Qink name = anlman
[21:17:08] End of Job
[21:17:12] Finished Job #3
[21:17:12] Starting job 4,CPU time has been restored to 6727.660000.
[21:17:12] Starting new Job
[21:17:12] Qink name = fldman
[21:17:13] Qink name = gesman
[21:17:13] Qink name = scfman
[21:22:37] Qink name = anlman
[21:23:15] End of Job
[21:23:17] Finished Job #4
[21:23:17] Starting job 5,CPU time has been restored to 6947.350000.
[21:23:17] Starting new Job
[21:23:17] Qink name = fldman
[21:23:18] Qink name = gesman
[21:23:18] Qink name = scfman
[21:29:03] Qink name = anlman
[21:29:40] End of Job
[21:29:43] Finished Job #5
[21:29:43] Starting job 6,CPU time has been restored to 7177.480000.
[21:29:43] Starting new Job
[21:29:43] Qink name = fldman
[21:29:43] Qink name = gesman
[21:29:43] Qink name = scfman
[21:35:09] Qink name = anlman
[21:35:46] End of Job
[21:35:49] Finished Job #6
[21:35:49] Starting job 7,CPU time has been restored to 7396.620000.
[21:35:49] Starting new Job
[21:35:49] Qink name = fldman
[21:35:49] Qink name = gesman
[21:35:49] Qink name = scfman
[21:43:29] Qink name = anlman
[21:44:03] End of Job
[21:44:06] Finished Job #7
[21:44:06] Starting job 8,CPU time has been restored to 7700.540000.
[21:44:06] Starting new Job
[21:44:06] Qink name = fldman
[21:44:06] Qink name = gesman
[21:44:06] Qink name = scfman
[21:49:40] Qink name = anlman
[21:50:16] End of Job
[21:50:19] Finished Job #8
[21:50:19] Starting job 9,CPU time has been restored to 7924.880000.
[21:50:20] Starting new Job
[21:50:20] Qink name = fldman
[21:50:20] Qink name = gesman
[21:50:20] Qink name = scfman
[21:56:07] Qink name = anlman
[21:57:06] End of Job
[21:57:09] Finished Job #9
[21:57:09] Starting job 10,CPU time has been restored to 8163.430000.
[21:57:09] Starting new Job
[21:57:09] Qink name = fldman
[21:57:10] Qink name = gesman
[21:57:10] Qink name = scfman
[22:09:49] Qink name = anlman
[22:10:43] End of Job
[22:10:45] Finished Job #10
[22:10:45] Starting job 11,CPU time has been restored to 8659.740000.
[22:10:45] Starting new Job
[22:10:45] Qink name = fldman
[22:10:46] Qink name = gesman
[22:10:46] Qink name = scfman
[22:18:03] Qink name = anlman
[22:18:58] End of Job
[22:19:00] Finished Job #11
[22:19:00] Starting job 12,CPU time has been restored to 8947.300000.
[22:19:00] Starting new Job
[22:19:01] Qink name = fldman
[22:19:04] Qink name = gesman
[22:19:04] Qink name = scfman
[22:53:09] Qink name = anlman
[22:59:27] End of Job
[22:59:31] Finished Job #12
[22:59:31] Starting job 13,CPU time has been restored to 10433.970000.
[22:59:32] Starting new Job
[22:59:32] Qink name = fldman
[22:59:34] Qink name = gesman
[22:59:35] Qink name = scfman
[00:32:15] Number of jobs = 16
[00:32:15] Starting job 13,CPU time has been restored to 10433.970000.
[00:32:15] Starting new Job
[00:32:15] Qink name = fldman
[00:32:23] Qink name = gesman
[00:32:23] Qink name = scfman
Parent was killed, exiting
[02:05:43] Qink name = anlman
[02:11:41] End of Job
[02:11:45] Finished Job #13
[02:11:45] Starting job 14,CPU time has been restored to 14228.950000.
[02:11:46] Starting new Job
[02:11:46] Qink name = fldman
[02:11:48] Qink name = gesman
[02:11:49] Qink name = scfman
[03:45:50] Qink name = anlman
[03:51:41] End of Job
[03:51:44] Finished Job #14
[03:51:44] Starting job 15,CPU time has been restored to 18085.450000.
[03:51:45] Starting new Job
[03:51:45] Qink name = fldman
[03:51:47] Qink name = gesman
[03:51:50] Qink name = scfman
[05:28:40] Qink name = anlman
[05:36:43] End of Job
[05:36:47] Finished Job #15
called boinc_finish
Exiting 0

</stderr_txt>
]]>
close

Return to Top
----------------------------------------

[Oct 19, 2010 1:41:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Well that is a novel one drawing attention. **

[17:30:20] Qink name = scfman
Parent was killed, exiting
[17:33:48] Number of jobs = 16
[17:33:48] Starting job 2,CPU time has been restored to 345.950000.

Wondered what caused this? Interested in the Linux Event Log and the stdoutdae.txt and stderrdae.txt logs from around your time segment when that happened.

Meantime, I've just set the Swapfile use of the client to zero (0). Done that in past under Windows 7, to be more precise, disabled Swapfile in the OS, and only once saw an out of memory condition. Hoping something changes (for the better of course ;-). There are concurrent 2 CEP2 running and 2 HCMD2 (those memory use mini's).

Edit: ** something nagging and sure enough, I posted the very thing on October 5: http://www.worldcommunitygrid.org/forums/wcg/...29985&offset=0#297929, but it occurred only once during the run and may have been playing with the system to cause this. It's the only other post showing this.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Oct 19, 2010 8:17:56 AM]
[Oct 19, 2010 8:13:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Here's a portion of the stdoutae.txt. I don't seem to have a stderrae.txt. My Internet service provider was having trouble that evening. It looks as if every time BOINC made an unsuccessful effort to contact the internet, the running jobs had problems.

17-Oct-2010 17:32:53 [World Community Grid] Computation for task E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0 finished
17-Oct-2010 17:32:53 [World Community Grid] Resuming task HFCC_n1_02464410_n1_0001_0 using hfcc version 611
17-Oct-2010 17:32:55 [World Community Grid] Started upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_0
17-Oct-2010 17:32:55 [World Community Grid] Started upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_1
17-Oct-2010 17:32:58 [World Community Grid] Finished upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_0
17-Oct-2010 17:32:58 [World Community Grid] Started upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_2
17-Oct-2010 17:33:01 [World Community Grid] Finished upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_1
17-Oct-2010 17:33:01 [World Community Grid] Started upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_3
17-Oct-2010 17:33:04 [World Community Grid] Finished upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_2
17-Oct-2010 17:33:04 [World Community Grid] Finished upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_3
17-Oct-2010 17:33:04 [World Community Grid] Started upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_4
17-Oct-2010 17:33:47 [World Community Grid] Task HFCC_n1_02464410_n1_0001_0 exited with zero status but no 'finished' file
17-Oct-2010 17:33:47 [World Community Grid] If this happens repeatedly you may need to reset the project.
17-Oct-2010 17:33:47 [World Community Grid] Temporarily failed upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_4: can't resolve hostname
17-Oct-2010 17:33:47 [World Community Grid] Backing off 1 min 0 sec on upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_4
17-Oct-2010 17:33:47 [World Community Grid] Restarting task HFCC_n1_02464410_n1_0001_0 using hfcc version 611
17-Oct-2010 17:33:48 [World Community Grid] Task E200459_340_A.24.C19H11NOS2Se.187.3.set1d06_0 exited with zero status but no 'finished' file
17-Oct-2010 17:33:48 [World Community Grid] If this happens repeatedly you may need to reset the project.
17-Oct-2010 17:34:17 [---] Project communication failed: attempting access to reference site
17-Oct-2010 17:34:59 [World Community Grid] Task HFCC_n1_02464410_n1_0001_0 exited with zero status but no 'finished' file
17-Oct-2010 17:34:59 [World Community Grid] If this happens repeatedly you may need to reset the project.
17-Oct-2010 17:34:59 [---] BOINC can't access Internet - check network connection or proxy configuration.
17-Oct-2010 17:34:59 [World Community Grid] Started upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_4
17-Oct-2010 17:34:59 [World Community Grid] Restarting task HFCC_n1_02464410_n1_0001_0 using hfcc version 611
17-Oct-2010 17:35:00 [World Community Grid] Task E200459_340_A.24.C19H11NOS2Se.187.3.set1d06_0 exited with zero status but no 'finished' file
17-Oct-2010 17:35:00 [World Community Grid] If this happens repeatedly you may need to reset the project.
17-Oct-2010 17:35:40 [World Community Grid] Task HFCC_n1_02464410_n1_0001_0 exited with zero status but no 'finished' file
17-Oct-2010 17:35:40 [World Community Grid] If this happens repeatedly you may need to reset the project.
17-Oct-2010 17:35:40 [World Community Grid] Temporarily failed upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_4: can't resolve hostname
17-Oct-2010 17:35:40 [World Community Grid] Backing off 1 min 0 sec on upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_4
17-Oct-2010 17:35:40 [World Community Grid] Restarting task HFCC_n1_02464410_n1_0001_0 using hfcc version 611
17-Oct-2010 17:35:41 [World Community Grid] Task E200459_340_A.24.C19H11NOS2Se.187.3.set1d06_0 exited with zero status but no 'finished' file
17-Oct-2010 17:35:41 [World Community Grid] If this happens repeatedly you may need to reset the project.
17-Oct-2010 17:35:41 [World Community Grid] Restarting task E200459_340_A.24.C19H11NOS2Se.187.3.set1d06_0 using cep2 version 619
17-Oct-2010 17:36:40 [World Community Grid] Started upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_4
17-Oct-2010 17:37:22 [World Community Grid] Task HFCC_n1_02464410_n1_0001_0 exited with zero status but no 'finished' file
17-Oct-2010 17:37:22 [World Community Grid] If this happens repeatedly you may need to reset the project.
17-Oct-2010 17:37:22 [World Community Grid] Temporarily failed upload of E200459_053_A.24.C17H13N3S2Si2.38.0.set1d06_0_4: can't resolve hostname

The same thing happened on my other machine that runs CEP2, but not on the machine that was running just C4CW, CMD2, and HCC.
----------------------------------------

[Oct 19, 2010 10:46:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Good info... long history with BOINC and jobs resetting and or crashing completely when the internet goes. Was the BOINC Manager loaded at the time? I've a long standing practice of closing this sucker... a quick in and out, but never left to sit on panel or in icon tray (notification area, which anyway is not a feature available on Linux for BOINC). If there is security software, check that out too. I'm still on the GetDeb 6.10.58 64 bit version. I'm on scheduled networking for a long time and if you know WCG or ISP is off on a maint trip, I always take the client off-line so it does not waste time trying to knock it's head through the closed door.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Oct 20, 2010 12:32:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Meantime, I've just set the Swapfile use of the client to zero (0). Done that in past under Windows 7, to be more precise, disabled Swapfile in the OS, and only once saw an out of memory condition. Hoping something changes (for the better of course ;-). There are concurrent 2 CEP2 running and 2 HCMD2 (those memory use mini's)

After some half dozen ran mostly in "left alone" mode, the hard conclusion is that the gaps between 'elapsed' and CPU time increased. Now THAT's puzzling.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Oct 20, 2010 12:55:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Poor crunching on this project

Good info... long history with BOINC and jobs resetting and or crashing completely when the internet goes. Was the BOINC Manager loaded at the time? I've a long standing practice of closing this sucker... a quick in and out, but never left to sit on panel or in icon tray (notification area, which anyway is not a feature available on Linux for BOINC). If there is security software, check that out too. I'm still on the GetDeb 6.10.58 64 bit version. I'm on scheduled networking for a long time and if you know WCG or ISP is off on a maint trip, I always take the client off-line so it does not waste time trying to knock it's head through the closed door.



BOINC Manager was mostly closed during that period -- open only when I sometimes peeked to see how things were progressing.

Interesting to learn that BOINC jobs tend to have problems with the Internet is out. I'll take mine offline in future at times when I know there are Internet isues.
----------------------------------------

[Oct 20, 2010 1:41:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 156   Pages: 16   [ Previous Page | 4 5 6 7 8 9 10 11 12 13 | Next Page ]
[ Jump to Last Post ]
Post new Thread