Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: Linux Only Beta Test |
No member browsing this thread |
Thread Status: Locked Total posts in this thread: 177
|
Author |
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
widdershins,
----------------------------------------Yes, since DDDT2 beta testing of the very long A-type WUs beta tests are limited to one job per thread at any given time to avoid "endless" beta test sessions and to increase the chances that the largest variety of beta testing machines get some. As long as you have as many beta jobs as running threads in a given machine the servers will not send more. Even if some are Ready to Report. ---------------------------------------- [Edit 1 times, last edit by JmBoullier at May 29, 2010 10:22:30 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The timekeeping has it's learning curve. When production is running the estimated flops will get adjusted on a daily basis allowing a more accurate projection, a solito. Actually they are rather well estimated when looking at jobs waiting to start. It's just that the percentage computation seems to be based on the maximum 8-hour duration whether the task will exceed the limit or not. That should be rather easy to correct before going to production.Can't say that is an estimate if the progress is simply computed as a fraction of 8 hours. If the same job on yours is showing near 8 hours and mine is too, the last one shows 8:01:59 in ready to start, there might be another field to feed that time into the client TTC. Kind of like: Hello, I'm client X, with Y benchmark and the servers responding: Hello client, here's 8x3600 seconds worth of flops, the result with a header fitted to the benchmark. Wonder if knreed made the server that smart already... would be great. Will be fun to watch when mixed with HCMD2 when a series of great grandchildren hop by.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
If the same job on yours is showing near 8 hours and mine is too, the last one shows 8:01:59 in ready to start, there might be another field to feed that time into the client TTC. Sorry, since I started these beta WUs their estimated time before starting has always been in the range of what they have actually needed, say the final time has been between 60 % and 140 % of what was "announced". Not that precise, but much better than what we use to see during beta tests, and the DCF is currently at 0.78.I don't know why yours are announced at around 8 hours? I have one waiting to start which is estimated at 3:14:15, probably a bit below what will be really needed, but not that much. The 4 last ones have used from 3.81 to 4.42 hours. I am using the standard version 6.10.17 delivered with Ubuntu 10.04 if that matters. |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: |
Houston I have a more serious problem than duration estimates, indeed: my last job has exploded after normal completion with a file too big condition.
----------------------------------------sam. 29 mai 2010 17:18:58 CEST World Community Grid Output file BETA_A.19.C15H10N2SSe.1.2_1_4 for task BETA_A.19.C15H10N2SSe.1.2_1 exceeds size limit. sam. 29 mai 2010 17:18:58 CEST World Community Grid File size: 53957882.000000 bytes. Limit: 52428800.000000 bytes Please take measures for other jobs to come... Edit: And one more... sam. 29 mai 2010 18:11:16 CEST World Community Grid Computation for task BETA_A.19.C15H11NSSeSi.2.1_0 finished sam. 29 mai 2010 18:11:16 CEST World Community Grid Output file BETA_A.19.C15H11NSSeSi.2.1_0_4 for task BETA_A.19.C15H11NSSeSi.2.1_0 exceeds size limit. sam. 29 mai 2010 18:11:16 CEST World Community Grid File size: 55755276.000000 bytes. Limit: 52428800.000000 bytes And since jobs seem to be bigger and bigger I find that this problem is very serious. ---------------------------------------- [Edit 1 times, last edit by JmBoullier at May 29, 2010 4:26:37 PM] |
||
|
widdershins
Veteran Cruncher Scotland Joined: Apr 30, 2007 Post Count: 674 Status: Offline Project Badges: |
One has errored out, it appears it wasn't an error with the computation but rather with the transmission of the data back to WCG.
Result Name: BETA_ A.19.C14H12N2SSi2.1_ 0-- <core_client_version>5.10.45</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [08:36:00] Number of jobs = 16 [08:36:00] Starting job 0,CPU time has been restored to 0.000000. [08:36:00] Creating Scratch Dir [08:36:00] Copying jobfile [08:36:00] Copying regular version [08:36:00] Starting new Job [08:36:00] Qink name = fldman [08:36:00] Qink name = gesman [08:36:00] Qink name = scfman [08:37:35] Qink name = anlman [08:37:35] End of Job [08:37:38] Updating rkrun [08:37:38] Copying Output Files [08:37:38] Delete Scratch Dir [08:37:38] Saving State and Checkpointing [08:37:38] Finished Job #0 [08:37:38] Starting job 1,CPU time has been restored to 82.189136. [08:37:38] Creating Scratch Dir [08:37:38] Copying jobfile [08:37:38] Copying regular version [08:37:38] Starting new Job [08:37:38] Qink name = fldman [08:37:39] Qink name = gesman [08:37:39] Qink name = scfman [08:42:00] Qink name = anlman [08:42:09] End of Job [08:42:11] Updating rkrun [08:42:11] Copying Output Files [08:42:12] Delete Scratch Dir [08:42:12] Saving State and Checkpointing [08:42:12] Finished Job #1 [08:42:12] Starting job 2,CPU time has been restored to 312.967558. [08:42:12] Creating Scratch Dir [08:42:12] Copying jobfile [08:42:12] Copying regular version [08:42:12] Starting new Job [08:42:12] Qink name = fldman [08:42:13] Qink name = gesman [08:42:13] Qink name = scfman [08:45:56] Qink name = anlman [08:45:56] Qink name = drvman [08:46:38] Qink name = optman [08:46:38] Qink name = fldman [08:46:38] Qink name = gesman [08:46:38] Qink name = scfman [08:52:19] Qink name = anlman [08:52:19] Qink name = drvman [08:52:59] Qink name = optman [08:52:59] Qink name = fldman [08:52:59] Qink name = gesman [08:52:59] Qink name = scfman [08:58:26] Qink name = anlman [08:58:27] Qink name = drvman [08:59:08] Qink name = optman [08:59:08] Qink name = fldman [08:59:08] Qink name = gesman [08:59:09] Qink name = scfman [09:04:01] Qink name = anlman [09:04:01] Qink name = drvman [09:04:40] Qink name = optman [09:04:40] Qink name = fldman [09:04:40] Qink name = gesman [09:04:40] Qink name = scfman [09:09:08] Qink name = anlman [09:09:08] Qink name = drvman [09:09:49] Qink name = optman [09:09:49] Qink name = fldman [09:09:49] Qink name = gesman [09:09:50] Qink name = scfman [09:13:53] Qink name = anlman [09:13:53] Qink name = drvman [09:14:31] Qink name = optman [09:14:31] Qink name = anlman [09:14:40] End of Job [09:14:43] Updating rkrun [09:14:43] Copying Output Files [09:14:43] Delete Scratch Dir [09:14:43] Saving State and Checkpointing [09:14:43] Finished Job #2 [09:14:43] Starting job 3,CPU time has been restored to 2023.570464. [09:14:43] Creating Scratch Dir [09:14:43] Copying jobfile [09:14:43] Copying regular version [09:14:44] Starting new Job [09:14:44] Qink name = fldman [09:14:44] Qink name = gesman [09:14:44] Qink name = scfman [09:19:23] Qink name = anlman [09:19:32] End of Job [09:19:34] Updating rkrun [09:19:34] Copying Output Files [09:19:34] Delete Scratch Dir [09:19:34] Saving State and Checkpointing [09:19:34] Finished Job #3 [09:19:34] Starting job 4,CPU time has been restored to 2283.950736. [09:19:34] Creating Scratch Dir [09:19:34] Copying jobfile [09:19:34] Copying regular version [09:19:34] Starting new Job [09:19:34] Qink name = fldman [09:19:34] Qink name = gesman [09:19:34] Qink name = scfman [09:22:45] Qink name = anlman [09:22:55] End of Job [09:22:57] Updating rkrun [09:22:57] Copying Output Files [09:22:57] Delete Scratch Dir [09:22:57] Saving State and Checkpointing [09:22:57] Finished Job #4 [09:22:57] Starting job 5,CPU time has been restored to 2471.374449. [09:22:57] Creating Scratch Dir [09:22:57] Copying jobfile [09:22:57] Copying regular version [09:22:57] Starting new Job [09:22:57] Qink name = fldman [09:22:58] Qink name = gesman [09:22:58] Qink name = scfman [09:26:29] Qink name = anlman [09:26:37] End of Job [09:26:39] Updating rkrun [09:26:39] Copying Output Files [09:26:39] Delete Scratch Dir [09:26:39] Saving State and Checkpointing [09:26:39] Finished Job #5 [09:26:39] Starting job 6,CPU time has been restored to 2667.674717. [09:26:39] Creating Scratch Dir [09:26:39] Copying jobfile [09:26:39] Copying regular version [09:26:39] Starting new Job [09:26:39] Qink name = fldman [09:26:40] Qink name = gesman [09:26:40] Qink name = scfman [09:30:10] Qink name = anlman [09:30:21] End of Job [09:30:23] Updating rkrun [09:30:23] Copying Output Files [09:30:23] Delete Scratch Dir [09:30:24] Saving State and Checkpointing [09:30:24] Finished Job #6 [09:30:24] Starting job 7,CPU time has been restored to 2865.491079. [09:30:24] Creating Scratch Dir [09:30:24] Copying jobfile [09:30:24] Copying regular version [09:30:24] Starting new Job [09:30:24] Qink name = fldman [09:30:25] Qink name = gesman [09:30:25] Qink name = scfman [09:34:42] Qink name = anlman [09:34:50] End of Job [09:34:53] Updating rkrun [09:34:53] Copying Output Files [09:34:53] Delete Scratch Dir [09:34:53] Saving State and Checkpointing [09:34:53] Finished Job #7 [09:34:53] Starting job 8,CPU time has been restored to 3115.086677. [09:34:53] Creating Scratch Dir [09:34:53] Copying jobfile [09:34:53] Copying regular version [09:34:54] Starting new Job [09:34:54] Qink name = fldman [09:34:54] Qink name = gesman [09:34:54] Qink name = scfman [09:38:07] Qink name = anlman [09:38:16] End of Job [09:38:19] Updating rkrun [09:38:19] Copying Output Files [09:38:19] Delete Scratch Dir [09:38:19] Saving State and Checkpointing [09:38:19] Finished Job #8 [09:38:19] Starting job 9,CPU time has been restored to 3303.686463. [09:38:19] Creating Scratch Dir [09:38:19] Copying jobfile [09:38:19] Copying regular version [09:38:19] Starting new Job [09:38:19] Qink name = fldman [09:38:19] Qink name = gesman [09:38:19] Qink name = scfman [09:45:59] Qink name = anlman [09:46:14] End of Job [09:46:16] Updating rkrun [09:46:16] Copying Output Files [09:46:16] Delete Scratch Dir [09:46:16] Saving State and Checkpointing [09:46:16] Finished Job #9 [09:46:16] Starting job 10,CPU time has been restored to 3745.298062. [09:46:16] Creating Scratch Dir [09:46:16] Copying jobfile [09:46:16] Copying regular version [09:46:16] Starting new Job [09:46:16] Qink name = fldman [09:46:17] Qink name = gesman [09:46:17] Qink name = scfman [09:53:13] Qink name = anlman [09:53:26] End of Job [09:53:29] Updating rkrun [09:53:29] Copying Output Files [09:53:29] Delete Scratch Dir [09:53:29] Saving State and Checkpointing [09:53:29] Finished Job #10 [09:53:29] Starting job 11,CPU time has been restored to 4148.239244. [09:53:29] Creating Scratch Dir [09:53:29] Copying jobfile [09:53:29] Copying regular version [09:53:29] Starting new Job [09:53:29] Qink name = fldman [09:53:30] Qink name = gesman [09:53:30] Qink name = scfman [09:58:21] Qink name = anlman [09:58:36] End of Job [09:58:38] Updating rkrun [09:58:38] Copying Output Files [09:58:38] Delete Scratch Dir [09:58:39] Saving State and Checkpointing [09:58:39] Finished Job #11 [09:58:39] Starting job 12,CPU time has been restored to 4426.328623. [09:58:39] Creating Scratch Dir [09:58:39] Copying jobfile [09:58:39] Copying regular version [09:58:39] Starting new Job [09:58:39] Qink name = fldman [09:58:40] Qink name = gesman [09:58:40] Qink name = scfman [10:19:55] Qink name = anlman [10:22:08] End of Job [10:22:10] Updating rkrun [10:22:10] Copying Output Files [10:22:10] Delete Scratch Dir [10:22:11] Saving State and Checkpointing [10:22:11] Finished Job #12 [10:22:11] Starting job 13,CPU time has been restored to 5719.677452. [10:22:11] Creating Scratch Dir [10:22:11] Copying jobfile [10:22:11] Copying regular version [10:22:11] Starting new Job [10:22:11] Qink name = fldman [10:22:13] Qink name = gesman [10:22:13] Qink name = scfman [11:37:36] Qink name = anlman [11:40:40] End of Job [11:40:42] Updating rkrun [11:40:42] Copying Output Files [11:40:42] Delete Scratch Dir [11:40:42] Saving State and Checkpointing [11:40:42] Finished Job #13 [11:40:42] Starting job 14,CPU time has been restored to 10083.394167. [11:40:42] Creating Scratch Dir [11:40:42] Copying jobfile [11:40:42] Copying regular version [11:40:43] Starting new Job [11:40:43] Qink name = fldman [11:40:45] Qink name = gesman [11:40:45] Qink name = scfman [13:07:37] Qink name = anlman [13:10:29] End of Job [13:10:31] Updating rkrun [13:10:31] Copying Output Files [13:10:31] Delete Scratch Dir [13:10:32] Saving State and Checkpointing [13:10:32] Finished Job #14 [13:10:32] Starting job 15,CPU time has been restored to 15190.037312. [13:10:32] Creating Scratch Dir [13:10:32] Copying jobfile [13:10:32] Copying regular version [13:10:32] Starting new Job [13:10:32] Qink name = fldman [13:10:34] Qink name = gesman [13:10:34] Qink name = scfman [14:45:48] Qink name = anlman [14:50:37] End of Job [14:50:40] Updating rkrun [14:50:40] Copying Output Files [14:50:40] Delete Scratch Dir [14:50:40] Saving State and Checkpointing [14:50:40] Finished Job #15 called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>BETA_A.19.C14H12N2SSi2.1_0_4</file_name> <error_code>-131</error_code> </file_xfer_error> </message> ]]> |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
No transmission issue. -131 is indicating what Jean wrote about:
----------------------------------------ERR_FILE_TOO_BIG -131 file size too big an output file was bigger than max_nbytes
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
pirogue
Veteran Cruncher USA Joined: Dec 8, 2008 Post Count: 685 Status: Offline Project Badges: |
So far, I've had 4 with error -131 and 22+ hours down the tubes. Are these one of the types of errors for which credit is granted?
---------------------------------------- |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
It's a genuine application [parm] fault i.e. Don't Panic Mr Mainwaring. It's not down the tube either since now at larger test scale it's learned how big things can get and however big the result files may get, not noticing anything, largest mem used is 89MB over 259MB for RAM and VM according the Top view.
----------------------------------------So far the longest run I had on the quad has been 5:46 hours at 2.4 ghz. Only one of 5 had a wingman agreeing and 4 more crunching now.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1313 Status: Offline Project Badges: |
Former BETA's runtimes between 2.94 and 4.60 ended successfull.
----------------------------------------The last good one had an upload file of 47MB. Now 2 longer running ones errored out with the same error as mentioned by Jean: exceeds size limit (52428800) World Community Grid 29-05-2010 21:55:39 Output file BETA_A.19.C15H11NOS2.4.4_0_4 for task BETA_A.19.C15H11NOS2.4.4_0 exceeds size limit. Run time 6hr42min World Community Grid 29-05-2010 22:34:14 Output file BETA_A.19.C16H10S2Se.1.2_1_4 for task BETA_A.19.C16H10S2Se.1.2_1 exceeds size limit. Run time 5hr25min Dual Core Processor: Intel P8400 @ 2.26GHz Memory 3GB OS: Linux 2.6.28-18-generic Waiting for new BETA's and will try to increase the max_nbytes value. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
CP, are you sure you want to do that? Not sure if the servers will accept over-sized result files.
----------------------------------------Just now my quad is sweating on a 49,382k upload, the critical level to watch for in the transfer screen presently being 51,300k. Long as that is going, not going faster than 50k, no replacement B type will be fetched /รต\
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
|