Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: BETA for type C in DDD2 |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 95
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
One day we'll have a Duration Correction Factor per science rather than one at the WCG level. Ah, I'd missed that little gem, thank-you. However, with this project the different types of unit run for such vastly different durations that using the project name as a key wouldn't really help much, would it? One DCF per WCG project (aka science) would make boinc timing calculations work sensibly. All that matters is that the WUs have similar CPU efficiency vs the benchmark. It's likely that all the different types of DDD2 WU perform with similar efficiency on a particular platform, but this is certainly not true between different projects on WCG. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Checkpoint/restart definitely has a problem.
----------------------------------------e.g. BETA_erlc_a202_pqb010_1 The WU is on a machine which isn't on all the time. The results.out file appears to show that it keeps going back to the start again every time it restarts. Looking at the "WRIDYN: RESTart file was written at step ..." lines, they range from 100-22900, then 100-2500, then 100-4500 again. (Fraction done = 0.11 at this stage.) Edit: WU now complete. <stderr_txt> includes 4 entries of "Calling gridPlatform.init()", rather than the conventional 1. [Edit 1 times, last edit by Former Member at Feb 8, 2010 12:59:35 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Possible fault in one particular WU? Mine completed with no errors reported, in 1/10 the time (and 1/10 credit claim) of the wingmen.
----------------------------------------BETA_ erlc_ a011_ sqa004_ 2-- 616 Pending Validation 5/02/10 22:47:55 6/02/10 11:30:57 0.12 0.9 / 0.0 <-- mine BETA_ erlc_ a011_ sqa004_ 1-- 616 Pending Validation 5/02/10 22:47:51 7/02/10 01:43:43 0.80 13.8 / 0.0 BETA_ erlc_ a011_ sqa004_ 0-- 616 Pending Validation 5/02/10 22:47:38 6/02/10 18:50:36 1.08 15.2 / 0.0 Edit: Mine failed validation, unsurprisingly. So did another WU by the same machine. I wonder if it's a Pentium 4 issue of some type? [Edit 1 times, last edit by Former Member at Feb 10, 2010 12:54:08 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Kremmen, this well covers the concern one member expressed several times on the switch off of the validator that forced the computation of #3 in the init distribution. Will certainly provide good statistics on the fail rate... the Go Live hurdle that can't be too high.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Finally I was lucky enough to grap some Beta WUs (C types) this round. Enough to get my Beta badge for entry into Club 15.
I got one ERROR when I did a series of suspend/resume on one machine: BETA_ erlc_ a211_ pr91a0_ 3-- 616 Pending Validation 07.02.10 11:06:09 07.02.10 12:29:19 1.20 34.2 / 0.0 BETA_ erlc_ a211_ pr91a0_ 2-- 616 Pending Validation 06.02.10 16:14:25 07.02.10 04:56:41 1.73 26.3 / 0.0 BETA_ erlc_ a211_ pr91a0_ 0-- 616 Pending Validation 06.02.10 16:14:19 07.02.10 08:49:29 1.29 27.5 / 0.0 BETA_ erlc_ a211_ pr91a0_ 1-- 616 Error 06.02.10 16:14:18 07.02.10 11:06:03 1.09 6.5 / 0.0 <-- mine The log file indicates that the pagefile is too small to create a process: <core_client_version>6.2.28</core_client_version> <![CDATA[ <message> CreateProcess() failed - Die Auslagerungsdatei ist zu klein, um diesen Vorgang durchzufren. (0x5af) </message> ]]> I don't know if this is due to the long uptime of the machine (no reboot since several weeks) or due to a failure in the checkpoint/resume logic of the science. It happened definitely immediately after resuming the WU. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
This "Die Auslagerungsdatei ist zu klein" probably translates to the swap file being too small... actually for some days, that's what I've been thinking is happening. Make sure that pagefile.sys can grow unrestricted.
----------------------------------------edit: was not reading... you said so about the VM being too small
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Feb 8, 2010 1:36:16 PM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Did not stare at it too hard, but it looks that those that have a all-3 return were allowed to validate: An example
----------------------------------------BETA_ erlc_ a004_ pr78a1_ 2-- 616 Valid 5-2-10 19:02:26 5-2-10 21:18:21 1.75 32.4 / 32.6 BETA_ erlc_ a004_ pr78a1_ 1-- 616 Valid 5-2-10 19:02:24 8-2-10 01:18:07 2.36 18.3 / 32.6 BETA_ erlc_ a004_ pr78a1_ 0-- 616 Valid 5-2-10 19:02:23 5-2-10 22:03:04 1.73 32.8 / 32.6 Credit seems here to have worked as average of first 2 returners, the 3rd getting the uptick. Got 35 odd waiting on nr. 3 edit: taking that back... got many with all-3 complete. Maybe only those that already had grants before the validator pause had the 3rd copy validated.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Feb 8, 2010 7:03:55 PM] |
||
|
X-Files 27
Senior Cruncher Canada Joined: May 21, 2007 Post Count: 391 Status: Offline Project Badges: |
I got 2 Wu's whc seems problematic, same with my wingmen.
----------------------------------------BETA_ erlc_ a205_ pr02a0_ 2 - 3 errors, 1 Inconclusive BETA_ erlc_ a205_ pr02a1_ 1 - 3 errors |
||
|
Randzo
Senior Cruncher Slovakia Joined: Jan 10, 2008 Post Count: 339 Status: Offline Project Badges: |
New beta test?
10. 2. 2010 1:22:29|World Community Grid|Finished download of BETA_ly01_a005_pcb001_ly01_a005_ly01_phi_18.dat.gzb I suppouse type C again. Am I correct? I have 8 task estimated run time ~1,5 hours deadline 3 days. |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: |
Randzo,
You are correct, there are another group of type C work units that need to be run through beta for the researchers. If my memory serves me correctly, there are ~1400 work units sent 3 copies of each... -Uplinger |
||
|
|