| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 101
|
|
| Author |
|
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges:
|
This has gone on for 3 weeks now. Why haven't the defective Wus been pulled yet?
----------------------------------------Cheers ![]() ![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7844 Status: Offline Project Badges:
|
Only 5 so far today, and 25 yesterday.Maybe we are running out of this batch.
----------------------------------------Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Jun 7, 2023 2:33:42 PM] |
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1403 Status: Offline Project Badges:
|
I switched one laptop from Windows 10 (running DENIS@home) back to Linux running from a USB-stick to get some SCC1's, cause SCC is running terrific on Linux. First I got 2 tasks from batch 4089 and 4099. Then 1 of 4176 crashing immediately and 2 minutes later another 4176 going down the drain too. Those 4 tasks were all quorum 1.
2 errors and my machine got irreliable, I think, from now on getting only quorum 2, however also 2 4176-tasks that were quorum 1 again. Those I could abort before starting. I've to create some cache buffer to intercept the ATOM 62's to make my machine reliable again. |
||
|
|
sptrog1
Master Cruncher Joined: Dec 12, 2017 Post Count: 1592 Status: Offline Project Badges:
|
I received 5 more 4176 C tasks today. Hopefully they are running out.
|
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
While we're halfway(*1) faulty batch 0004176, I just noticed that Replication has been set to 0 for all type C workunits from that batch, e.g.:
----------------------------------------workunit 311933876 App: Smash Childhood Cancer And a recent one: workunit 314246927 App: Smash Childhood Cancer This hasn't happened yet to batch 0004174 (also type C): workunit 318804187 App: Smash Childhood Cancer [*1] Each batch runs from sequence 0 to 99,999. Adri [Edit 3 times, last edit by adriverhoef at Jun 8, 2023 2:47:37 PM] |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7844 Status: Offline Project Badges:
|
From what I see of the progress through the batch it will be at least about 10 to 12 more days of this batch before it is through. That is if it continues at the current pace and if the sequence it follows it strictly sequential. But I could be entirely mistaken. Time will tell.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Hans Sveen
Veteran Cruncher Norge Joined: Feb 18, 2008 Post Count: 983 Status: Offline Project Badges:
|
Adri, I just got this one,
SCC1_0004165_MyoD1-C_0467 same error as 4175/4176 batch! |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Indeed, Hans, while we're still ploughing through batch 0004176, there are other batches of type C waiting to get crunched and apparently batch 0004165 also crashes immediately.
This was my first one this morning: workunit 314552778 App: Smash Childhood CancerDetails: --------------------------------------------------------------------------------------------------------------------------------------- SCC1_0004165_MyoD1-C_0153_2 Fedora Linux Error 2023-06-09T03:22:53 2023-06-09T03:25:00 Followed by: workunit 314612178 SCC1_0004165_MyoD1-C_0629_0 Linux Ubuntu Server Aborted 2023-06-09T04:01:39 2023-06-09T06:45:06 workunit 314650293 SCC1_0004165_MyoD1-C_1008_0 Linux Ubuntu In Progress 2023-06-09T05:52:42 2023-06-15T05:52:42 workunit 314706662 SCC1_0004165_MyoD1-C_1451_0 Linux openSU In Progress 2023-06-09T08:24:26 2023-06-15T08:24:26 Adri |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Nice to see that somebody else (see task _3 below) also (probably automatically(*1) (see post 686915)) aborts incoming tasks from the 'new' faulty batch 0004174:
----------------------------------------workunit 314654777 SCC1_0004174_MyoD1-C_0092_0 Fedora Linux U.Aborted 2023-06-09T06:26:19 2023-06-09T08:50:47 [*1] Aborting them automatically will prevent your host from getting unreliable and sending out more copies of the same task to other clients once Replication has been set to 0. In the end, it also prevents sending out more tasks needing verification. Of course, as soon as the error has been fixed on the server and all faulty tasks have vanished, you can/should stop looking for these faulty tasks. Guarding against aborting the wrong tasks has been defined by looking for the 'signature' that concerns the 'faulty' batches (again, see post 686915). Adri [Edit 4 times, last edit by adriverhoef at Jun 9, 2023 4:23:15 PM] |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Recently Active Project Badges:
|
Oh well, what do you know? Just in time for the weekend, yet another faulty batch (after 4175 and 4176, it's now 4174)...
And still crickets from WCG Towers... Ralf |
||
|
|
|