Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Computing for Sustainable Water Forum Thread: CFSW tasks getting Computation Error right after starting |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 57
|
Author |
|
gio777
Advanced Cruncher Georgia Joined: Dec 8, 2004 Post Count: 69 Status: Offline Project Badges: |
Same Here :(
----------------------------------------8/25/2012 9:40:45 PM | World Community Grid | Reporting 155 completed tasks, requesting new tasks for CPU 8/25/2012 9:40:52 PM | World Community Grid | Scheduler request completed: got 7 new tasks 8/25/2012 9:40:54 PM | World Community Grid | Started download of cfsw_14409_14409863_D14409863.sql 8/25/2012 9:40:54 PM | World Community Grid | Started download of cfsw_14409_14409887_D14409887.sql 8/25/2012 9:40:57 PM | World Community Grid | Finished download of cfsw_14409_14409863_D14409863.sql 8/25/2012 9:40:57 PM | World Community Grid | Finished download of cfsw_14409_14409887_D14409887.sql 8/25/2012 9:40:57 PM | World Community Grid | Started download of cfsw_14410_14410043_D14410043.sql 8/25/2012 9:40:57 PM | World Community Grid | Started download of cfsw_14410_14410080_D14410080.sql 8/25/2012 9:40:57 PM | World Community Grid | Starting task cfsw_14409_14409863_0 using cfsw version 612 in slot 0 8/25/2012 9:40:57 PM | World Community Grid | Starting task cfsw_14409_14409887_0 using cfsw version 612 in slot 1 8/25/2012 9:40:58 PM | World Community Grid | Finished download of cfsw_14410_14410043_D14410043.sql 8/25/2012 9:40:58 PM | World Community Grid | Finished download of cfsw_14410_14410080_D14410080.sql 8/25/2012 9:40:58 PM | World Community Grid | Started download of cfsw_14409_14409999_D14409999.sql 8/25/2012 9:40:58 PM | World Community Grid | Started download of cfsw_14410_14410081_D14410081.sql 8/25/2012 9:40:58 PM | World Community Grid | Computation for task cfsw_14409_14409863_0 finished 8/25/2012 9:40:58 PM | World Community Grid | Output file cfsw_14409_14409863_0_0 for task cfsw_14409_14409863_0 absent 8/25/2012 9:40:58 PM | World Community Grid | Computation for task cfsw_14409_14409887_0 finished 8/25/2012 9:40:58 PM | World Community Grid | Output file cfsw_14409_14409887_0_0 for task cfsw_14409_14409887_0 absent 8/25/2012 9:40:58 PM | World Community Grid | Starting task cfsw_14410_14410080_0 using cfsw version 612 in slot 0 8/25/2012 9:40:58 PM | World Community Grid | Starting task cfsw_14410_14410043_0 using cfsw version 612 in slot 1 8/25/2012 9:40:59 PM | World Community Grid | Finished download of cfsw_14409_14409999_D14409999.sql 8/25/2012 9:40:59 PM | World Community Grid | Finished download of cfsw_14410_14410081_D14410081.sql 8/25/2012 9:40:59 PM | World Community Grid | Started download of cfsw_14410_14410046_D14410046.sql 8/25/2012 9:40:59 PM | World Community Grid | Computation for task cfsw_14410_14410080_0 finished 8/25/2012 9:40:59 PM | World Community Grid | Output file cfsw_14410_14410080_0_0 for task cfsw_14410_14410080_0 absent 8/25/2012 9:40:59 PM | World Community Grid | Computation for task cfsw_14410_14410043_0 finished 8/25/2012 9:40:59 PM | World Community Grid | Output file cfsw_14410_14410043_0_0 for task cfsw_14410_14410043_0 absent 8/25/2012 9:40:59 PM | World Community Grid | Starting task cfsw_14409_14409999_0 using cfsw version 612 in slot 0 8/25/2012 9:40:59 PM | World Community Grid | Starting task cfsw_14410_14410081_0 using cfsw version 612 in slot 1 8/25/2012 9:41:00 PM | World Community Grid | Finished download of cfsw_14410_14410046_D14410046.sql 8/25/2012 9:41:00 PM | World Community Grid | Computation for task cfsw_14409_14409999_0 finished 8/25/2012 9:41:00 PM | World Community Grid | Output file cfsw_14409_14409999_0_0 for task cfsw_14409_14409999_0 absent 8/25/2012 9:41:00 PM | World Community Grid | Computation for task cfsw_14410_14410081_0 finished 8/25/2012 9:41:00 PM | World Community Grid | Output file cfsw_14410_14410081_0_0 for task cfsw_14410_14410081_0 absent 8/25/2012 9:41:00 PM | World Community Grid | Starting task cfsw_14410_14410046_0 using cfsw version 612 in slot 0 8/25/2012 9:41:01 PM | World Community Grid | Computation for task cfsw_14410_14410046_0 finished 8/25/2012 9:41:01 PM | World Community Grid | Output file cfsw_14410_14410046_0_0 for task cfsw_14410_14410046_0 absent |
||
|
LCB001
Advanced Cruncher CANADA Joined: Oct 14, 2009 Post Count: 69 Status: Offline Project Badges: |
All the latest wu are bad with a 'output file absent' problem.
----------------------------------------This is happening on all 8 rigs that are known to be stable and error free including one with more than 1500 hrs uptime. Good Luck... [Edit 1 times, last edit by LCB001 at Aug 25, 2012 5:48:19 PM] |
||
|
JCMarsh [U.S. Army]
Cruncher Joined: Feb 8, 2012 Post Count: 5 Status: Offline |
I had a batch of cfsw_1439x wu end with computation error today, as well. After a reboot and project reset, all new cfsw wu are giving same result. Guess I'll load a batch from another project to crunch today while this is worked out.
---------------------------------------- |
||
|
guenterhb
Cruncher Joined: Sep 22, 2006 Post Count: 10 Status: Offline Project Badges: |
me too - lots of errors. I've switched to other projects
|
||
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges: |
I have a 3-day queue so my 3 crunchers hadn't started on jobs distributed today. On each cruncher, I suspended all ready jobs, plus one of the running jobs. Then I started enabling all jobs one by one, starting with the most recently downloaded. Each errored out immediately with the message "Output file cfsw_XXX for task cfsw_XXX absent" unit I worked my way back to good WUs. I hope that I have cleaned out all the bad units and now have
----------------------------------------I have dropped my queue down from 3 days so my crunchers will stop asking for more bad WUs. All the bad units I got were in the range 14349 - 14421. Cheers [edit - more bad units added] [Edit 1 times, last edit by NixChix at Aug 25, 2012 8:05:47 PM] |
||
|
X-Pilot
Cruncher Joined: Mar 24, 2008 Post Count: 11 Status: Offline |
Yep, same here... I was so close to the silver badge...
|
||
|
rbotterb
Senior Cruncher United States Joined: Jul 21, 2005 Post Count: 401 Status: Offline Project Badges: |
With all these WUs blowing up, maybe it's time for a federal recall! I've got two machines and everything they are touching in CFSW WUs is blowing sky high. Anymore messes like this and I'll soon have to brake out the hazmat suits to start mopping up all the glowing H2O left all over my house!
|
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: |
Maybe this isn't entirely bad news for badge hunters. Now many reliable crunching machines will start needing double validation, so due to the duplication, there will be more WUs to crunch.
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Today, I left my quad core Phenom II (Win7-64), that runs CFSW alongside CEP2, running at 100% while I went on a photography trip to the beach. It can run the average CFSW WU in ~32 mins (64 bit) or ~36 mins (32 bit), so I get a lot of work units, especially when my machines run 24/7 on the weekend.
CEP2 units have all run without issue, but I came home to find 149 CFSW WUs had failed as they started! Picked up a few older redos, and all but one ran without issue, but WUs in the following more recent series all failed (spot check was - exit code -1 (0xffffffff)): 14349, 14352, 14358, 14361, 14378, 14379, 14380, 14381, 14385, 14389, 14390, 14396, 14399, 14408, 14413, 14414, 14415, 14425, 14430, 14431, 14436, 14437, 14440, 14444, 14447, 14453 I received at least one of each of the following series which ran without issue today (since midnight EDT 08/25/2012): 12623, 12694, 12841, 13912, 14122, 14214, 14223, 14226, 14234, 14244, 14246, 14249, 14253, 14254, 14258, 14259, 14264, 14265, 14270, 14271, 14272, 14274, 14277, 14280, 14281, 14282, 14285, 14287, 14288, 14290, 14291, 14294, 14299, 14348 Over a dozen more have downloaded and then failed while I have been typing this post. Some of those recent series ones have _6 as a suffix, so it looks like the system is reissuing them, and they are failing repeatedly. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Afterthought: Will this ruin my machine's "reputation?" I often get assigned short deadline "redos," which, if my impressions are correct, means my machines are well-trusted.
|
||
|
|