| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 11
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I am currently crunching a Beta Wu that has the following results listed: (notice the amount of errors and inconclusives) Should I abort?
Workunit Status Project Name: Beta Testing 2 Created: 08/31/2007 14:53:52 Name: BETA_ach1_1_48 Minimum Quorum: 10 Initial Replication: 15 The large number of copies sent out for this workunit is due to the unique nature of this project. We encourage you to read the FAQs about this project for more information. Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit BETA_ ach1_ 1_ 48_ 59-- In Progress 09/02/2007 20:17:24 09/03/2007 22:47:17 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 53-- In Progress 09/02/2007 20:17:05 09/03/2007 13:13:28 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 50-- In Progress 09/02/2007 20:16:48 09/03/2007 01:40:48 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 54-- In Progress 09/02/2007 20:16:40 09/03/2007 08:31:17 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 55-- In Progress 09/02/2007 20:16:24 09/03/2007 01:40:24 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 56-- In Progress 09/02/2007 20:16:15 09/04/2007 02:37:26 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 58-- In Progress 09/02/2007 20:16:04 09/03/2007 01:40:04 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 57-- In Progress 09/02/2007 20:15:55 09/04/2007 02:43:54 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 42-- In Progress 09/02/2007 12:45:19 09/03/2007 13:46:38 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 35-- Inconclusive 09/02/2007 12:42:04 09/02/2007 20:14:21 6.77 110.4 / 0.0 BETA_ ach1_ 1_ 48_ 31-- In Progress 09/02/2007 12:41:01 09/04/2007 00:08:26 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 25-- Error 09/02/2007 05:18:46 09/02/2007 12:15:57 6.47 73.6 / 0.0 BETA_ ach1_ 1_ 48_ 22-- Inconclusive 09/02/2007 05:18:07 09/02/2007 12:44:08 7.08 89.0 / 0.0 BETA_ ach1_ 1_ 48_ 21-- No Reply 09/02/2007 05:18:01 09/02/2007 10:42:01 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 15-- Inconclusive 09/01/2007 17:12:42 09/02/2007 05:16:29 6.42 125.3 / 0.0 BETA_ ach1_ 1_ 48_ 14-- Inconclusive 09/01/2007 09:42:55 09/02/2007 03:33:21 6.51 90.6 / 0.0 BETA_ ach1_ 1_ 48_ 13-- Inconclusive 09/01/2007 04:15:34 09/02/2007 06:53:37 9.75 88.8 / 0.0 BETA_ ach1_ 1_ 48_ 12-- Inconclusive 09/01/2007 03:03:58 09/01/2007 13:17:54 6.84 98.7 / 0.0 BETA_ ach1_ 1_ 48_ 11-- Inconclusive 08/31/2007 22:50:32 09/02/2007 12:39:50 9.88 126.0 / 0.0 BETA_ ach1_ 1_ 48_ 10-- Error 08/31/2007 20:19:57 09/01/2007 17:11:23 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 1-- Error 08/31/2007 15:38:18 08/31/2007 22:50:28 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 0-- Inconclusive 08/31/2007 15:36:44 09/01/2007 03:36:11 6.44 86.0 / 0.0 BETA_ ach1_ 1_ 48_ 2-- Error 08/31/2007 15:36:40 09/01/2007 03:02:33 7.35 82.0 / 0.0 BETA_ ach1_ 1_ 48_ 9-- Inconclusive 08/31/2007 15:35:35 09/01/2007 08:33:07 7.47 69.4 / 0.0 BETA_ ach1_ 1_ 48_ 3-- Inconclusive 08/31/2007 15:35:15 08/31/2007 21:52:11 5.98 81.3 / 0.0 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I think you don't need to crunch that workunit, since more than 10 crunched workunits are already submitted. If the deadline of the workunits are too short, I'd abort it, though I might continue to crunch.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I think you don't need to crunch that workunit, since more than 10 crunched workunits are already submitted. If the deadline of the workunits are too short, I'd abort it, though I might continue to crunch. Except that they seem to have reissued all 10 recently. They may have fixed the problem and reissued the wu's for further testing. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
i got the WU at 20:16 and should return till 1:40 ??
why do others have more time for the same workunit (BETA_ ach1_ 1_ 48_ 50) mine: sent - return due 09/02/2007 20:16:48 09/03/2007 01:40:48 some other one: 09/02/2007 20:17:05 09/03/2007 13:13:28 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I completed the WU. It is showing "to late."
Workunit Status Project Name: Beta Testing 2 Created: 08/31/2007 14:53:52 Name: BETA_ach1_1_48 Minimum Quorum: 10 Initial Replication: 22 The large number of copies sent out for this workunit is due to the unique nature of this project. We encourage you to read the FAQs about this project for more information. Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit BETA_ ach1_ 1_ 48_ 97-- In Progress 09/03/2007 04:48:53 09/03/2007 10:12:53 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 92-- In Progress 09/03/2007 04:48:50 09/03/2007 10:12:50 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 88-- In Progress 09/03/2007 02:32:29 09/04/2007 07:32:44 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 85-- In Progress 09/03/2007 02:31:33 09/03/2007 07:55:33 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 84-- In Progress 09/03/2007 02:29:59 09/03/2007 07:53:59 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 81-- In Progress 09/03/2007 02:29:46 09/03/2007 07:53:46 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 73-- In Progress 09/03/2007 01:35:16 09/03/2007 22:41:56 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 70-- In Progress 09/03/2007 01:35:00 09/04/2007 03:59:09 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 62-- No Reply 09/02/2007 21:10:03 09/03/2007 02:34:03 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 61-- No Reply 09/02/2007 21:02:07 09/03/2007 02:26:07 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 59-- Too Late 09/02/2007 20:17:24 09/03/2007 05:46:24 8.72 84.7 / 0.0 BETA_ ach1_ 1_ 48_ 53-- Too Late 09/02/2007 20:17:05 09/03/2007 01:33:19 5.02 103.5 / 0.0 BETA_ ach1_ 1_ 48_ 50-- Error 09/02/2007 20:16:48 09/02/2007 21:18:41 0.85 6.3 / 0.0 BETA_ ach1_ 1_ 48_ 54-- Too Late 09/02/2007 20:16:40 09/03/2007 02:28:16 5.32 130.2 / 0.0 BETA_ ach1_ 1_ 48_ 55-- No Reply 09/02/2007 20:16:24 09/03/2007 01:40:24 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 56-- Too Late 09/02/2007 20:16:15 09/03/2007 06:43:20 9.96 100.0 / 0.0 BETA_ ach1_ 1_ 48_ 58-- No Reply 09/02/2007 20:16:04 09/03/2007 01:40:04 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 57-- Too Late 09/02/2007 20:15:55 09/03/2007 04:47:22 7.84 81.6 / 0.0 BETA_ ach1_ 1_ 48_ 42-- Too Late 09/02/2007 12:45:19 09/02/2007 21:08:45 8.15 116.2 / 0.0 BETA_ ach1_ 1_ 48_ 35-- Too Late 09/02/2007 12:42:04 09/02/2007 20:14:21 6.77 110.4 / 0.0 BETA_ ach1_ 1_ 48_ 31-- Too Late 09/02/2007 12:41:01 09/02/2007 21:00:00 6.75 63.1 / 0.0 BETA_ ach1_ 1_ 48_ 25-- Error 09/02/2007 05:18:46 09/02/2007 12:15:57 6.47 73.6 / 0.0 BETA_ ach1_ 1_ 48_ 22-- Too Late 09/02/2007 05:18:07 09/02/2007 12:44:08 7.08 89.0 / 0.0 BETA_ ach1_ 1_ 48_ 21-- No Reply 09/02/2007 05:18:01 09/02/2007 10:42:01 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 48_ 15-- Too Late 09/01/2007 17:12:42 09/02/2007 05:16:29 6.42 125.3 / 0.0 |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
All successfully completed results returned say "too late". Happened before, most probable cause a '0' short entry, when entering the deadline time, which is specified in seconds. knreed will surely correct, when back from the 1 week conference.
----------------------------------------cheers
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Seventeen invalid and three valid results returned (are there more results which aren't being displayed?). Replication 33. Dorothy, we ain't in Kansas no more... Personally I think the replication should be dropped to 1 and validate the returned models by checking the energy flux and also sanity checking the individual cells for sensible temp/pressure/humidity/velocity. An unbalanced model would indicate floating point errors. OK, we'd need to upload 150MB, but that's got to be better than doing the same work 33 times. Give credit for the upload, and limit the number of outstanding workunits per host (so the host doesn't get buried with an increasing upload queue). Project Name: Beta Testing 2 Created: 08/30/2007 19:07:31 Name: BETA_ach1_1_38 Minimum Quorum: 10 Initial Replication: 33 The large number of copies sent out for this workunit is due to the unique nature of this project. We encourage you to read the FAQs about this project for more information. Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit BETA_ ach1_ 1_ 38_ 118-- Invalid 09/04/2007 17:12:24 09/05/2007 01:55:38 8.59 104.2 / 40.9 BETA_ ach1_ 1_ 38_ 108-- Valid 09/04/2007 10:34:45 09/05/2007 01:52:46 14.52 133.2 / 81.9 BETA_ ach1_ 1_ 38_ 99-- Error 09/04/2007 05:08:31 09/04/2007 18:15:07 12.96 91.4 / 0.0 BETA_ ach1_ 1_ 38_ 98-- Invalid 09/03/2007 23:11:35 09/04/2007 09:54:12 8.72 60.3 / 40.9 BETA_ ach1_ 1_ 38_ 89-- Invalid 09/03/2007 20:24:07 09/04/2007 16:23:54 9.90 62.4 / 40.9 BETA_ ach1_ 1_ 38_ 88-- No Reply 09/03/2007 13:02:41 09/03/2007 18:26:41 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 38_ 79-- Invalid 09/03/2007 05:59:24 09/03/2007 22:30:09 16.20 107.1 / 40.9 BETA_ ach1_ 1_ 38_ 72-- Invalid 09/03/2007 04:18:13 09/03/2007 11:52:45 7.13 70.0 / 40.9 BETA_ ach1_ 1_ 38_ 84-- Invalid 09/03/2007 03:49:32 09/03/2007 13:34:50 7.34 56.8 / 40.9 BETA_ ach1_ 1_ 38_ 70-- Invalid 09/03/2007 01:49:52 09/03/2007 11:26:34 9.34 99.5 / 40.9 BETA_ ach1_ 1_ 38_ 66-- Invalid 09/02/2007 23:45:56 09/03/2007 13:17:25 7.41 86.3 / 40.9 BETA_ ach1_ 1_ 38_ 67-- No Reply 09/02/2007 22:36:45 09/03/2007 04:00:45 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 38_ 68-- Invalid 09/02/2007 21:55:58 09/03/2007 07:50:13 9.47 70.0 / 40.9 BETA_ ach1_ 1_ 38_ 65-- Invalid 09/02/2007 19:44:09 09/03/2007 10:14:15 14.48 87.5 / 40.9 BETA_ ach1_ 1_ 38_ 69-- Error 09/02/2007 19:00:30 09/02/2007 19:03:29 0.00 0.0 / 0.0 BETA_ ach1_ 1_ 38_ 64-- Invalid 09/02/2007 18:52:46 09/03/2007 00:58:45 6.03 64.6 / 40.9 BETA_ ach1_ 1_ 38_ 63-- Invalid 09/02/2007 18:45:01 09/03/2007 07:09:11 9.20 84.6 / 40.9 BETA_ ach1_ 1_ 38_ 61-- Invalid 09/02/2007 18:39:56 09/03/2007 03:17:03 8.13 77.5 / 40.9 BETA_ ach1_ 1_ 38_ 60-- Invalid 09/02/2007 18:31:17 09/03/2007 00:19:48 5.78 103.7 / 40.9 BETA_ ach1_ 1_ 38_ 52-- Valid 09/02/2007 17:31:44 09/03/2007 04:54:30 6.71 66.8 / 81.9 BETA_ ach1_ 1_ 38_ 51-- Valid 09/02/2007 11:51:25 09/02/2007 21:04:52 8.74 78.8 / 81.9 BETA_ ach1_ 1_ 38_ 48-- Invalid 09/02/2007 11:46:32 09/03/2007 11:39:29 7.40 130.2 / 40.9 BETA_ ach1_ 1_ 38_ 38-- Invalid 09/02/2007 04:45:51 09/02/2007 11:26:05 6.54 83.5 / 40.9 BETA_ ach1_ 1_ 38_ 41-- Invalid 09/02/2007 04:32:17 09/02/2007 11:27:02 6.75 81.8 / 40.9 BETA_ ach1_ 1_ 38_ 33-- Invalid 09/02/2007 04:27:00 09/02/2007 18:10:06 11.71 74.3 / 40.9 [Edit 5 times, last edit by Former Member at Sep 5, 2007 7:56:32 AM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Seventeen invalid and three valid results returned (are there more results which aren't being displayed?). Replication 33. Dorothy, we ain't in Kansas no more... Personally I think the replication should be dropped to 1 and validate the returned models by checking the energy flux and also sanity checking the individual cells for sensible temp/pressure/humidity/velocity. An unbalanced model would indicate floating point errors. OK, we'd need to upload 150MB, but that's got to be better than doing the same work 33 times. Give credit for the upload, and limit the number of outstanding workunits per host (so the host doesn't get buried with an increasing upload queue). Did not catch wat you mean with the bolded part. WCG default connect is 0.3 days (BOINC is 0.1) and the additional buffer only for 5.10 is default 0.25. The sum should not give you more than 1 per core in progress plus 1 per core in queue. My early Beta recommendation was to set connect to 0.1 days or lower, where one would expect 'active' participation of those having subscribed to Beta testing. That should have prevented any queuing. It being a first of this nature, we learn as we go.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Just a theoretical issue, might not be an issue in reality.
The example is : * Upload size is 150MB * The user's upload speed is too slow (say it takes 10 hours to upload) * Model takes (for example) 6 hours to complete First model downloads, completes after 6 hours. PC downloads 2nd model, completes it in 6 hours PC downloads 3rd model, starts processing it First models upload finishes, 2nd models upload starts 3rd model completes, PC downloads the 4th, completes it Second model's upload finishes ... etc So the number of files to be uploaded is increasing quicker than Boinc can transfer them. But because Boinc doesn't track the upload queue, it keeps downloading and running more models. All but the first model will timeout. The boinc scheduler is concerned with keeping the CPU busy and doesn't look at the data transfers involved. So in this particular 'worst case' example the bottleneck is the upload speed rather than the CPU time. If the user's upload speed is quick enough to upload a result file in less time than the model takes to run (say, 6 hours) then the issue won't happen. |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
So the number of files to be uploaded is increasing quicker than Boinc can transfer them. But because Boinc doesn't track the upload queue, it keeps downloading and running more models. All but the first model will timeout. The boinc scheduler is concerned with keeping the CPU busy and doesn't look at the data transfers involved. Actually, BOINC does keep track of upload-queue, if a project has more uploads than 2 x ncpus, work-request to the project is blocked. ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
|
|