| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 9
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Are there any problems with the restart of BOINC. I have several errors and Inconclusive after restarting and comming back from Hibernation.
Any hints ? And why is this comming ? I have 1GB set as maximum ? 05/07/2006 10:06:07|World Community Grid|Resuming task faah0680_d475cb052_x1hpv_01_2 using faah version 509 05/07/2006 10:11:08|World Community Grid|Aborting task faah0680_d475cb052_x1hpv_01_2: exceeded disk limit: 51245484.000000 > 50000000.000000 05/07/2006 10:11:08|World Community Grid|Unrecoverable error for result faah0680_d475cb052_x1hpv_01_2 (Maximum disk usage exceeded) 05/07/2006 10:11:08|World Community Grid|Deferring scheduler requests for 1 minutes and 0 seconds 05/07/2006 10:11:14||Rescheduling CPU: application exited 05/07/2006 10:11:14|World Community Grid|Computation for task faah0680_d475cb052_x1hpv_01_2 finished Thanks in advance Siegfried |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Had exactly the same 50mb error a while back. The response was that the particular WU had been compiled with too low an allowance for diskspace usage, nothing to do with your HD! Advice just to get on with the next one.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That's a slightly more unusual error.
What you're seeing is, each work unit has a built-in disk limit, in addition (and smaller than) the overall BOINC limit. This stops one work unit eating all the space, and (in this case) the work unit isn't supposed to create such large temporary files. It is perfectly possible that the hibernation has conflicted with BOINC, and corrupted the data, leading to a large out of control error log using up all the space. This would be consistent with the error occurring a few minutes after the work unit was resumed. Take a look at the work unit folder, and see what files are there, and note any very large files. Of course, the work unit folder may have been cleaned up by now. Personally, I don't find hibernation useful. It is faster to shutdown and restart properly, without running any risk of corruption. However, if you can reproduce this, then it should definitely be looked into further by the WCG techs. Keep an eye on it, and if it happens again have a look at the actual file that grew too large. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The space Problem occurs after restarting BOINC after a system restart.
Thanks for all information. I see nothing special in the workunit folder: Volume in drive C is Local Disk Volume Serial Number is 5CC0-9DEE Directory of C:\Program Files\BOINC\projects\www.worldcommunitygrid.org 05.07.2006 11:12 <DIR> . 05.07.2006 11:12 <DIR> .. 09.06.2006 16:19 39.057 avgE_from_pdb 09.06.2006 16:20 35.280 bbind00.Nov.lib 09.06.2006 16:19 561 bb_hbW 09.06.2006 16:20 479.180 disulf_jumps.dat 09.06.2006 16:20 7.722 dunsd 05.07.2006 10:16 1.489 faah0680_d475cb056_x1hpv_01_0_0 05.07.2006 10:16 75.671 faah0680_d475cb056_x1hpv_01_0_1 04.07.2006 10:22 3.878 faah0680_d475cb056_x1hpv_01_AD4_parameters.dat 04.07.2006 10:22 3.450 faah0680_d475cb056_x1hpv_01_d475cb056.pdbqt 04.07.2006 10:22 1.105 faah0680_d475cb056_x1hpv_01_d475cb056_x1hpv_01.gpf 04.07.2006 10:22 3.191 faah0680_d475cb056_x1hpv_01_faah0680_d475cb056_x1hpv_01.dpf 04.07.2006 10:22 184.555 faah0680_d475cb056_x1hpv_01_x1hpv.pdbqt 05.07.2006 07:41 3.878 faah0680_d475cb897_x1hpv_00_AD4_parameters.dat 05.07.2006 07:41 3.754 faah0680_d475cb897_x1hpv_00_d475cb897.pdbqt 05.07.2006 07:41 1.243 faah0680_d475cb897_x1hpv_00_d475cb897_x1hpv_00.gpf 05.07.2006 07:41 3.327 faah0680_d475cb897_x1hpv_00_faah0680_d475cb897_x1hpv_00.dpf 05.07.2006 07:41 184.555 faah0680_d475cb897_x1hpv_00_x1hpv.pdbqt 27.06.2006 07:34 11.994 hpf2.avgE_from_pdb.gz 27.06.2006 07:34 5.873.640 hpf2.bbdep02.May.sortlib.gz 27.06.2006 07:35 165 hpf2.Paa.gz 27.06.2006 07:34 1.831 hpf2.Paa_n.gz 27.06.2006 07:34 117.362 hpf2.Paa_pp.gz 27.06.2006 07:35 2.406 hpf2.paircutoffs.gz 27.06.2006 07:35 69.283 hpf2.pdbpairstats_fine.gz 27.06.2006 07:35 19.292 hpf2.phi.theta.36.HS.resmooth.gz 27.06.2006 07:35 11.718 hpf2.phi.theta.36.SS.resmooth.gz 27.06.2006 07:34 129.450 hpf2.plane_data_table_1015.dat.gz 27.06.2006 07:35 389.787 hpf2.Rama_smooth_dyn.dat_ss_6.4.gz 27.06.2006 07:35 1.113 hpf2.SASA-angles.dat.gz 27.06.2006 07:35 64.917 hpf2.SASA-masks.dat.gz 27.06.2006 07:35 2.475 hpf2.sasa_offsets.txt.gz 27.06.2006 07:35 34.441 hpf2.sasa_prob_cdf.txt.gz 28.06.2006 22:22 1.243 hpf2_5.07_win_paths.txt 09.06.2006 16:20 1.608 jump_templates.dat 09.06.2006 16:20 364 Paa 09.06.2006 16:19 6.272 Paa_n 09.06.2006 16:19 984.960 Paa_pp 09.06.2006 16:20 18.034 paircutoffs 09.06.2006 16:20 280.000 pdbpairstats_fine 09.06.2006 16:20 62.208 phi.theta.36.HS.resmooth 09.06.2006 16:20 41.472 phi.theta.36.SS.resmooth 09.06.2006 16:19 907.200 plane_data_table_1015.dat 09.06.2006 16:20 4.432.320 Rama_smooth_dyn.dat_ss_6.4 09.06.2006 16:20 5.382.144 rosetta_4.22_windows_intelx86 09.06.2006 16:20 13.613 SASA-angles.dat 09.06.2006 16:20 1.074.560 SASA-masks.dat 09.06.2006 16:20 6.731 sasa_offsets.txt 09.06.2006 16:20 13.260 sc_hbW 09.06.2006 16:20 17.838 template.pdb 05.07.2006 11:12 0 tree.dat 08.06.2006 15:44 1.146.880 wcg_faah_autodock_5.09_windows_intelx86 28.06.2006 22:52 11.730.944 wcg_hpf2_rosetta_5.07_windows_intelx86 09.06.2006 16:19 1.243 win_paths.txt 05.07.2006 07:41 232.261 za094_00322_aaza09403_05.075_v1_3.gz 05.07.2006 07:41 514.623 za094_00322_aaza09409_05.075_v1_3.gz 05.07.2006 07:41 222 za094_00322_za094.fasta.gz 05.07.2006 07:41 209 za094_00322_za094.psipred.gz 05.07.2006 07:41 716 za094_00322_za094.psipred_ss2.gz 05.07.2006 07:41 232.261 za094_00333_aaza09403_05.075_v1_3.gz 05.07.2006 07:41 514.623 za094_00333_aaza09409_05.075_v1_3.gz 05.07.2006 07:41 222 za094_00333_za094.fasta.gz 05.07.2006 07:41 209 za094_00333_za094.psipred.gz 05.07.2006 07:41 716 za094_00333_za094.psipred_ss2.gz 63 File(s) 35.380.726 bytes 2 Dir(s) 2.375.688.192 bytes free Thanks in advance. Siegfried |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Hibernation....here in the land of unstable electricity, the Standby option is experienced as 'The' solution, not hybing. Takes about 5 seconds to shut down and 15 to get back to where i was. The PSU provides sufficient standby juice to maintain that RAM state for a long long time.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
For reference, running work units keep their files in the slot folders, e.g. C:\Program Files\BOINC\slots\0
It will be cleaned out for the next work unit by now, but that's where the interesting stuff will be should it happen again. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks for the information. Nothing special in the C:\Program Files\BOINC\slots\0 folder.
Regarding Hibernation, it seems that the files not found happens often after comming back from hibernation: 2006-07-05 05:33:37 [World Community Grid] Starting task faah0680_d475cb052_x1hpv_01_2 using faah version 509 2006-07-05 07:32:44 [World Community Grid] Task faah0680_d475cb052_x1hpv_01_2 exited with zero status but no 'finished' file 2006-07-05 07:32:44 [World Community Grid] If this happens repeatedly you may need to reset the project. 2006-07-05 07:32:44 [---] Rescheduling CPU: application exited error in the status information: Checkpoint complete call_glss(): pop_size: 200 num_evals: 10000000 start: [09:42:39] call_glss(): end: [09:57:30] wcg_checkpoint() called Starting to checkpoint ... Failed to open wcg_checkpoint.dat for reading. rc: 2. File doesn't exist? INFO: CPU Idle Factor is 0.000000 World Community Grid AutoDock (projects/www.worldcommunitygrid.org/wcg_faah_autodock_5.09_windows_intelx86) version Failed to get VersionInfo size: 1812 Failed to open receptor.maps.fld for reading. rc: 2. File doesn't exist? INFO:[10:06:12] Start AutoGrid... autogrid: autogrid4: Successful Completion. wcg_checkpoint() called Starting to checkpoint ... Checkpoint complete INFO:[10:09:02] End AutoGrid... Beginning AutoDock... INFO: Setting num_generations: 27000 Setting maxGen to 6750 Failed to open wcg_faah.state for reading. rc: 2. File doesn't exist? Thanks in advance. Siegfried |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
No, that's perfectly normal. It's just looking for files to see if it needs to restart from a checkpoint, or whether it should start at the beginning. All your FAAH work units will log something similar to this.
|
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
Are there any problems with the restart of BOINC. I have several errors and Inconclusive after restarting and comming back from Hibernation. Any hints ? And why is this comming ? I have 1GB set as maximum ? 05/07/2006 10:06:07|World Community Grid|Resuming task faah0680_d475cb052_x1hpv_01_2 using faah version 509 05/07/2006 10:11:08|World Community Grid|Aborting task faah0680_d475cb052_x1hpv_01_2: exceeded disk limit: 51245484.000000 > 50000000.000000 05/07/2006 10:11:08|World Community Grid|Unrecoverable error for result faah0680_d475cb052_x1hpv_01_2 (Maximum disk usage exceeded) 05/07/2006 10:11:08|World Community Grid|Deferring scheduler requests for 1 minutes and 0 seconds 05/07/2006 10:11:14||Rescheduling CPU: application exited 05/07/2006 10:11:14|World Community Grid|Computation for task faah0680_d475cb052_x1hpv_01_2 finished Thanks in advance Siegfried BOINC allows the project to set a limit on each workunit that causes the workunit to abort if the amount of disk used by the workunit exceeds this threshold. It prevents an application from running amok and dumping out tons of data if something goes wrong. You got this message becuase we set the limit for FightAIDS@Home at 50000000. However there have been about 30 workunits (out of around 10,000's that we have run) that needed more disk space then this. All new FightAIDS@Home batches on BOINC now are now set to 75000000 (there are still a few old ones going through though). We apologize for the problem. [Edit 2 times, last edit by knreed at Jul 9, 2006 2:44:17 AM] |
||
|
|
|