| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 37
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
All my WU's are failing. I'm using a Intel Mac.
They run for a while then report they've used too much disk space. The error indicates that it thinks that 75 odd Megs is over the limit. But my preferences are set to allow 10 Gigs or 50% of disk (which would be 80 Gig). |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello robertc99,
First, which version of BOINC are you running? Second, are you running any projects on another website, so that preferences might be migrating globally? Third, please paste the start of your messages and then a section where it fails. Fourth, please check My Grid - Device Manager - Device Profile and examine your active BOINC profile. How much disk space is allowed? Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Each work unit has an individual limit, usually far lower than the total space allocated for BOINC. This lets the projects detect unusual conditions, and prevent a misbehaving work unit from eating up all your space.
I think 75MB is about right for some of the smaller work units sent out by WCG. Most likely, the work unit is filling up an error log and failing. Have a look for the stderr file, and see if it's unusually large. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I'm running boinc 5.8.8
I'm running rosetta, simap, einstein and of course wcg. Mon Feb 5 10:24:53 2007|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 112055; location: (none); project prefs: default Mon Feb 5 10:24:53 2007||General prefs: from World Community Grid (last modified 2007-02-04 13:08:42) Mon Feb 5 10:24:53 2007||Host location: none Mon Feb 5 10:24:53 2007||General prefs: using your defaults Mon Feb 5 10:24:53 2007|World Community Grid|Restarting task faah1293_d229n789_x2BPZ_00_1 using faah version 510 Mon Feb 5 10:24:53 2007|Einstein@Home|Restarting task h1_0335.0_S5R1__269_S5RIa_0 using einstein_S5RI version 428 Mon Feb 5 11:08:38 2007||Restarting faah1293_d229n789_x2BPZ_00_1 - message timeout Mon Feb 5 11:08:39 2007|World Community Grid|Task faah1293_d229n789_x2BPZ_00_1 exited with zero status but no 'finished' file Mon Feb 5 11:08:39 2007|World Community Grid|If this happens repeatedly you may need to reset the project. I'll keep an eye on the WU and see whats growing. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Heres the failing part of the messages
Tue Feb 6 07:23:13 2007|World Community Grid|Restarting task faah1294_d231n757_x2BPZ_00_3 using faah version 510 Tue Feb 6 07:32:15 2007|World Community Grid|Aborting task faah1294_d231n757_x2BPZ_00_3: exceeded disk limit: 81.31MB > 71.53MB Tue Feb 6 07:32:15 2007|World Community Grid|Deferring communication for 1 min 0 sec Tue Feb 6 07:32:15 2007|World Community Grid|Reason: Unrecoverable error for result faah1294_d231n757_x2BPZ_00_3 (Maximum disk usage exceeded) Tue Feb 6 07:32:20 2007|World Community Grid|Computation for task faah1294_d231n757_x2BPZ_00_3 finished Tue Feb 6 07:32:22 2007||[error] Process 19542 not found |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Greetings,
Are you using the intel mac version of BOINC? I don't believe the emulated version would be causing the space to reach those sizes for FAAH but I am unsure. If you are running the intel version, are you getting these errors only on FAAH work units? Are you initializing the graphics on these work units? This may be causing more logging and causing it to get larger than the limit set on the work unit. -Uplinger |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If there is a large stderr, I'm having trouble spotting it.
The project looks fine, with space usage of about 5 Meg. Shortly thereafter it dies complaining about using 80 Meg. Presumably something grew suddenly. But when the WU dies, it gets deleted so I can't see what grew. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The boinc application is a universal binary.
So far, wcg only appears to have tried to run faah WU's. I left all the sub applications ticked. But none of the others types have shown up. I am not initialising the graphics. I don't have the screensaver enabled and I havent fired up the graphics manually. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello robertc99,
Have you tried abort transfer (for FAAH), abort task (for FAAH), reset project (for WCG)? I am puzzled and just trying to think of something to try. Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I'll try that.
But hers some more info. Ive watched the slot directory. The problem seems to be that checkpoint files just keep accumulating. Heres a directory listing before it fails total 58528 0 drwxrwxr-x 7 boinc_ma boinc_ma 238 Feb 6 15:09 ../ 4 -rw-rw-r-- 1 boinc_ma boinc_pr 104 Feb 6 15:57 x2BPZ.pdbqt 4 -rw-rw-r-- 1 boinc_ma boinc_pr 106 Feb 6 15:57 wcg_faah_autodock_5.10_i686-apple-darwin 4 -rw-rw-r-- 1 boinc_ma boinc_pr 96 Feb 6 15:57 wcg_autodock4.dlg 4 -rw-rw-r-- 1 boinc_ma boinc_pr 96 Feb 6 15:57 wcg_ad4-result.xml 4 -rw-rw-r-- 1 boinc_ma boinc_pr 123 Feb 6 15:57 faah1297_d237n014_x2BPZ_00.dpf 4 -rw-rw-r-- 1 boinc_ma boinc_pr 114 Feb 6 15:57 d237n014_x2BPZ_00.gpf 4 -rw-rw-r-- 1 boinc_ma boinc_pr 107 Feb 6 15:57 d237n014.pdbqt 4 -rw-rw-r-- 1 boinc_ma boinc_pr 111 Feb 6 15:57 AD4_parameters.dat 0 -rw-r--r-- 1 boinc_pr boinc_pr 0 Feb 6 15:57 boinc_lockfile 1908 -rw-rw-r-- 1 boinc_pr boinc_pr 1952829 Feb 6 16:48 wcg_checkpoint_03.ckp 1744 -rw-rw-r-- 1 boinc_pr boinc_pr 1782522 Feb 6 16:48 wcg_checkpoint_02.ckp 1952 -rw-rw-r-- 1 boinc_pr boinc_pr 1994985 Feb 6 16:48 wcg_checkpoint_01.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 2169 Feb 6 16:48 wcg_checkpoint_0c.ckp 68 -rw-rw-r-- 1 boinc_pr boinc_pr 66332 Feb 6 16:48 wcg_checkpoint_0b.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 1459 Feb 6 16:48 wcg_checkpoint_0a.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 44 Feb 6 16:48 wcg_checkpoint_09.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 1647 Feb 6 16:48 wcg_checkpoint_08.ckp 1608 -rw-rw-r-- 1 boinc_pr boinc_pr 1643789 Feb 6 16:48 wcg_checkpoint_07.ckp 1500 -rw-rw-r-- 1 boinc_pr boinc_pr 1532871 Feb 6 16:48 wcg_checkpoint_06.ckp 1964 -rw-rw-r-- 1 boinc_pr boinc_pr 2009683 Feb 6 16:48 wcg_checkpoint_05.ckp 1896 -rw-rw-r-- 1 boinc_pr boinc_pr 1937862 Feb 6 16:48 wcg_checkpoint_04.ckp 1948 -rw-rw-r-- 1 boinc_pr boinc_pr 1994110 Feb 6 16:57 wcg_checkpoint_00.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 825 Feb 6 17:21 wcg_checkpoint_17.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 44 Feb 6 17:21 wcg_checkpoint_16.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 1647 Feb 6 17:21 wcg_checkpoint_15.ckp 1608 -rw-rw-r-- 1 boinc_pr boinc_pr 1643789 Feb 6 17:21 wcg_checkpoint_14.ckp 1500 -rw-rw-r-- 1 boinc_pr boinc_pr 1532871 Feb 6 17:21 wcg_checkpoint_13.ckp 1964 -rw-rw-r-- 1 boinc_pr boinc_pr 2009683 Feb 6 17:21 wcg_checkpoint_12.ckp 1896 -rw-rw-r-- 1 boinc_pr boinc_pr 1937862 Feb 6 17:21 wcg_checkpoint_11.ckp 1908 -rw-rw-r-- 1 boinc_pr boinc_pr 1952829 Feb 6 17:21 wcg_checkpoint_10.ckp 1744 -rw-rw-r-- 1 boinc_pr boinc_pr 1782522 Feb 6 17:21 wcg_checkpoint_0f.ckp 1952 -rw-rw-r-- 1 boinc_pr boinc_pr 1994985 Feb 6 17:21 wcg_checkpoint_0e.ckp 1948 -rw-rw-r-- 1 boinc_pr boinc_pr 1994110 Feb 6 17:21 wcg_checkpoint_0d.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 4096 Feb 6 17:22 wcg_checkpoint_18.ckp 4 -rw-rw-r-- 1 boinc_ma boinc_pr 3765 Feb 6 17:25 init_data.xml 4 -rw-rw-r-- 1 boinc_pr boinc_pr 44 Feb 6 17:25 receptor.maps.xyz 4 -rw-rw-r-- 1 boinc_pr boinc_pr 1647 Feb 6 17:25 receptor.maps.fld 212 -rw-rw-r-- 1 boinc_pr boinc_pr 215599 Feb 6 17:25 wcg_ag.log 1608 -rw-rw-r-- 1 boinc_pr boinc_pr 1643789 Feb 6 17:25 receptor.e.map 1500 -rw-rw-r-- 1 boinc_pr boinc_pr 1532871 Feb 6 17:25 receptor.d.map 1964 -rw-rw-r-- 1 boinc_pr boinc_pr 2009683 Feb 6 17:25 receptor.SA.map 1896 -rw-rw-r-- 1 boinc_pr boinc_pr 1937862 Feb 6 17:25 receptor.OA.map 1908 -rw-rw-r-- 1 boinc_pr boinc_pr 1952829 Feb 6 17:25 receptor.N.map 1744 -rw-rw-r-- 1 boinc_pr boinc_pr 1782522 Feb 6 17:25 receptor.HD.map 1952 -rw-rw-r-- 1 boinc_pr boinc_pr 1994985 Feb 6 17:25 receptor.C.map 1948 -rw-rw-r-- 1 boinc_pr boinc_pr 1994110 Feb 6 17:25 receptor.A.map 4 -rw-rw-r-- 1 boinc_pr boinc_pr 2169 Feb 6 17:34 wcg_faah.state 4 -rw-rw-r-- 1 boinc_pr boinc_pr 2169 Feb 6 17:34 wcg_checkpoint_25.ckp 64 -rw-rw-r-- 1 boinc_pr boinc_pr 61807 Feb 6 17:34 wcg_checkpoint_24.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 825 Feb 6 17:34 wcg_checkpoint_23.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 44 Feb 6 17:34 wcg_checkpoint_22.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 1647 Feb 6 17:34 wcg_checkpoint_21.ckp 1608 -rw-rw-r-- 1 boinc_pr boinc_pr 1643789 Feb 6 17:34 wcg_checkpoint_20.ckp 1500 -rw-rw-r-- 1 boinc_pr boinc_pr 1532871 Feb 6 17:34 wcg_checkpoint_1f.ckp 1964 -rw-rw-r-- 1 boinc_pr boinc_pr 2009683 Feb 6 17:34 wcg_checkpoint_1e.ckp 1896 -rw-rw-r-- 1 boinc_pr boinc_pr 1937862 Feb 6 17:34 wcg_checkpoint_1d.ckp 1908 -rw-rw-r-- 1 boinc_pr boinc_pr 1952829 Feb 6 17:34 wcg_checkpoint_1c.ckp 1744 -rw-rw-r-- 1 boinc_pr boinc_pr 1782522 Feb 6 17:34 wcg_checkpoint_1b.ckp 1952 -rw-rw-r-- 1 boinc_pr boinc_pr 1994985 Feb 6 17:34 wcg_checkpoint_1a.ckp 1948 -rw-rw-r-- 1 boinc_pr boinc_pr 1994110 Feb 6 17:34 wcg_checkpoint_19.ckp 4 -rw-rw-r-- 1 boinc_pr boinc_pr 497 Feb 6 17:34 wcg_checkpoint.dat 0 drwxrwxr-x 64 boinc_ma boinc_pr 2176 Feb 6 17:34 ./ 4 -rw-rw-r-- 1 boinc_ma boinc_pr 3034 Feb 6 17:34 stderr.txt And heres a section of the stderr log. INFO:[17:25:01] Start AutoGrid... autogrid: autogrid4: Successful Completion. wcg_checkpoint() called Skipping checkpoint INFO:[17:25:59] End AutoGrid... Beginning AutoDock... INFO: Setting num_generations: 27000 Setting maxGen to 6750 autodock4: WARNING: Unrecognized keyword in docking parameter file, in line: compute_unbound_extended # compute extended ligand energyINFO: No state to restore. Start from the beginning. About to enter main loop...(dockings already completed: 0) call_glss(): pop_size: 200 num_evals: 10000000 start: [17:26:10] call_glss(): end: [17:34:17] wcg_checkpoint() called Starting to checkpoint ... Checkpoint complete call_glss(): pop_size: 200 num_evals: 10000000 start: [17:34:17] |
||
|
|
|