World Community Grid - View Thread - WU's failing with disk exceeded error.

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: WU's failing with disk exceeded error.

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 37

[ ]

Author

This topic has been viewed 3764 times and has 36 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


WU's failing with disk exceeded error.

All my WU's are failing. I'm using a Intel Mac.

They run for a while then report they've used too much disk space.
The error indicates that it thinks that 75 odd Megs is over the limit.
But my preferences are set to allow 10 Gigs or 50% of disk (which would be 80 Gig).

[Feb 5, 2007 6:41:54 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU's failing with disk exceeded error.

Hello robertc99,
First, which version of BOINC are you running?
Second, are you running any projects on another website, so that preferences might be migrating globally?
Third, please paste the start of your messages and then a section where it fails.
Fourth, please check My Grid - Device Manager - Device Profile and examine your active BOINC profile. How much disk space is allowed?

Lawrence

[Feb 5, 2007 8:10:55 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU's failing with disk exceeded error.

Each work unit has an individual limit, usually far lower than the total space allocated for BOINC. This lets the projects detect unusual conditions, and prevent a misbehaving work unit from eating up all your space.

I think 75MB is about right for some of the smaller work units sent out by WCG. Most likely, the work unit is filling up an error log and failing. Have a look for the stderr file, and see if it's unusually large.

[Feb 5, 2007 8:15:39 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU's failing with disk exceeded error.

I'm running boinc 5.8.8
I'm running rosetta, simap, einstein and of course wcg.

Mon Feb 5 10:24:53 2007|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 112055; location: (none); project prefs: default
Mon Feb 5 10:24:53 2007||General prefs: from World Community Grid (last modified 2007-02-04 13:08:42)
Mon Feb 5 10:24:53 2007||Host location: none
Mon Feb 5 10:24:53 2007||General prefs: using your defaults
Mon Feb 5 10:24:53 2007|World Community Grid|Restarting task faah1293_d229n789_x2BPZ_00_1 using faah version 510
Mon Feb 5 10:24:53 2007|Einstein@Home|Restarting task h1_0335.0_S5R1__269_S5RIa_0 using einstein_S5RI version 428
Mon Feb 5 11:08:38 2007||Restarting faah1293_d229n789_x2BPZ_00_1 - message timeout
Mon Feb 5 11:08:39 2007|World Community Grid|Task faah1293_d229n789_x2BPZ_00_1 exited with zero status but no 'finished' file
Mon Feb 5 11:08:39 2007|World Community Grid|If this happens repeatedly you may need to reset the project.

I'll keep an eye on the WU and see whats growing.

[Feb 6, 2007 4:11:10 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU's failing with disk exceeded error.

Heres the failing part of the messages
Tue Feb 6 07:23:13 2007|World Community Grid|Restarting task faah1294_d231n757_x2BPZ_00_3 using faah version 510
Tue Feb 6 07:32:15 2007|World Community Grid|Aborting task faah1294_d231n757_x2BPZ_00_3: exceeded disk limit: 81.31MB > 71.53MB
Tue Feb 6 07:32:15 2007|World Community Grid|Deferring communication for 1 min 0 sec
Tue Feb 6 07:32:15 2007|World Community Grid|Reason: Unrecoverable error for result faah1294_d231n757_x2BPZ_00_3 (Maximum disk usage exceeded)
Tue Feb 6 07:32:20 2007|World Community Grid|Computation for task faah1294_d231n757_x2BPZ_00_3 finished
Tue Feb 6 07:32:22 2007||[error] Process 19542 not found

[Feb 6, 2007 4:13:29 AM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: WU's failing with disk exceeded error.

Greetings,

Are you using the intel mac version of BOINC? I don't believe the emulated version would be causing the space to reach those sizes for FAAH but I am unsure.

If you are running the intel version, are you getting these errors only on FAAH work units?

Are you initializing the graphics on these work units? This may be causing more logging and causing it to get larger than the limit set on the work unit.

-Uplinger

[Feb 6, 2007 4:24:05 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU's failing with disk exceeded error.

If there is a large stderr, I'm having trouble spotting it.
The project looks fine, with space usage of about 5 Meg.
Shortly thereafter it dies complaining about using 80 Meg.

Presumably something grew suddenly.
But when the WU dies, it gets deleted so I can't see what grew.

[Feb 6, 2007 5:08:05 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU's failing with disk exceeded error.

The boinc application is a universal binary.
So far, wcg only appears to have tried to run faah WU's.
I left all the sub applications ticked. But none of the others types have shown up.

I am not initialising the graphics.
I don't have the screensaver enabled and I havent fired up the graphics manually.

[Feb 6, 2007 5:11:38 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU's failing with disk exceeded error.

Hello robertc99,
Have you tried abort transfer (for FAAH), abort task (for FAAH), reset project (for WCG)? I am puzzled and just trying to think of something to try.

Lawrence

[Feb 6, 2007 6:34:26 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: WU's failing with disk exceeded error.

I'll try that.
But hers some more info.
Ive watched the slot directory.
The problem seems to be that checkpoint files just keep accumulating.
Heres a directory listing before it fails

total 58528
0 drwxrwxr-x 7 boinc_ma boinc_ma 238 Feb 6 15:09 ../
4 -rw-rw-r-- 1 boinc_ma boinc_pr 104 Feb 6 15:57 x2BPZ.pdbqt
4 -rw-rw-r-- 1 boinc_ma boinc_pr 106 Feb 6 15:57 wcg_faah_autodock_5.10_i686-apple-darwin
4 -rw-rw-r-- 1 boinc_ma boinc_pr 96 Feb 6 15:57 wcg_autodock4.dlg
4 -rw-rw-r-- 1 boinc_ma boinc_pr 96 Feb 6 15:57 wcg_ad4-result.xml
4 -rw-rw-r-- 1 boinc_ma boinc_pr 123 Feb 6 15:57 faah1297_d237n014_x2BPZ_00.dpf
4 -rw-rw-r-- 1 boinc_ma boinc_pr 114 Feb 6 15:57 d237n014_x2BPZ_00.gpf
4 -rw-rw-r-- 1 boinc_ma boinc_pr 107 Feb 6 15:57 d237n014.pdbqt
4 -rw-rw-r-- 1 boinc_ma boinc_pr 111 Feb 6 15:57 AD4_parameters.dat
0 -rw-r--r-- 1 boinc_pr boinc_pr 0 Feb 6 15:57 boinc_lockfile
1908 -rw-rw-r-- 1 boinc_pr boinc_pr 1952829 Feb 6 16:48 wcg_checkpoint_03.ckp
1744 -rw-rw-r-- 1 boinc_pr boinc_pr 1782522 Feb 6 16:48 wcg_checkpoint_02.ckp
1952 -rw-rw-r-- 1 boinc_pr boinc_pr 1994985 Feb 6 16:48 wcg_checkpoint_01.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 2169 Feb 6 16:48 wcg_checkpoint_0c.ckp
68 -rw-rw-r-- 1 boinc_pr boinc_pr 66332 Feb 6 16:48 wcg_checkpoint_0b.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 1459 Feb 6 16:48 wcg_checkpoint_0a.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 44 Feb 6 16:48 wcg_checkpoint_09.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 1647 Feb 6 16:48 wcg_checkpoint_08.ckp
1608 -rw-rw-r-- 1 boinc_pr boinc_pr 1643789 Feb 6 16:48 wcg_checkpoint_07.ckp
1500 -rw-rw-r-- 1 boinc_pr boinc_pr 1532871 Feb 6 16:48 wcg_checkpoint_06.ckp
1964 -rw-rw-r-- 1 boinc_pr boinc_pr 2009683 Feb 6 16:48 wcg_checkpoint_05.ckp
1896 -rw-rw-r-- 1 boinc_pr boinc_pr 1937862 Feb 6 16:48 wcg_checkpoint_04.ckp
1948 -rw-rw-r-- 1 boinc_pr boinc_pr 1994110 Feb 6 16:57 wcg_checkpoint_00.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 825 Feb 6 17:21 wcg_checkpoint_17.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 44 Feb 6 17:21 wcg_checkpoint_16.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 1647 Feb 6 17:21 wcg_checkpoint_15.ckp
1608 -rw-rw-r-- 1 boinc_pr boinc_pr 1643789 Feb 6 17:21 wcg_checkpoint_14.ckp
1500 -rw-rw-r-- 1 boinc_pr boinc_pr 1532871 Feb 6 17:21 wcg_checkpoint_13.ckp
1964 -rw-rw-r-- 1 boinc_pr boinc_pr 2009683 Feb 6 17:21 wcg_checkpoint_12.ckp
1896 -rw-rw-r-- 1 boinc_pr boinc_pr 1937862 Feb 6 17:21 wcg_checkpoint_11.ckp
1908 -rw-rw-r-- 1 boinc_pr boinc_pr 1952829 Feb 6 17:21 wcg_checkpoint_10.ckp
1744 -rw-rw-r-- 1 boinc_pr boinc_pr 1782522 Feb 6 17:21 wcg_checkpoint_0f.ckp
1952 -rw-rw-r-- 1 boinc_pr boinc_pr 1994985 Feb 6 17:21 wcg_checkpoint_0e.ckp
1948 -rw-rw-r-- 1 boinc_pr boinc_pr 1994110 Feb 6 17:21 wcg_checkpoint_0d.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 4096 Feb 6 17:22 wcg_checkpoint_18.ckp
4 -rw-rw-r-- 1 boinc_ma boinc_pr 3765 Feb 6 17:25 init_data.xml
4 -rw-rw-r-- 1 boinc_pr boinc_pr 44 Feb 6 17:25 receptor.maps.xyz
4 -rw-rw-r-- 1 boinc_pr boinc_pr 1647 Feb 6 17:25 receptor.maps.fld
212 -rw-rw-r-- 1 boinc_pr boinc_pr 215599 Feb 6 17:25 wcg_ag.log
1608 -rw-rw-r-- 1 boinc_pr boinc_pr 1643789 Feb 6 17:25 receptor.e.map
1500 -rw-rw-r-- 1 boinc_pr boinc_pr 1532871 Feb 6 17:25 receptor.d.map
1964 -rw-rw-r-- 1 boinc_pr boinc_pr 2009683 Feb 6 17:25 receptor.SA.map
1896 -rw-rw-r-- 1 boinc_pr boinc_pr 1937862 Feb 6 17:25 receptor.OA.map
1908 -rw-rw-r-- 1 boinc_pr boinc_pr 1952829 Feb 6 17:25 receptor.N.map
1744 -rw-rw-r-- 1 boinc_pr boinc_pr 1782522 Feb 6 17:25 receptor.HD.map
1952 -rw-rw-r-- 1 boinc_pr boinc_pr 1994985 Feb 6 17:25 receptor.C.map
1948 -rw-rw-r-- 1 boinc_pr boinc_pr 1994110 Feb 6 17:25 receptor.A.map
4 -rw-rw-r-- 1 boinc_pr boinc_pr 2169 Feb 6 17:34 wcg_faah.state
4 -rw-rw-r-- 1 boinc_pr boinc_pr 2169 Feb 6 17:34 wcg_checkpoint_25.ckp
64 -rw-rw-r-- 1 boinc_pr boinc_pr 61807 Feb 6 17:34 wcg_checkpoint_24.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 825 Feb 6 17:34 wcg_checkpoint_23.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 44 Feb 6 17:34 wcg_checkpoint_22.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 1647 Feb 6 17:34 wcg_checkpoint_21.ckp
1608 -rw-rw-r-- 1 boinc_pr boinc_pr 1643789 Feb 6 17:34 wcg_checkpoint_20.ckp
1500 -rw-rw-r-- 1 boinc_pr boinc_pr 1532871 Feb 6 17:34 wcg_checkpoint_1f.ckp
1964 -rw-rw-r-- 1 boinc_pr boinc_pr 2009683 Feb 6 17:34 wcg_checkpoint_1e.ckp
1896 -rw-rw-r-- 1 boinc_pr boinc_pr 1937862 Feb 6 17:34 wcg_checkpoint_1d.ckp
1908 -rw-rw-r-- 1 boinc_pr boinc_pr 1952829 Feb 6 17:34 wcg_checkpoint_1c.ckp
1744 -rw-rw-r-- 1 boinc_pr boinc_pr 1782522 Feb 6 17:34 wcg_checkpoint_1b.ckp
1952 -rw-rw-r-- 1 boinc_pr boinc_pr 1994985 Feb 6 17:34 wcg_checkpoint_1a.ckp
1948 -rw-rw-r-- 1 boinc_pr boinc_pr 1994110 Feb 6 17:34 wcg_checkpoint_19.ckp
4 -rw-rw-r-- 1 boinc_pr boinc_pr 497 Feb 6 17:34 wcg_checkpoint.dat
0 drwxrwxr-x 64 boinc_ma boinc_pr 2176 Feb 6 17:34 ./
4 -rw-rw-r-- 1 boinc_ma boinc_pr 3034 Feb 6 17:34 stderr.txt

And heres a section of the stderr log.

INFO:[17:25:01] Start AutoGrid...

autogrid: autogrid4: Successful Completion.
wcg_checkpoint() called
Skipping checkpoint
INFO:[17:25:59] End AutoGrid...
Beginning AutoDock...
INFO: Setting num_generations: 27000
Setting maxGen to 6750
autodock4: WARNING: Unrecognized keyword in docking parameter file, in line:
compute_unbound_extended # compute extended ligand energyINFO: No state to restore. Start from the beginning.
About to enter main loop...(dockings already completed: 0)
call_glss(): pop_size: 200 num_evals: 10000000 start: [17:26:10]
call_glss(): end: [17:34:17]
wcg_checkpoint() called
Starting to checkpoint ...
Checkpoint complete
call_glss(): pop_size: 200 num_evals: 10000000 start: [17:34:17]

[Feb 6, 2007 6:39:19 AM]

[ ]