Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 114
Posts: 114   Pages: 12   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 12845 times and has 113 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

15 completed, 60 to go.
However, during the last 20 hours no task got to the finish line - all got the 'exited with zero status but no finish file' task, killing all progress. No matter if running CEP2 on all 8 or only 4 cores, hands on or off.
Tried to 'pre-build' the disk allocation first, i. e. let the task run until the data is written and cpu starts working. Then, unloading tasks from cache (laim off) preserves the disk data, so they would start over smoothly later. However, this didn't prevent other tasks' exiting. Got 30 Gb of CEP2 data now, but no reduction in exits.
Tasks finishing caused others exiting.
[Feb 28, 2016 1:19:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

'exited with zero status but no finish file'

which goes with

'if this happens often, consider resetting the project' (or something to that extend.

Surprised here not to see any on my laptop with 4 concurrent from 6 before [manually paused 2]. Intel Rapid Storage Technology is running, without it, the machine really turns snail.
[Feb 28, 2016 1:34:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pvh513
Senior Cruncher
Joined: Feb 26, 2011
Post Count: 260
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Had 10 WUs so far from the new batch. 6 are in PV, 3 failed due to "Maximum disk usage exceeded" and 1 is still in progress. Looking at the wingmen for the failed units, I see that they very frequently fail with that error, but once in a blue moon a unit gets through. Not sure why that is. Maybe the timing of the moment when the usage is measured (i.e. some units happen to miss the peak usage)?? Anyway, it seems there is a serious problem there. The failure rate is clearly too high.
[Feb 28, 2016 2:17:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

The comment by Crystal Pellet is pointing... only Linux was his observation giving the issue.

Looked through the completed logs of mine so far [3] and see several heartbeat hops being recorded, the 30 second contact cessation with the core client :|, but recovering and all skipping job #4. Seen enough cases in past where the heartbeat skips were fatal, so happy they did not croak this time.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Feb 28, 2016 2:26:39 PM]
[Feb 28, 2016 2:26:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Using TThrottle to limit cpu to 70% (so that cpu usage is overall ~90%) and suspending GPU tasks also didn't prevent exits. Boinc stalls for a about a minute, losing all progress to last checkpoint, which is most of the time no checkpoint. I aborted 48 tasks and try running the remaining 12 with only 2 tasks running concurrently.
It's been some time when i crunched CEP2, but it used to work mostly fine running on all 8 threads at the same time after I upped RAM to 8 gig on Windows 7 i7-64 laptop. This is a Windows 10 i7-64 laptop, 8 gig and it couldn't handle even 4. Both laptops had magnetic hard disks, no solid state disks.
[Feb 28, 2016 4:09:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1320
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

... I aborted 48 tasks and try running the remaining 12 with only 2 tasks running concurrently.
Suddenly I got 33 resends on my Win7 i7 2600, so I suppose they're all coming from you biggrin
I'll try to do my best to return them on time.
Started the 3 with a short deadline of ~33 hours. The others have a deadline of 4 days.
I see the most were suffering from no heartbeat for 30 seconds.
[Feb 28, 2016 4:20:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2155
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
smile Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

At least you had one checkpoint > continue to PV, if no checkpoint slammed with compute error status.

Clicked the WU in Results Status:
BETA_E236295_768_S.316.C33H24N10O3.KXJKZDDABLYRQF-UHFFFAOYSA-N.13_s1_14_1-- Linux
2.6.35-32-generic 700 Pending Validation 27/02/16 00:52:56 28/02/16 15:14:27 18.00 292.9 / 0.0
BETA_E236295_768_S.316.C33H24N10O3.KXJKZDDABLYRQF-UHFFFAOYSA-N.13_s1_14_0-- Linux
4.3.3-303.fc23.x86_64 700 Pending Validation 27/02/16 00:52:41 28/02/16 13:07:01 18.00 171.9 / 0.0


Saw something new(?) under Minimum Quorum ('2') and Replication ('2'):
"Try Validation"
Clicked that and the message "Scheduled for Validation" appeared.
Waiting for any updates on this ... smile

EDIT:
Wingman didn't reach job #1:
[09:46:48] Qink name = anlman
[09:46:48] Qink name = drvman
[09:51:50] Qink name = optman
[09:51:50] Qink name = fldman
[09:51:50] Qink name = gesman
[09:51:53] Qink name = scfman
Killing job because cpu time limit has been exceeded. 0.000000||64800.280000||0.000000
[10:06:26] Finished Job #0
10:06:27 (12278): called boinc_finish

Mine was killed in job #3. To recap:
Killing job because cpu time limit has been exceeded. 64611.824290||189.070813||0.000000
[13:34:13] Finished Job #3
13:34:16 (26240): called boinc_finish
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Feb 28, 2016 4:41:18 PM]
[Feb 28, 2016 4:29:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

If these were reported at 15:51 UT, then they came mostly likely from Munich :O I wondered how they all became reissued so quickly. Maybe your machines can handle them better than mine :)
[Feb 28, 2016 4:29:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2155
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Mine was reported at 14:07 local time (13:07:01 GMT), herna:
2016-02-28T14:06:59 CET | World Community Grid | Sending scheduler request: To report completed tasks.
2016-02-28T14:06:59 CET | World Community Grid | Reporting 9 completed tasks
2016-02-28T14:06:59 CET | World Community Grid | Requesting new tasks for CPU
2016-02-28T14:07:02 CET | World Community Grid | Scheduler request completed: got 9 new tasks

My wingman reported at 15:14:27 GMT.
[Feb 28, 2016 4:48:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1320
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

If these were reported at 15:51 UT, then they came mostly likely from Munich :O I wondered how they all became reissued so quickly. Maybe your machines can handle them better than mine :)
I them out of the river Rhine on their way back to Toronto.
I got so many because the estimated run time is way too low - 4.5 and 5.5 hours.
I started 7 tasks and no hick-ups so far.
[Feb 28, 2016 5:05:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 114   Pages: 12   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread