World Community Grid - View Thread - Maximum elapsed time exceeded and exception

World Community Grid Forums

Category: Completed Research

Forum: Outsmart Ebola Together

Thread: Maximum elapsed time exceeded and exception

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 19

[ ]

Author

This topic has been viewed 3083 times and has 18 replies

Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Fight Childhood Cancer

5 year badge for The Clean Energy Project - Phase 2

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

100 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

100 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Maximum elapsed time exceeded and exception

I have a few WUs, which after 15 hours reported "Maximum elapsed time exceeded" and then crashed:
OET1_ 0000333_ xMBGP-OM_ rig_ 9319_ 0--
OET1_ 0000333_ xMBGP-OM_ rig_ 10958_ 0--


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E

----------------------------------------

[Feb 28, 2015 3:18:35 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Maximum elapsed time exceeded and exception

Means your machine did a factor 40 longer to complete the task than was originally estimated, and yes many of the 333 are known to run long. After 15 hours failing, the original estimated must then have been less than 22.5 minutes. The project average last few days has been well over one hour, so it's hard to tell how this could have happened.

Does the device have a variable CPU speed? If the task came at for instance client benchmark 15000 but then was really running at a benchmark speed of 7500, this could occur sooner. At any rate, the enormous variability may be a reason for the technicians to up the </rsc_fpops_bound> to a factor 50.

----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 28, 2015 3:50:03 PM]

[Feb 28, 2015 3:48:54 PM]

Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:


Re: Maximum elapsed time exceeded and exception

That machine runs stable since a few years and it's not a slow one (Core i5-750). It received several 333 and 490 units with estimated runtime of ~24 minutes.
One issue is why is the estimated runtime so low and the other I believe it should not raise an exception.

----------------------------------------

[Feb 28, 2015 4:04:10 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Maximum elapsed time exceeded and exception

The explanation is in my previous post.... the exceptions are covered through the upper bound factor of 40. Other projects have a factor 5 to 10. The unit must have been -originally- assigned right when a larger set of shorts were influencing the average run time for new work to be that low. This is not expressed in the day averages of which lowest was 0.69 hours on Feb.23, hence the suggestion made to the technicians.

Of course the client benchmark could be heavily optimistic. What are the Whetstone/Dhrystone values and was it Linux or Windows? If a device is claiming to be a Ferrari but really runs as VW, the allowed maximum runtime starts to play a roll.

BTW, I've seen tasks with max exceed being credited, just don't know if this is policy.

[Feb 28, 2015 4:17:57 PM]

Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:


Re: Maximum elapsed time exceeded and exception

It's a WinXP machine. I don't have results of the past benchmark, but a recent re-run gives:
2923 floating point MIPS (Whetstone) per CPU
7117 integer MIPS (Dhrystone) per CPU

----------------------------------------

[Feb 28, 2015 4:41:03 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Maximum elapsed time exceeded and exception

Of course the other question, seeing WinXP, what client version?

[Feb 28, 2015 7:03:35 PM]

Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:


Re: Maximum elapsed time exceeded and exception

7.2.42

----------------------------------------

[Feb 28, 2015 7:59:16 PM]

seippel
Former World Community Grid Tech
Joined: Apr 16, 2009
Post Count: 392
Status: Offline
Project Badges:

14 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

180 day badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

180 day badge for Smash Childhood Cancer

2 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Maximum elapsed time exceeded and exception

The OET1_ *-OM batches run quite a bit longer than other OET1 batches which causes problems for the estimations. A few weeks ago we decided to hold the OET1_*-OM batches until the other batches have run (and OET1_*-OM batches won't be run on android). A few of the OET1_*-OM batches had already been sent out though. The vast majority of those completed without problems, but a few hit the 'maximum elapsed time exceeded'. Those will just be re-run when we've completed other batches and are only running OET1_*-OM batches.

Seippel

[Mar 5, 2015 6:48:39 PM]