Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 98
Posts: 98   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 19357 times and has 97 replies Next Thread
Psalm103
Cruncher
Joined: Jan 6, 2007
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

I've got two betas that have been going for 17 hours now. (00011_1551 and 00011_1558). Not a single checkpoint yet. It looked at first like they'd finish in about 1.5 hrs each. The Remaining Time went to '---' after about an hour and they hit 100.000% at just over 15 hours of cpu time. Will they time-out eventually? I'll keep them running for now and see what happens. This is a reasonably quick machine at usually comes in at just under the average run time.
(Win 7 x64, 4 cores @ 2.66 GHz, 8 GB RAM @ 1600 MHz)
[Sep 19, 2014 2:16:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

I see that Keith Uplinger has been browsing the thread in the last hour.

Hopefully words of wisdom will be forthcoming shortly! biggrin
----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread
[Sep 19, 2014 2:18:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

should we be aborting the ones stuck at 99.x%? can the staff give us some direction?
----------------------------------------

[Sep 19, 2014 2:22:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

Sorry for the long delay on a response, but we have figured out the root cause for the work units hanging. It is a work unit build problem with some of the input files being improperly formed. These input files were manually changed outside of the build script to change a special character. When more than one special character was encountered in the manual update of them, it changed the length of the line that was expected. Thus it has caused the application to appear stalled. It was technically still working just on data that was a lot longer (1000000x) than normal. We are going to set all the work units currently out there to report as being completed (server_abort).

I have disabled the assimilator and validator for the time being. This will allow for the results to stay in the database longer than normal. I will be reviewing the data that members have returned on Monday for these batches and grant credit if someone hits the resource limit (cpu timeout). I will also see about those that manually aborted them, to see if some partial credit for time spent can be given.

We changed the build script so that manual intervention on removing the special character is not needed. After we clean up from this current beta, we will be sending out proper work units, no time table on that yet.

Thanks,
-Uplinger
[Sep 19, 2014 2:22:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mikefinn
Cruncher
USA
Joined: Apr 27, 2007
Post Count: 43
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

My two beta units do not have an estimated time remaining entry.

<Sniped>

I took a quick look at the stderr.txt file of one of them and it had only one line:

Unable to open checkpoint file starting from 0

I let the work units run all night. When I checked in the morning, my computer was unresponsive to keyboard and mouse and I wound up rebooting. Before reboot, the two work units were at 99.x% with no checkpoint or time remaining entry.

After reboot, the two work units were running from the beginning with 45 minutes of remaining time. But shortly after, the remaining time entry vanished and was replaced by "-- "

I looked at one of the stderr.txt and all it had was:
Unable to open checkpoint file starting from 0
Unable to open checkpoint file starting from 0
Unable to open checkpoint file starting from 0

[Sep 19, 2014 2:26:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

Users can manually abort them if they would like. Work units that had not started on a members computer will abort after we trigger the server abort.

Please wait to manually abort them until we have updated the database, so that additional copies are not sent out.

Thanks,
-Uplinger
[Sep 19, 2014 2:34:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

My window machine aborted 5 Beta WUs after 13:42 hours CPU runtime due to max time being exceeded. They ran for 13:27/13:42 and are claiming 24.0 pts. Here is the error info on one of them:

Result Log

Result Name: BETA_ ugm1_ ugm1_ 00012_ 0950_ 0--

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>
Unable to open checkpoint file starting from 0


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x000007FEFDC73CA2

Engaging BOINC Windows Runtime Debugger...
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 19, 2014 2:48:38 PM]
[Sep 19, 2014 2:46:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

Hit the update button and got the below indicating it's now ok, server aborted, running or not or suspended as in my case.

2691 World Community Grid 9/19/2014 5:06:07 PM Result BETA_ugm1_ugm1_00012_0031_1 is no longer usable
2692 World Community Grid 9/19/2014 5:06:07 PM Result BETA_ugm1_ugm1_00012_0036_1 is no longer usable
2693 World Community Grid 9/19/2014 5:06:07 PM Result BETA_ugm1_ugm1_00012_0029_1 is no longer usable
2694 World Community Grid 9/19/2014 5:06:07 PM Result BETA_ugm1_ugm1_00012_0045_0 is no longer usable
2695 World Community Grid 9/19/2014 5:06:07 PM Result BETA_ugm1_ugm1_00012_0035_1 is no longer usable
2696 World Community Grid 9/19/2014 5:06:07 PM Result BETA_ugm1_ugm1_00012_1146_1 is no longer usable
2697 World Community Grid 9/19/2014 5:06:07 PM Result BETA_ugm1_ugm1_00012_1143_0 is no longer usable
2698 World Community Grid 9/19/2014 5:06:07 PM Result BETA_ugm1_ugm1_00012_1166_0 is no longer usable
[Sep 19, 2014 3:09:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

Yes, we have server aborted any that were still in a state of running. If the work units validated they were not touched, but any that had atleast one wingman in progress got server aborted. You can manually abort work units now if you'd like. I will be working on Monday to grant credit where it is due. Monday should be when the initial deadline is so most results should be in by that point.

Again, we apologize for issues.

Thanks,
-Uplinger
[Sep 19, 2014 3:30:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

And those units that completed successfully but were in PVal state now say "Too Late" - in this case, don't try to read anything into that phrase, it's a known get-out route, see the FAQ.
[Sep 19, 2014 3:33:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 98   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread