Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 28
Posts: 28   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6862 times and has 27 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
HSTB units all erroring

Good news: Several units unexpectedly appeared at 17:03 UTC smile
Bad news: lots of them errored within a few minutes (same error on all copies so far) crying

Project Name: Help Stop TB
Created: 02/15/2017 17:03:04
Name: HST1_006502_000080_KC0002_T350_F00024_S00007
Minimum Quorum: 2
Replication: 2

Result Name OS type OS version App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
HST1_ 006502_ 000080_ KC0002_ T350_ F00024_ S00007_ 3-- Microsoft Windows 10 Core x64 Edition, (10.00.14393.00) - In Progress 2/15/17 17:07:27 2/19/17 05:07:27 0.00 0.0 / 0.0
HST1_ 006502_ 000080_ KC0002_ T350_ F00024_ S00007_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.14393.00) 726 Error 2/15/17 17:05:20 2/15/17 17:07:25 0.00 0.0 / 0.0
HST1_ 006502_ 000080_ KC0002_ T350_ F00024_ S00007_ 0-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) 726 Error 2/15/17 17:03:09 2/15/17 17:05:18 0.00 0.1 / 0.0
HST1_ 006502_ 000080_ KC0002_ T350_ F00024_ S00007_ 1-- Microsoft Windows 8.1 Core x64 Edition, (06.03.9600.00) - In Progress 2/15/17 17:03:09 2/19/17 05:03:09 0.00 0.0 / 0.0

Result Name: HST1_ 006502_ 000080_ KC0002_ T350_ F00024_ S00007_ 0--
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
INFO: result number = 0
INFO: No state to restore. Start from the beginning.
[11:03:52] INFO: Running initial simulation
Unhandled Exception Detected...

Result Name: HST1_ 006502_ 000080_ KC0002_ T350_ F00024_ S00007_ 2--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
INFO: result number = 2
INFO: No state to restore. Start from the beginning.
[17:05:32] INFO: Running initial simulation
Unhandled Exception Detected...

P.S. All the erroring units I saw were batches like HST1_ 00650x; by comparison, some more appeared at 17:33 UTC like HST1_ 0074xx and are running ok so far.
----------------------------------------
[Edit 3 times, last edit by Former Member at Feb 15, 2017 5:53:38 PM]
[Feb 15, 2017 5:47:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HSTB units all erroring

Good news: Several units unexpectedly appeared at 17:03 UTC smile
Bad news: lots of them errored within a few minutes (same error on all copies so far) crying
(...snip-snip...)
HST1_ 006502_ 000080_ KC0002_ T350_ F00024_ S00007_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.14393.00) 726 Error 2/15/17 17:05:20 2/15/17 17:07:25 0.00 0.0 / 0.0
HST1_ 006502_ 000080_ KC0002_ T350_ F00024_ S00007_ 0-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) 726 Error 2/15/17 17:03:09 2/15/17 17:05:18 0.00 0.1 / 0.0
(Emphasis mine)

--Start nitpicking mode--
Actually, this is what happened:
- the server sent out WU _0 at 17:03:09, then client A started the WU right away and it errored out right away, the result was uploaded to the server right away and two minutes later the WU was reported back to the server by client A;
- the server acknowledged WU _0 at 17:05:18;
- the server sent out WU _2 at 17:05:20 (two seconds later), then client C started the WU right away and it errored out right away, the result was uploaded to the server right away and two minutes later the WU was reported back to the server by client C;
- the server acknowledged WU _2 at 17:07:25.

Recapping, you'll see a time-lapse of (at least) two minutes:
- client receives WU from the server;
- client starts executing WU;
- WU finishes;
- client uploads result back to server, WU is set to "Ready to report" on the client;
- client has to wait 2 minutes before "Ready to report"-workunits can be reported(*);
- two minutes later: the client reports the "Ready to report"-workunits to the server.
  • Don't know for sure if this is true on Android.

    Example:
    Result Name                  OS       AVN  Status             Sent Time     Due / Return Time CPUh  Claimed/Granted
    SCC1_0000088_Bct-C_35834_3-- Android 708 Error 2/13/17 15:08:16 2/13/17 15:10:24 0.00 51.2/0.0
    --End of nitpicking mode-- biggrin
  • [Feb 15, 2017 7:07:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    uplinger
    Former World Community Grid Tech
    Joined: May 23, 2005
    Post Count: 3952
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: HSTB units all erroring

    Tony,

    Thanks for the report. I am monitoring the results in progress right now. We submitted results that had been in error on the grid in the past. Once I collect all the data, I will be sending the data to the researchers.

    Thanks,
    -Uplinger
    [Feb 15, 2017 7:16:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    Former Member
    Cruncher
    Joined: May 22, 2018
    Post Count: 0
    Status: Offline
    Reply to this Post  Reply with Quote 
    Re: HSTB units all erroring

    @adriverhoef
    --Start nitpicking mode--
    Actually, this is what happened:
    ...
    --End of nitpicking mode-- biggrin
    Too true, mine was the "I'm in a hurry" version tongue
    [Feb 16, 2017 8:22:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
    Former Member
    Cruncher
    Joined: May 22, 2018
    Post Count: 0
    Status: Offline
    Reply to this Post  Reply with Quote 
    Re: HSTB units all erroring

    Picked up a few repair jobs from yesterday's 17:03 UTC batch. All copies failing, like this one.

    HST1_ 006506_ 000098_ MT0011_ T325_ F00009_ S00007_ 4-- Microsoft Windows 10 Core x64 Edition, (10.00.14393.00) 726 Error 2/16/17 10:45:21 2/16/17 11:18:14 0.00 0.0 / 0.0
    HST1_ 006506_ 000098_ MT0011_ T325_ F00009_ S00007_ 3-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) - In Progress 2/16/17 01:18:37 2/19/17 13:18:37 0.00 0.0 / 0.0
    HST1_ 006506_ 000098_ MT0011_ T325_ F00009_ S00007_ 2-- Microsoft Windows 8.1 x64 Edition, (06.03.9600.00) 726 Error 2/15/17 22:06:10 2/16/17 01:18:36 0.00 0.0 / 0.0
    HST1_ 006506_ 000098_ MT0011_ T325_ F00009_ S00007_ 1-- Microsoft Windows 7 Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) 726 Error 2/15/17 17:03:12 2/16/17 10:45:19 0.00 0.0 / 0.0
    HST1_ 006506_ 000098_ MT0011_ T325_ F00009_ S00007_ 0-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) 726 Error 2/15/17 17:03:11 2/15/17 22:06:09 0.00 0.0 / 0.0
    [Feb 16, 2017 12:26:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    xroule
    Senior Cruncher
    Joined: Nov 16, 2004
    Post Count: 284
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    How do you explain this

    Workunit Status



    Project Name:

    Help Stop TB
    Created:

    03/06/2017 14:03:07
    Name:

    HST1_009007_000005_KT0012_T325_F00051_S00005
    Minimum Quorum:

    2
    Replication:

    3



    Result Name
    OS type
    OS version
    App Version Number
    Status
    Sent Time
    Time Due /
     Return Time
    CPU Time / Elapsed Time (hours)
    Claimed/ Granted BOINC Credit
    HST1_ 009007_ 000005_ KT0012_ T325_ F00051_ S00005_ 4--
    Microsoft
    x64 Edition, (06.02.9200.00)
    726
    Too Late
    3/16/17 14:05:05
    3/17/17 12:32:01
    11.89
    411.0 / 411.0
    HST1_ 009007_ 000005_ KT0012_ T325_ F00051_ S00005_ 3--
    Microsoft Windows 10
    Core x64 Edition, (10.00.14393.00)
    726
    Too Late
    3/6/17 15:19:24
    3/7/17 20:26:39
    12.86
    118.0 / 118.0
    HST1_ 009007_ 000005_ KT0012_ T325_ F00051_ S00005_ 2--
    Microsoft Windows 8.1
    Enterprise x64 Edition, (06.03.9600.00)
    726
    Error
    3/6/17 15:12:47
    3/6/17 15:19:21
    0.00
    229.7 / 0.0
    HST1_ 009007_ 000005_ KT0012_ T325_ F00051_ S00005_ 0--
    Microsoft Windows 8.1
    x64 Edition, (06.03.9600.00)
    726
    Error
    3/6/17 14:04:58
    3/6/17 15:12:46
    0.00
    229.6 / 0.0
    HST1_ 009007_ 000005_ KT0012_ T325_ F00051_ S00005_ 1--
    Microsoft Windows 7
    x64 Edition, Service Pack 1, (06.01.7601.00)
    -
    No Reply
    3/6/17 14:04:58
    3/16/17 14:04:58
    0.00
    0.0 / 0.0

    Rej
    ----------------------------------------


    THE ).( IS NAKED!

    Do you know what your PC is doing ???
    [Mar 18, 2017 12:32:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    Former Member
    Cruncher
    Joined: May 22, 2018
    Post Count: 0
    Status: Offline
    Reply to this Post  Reply with Quote 
    Re: How do you explain this

    Assuming it's the Too Late designation that you're querying, see the FAQ BOINC: Results Status page - What does xyz status mean? for an alternative / illogical use of the status Too Late.
    [Mar 18, 2017 1:23:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    AnRM
    Advanced Cruncher
    Canada
    Joined: Nov 17, 2004
    Post Count: 102
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: How do you explain this

    Seeing the same problems...tasks start and immediately error out!!
    [Mar 18, 2017 6:43:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    gb009761
    Master Cruncher
    Scotland
    Joined: Apr 6, 2005
    Post Count: 3010
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: How do you explain this

    AnRM, can you please expand a little...

    The issues which xroule has reported, look as though they were returned 'Too Late' (i.e., it looks as though those WU's were sent on the 6th of March and returned 10 days later - the 16th, thus, making them "Too Late" - if the respective repair WU has already been returned and validated).

    Therefore, please list out the WU's you're referring to, along with any error messages you're seeing.
    ----------------------------------------

    [Mar 18, 2017 8:47:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    Former Member
    Cruncher
    Joined: May 22, 2018
    Post Count: 0
    Status: Offline
    Reply to this Post  Reply with Quote 
    Re: How do you explain this

    Tony has the stick at the right end... It's a _4 copy, the 5th and final. 10 days after is UNCONNECTED to the original creation date of the task, which can be even days or more before the _0 is even circulated. Repairs for HSTb for whatever reason are also sent out with 10 days deadline at that i.e. technically for sure well in time.
    ----------------------------------------
    [Edit 1 times, last edit by Former Member at Mar 18, 2017 9:00:45 PM]
    [Mar 18, 2017 8:58:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    Posts: 28   Pages: 3   [ 1 2 3 | Next Page ]
    [ Jump to Last Post ]
    Post new Thread