Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 31
Posts: 31   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 35025 times and has 30 replies Next Thread
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

Finished 9, several valid, most PVal, started few more by hand and lost 3 on a test system, so will sit waiting for 10 days before No Reply is called (Upgraded from Windows 10 build 10240 to 10586 which zoothingly ** said "all your files will be left alone". What they did not say was that ProgramData does not fall in that class... fully erased. frustrated . The new BOINC install places the data dir in a private place. cool

** Zoot Allures
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Jun 15, 2016 6:53:12 AM]
[Jun 15, 2016 6:51:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1403
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

I got a bunch of those beta's. Did some suspend (LAIM off) and stop/start BOINC. All went fine.

Remark: The T300's as an example are using on Linux 76/143MB memory/virtual memory and on Windows 400/1000MB.
[Jun 15, 2016 6:51:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1403
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

I got a resend from BETA_ HST1_ 004003_ 000065_ KC0014_ T325_ F00089_ S00005_ 0-- that errored after 3.95 hours run time.

[15:34:49] INFO: Completed step 36000 of initial simulation
Writing checkpoint at step 36330.

-------------------------------------------------------
Program projects/www.worldcommunitygrid.org/wcgrid_beta22_gromacs_7, VERSION 4.6.1
Source code file: .\src\mdlib\nsgrid.c, line: 641

Range checking error:
Explanation: During neighborsearching, we assign each particle to a grid
based on its coordinates. If your system contains collisions or parameter
errors that give particles very high velocities you might end up with some
coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot
put these on a grid, so this is usually where we detect those errors.
Make sure your system is properly energy-minimized and that the potential
energy seems reasonable before trying again.
Variable ind has value 33036. It should have been within [ 0 .. 33019 ]

[Jun 15, 2016 7:19:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

I've just realised that one of my machines (i7-4770K, Win10, BOINC 7.6.22) received 8 units of batch 3901 and promptly errored out all of them on download with the above <error_code>-120 (RSA key check failed for file). It subsequently downloaded some from other batches (3903, 3906, 3911, 3914) without a problem; 2 are already in PVal. Hence, it appears not to be a permanent problem with that machine.

Conversely, all of those errored units went on to generate repair jobs that appear successful, like this one (2 copies with the same error, one PVal (normal Result Log), one In Progress. Hence, it appears not to be a permanent problem with those units.

BETA_ HST1_ 003901_ 000009_ AC0010_ T000_ F00091_ S00006_ 3-- Microsoft x64 Edition, (10.00.10586.00) 720 Pending Validation 14/06/16 17:50:17 14/06/16 22:12:34 3.26 96.2 / 0.0
BETA_ HST1_ 003901_ 000009_ AC0010_ T000_ F00091_ S00006_ 2-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) - In Progress 14/06/16 17:48:15 24/06/16 17:48:15 0.00 0.0 / 0.0
BETA_ HST1_ 003901_ 000009_ AC0010_ T000_ F00091_ S00006_ 1-- Microsoft Windows 7 Enterprise x64 Edition, Service Pack 1, (06.01.7601.00) 720 Error 14/06/16 17:47:38 14/06/16 17:50:08 0.00 296.9 / 0.0
BETA_ HST1_ 003901_ 000009_ AC0010_ T000_ F00091_ S00006_ 0-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) 720 Error 14/06/16 17:47:26 14/06/16 17:48:11 0.00 296.9 / 0.0

So what's causing the error? Something intermittent somewhere? On my machine (and several others)? In the CDN?
[Jun 15, 2016 9:31:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

Hypothesis #1 coming up. This was after I also realised that another of my machines successfully downloaded 2 of batch 3901 and didn't suffer the RSA key check fail, but 2 wingmen did for one of them (mine is _2, Valid):

BETA_ HST1_ 003901_ 000093_ AC0011_ T000_ F00078_ S00006_ 3-- Microsoft x64 Edition, (06.02.9200.00) 720 Valid 14/06/16 18:48:42 15/06/16 03:52:03 3.54 97.8 / 91.9
BETA_ HST1_ 003901_ 000093_ AC0011_ T000_ F00078_ S00006_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) 720 Valid 14/06/16 17:48:08 14/06/16 21:45:14 2.13 86.0 / 91.9
BETA_ HST1_ 003901_ 000093_ AC0011_ T000_ F00078_ S00006_ 1-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) 720 Error 14/06/16 17:47:50 14/06/16 18:48:39 0.00 296.9 / 0.0
BETA_ HST1_ 003901_ 000093_ AC0011_ T000_ F00078_ S00006_ 0-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) 720 Error 14/06/16 17:47:39 14/06/16 17:48:01 0.00 296.9 / 0.0

Mine ran on an i5-750, Win 10 Pro, BOINC 7.6.22, but otherwise fairly similarly configured to the 4770K that had the RSA key check fail. So what differences exist that might affect BOINC? Avast Free Antivirus is running on both of them, but I wondered whether I had exactly the same configuration. No, I didn't. The i5-750 had an exclusion on the BOINC program data folder, the 4770K didn't. Hmmm.

My Avast program version is 11.2.2262 (up to date). Exclusions are entered thusly: from its home screen, Settings (gear icon), General tab, Exclusions, File paths, enter something like "C:\ProgramData\BOINC\*" without the "" (wherever your BOINC program data folder is), Enter, OK.

I've no idea whether or how a virus check on download could cause the RSA key check fail, and why only for some workunits, so this is grabbing at straws. So what about you others that have suffered the RSA key check fail? Do those machines have a scan exclusion on the BOINC program data folder? Are they running Avast? For completeness, I also ought to ask if anyone running Avast without an exclusion succeeded in downloading batch 3901?
[Jun 15, 2016 11:40:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

Have Avast on all machines, with different clients, 7.6.2, 22, 33. All have a scan exclusion. This manisfestation is since May 28, for first the -119 showing up in reports and somewhat later the -120, of latter having had one, witb emphasis of the scan exlusion having set, since ages.

Edit exclusion, not elusion biggrin
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Jun 15, 2016 12:35:04 PM]
[Jun 15, 2016 12:33:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

Error after 13.7 hours
Writing checkpoint at step 83090.
SIGSEGV: segmentation violation
Stack trace (12 frames):
[0x2094e4d]
[0x20876a0]
[0x97b7ed]
[0x75a1cf]
[0x4db05b]
[0x55c684]
[0x4115ec]
[0x420770]
[0x4141a6]
[0x47ed1b]
[0x211b1cb]
[0x400469]

Exiting...

Although the host has enough available RAM.
At this time, the computation of the 5 other WUs (on the same host) is still in progress, after 15+ hours for a duration forecast or 10.21 hours.
The duration forecast is still meaningless.
Cheers,
Yves
----------------------------------------
[Jun 15, 2016 3:19:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
marist_university
Advanced Cruncher
USA
Joined: Mar 30, 2005
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

Picked up 242 of these, no errors so far - either IP, PV, or V.
----------------------------------------

[Jun 15, 2016 4:16:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 823
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

Tonyh205 and I both have a machine with <error_code>-120 (RSA key check failed for file) and that same machine then later successfully ran other Betas. Nice to see that isn't only me.

And Tonyh205 also mentioned:
Conversely, all of those errored units went on to generate repair jobs that appear successful, like this one (2 copies with the same error, one PVal (normal Result Log), one In Progress. Hence, it appears not to be a permanent problem with those units.
which I then rechecked the ones that errored with <error_code>-120 and found the same thing: other computers were able to run it successfully.
----------------------------------------

[Jun 16, 2016 12:53:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
andgra
Senior Cruncher
Sweden
Joined: Mar 15, 2014
Post Count: 195
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Test for Help Stop TB - June 14, 2016 [Issues Thread]

Picked up quite a few which run fine on various machines.
One machine failed all and the difference is that it is 32-bit Win on this one, all other 64-bit.
(unknown error) - exit code -1073741819 (0xc0000005) on all 3.

[04:01:27] INFO: Finished initial simulation.
[04:01:27] INFO: Running secondary simulation


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7728291F read attempt to address 0x770DFFDE

Could there be a problem with the 32-bit version?
----------------------------------------
/andgra



[Jun 16, 2016 1:53:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 31   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread