| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 31
|
|
| Author |
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Finished 9, several valid, most PVal, started few more by hand and lost 3 on a test system, so will sit waiting for 10 days before No Reply is called (Upgraded from Windows 10 build 10240 to 10586 which zoothingly ** said "all your files will be left alone". What they did not say was that ProgramData does not fall in that class... fully erased.
---------------------------------------- . The new BOINC install places the data dir in a private place. ![]() ** Zoot Allures [Edit 1 times, last edit by SekeRob* at Jun 15, 2016 6:53:12 AM] |
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1403 Status: Offline Project Badges:
|
I got a bunch of those beta's. Did some suspend (LAIM off) and stop/start BOINC. All went fine.
Remark: The T300's as an example are using on Linux 76/143MB memory/virtual memory and on Windows 400/1000MB. |
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1403 Status: Offline Project Badges:
|
I got a resend from BETA_ HST1_ 004003_ 000065_ KC0014_ T325_ F00089_ S00005_ 0-- that errored after 3.95 hours run time.
[15:34:49] INFO: Completed step 36000 of initial simulation Writing checkpoint at step 36330. ------------------------------------------------------- Program projects/www.worldcommunitygrid.org/wcgrid_beta22_gromacs_7, VERSION 4.6.1 Source code file: .\src\mdlib\nsgrid.c, line: 641 Range checking error: Explanation: During neighborsearching, we assign each particle to a grid based on its coordinates. If your system contains collisions or parameter errors that give particles very high velocities you might end up with some coordinates being +-Infinity or NaN (not-a-number). Obviously, we cannot put these on a grid, so this is usually where we detect those errors. Make sure your system is properly energy-minimized and that the potential energy seems reasonable before trying again. Variable ind has value 33036. It should have been within [ 0 .. 33019 ] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've just realised that one of my machines (i7-4770K, Win10, BOINC 7.6.22) received 8 units of batch 3901 and promptly errored out all of them on download with the above <error_code>-120 (RSA key check failed for file). It subsequently downloaded some from other batches (3903, 3906, 3911, 3914) without a problem; 2 are already in PVal. Hence, it appears not to be a permanent problem with that machine.
Conversely, all of those errored units went on to generate repair jobs that appear successful, like this one (2 copies with the same error, one PVal (normal Result Log), one In Progress. Hence, it appears not to be a permanent problem with those units. BETA_ HST1_ 003901_ 000009_ AC0010_ T000_ F00091_ S00006_ 3-- Microsoft x64 Edition, (10.00.10586.00) 720 Pending Validation 14/06/16 17:50:17 14/06/16 22:12:34 3.26 96.2 / 0.0 BETA_ HST1_ 003901_ 000009_ AC0010_ T000_ F00091_ S00006_ 2-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) - In Progress 14/06/16 17:48:15 24/06/16 17:48:15 0.00 0.0 / 0.0 BETA_ HST1_ 003901_ 000009_ AC0010_ T000_ F00091_ S00006_ 1-- Microsoft Windows 7 Enterprise x64 Edition, Service Pack 1, (06.01.7601.00) 720 Error 14/06/16 17:47:38 14/06/16 17:50:08 0.00 296.9 / 0.0 BETA_ HST1_ 003901_ 000009_ AC0010_ T000_ F00091_ S00006_ 0-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) 720 Error 14/06/16 17:47:26 14/06/16 17:48:11 0.00 296.9 / 0.0 So what's causing the error? Something intermittent somewhere? On my machine (and several others)? In the CDN? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hypothesis #1 coming up. This was after I also realised that another of my machines successfully downloaded 2 of batch 3901 and didn't suffer the RSA key check fail, but 2 wingmen did for one of them (mine is _2, Valid):
BETA_ HST1_ 003901_ 000093_ AC0011_ T000_ F00078_ S00006_ 3-- Microsoft x64 Edition, (06.02.9200.00) 720 Valid 14/06/16 18:48:42 15/06/16 03:52:03 3.54 97.8 / 91.9 BETA_ HST1_ 003901_ 000093_ AC0011_ T000_ F00078_ S00006_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) 720 Valid 14/06/16 17:48:08 14/06/16 21:45:14 2.13 86.0 / 91.9 BETA_ HST1_ 003901_ 000093_ AC0011_ T000_ F00078_ S00006_ 1-- Microsoft Windows 10 Core x64 Edition, (10.00.10586.00) 720 Error 14/06/16 17:47:50 14/06/16 18:48:39 0.00 296.9 / 0.0 BETA_ HST1_ 003901_ 000093_ AC0011_ T000_ F00078_ S00006_ 0-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) 720 Error 14/06/16 17:47:39 14/06/16 17:48:01 0.00 296.9 / 0.0 Mine ran on an i5-750, Win 10 Pro, BOINC 7.6.22, but otherwise fairly similarly configured to the 4770K that had the RSA key check fail. So what differences exist that might affect BOINC? Avast Free Antivirus is running on both of them, but I wondered whether I had exactly the same configuration. No, I didn't. The i5-750 had an exclusion on the BOINC program data folder, the 4770K didn't. Hmmm. My Avast program version is 11.2.2262 (up to date). Exclusions are entered thusly: from its home screen, Settings (gear icon), General tab, Exclusions, File paths, enter something like "C:\ProgramData\BOINC\*" without the "" (wherever your BOINC program data folder is), Enter, OK. I've no idea whether or how a virus check on download could cause the RSA key check fail, and why only for some workunits, so this is grabbing at straws. So what about you others that have suffered the RSA key check fail? Do those machines have a scan exclusion on the BOINC program data folder? Are they running Avast? For completeness, I also ought to ask if anyone running Avast without an exclusion succeeded in downloading batch 3901? |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Have Avast on all machines, with different clients, 7.6.2, 22, 33. All have a scan exclusion. This manisfestation is since May 28, for first the -119 showing up in reports and somewhat later the -120, of latter having had one, witb emphasis of the scan exlusion having set, since ages.
----------------------------------------Edit exclusion, not elusion ![]() [Edit 1 times, last edit by SekeRob* at Jun 15, 2016 12:35:04 PM] |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Error after 13.7 hours
----------------------------------------Writing checkpoint at step 83090. Although the host has enough available RAM. At this time, the computation of the 5 other WUs (on the same host) is still in progress, after 15+ hours for a duration forecast or 10.21 hours. The duration forecast is still meaningless. Cheers, Yves |
||
|
|
marist_university
Advanced Cruncher USA Joined: Mar 30, 2005 Post Count: 107 Status: Offline Project Badges:
|
Picked up 242 of these, no errors so far - either IP, PV, or V.
----------------------------------------![]() |
||
|
|
Seoulpowergrid
Veteran Cruncher Joined: Apr 12, 2013 Post Count: 823 Status: Offline Project Badges:
|
Tonyh205 and I both have a machine with <error_code>-120 (RSA key check failed for file) and that same machine then later successfully ran other Betas. Nice to see that isn't only me.
----------------------------------------And Tonyh205 also mentioned: Conversely, all of those errored units went on to generate repair jobs that appear successful, like this one (2 copies with the same error, one PVal (normal Result Log), one In Progress. Hence, it appears not to be a permanent problem with those units. which I then rechecked the ones that errored with <error_code>-120 and found the same thing: other computers were able to run it successfully.![]() |
||
|
|
andgra
Senior Cruncher Sweden Joined: Mar 15, 2014 Post Count: 195 Status: Offline Project Badges:
|
Picked up quite a few which run fine on various machines.
----------------------------------------One machine failed all and the difference is that it is 32-bit Win on this one, all other 64-bit. (unknown error) - exit code -1073741819 (0xc0000005) on all 3. [04:01:27] INFO: Finished initial simulation. [04:01:27] INFO: Running secondary simulation Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7728291F read attempt to address 0x770DFFDE Could there be a problem with the 32-bit version?
/andgra
![]() |
||
|
|
|