| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 14
|
|
| Author |
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
I hit this pretty hard over the weekend for 12+ hours, trying lots of process of elimination to eliminate the issue or to isolate it. Tried things like:
----------------------------------------1. Add the standard user account to boinc_users group. Reboot. 2. Add the standard user account to boinc_admins group. Reboot. 3. Add the standard user account to both of the above groups. Reboot. 3. RDP from another machine to see if this was a Remote Desktop issue. Reboot. 4. Reinstall BOINC as service but so not all users can control it. Reboot. 5. Bunch of other stuff that I don't remember. Tons of rebooting. I gave up and basically reinstalled BOINC *not* as a service, and the issue went away. If there's any value still, I think in order to recreate the issue BOINC needs to: 1. Be installed as a service. 2. Log in to Windows as a standard user (i.e., not a local administrator). You'll see that boinc.exe is running under the boinc_master service account, and the WCG binaries all run under the boinc_project service account, presumably for more security. 3. Must be a Microbiome Immunity Project work unit that's almost done. (Then click "Show Graphics"). 4. Must wait for a MIP1 work unit to reach 100% completion. 5. Observe the domino effect of errors of the next MIP1 work units. Each one will error out, boom, boom, boom, boom. a. If you do nothing, about 25 work units will error out while the completed work unit graphics window closes itself after about 10 seconds. b. If you manually close the graphics window of the work unit that just finished, the errors stop. In other words, having a graphics window open when a WU hits 100% places a "block" or "lock" on other work units for read or write access. I still think this is a bug related to MIP1 graphics permissions blocking write access to subsequent MIP1 tasks when BOINC is installed in Windows as a service and a Standard User is executing boincmgr.exe, boinctray.exe, and the MIP graphics. But the use case to recreate the error is so small. @Uplinger, if you have the time, I think I narrowed down how to recreate the bug, but I have no clue how to fix it. Thanks for your time if you look into this further. :)
[Edit 5 times, last edit by hchc at May 29, 2019 1:43:03 PM] |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
hchc,
Thank you very much for your efforts in trying to track down the issue. From the errors that you show, it has to do with the shared memory that is set up between science application and the graphics application. We have done testing on mip1 only using boinc installed as a service on windows 10 as well. However, we have not been able to recreate the issue. I will try to take a look at the code to see if there is a possible race condition that is being met on your machine but just not triggered on ours. Thanks again for your help! -Uplinger |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
Uplinger,
----------------------------------------Thank you sir! In your testing, are you logging into Windows with a non-Admin account? That could be the difference. Since you mention shared memory, I forgot to mention: In the Windows Defender Exploit Protection settings, the only thing I changed was set the "Mandatory ASLR" to be "On by default." I know some apps can crash with Mandatory ASLR, but they usually crash right away, so it's probably not the root cause. I also have DEP turned on for "all programs and services" instead of just Windows. This setting is in Advanced System Settings --> Performance --> Data Execution Prevention tab. Edit: I'm overthinking this. If it's shared memory between graphics and science application, maybe it's because the graphics application is run using the currently logged in Standard User account and the science application is running under the "boinc_project" account, and the accounts don't play well with shared memory.
[Edit 3 times, last edit by hchc at May 29, 2019 3:29:33 PM] |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
We will give those a shot.
Your edit comment about the standard user and boinc_project not playing nice would be an issue for all projects then in my opinion. My current thinking is that when the current workunit finishes, it calls boinc_finish which is supposed to trigger graphics to close. Something in that is not working properly, I've looked through the graphics code and didn't see anything that stood out between the zika graphics and mip1 graphics that would cause this. Thanks, -Uplinger |
||
|
|
|