| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 9
|
|
| Author |
|
|
cqexbesd
Cruncher Joined: Oct 13, 2008 Post Count: 14 Status: Offline Project Badges:
|
About 2 weeks back my 32 bit Ubuntu 10.04 system started failing virtually all WCG tasks. It's not quite all - currently a CFSW task seems to be running fine - but most tasks from a variety of projects fail instantly (its previously been running successfully for many months). The error is always the same:
<core_client_version>6.10.17</core_client_version> Using strace I see the task (wcgrid_cep2_6.40_i686-pc-linux-gnu in this case) reads and writes from a pipe to its parent a bit then makes lots (16K or so) of: old_mmap(0xbf5fb000, 20480, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0x955580009380108) = -1 EFAULT (Bad address) calls which all fail with EFAULT before it seems to give up and signal its parent. Any ideas? I would suspect some software update or other but no idea what. I have disabled AV software. |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Have you tried a reboot ?
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
cqexbesd
Cruncher Joined: Oct 13, 2008 Post Count: 14 Status: Offline Project Badges:
|
Yes, rebooted many times.
Thanks |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Suggest you do a project reset, to force re-download of all the science app components.
The latest discussion of this error was in 2009... 3 years ago: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,27915 Maybe it will give you more pointers where to look. |
||
|
|
cqexbesd
Cruncher Joined: Oct 13, 2008 Post Count: 14 Status: Offline Project Badges:
|
Suggest you do a project reset, to force re-download of all the science app components. Thanks but unfortunately that didn't help. Same symptoms as before. The latest discussion of this error was in 2009... 3 years ago: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,27915 Maybe it will give you more pointers where to look. I have seen that thread. My laptop isn't overclocked. I can't easily try upgrading Ubuntu atm as its a work machine so I upgrade when I'm told ;-) WCG was working on here before though and there have been no major updates recently (though there are regular minor updates that may be the cause). I guess I could go through each library it has linked to and see if any of those have changed then try downgrading each in turn to see if it helps... Thanks! |
||
|
|
cqexbesd
Cruncher Joined: Oct 13, 2008 Post Count: 14 Status: Offline Project Badges:
|
I guess I could go through each library it has linked to and see if any of those have changed then try downgrading each in turn to see if it helps... Turns out it is statically linked so that got me nowhere. I'll try building my own BOINC client so I can get the latest code - I don't know a good reason why that would help but I am several versions behind... |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Since an EFAULT error is a bad address, maybe try memtest to see if there is a hardware failure in memory. Just a thought.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
cqexbesd
Cruncher Joined: Oct 13, 2008 Post Count: 14 Status: Offline Project Badges:
|
Since an EFAULT error is a bad address, maybe try memtest to see if there is a hardware failure in memory. Just a thought. Thanks, I'll give that a go but I'm fairly sure its the address passed to mmap that isn't valid and causes the mmap to fail. If it was dodgy memory its more likely the mmap would succeed but the program would segfault later. Still I have no other ideas atm so it can't hurt! Thanks |
||
|
|
cqexbesd
Cruncher Joined: Oct 13, 2008 Post Count: 14 Status: Offline Project Badges:
|
Since an EFAULT error is a bad address, maybe try memtest to see if there is a hardware failure in memory. Just a thought. Thanks, I'll give that a go but I'm fairly sure its the address passed to mmap that isn't valid and causes the mmap to fail. I ran memtest for more than 14 hours without it detecting any failures. I'm fairly sure this is a software problem of some sort...but that's about the end of my insight... |
||
|
|
|