| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 18
|
|
| Author |
|
|
Luke081515
Cruncher Joined: Apr 16, 2017 Post Count: 8 Status: Offline Project Badges:
|
Normally I would not worry about the expected runtime, but if I take a look at the results, this is a clear. Before the crash, I had this statistics (just for this instance):
https://www.dropbox.com/s/ta46rhxccac3kjo/before.PNG?dl=0 After the crash, and still now, I get this: https://www.dropbox.com/s/xl6m89uy1101kkz/after.PNG?dl=0 There is no strong load on the host, if I turn BOINC off, I have a load less then 4 on xenial (instance has 16 Cores). |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Please post a screenshot of task manager from the "processes" tab. That may tell us something.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Luke081515
Cruncher Joined: Apr 16, 2017 Post Count: 8 Status: Offline Project Badges:
|
I tried another way too: apt-remove both, apt-purge, and then only install the client, and run it via boinccmd. I did this two days ago, it causes a lot of load, but in fact the whole instance returned no results since 02.06.17. Boinccmd only works fine on my second server: it has 4 cores and a constant load of 3.00
I took two screenshots: This is with boinc running (high load, since it is via boinccmd only, normally the load would be between 12 and 14 otherwise): https://www.dropbox.com/s/nti91q63mj3cjll/Boinc-running.PNG?dl=0 This screenshot was made after 15 minutes of suspending boinc: https://www.dropbox.com/s/4e53h76hk0m4qad/Boinc-paused.PNG?dl=0 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I see several anomalies in the first screen shot.
----------------------------------------1. I count 17 WUs started on what looks like a 16 core machine 2. Processor 10 is dead for some reason which means only 15 can run 3. /usr/bin/boinc is using too much CPU time for some reason which will impact the other work units. Second screen shot: there about 5 tasks that are using 100% CPU including /usr/bin/boinc (that task shouldn't use that much). That means when you resume, the BOINC tasks will be impacted by those other tasks which extends the runtime. [Edit 1 times, last edit by Doneske at Jun 8, 2017 2:04:35 AM] |
||
|
|
Luke081515
Cruncher Joined: Apr 16, 2017 Post Count: 8 Status: Offline Project Badges:
|
Concerning:
1) As I ran boinc with GUI, I configured it to do only 12 tasks at once, this returned no results too. 2) It's not really dead, if you take a look at htop, at the most time, one CPU is not running all the time, but not the same one. But there are also moments where all CPUs are at 100% 3) Any idea how can I fix that? I already tried reinstalling it several times. Concerning screenshot two: That can happen for a second, but mostly the instance has nearly no load. Currently it has 0.03 0.07 0.02. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I'm still not sure there isn't something wrong with the machine. Your htop screen shots don't look like mine. I ran htop for 15 minutes and it never showed htop using 100% of the CPU. Do you know why the machine crashed? I had one issue with Linux where the 4.8 kernel didn't work well with one model of Xeon processor. It wouldn't dispatch work on the CPU correctly but was only noticeable because WGC tasks use 100% of the processors and they were only using about 30%. I just continued to run on the 4.4 kernel for a while. Finally, a firmware update came out along with the 4.8-52 version of the kernel. After that it was back to normal again. I suspect somewhere in the earlier 4.8 kernel builds some firmware was dropped and then finally put back in the later builds. This was also about the time Intel was submitting updates for Kabylake. Since this problem has cropped up after a crash, I suspect there may be some residual effects somewhere.
|
||
|
|
Luke081515
Cruncher Joined: Apr 16, 2017 Post Count: 8 Status: Offline Project Badges:
|
Hm. I've reinstalled the server, now boinc (via boinccmd) and other are consuming a load about 14, and returning likely the same amount of results as before (I can't say it does, since boinc is not running more than 24 hours yet). What a strange behaviour...
|
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Maybe SELinux or whatever security software is excessively quizzing the BOINC processes. BOINC needs localhost IP 127.0.1.1 and port 31416 (and an assorted number of other IP ports to run it's RPC (yes local too). In amongst this is the in-between of the core client/daemon and the BOINC Manager GUI. At any rate, I've no BOINC issues on Ubuntu
----------------------------------------PS on well configured system, the core client (boinc) takes but few minutes a day to do it's business. [Edit 2 times, last edit by SekeRob* at Jun 16, 2017 2:24:58 PM] |
||
|
|
|