| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 8
|
|
| Author |
|
|
jonathandl
Advanced Cruncher Joined: Nov 12, 2007 Post Count: 106 Status: Offline Project Badges:
|
I have been running "htop" lately, in "tree view" (F5 in "htop,") to look at the processes running on my Linux box. Normally, when the main boinc client starts a science application, the process name of the science application is long. Simplified example of the usual output:
----------------------------------------PID Command But today, I saw it do something like this a couple of times. PID Command Does this mean that something got corrupted in BOINC, and I should either reset the project or restart my computer? Or is it safe just to let everything continue to run (including the new single-validation/zero-redundancy OpenPandemics jobs)? [Edit 1 times, last edit by jonathandl at Jun 4, 2020 4:44:39 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
You don't reset unless you see such warnings in the event log.
Why the deviated output, don't know. Maybe the task was paused, but again, that's what you would see in BOINC manager. If you're running headless, the event log is written to the stdoutdae.txt file. There's an all in one boinc manager in text format especially written for terminal called boinctui. https://zoomadmin.com/HowToInstall/UbuntuPackage/boinctui. A screenshot ![]() |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
I'm a fair beginner to Linux, but wow never knew htop had a Tree view, but my mind is blown now -- very useful!
----------------------------------------I wouldn't worry too much, honestly. I've seen similar things in certain versions of Windows 10 Task Manager, where some tasks are grouped under BOINC and some are not. If you're looking for a deeper, more technical root cause of why you're seeing that, maybe one of the *nix wizards will post in this thread. I personally wouldn't worry at all unless the Event Log shows errors or warnings or multiple tasks start error'ing out.
[Edit 2 times, last edit by hchc at Jun 4, 2020 5:50:53 AM] |
||
|
|
jonathandl
Advanced Cruncher Joined: Nov 12, 2007 Post Count: 106 Status: Offline Project Badges:
|
You don't reset unless you see such warnings in the event log. What kind of warning in the event log merits a reset? I used to reset a lot more, but now I try not to do it at all. In any case, I have a new event-log problem with the GUI (official) BOINC manager. (And I prefer to use the official software even though HTOP be "unofficial.") Just recently, I got about 6 jobs of which I noticed that OPN1_0001555_01743_0 is a job waiting for me to validate somebody else's result. I decided that I want to do that one first, so in BOINC Manager, I suspended the Project, then I suspended all jobs except for OPN1_0001555_01743. Then I resumed the project, but guess what? First, when I scroll all the way down in the event log viewer, my screen flickers rapidly, almost as if my mouse were fighting something. And, the event log says "task MCM1_0163967_9483_0 resumed by user." I look in the Tasks tab of the BOINC Manager GUI, and it does show that OPN1_0001555_01743 is running and MCM1_0163967_9483_0 is suspended, but the event log clearly said that MCM1 _0163967_9483_0 is the one that was resumed. So, I shutdown my computer and powered it back on. Now the correct Task is showing as running in the BOINC manager, as before. But the Event log still exhibits the rapid "flickering" or "shaking" behavior when I scroll all the way to the bottom! And when using "systemctl" to get the last few lines of the log in a Terminal window, there is no indication of the current, correct task OPN_0001555_017431 as there should be. systemctl status boinc-client ● boinc-client.service - Berkeley Open Infrastructure Network Computing Client Loaded: loaded (/lib/systemd/system/boinc-client.service; disabled; vendor preset: enabled) Active: active (running) since Thu 2020-06-04 09:19:13 EDT; 6h ago Docs: man:boinc(1) Main PID: 1490 (boinc) Tasks: 4 (limit: 4915) Memory: 139.1M CGroup: /system.slice/boinc-client.service ├─1490 /usr/bin/boinc └─1746 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.17_x86_64-pc-linux-gnu -jobs OPN1_0001555_01743.job -input OPN1_0001555_01743.zip -seed 803319668 -wcgruns 34 -wcgdpf 1 Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] don't use GPU while active Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] suspend work if non-BOINC CPU load exceeds 88% Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] (to change preferences, visit a project web site or select Preferences in the Manager) Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] Setting up project and slot directories Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] Checking active tasks Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 5852410; resource share 120 Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] Setting up GUI RPC socket Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] gui_rpc_auth.cfg is empty - no GUI RPC password protection Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] Checking presence of 174 project files Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 Initialization completed If WC Grid staff are reading this, the name of the task in trouble is OPN1_0001555_01743. Since it's just to validate work already done by somebody else, there is not too much cause for concern, but if this were to happen to a single-validation unit then maybe I really should reset? [Edit 3 times, last edit by jonathandl at Jun 4, 2020 8:44:45 PM] |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
On the flickering question. If it is the flickering with which I am familiar, it means your video card or onboard video is not keeping up with your scroll action. I have not seen this with BOINC, but have experienced it with a few wide spreadsheets with 100,000 plus rows while scrolling.
----------------------------------------I suppose it could also be an inadequate refresh rate in your monitor. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
You don't reset unless you see such warnings in the event log. The warning text in the event log is "If this happens repeatedly you may need to reset the project."... If WC Grid staff are reading this, the name of the task in trouble is OPN1_0001555_01743. Since it's just to validate work already done by somebody else, there is not too much cause for concern, but if this were to happen to a single-validation unit then maybe I really should reset? There's enough QA checks for the system to know that if the single, zero redundant result, is iffy, a second copy is send out and your copy will be marked "Pending Verification". Anyway, unless you see frequent error/invalid results and all hardware software diagnostics show no cause, a reset may be necessary, but not before. All it does is cancel all tasks on the device, fetch a fresh copy of the science application and pull new tasks. Frequent error/invalid will actually result in the daily quota of work being cut all the down to 1. Does not read like you have that, only some inexplicable variations of how htop lists your processes. |
||
|
|
jonathandl
Advanced Cruncher Joined: Nov 12, 2007 Post Count: 106 Status: Offline Project Badges:
|
Probably a different flickering... I got rid of it by enlarging the window
---------------------------------------- What I don't understand is why the resumption of the OPN1_0001555_01743 job didn't show up in the event log after I restarted. And it expired out of my "Results status" screen too! But since it was a redundant (dual-validation) task, no worries; I think we can close this thread. [Edit 1 times, last edit by jonathandl at Jun 8, 2020 6:49:27 PM] |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
What I don't understand is why the resumption of the OPN1_0001555_01743 job didn't show up in the event log after I restarted. Because it doesn't by default. ![]() In BOINC Manager, boincmgr on Linux, if you click Options → Event Log options, a window opens (BOINC Diagnostic Log Flags). If you enable "task_debug" in there, you'll get to see messages like: Tue 09 Jun 2020 09:39:29 CEST | | Re-reading cc_config.xml ![]() [Edit 1 times, last edit by adriverhoef at Jun 9, 2020 7:52:07 AM] |
||
|
|
|