Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2429 times and has 7 replies Next Thread
jonathandl
Advanced Cruncher
Joined: Nov 12, 2007
Post Count: 106
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Linux question?

I have been running "htop" lately, in "tree view" (F5 in "htop,") to look at the processes running on my Linux box. Normally, when the main boinc client starts a science application, the process name of the science application is long. Simplified example of the usual output:
PID  Command
1420 boinc
2733 ├ wcgrid_mcm1_map_7.43_x86_64_pc_linux_gnu
2734 │ └ wcgrid_mcm1_map_7.43_x86_64_pc_linux_gnu
1942 ├ wcgrid_opn1_autodock_7.17_x86_64_pc_linux_gnu
1943 │ └ wcgrid_opn1_autodock_7.17_x86_64_pc_linux_gnu
1675 └ boinc

But today, I saw it do something like this a couple of times.
PID  Command
1420 boinc
2733 ├ wcgrid_mcm1_map_7.43_x86_64_pc_linux_gnu
2734 │ └ wcgrid_mcm1_map_7.43_x86_64_pc_linux_gnu
1675 ├ boinc
1942 └ boinc
1943 └ wcgrid_opn1_autodock_7.17_x86_64_pc_linux_gnu

Does this mean that something got corrupted in BOINC, and I should either reset the project or restart my computer? Or is it safe just to let everything continue to run (including the new single-validation/zero-redundancy OpenPandemics jobs)?
----------------------------------------
[Edit 1 times, last edit by jonathandl at Jun 4, 2020 4:44:39 AM]
[Jun 4, 2020 4:44:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux question?

You don't reset unless you see such warnings in the event log.

Why the deviated output, don't know. Maybe the task was paused, but again, that's what you would see in BOINC manager.

If you're running headless, the event log is written to the stdoutdae.txt file.

There's an all in one boinc manager in text format especially written for terminal called boinctui. https://zoomadmin.com/HowToInstall/UbuntuPackage/boinctui. A screenshot

[Jun 4, 2020 5:36:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Linux question?

I'm a fair beginner to Linux, but wow never knew htop had a Tree view, but my mind is blown now -- very useful!

I wouldn't worry too much, honestly. I've seen similar things in certain versions of Windows 10 Task Manager, where some tasks are grouped under BOINC and some are not.

If you're looking for a deeper, more technical root cause of why you're seeing that, maybe one of the *nix wizards will post in this thread. I personally wouldn't worry at all unless the Event Log shows errors or warnings or multiple tasks start error'ing out.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 2 times, last edit by hchc at Jun 4, 2020 5:50:53 AM]
[Jun 4, 2020 5:45:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jonathandl
Advanced Cruncher
Joined: Nov 12, 2007
Post Count: 106
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Linux question?

You don't reset unless you see such warnings in the event log.

What kind of warning in the event log merits a reset? I used to reset a lot more, but now I try not to do it at all.

In any case, I have a new event-log problem with the GUI (official) BOINC manager. (And I prefer to use the official software even though HTOP be "unofficial.") Just recently, I got about 6 jobs of which I noticed that OPN1_0001555_01743_0 is a job waiting for me to validate somebody else's result. I decided that I want to do that one first, so in BOINC Manager, I suspended the Project, then I suspended all jobs except for OPN1_0001555_01743. Then I resumed the project, but guess what? First, when I scroll all the way down in the event log viewer, my screen flickers rapidly, almost as if my mouse were fighting something. And, the event log says "task MCM1_0163967_9483_0 resumed by user."

I look in the Tasks tab of the BOINC Manager GUI, and it does show that OPN1_0001555_01743 is running and MCM1_0163967_9483_0 is suspended, but the event log clearly said that MCM1 _0163967_9483_0 is the one that was resumed.
So, I shutdown my computer and powered it back on. Now the correct Task is showing as running in the BOINC manager, as before. But the Event log still exhibits the rapid "flickering" or "shaking" behavior when I scroll all the way to the bottom! And when using "systemctl" to get the last few lines of the log in a Terminal window, there is no indication of the current, correct task OPN_0001555_017431 as there should be.

systemctl status boinc-client
● boinc-client.service - Berkeley Open Infrastructure Network Computing Client
Loaded: loaded (/lib/systemd/system/boinc-client.service; disabled; vendor preset: enabled)
Active: active (running) since Thu 2020-06-04 09:19:13 EDT; 6h ago
Docs: man:boinc(1)
Main PID: 1490 (boinc)
Tasks: 4 (limit: 4915)
Memory: 139.1M
CGroup: /system.slice/boinc-client.service
├─1490 /usr/bin/boinc
└─1746 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.17_x86_64-pc-linux-gnu -jobs OPN1_0001555_01743.job -input OPN1_0001555_01743.zip -seed 803319668 -wcgruns 34 -wcgdpf 1

Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] don't use GPU while active
Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] suspend work if non-BOINC CPU load exceeds 88%
Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] Setting up project and slot directories
Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] Checking active tasks
Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 5852410; resource share 120
Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] Setting up GUI RPC socket
Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] gui_rpc_auth.cfg is empty - no GUI RPC password protection
Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 [---] Checking presence of 174 project files
Jun 04 09:19:15 ProLiant-MicroServer-Gen10 boinc[1490]: 04-Jun-2020 09:19:15 Initialization completed

If WC Grid staff are reading this, the name of the task in trouble is OPN1_0001555_01743. Since it's just to validate work already done by somebody else, there is not too much cause for concern, but if this were to happen to a single-validation unit then maybe I really should reset?
----------------------------------------
[Edit 3 times, last edit by jonathandl at Jun 4, 2020 8:44:45 PM]
[Jun 4, 2020 7:44:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Linux question?

On the flickering question. If it is the flickering with which I am familiar, it means your video card or onboard video is not keeping up with your scroll action. I have not seen this with BOINC, but have experienced it with a few wide spreadsheets with 100,000 plus rows while scrolling.
I suppose it could also be an inadequate refresh rate in your monitor.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Jun 4, 2020 10:23:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux question?

You don't reset unless you see such warnings in the event log.
The warning text in the event log is "If this happens repeatedly you may need to reset the project."
...
If WC Grid staff are reading this, the name of the task in trouble is OPN1_0001555_01743. Since it's just to validate work already done by somebody else, there is not too much cause for concern, but if this were to happen to a single-validation unit then maybe I really should reset?

There's enough QA checks for the system to know that if the single, zero redundant result, is iffy, a second copy is send out and your copy will be marked "Pending Verification".
Anyway, unless you see frequent error/invalid results and all hardware software diagnostics show no cause, a reset may be necessary, but not before. All it does is cancel all tasks on the device, fetch a fresh copy of the science application and pull new tasks.

Frequent error/invalid will actually result in the daily quota of work being cut all the down to 1. Does not read like you have that, only some inexplicable variations of how htop lists your processes.
[Jun 5, 2020 6:22:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jonathandl
Advanced Cruncher
Joined: Nov 12, 2007
Post Count: 106
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Linux question?

Probably a different flickering... I got rid of it by enlarging the window smile
What I don't understand is why the resumption of the OPN1_0001555_01743 job didn't show up in the event log after I restarted. And it expired out of my "Results status" screen too! But since it was a redundant (dual-validation) task, no worries; I think we can close this thread.
----------------------------------------
[Edit 1 times, last edit by jonathandl at Jun 8, 2020 6:49:27 PM]
[Jun 8, 2020 6:48:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Linux question?

What I don't understand is why the resumption of the OPN1_0001555_01743 job didn't show up in the event log after I restarted.
Because it doesn't by default. devilish


In BOINC Manager, boincmgr on Linux, if you click Options → Event Log options,
a window opens (BOINC Diagnostic Log Flags).
If you enable "task_debug" in there, you'll get to see messages like:
Tue 09 Jun 2020 09:39:29 CEST |  | Re-reading cc_config.xml
Tue 09 Jun 2020 09:39:29 CEST | | log flags: file_xfer, sched_ops, task, task_debug
Tue 09 Jun 2020 09:39:32 CEST | World Community Grid | task HST1_306457_000082_KC0026_T350_F00013_S00037_1 resumed by user
Tue 09 Jun 2020 09:39:32 CEST | World Community Grid | [task] task_state=SUSPENDED for ARP1_0008183_013_1 from suspend
Tue 09 Jun 2020 09:39:32 CEST | World Community Grid | [task] task_state=EXECUTING for HST1_306457_000082_KC0026_T350_F00013_S00037_1 from unsuspend
tongue
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jun 9, 2020 7:52:07 AM]
[Jun 9, 2020 7:47:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread