| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 33
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Interesting. One task appears as running for 55h 20m, properties report 2h 39m of CPU time, and Windows Task Manager reports 0:01:42 (1 hour 42 minutes, or 1 minute 42 seconds?) of time for that PID. But, a task that shows as 99% complete in three hours shows only 0:00:01 CPU time in Task Manager. So it seems that my processes aren't showing what I see in BOINC. Anyway, reducing to a heptacore didn't do anything; I still see the same behaviour. Hi ashes999. Has no one else has asked, can you shut down the Boinc & restart it then copy the first 30+ lines to where the tasks are starting from the event log & paste it here so we can see what this rig is doing might help. Also have you changed the checkpoint save time in Boinc for some reason? [Edit 2 times, last edit by Former Member at May 23, 2013 12:44:55 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Since like CEP2, the VINA jobs launch a new 'job' after each checkpoint, you will never see the total CPU time in the TM, just the time for the one docking. BOINC shows the accumulated CPU time. Time in TM is shown as hh:mm:ss so when you see 0:12:30 it's 12 minutes 30 seconds.
It makes absolute zero sense, that when you reduce from 8 to 7 cores on BOINC, there's still 1 VINA job exhibiting the Elapsed / CPU time discrepancy. Switch on checkpoint logging, tag <checkpoint_debug> to be added to cc_config.xml in the <log_flags> section, and start counting the entries per job. If one never checkpoints, it would never have an log entry. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
BTW, you never answered what the 'System idle process' was showing when BOINC ran and used all 8 cores. Also, apropos, assume that BOINC is set to still use 100% of time.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi ashes999.
If your using windows have you had a look at what the core/thread speeds are under load something like CPU.z to see if it is slowing down for some reason, I just thought that that might account for some of the lost time Plus you might want to check your B.I.O.s to see if there is something in there that is slowing it. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
P.P.L, it's only on 1 of 8 threads, PLUS, even if you slow a CPU, the Elapsed time and CPU time still count on the same clock... jobs will just take longer.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
ashes999, is the CPU time for the processor set in BOINC to 100%? The startup log as per P.P.L. is indeed now of interest, as is a piece of log when BOINC is in full swing, with the suggested checkpoint logging activated.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
ashes999.
----------------------------------------Another thing since this is a work P.C. is it possible that the I.T. person/dept have installed some software on the rig that you don't know about and more than likely wouldn't be able to see & fiddle with. ![]() [Edit 1 times, last edit by Former Member at May 23, 2013 7:53:54 AM] |
||
|
|
NightBlade
Advanced Cruncher Joined: Jun 10, 2008 Post Count: 89 Status: Offline Project Badges:
|
Hi all,
----------------------------------------Thanks for all the feedback and information. The main issue is that this happens sporadically, not all the time; I see usually none, sometimes one, and rarely two WUs in this weird state. @SekeRob thanks for explaining about the TM stuff, that makes perfect sense now. I've integrated your cc_config.xml changes and set my CPU back to 100% of CPUs and 85% usage; let's see if that causes the problem to emerge. It seemed like switching to 75% (6/8) solved the problem, but it occurs non-deterministically, so I can't say if this is for sure or not. As for support, I have full permission and full control to do whatever I want to my machine. There are a few rare exceptions, like disabling the firewall. It's very unlikely they installed anything on it, because the machine came to me very, very bare-bones with just an OS. As requested @P.P.L., here's the first 30-ish lines of my log file. In cc_config.xml, I added checkpoint_debug and cpu_sched_debug. Again, this is with 100% CPUs and 85% utilization. 23/05/2013 10:53:16 AM | | Starting BOINC client version 7.0.64 for windows_x86_64 23/05/2013 10:53:16 AM | | log flags: file_xfer, sched_ops, task, checkpoint_debug, cpu_sched_debug 23/05/2013 10:53:16 AM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6 23/05/2013 10:53:16 AM | | Running as a daemon 23/05/2013 10:53:16 AM | | Data directory: D:\Program Files (x86)\BOINC\Data 23/05/2013 10:53:16 AM | | Running under account boinc_master 23/05/2013 10:53:16 AM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9] 23/05/2013 10:53:16 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes nx lm vmx smx tm2 pbe 23/05/2013 10:53:16 AM | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00) 23/05/2013 10:53:16 AM | | Memory: 15.96 GB physical, 31.91 GB virtual 23/05/2013 10:53:16 AM | | Disk: 186.31 GB total, 172.94 GB free 23/05/2013 10:53:16 AM | | Local time is UTC -4 hours 23/05/2013 10:53:16 AM | | No usable GPUs found 23/05/2013 10:53:16 AM | VolPEx | URL http://volpex.cs.uh.edu/VCP/; Computer ID 7122; resource share 900 23/05/2013 10:53:16 AM | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2346629; resource share 100 23/05/2013 10:53:16 AM | | General prefs: from http://bam.boincstats.com/ (last modified 27-Dec-2012 11:32:09) 23/05/2013 10:53:16 AM | | Host location: none 23/05/2013 10:53:16 AM | | General prefs: using your defaults 23/05/2013 10:53:16 AM | | Reading preferences override file 23/05/2013 10:53:16 AM | | Preferences: 23/05/2013 10:53:16 AM | | max memory usage when active: 8169.22MB 23/05/2013 10:53:16 AM | | max memory usage when idle: 14704.59MB 23/05/2013 10:53:16 AM | | max disk usage: 100.00GB 23/05/2013 10:53:16 AM | | (to change preferences, visit a project web site or select Preferences in the Manager) 23/05/2013 10:53:16 AM | | [cpu_sched_debug] Request CPU reschedule: Prefs update 23/05/2013 10:53:16 AM | | [cpu_sched_debug] Request CPU reschedule: Startup 23/05/2013 10:53:16 AM | | Not using a proxy 23/05/2013 10:53:16 AM | | [cpu_sched_debug] Request CPU reschedule: Idle state change 23/05/2013 10:53:16 AM | | [cpu_sched_debug] Request CPU reschedule: periodic CPU scheduling 23/05/2013 10:53:16 AM | | [cpu_sched_debug] schedule_cpus(): start 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0584_0 (CPU job, priority order) (prio -1.000000) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0624_0 (CPU job, priority order) (prio -1.005208) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0778_0 (CPU job, priority order) (prio -1.010417) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0449_0 (CPU job, priority order) (prio -1.015625) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0011_0 (CPU job, priority order) (prio -1.020833) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0673_0 (CPU job, priority order) (prio -1.026042) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0708_0 (CPU job, priority order) (prio -1.031250) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0156_0 (CPU job, priority order) (prio -1.036458) 23/05/2013 10:53:16 AM | | [cpu_sched_debug] enforce_schedule(): start 23/05/2013 10:53:16 AM | | [cpu_sched_debug] preliminary job list: 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 0: SN2S_Smp102070_0000095_0584_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 1: SN2S_Smp102070_0000095_0624_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 2: SN2S_Smp102070_0000095_0778_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 3: SN2S_Smp102070_0000095_0449_0 (MD: no; UTS: yes) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 4: SN2S_Smp102070_0000095_0011_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 5: SN2S_Smp102070_0000095_0673_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 6: SN2S_Smp102070_0000095_0708_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 7: SN2S_Smp102070_0000095_0156_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | | [cpu_sched_debug] final job list: 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 0: SN2S_Smp102070_0000095_0584_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 1: SN2S_Smp102070_0000095_0624_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 2: SN2S_Smp102070_0000095_0778_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 3: SN2S_Smp102070_0000095_0449_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 4: SN2S_Smp102070_0000095_0011_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 5: SN2S_Smp102070_0000095_0673_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 6: SN2S_Smp102070_0000095_0708_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] 7: SN2S_Smp102070_0000095_0156_0 (MD: no; UTS: no) 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0584_0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0624_0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0778_0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0449_0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0011_0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0673_0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0708_0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] scheduling SN2S_Smp102070_0000095_0156_0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] SN2S_Smp102070_0000095_0584_0 sched state 1 next 2 task state 0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] SN2S_Smp102070_0000095_0624_0 sched state 1 next 2 task state 0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] SN2S_Smp102070_0000095_0778_0 sched state 1 next 2 task state 0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] SN2S_Smp102070_0000095_0449_0 sched state 1 next 2 task state 0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] SN2S_Smp102070_0000095_0011_0 sched state 1 next 2 task state 0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] SN2S_Smp102070_0000095_0673_0 sched state 1 next 2 task state 0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] SN2S_Smp102070_0000095_0708_0 sched state 1 next 2 task state 0 23/05/2013 10:53:16 AM | World Community Grid | [cpu_sched_debug] SN2S_Smp102070_0000095_0156_0 sched state 1 next 2 task state 0 23/05/2013 10:53:16 AM | World Community Grid | Restarting task SN2S_Smp102070_0000095_0584_0 using sn2s version 620 in slot 2 23/05/2013 10:53:16 AM | World Community Grid | Restarting task SN2S_Smp102070_0000095_0624_0 using sn2s version 620 in slot 11 23/05/2013 10:53:16 AM | World Community Grid | Restarting task SN2S_Smp102070_0000095_0778_0 using sn2s version 620 in slot 12 23/05/2013 10:53:16 AM | World Community Grid | Restarting task SN2S_Smp102070_0000095_0449_0 using sn2s version 620 in slot 6 23/05/2013 10:53:16 AM | World Community Grid | Restarting task SN2S_Smp102070_0000095_0011_0 using sn2s version 620 in slot 5 23/05/2013 10:53:16 AM | World Community Grid | Restarting task SN2S_Smp102070_0000095_0673_0 using sn2s version 620 in slot 9 23/05/2013 10:53:16 AM | World Community Grid | Restarting task SN2S_Smp102070_0000095_0708_0 using sn2s version 620 in slot 13 23/05/2013 10:53:16 AM | World Community Grid | Restarting task SN2S_Smp102070_0000095_0156_0 using sn2s version 620 in slot 14 23/05/2013 10:53:16 AM | | [cpu_sched_debug] enforce_schedule: end ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The 85% utilization is the potential issue [has been in past]... it's a meaningless setting for desktops/servers anyhow which is why I proposed 100% CPU time. 85% translates to 17/20th running of a unit of time which is in whole seconds, 3/20th pausing [to cool down which is meant for laptops]. 85% Is anyway even for laptops ineffective, only something like 50% [WCG default], will let the client run 1 second, pause one second, the prevent CPU fan oscillation.
I'd take the sched_debug out... generates lots of output with little value in the situation. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
P.S. Please edit the opening post and specify [Some VINA based WU take forever and don't checkpoint] as I did to this post. As is evident, some readers misunderstand it as being a broader problem.
|
||
|
|
|