Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 33
Posts: 33   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 142075 times and has 32 replies Next Thread
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

I noticed more suspicious behaviour. After restarting, tasks that show a huge elapsed (eg. 2 hours) but small checkpoint (eg. 3 minutes) revert to having those values the same.

I have tried granting Everyone permission to the data directory, as well as all the boinc users (full control). I will monitor and keep you posted.

This seems like a recurring problem. :(
----------------------------------------

[May 15, 2013 4:50:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

I'm still seeing this problem.

Is it simply that BOINC sees eight threads, but one of them rarely gets scheduled?

Is there a side-effect I should see in my results? Errors?
----------------------------------------

[May 16, 2013 10:31:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

Anyone? @SekeRob?
----------------------------------------

[May 18, 2013 2:54:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7844
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

On an individual basis, you could suspend the WU and then after another WU has started resume the WU. I do not know why this works, but it has for me on the rare occasions when I have discovered a WU which has become stuck for an unknown reason. For me it was FAAH units which were at 50+ hours of run time when they would normally finish in under 10 hours. This has only happened on one machine which is configured identically with another machine - same hardware, same OS, same profile. This situation has been rare enough to handle on an individual basis so I have not mentioned it before. Also when the problem WU resumes it does not reflect the huge amount of cpu time, but a much lesser number more in line with the average.
My only other suggestion is to maybe back off one thread at a time and see if the problem still occurs with fewer than 8 threads running.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[May 18, 2013 3:23:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
coolstream
Senior Cruncher
SCOTLAND
Joined: Nov 8, 2005
Post Count: 475
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

@ashes999, I understand what you are saying about granting everything in BOINC folders a free hand in regard to AV scanning, however since you are still in the first phase of diagnostics, I think it might be better to put AV scanning in a more 'locked down' mode to rule out any possible subversive sinister activities wink
----------------------------------------

Crunching in memory of my Mum PEGGY, cousin ROPPA and Aunt AUDREY.
[May 18, 2013 4:32:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

@coolstream it's a work PC and I don't have direct control over disabling the AV. I doubt it's that, since this PC has the same AV as my previous PC.

@Sgt.Joe it's frequent enough that I can't monitor it by hand.

What do I do now?
----------------------------------------

[May 18, 2013 2:58:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

"What do I do now?"

As before, investigate what is eating the real CPU time on that 8th thread, or accept that it's a freebie, courtesy of the boss [who's surely authorized the running of this software directly or through the IT department], set BOINC to 87.5% use of processors [7 threads as before suggested], and see if the eating persists [Process Explorer, a Sysinternals/MS gratis tool] is very good at revealing what else runs **]. For sure the task is getting 100% of wallclock [elapsed], which means the process will take any spare cycle it has at it's disposal, *after* any other more aggressively prioritized process.

** If nothing else runs, then the spare cycles not going to the science app [very lowest priority and not to be meddled with] go to the CPU System Idle Process. Mine shows since last boot to have accumulated 227:04 hours on the octo, because I happen to be running BOINC as a heptacore on an octocore CPU. Also Task Manager performance tab will show each core utilization. I have 6 fully loaded and 2 about 50/50 [because the Win-OS is very good at spreading load over threads if it can.

edit: heptacore is not me own word-invention ... many hits on Google such as Heptacore Inc.:D
----------------------------------------
[Edit 1 times, last edit by Former Member at May 18, 2013 4:13:49 PM]
[May 18, 2013 4:10:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

@SekeRob I see. I'm going to see if I can find (or write) a tool that will monitor aggregate CPU over a couple of hours and report where it's going.

I'm usually sitting on that computer when it's not idling. This happens even when I idle. It could be legitimately used up compiling code etc. it's reasonable that 50% of one CPU goes there.

I'm just skeptical because the checkpoints show virtually nothing (3-10 minutes max) after hours and hours of running. That seems suspicious.

I'll try reducing number of cores to 87.5% and seeing what happens.
----------------------------------------

[May 19, 2013 2:20:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
NightBlade
Advanced Cruncher
Joined: Jun 10, 2008
Post Count: 89
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

Interesting. One task appears as running for 55h 20m, properties report 2h 39m of CPU time, and Windows Task Manager reports 0:01:42 (1 hour 42 minutes, or 1 minute 42 seconds?) of time for that PID.

But, a task that shows as 99% complete in three hours shows only 0:00:01 CPU time in Task Manager. So it seems that my processes aren't showing what I see in BOINC.

Anyway, reducing to a heptacore didn't do anything; I still see the same behaviour.
----------------------------------------

[May 22, 2013 9:51:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7844
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WUs take forever and don't checkpoint

Other than checking the temperature I am out of suggestions.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[May 22, 2013 10:35:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 33   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread