Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 53
|
![]() |
Author |
|
Lone-Wolf
Cruncher Joined: Apr 10, 2007 Post Count: 33 Status: Offline |
Unfortunately I'm starting to suspect the motherboard is the cause as I did change the chipset fan recently as it was howling and running erratically.
----------------------------------------Temps are great now but likely the damage is done. :( ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Do you see both processes in Task Manager?
Normally, if a process dies prematurely or stops responding, then BOINC will detect this (it uses a heartbeat mechanism) and kill/restart the process, as well as logging the problem. Just stopping is usually symptomatic of a problem with the science application or work unit. I wasn't aware of any such issue with AutoDock, and it is also unusual that only you are affected (so far). A hardware problem is perfectly possible. Please keep us updated, and we will review this again if further reports come in. |
||
|
Lone-Wolf
Cruncher Joined: Apr 10, 2007 Post Count: 33 Status: Offline |
Do you see both processes in Task Manager? Normally, if a process dies prematurely or stops responding, then BOINC will detect this (it uses a heartbeat mechanism) and kill/restart the process, as well as logging the problem. Just stopping is usually symptomatic of a problem with the science application or work unit. I wasn't aware of any such issue with AutoDock, and it is also unusual that only you are affected (so far). A hardware problem is perfectly possible. Please keep us updated, and we will review this again if further reports come in. Yes in task manager I can see both processes and when operating properly they will be displaying about 48-50% CPU usage each while currently one says zero and the other is at 49%. I see no visible clues as to the problem other than the counter stopping and task manager showing an idle core. ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Lone-Wolf,
The Useful Utilities thread in Start Here at http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=2490 offers many useful system diagnostic programs such as CPU-Z, Everest and HotCPU. Lawrence |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
On my dual core laptop running Linux, I saw a WU today that kept stopping. I had both a DDDT and a HCC WU crunching. The DDDT WU all of a sudden stopped incrementing the CPU time. I checked CPU usage using 'top' and indeed only the HCC WU was still running. I restarted BOINC and the DDDT WU resumed along with the HCC WU. There is something fishy going on. There weren't any messages in the message tab. It just seemed to stop completely as if it crashed. The DDDT app wasn't even present as far as I could tell. I'll be on the lookout for this to see if I can catch it happening again and if I can find any interesting info anywhere else.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi saccia,
An idea - - check your Preferences to see if RAM limits might have forced BOINC to unload one work unit. Lawrence |
||
|
Lone-Wolf
Cruncher Joined: Apr 10, 2007 Post Count: 33 Status: Offline |
My scenario is exactly as saccia describes and I have 2GB of RAM in that machine.
----------------------------------------For what it's worth I have noticed that upon a reboot the frozen work unit will run but often times that unit then ends up getting a "computation error" resulting in a new work unit starting. Although there's much about the innards of a computer that I don't fully grasp it baffles me a bit that one unit will run fine if indeed I have a hardware failure. I would be inclined to think that if I cooked the chip it would either work or not. I may just set that machine to crunch something else and monitor what happens. FWIW I just checked the temperatures and under full load the CPU is operating at 35C both cores and the chip is showing in the same range so I highly doubt it's a heat issue. ![]() [Edit 1 times, last edit by Lone-Wolf at Nov 11, 2007 1:59:57 AM] |
||
|
Lone-Wolf
Cruncher Joined: Apr 10, 2007 Post Count: 33 Status: Offline |
Update: I've changed the profile on the machine in question to crunch only HPF and it's been running just fine since on both cores.
----------------------------------------At this stage I'd have to say I'm a bit suspicious. I had set my crunchers to participate in the DDD work units when they first came out and kept noticing that various computers required a reboot but I didn't have time to investigate. I simply changed them back to HPF and they kept running fine. With time elapsed I thought I'd try DDD again a couple of weeks back and it seemed my machines were doing better this time except for this one which I attributed at the time to the chipset fan requiring attention which has been dealt with and currently cooling is far better than stock allowances for this equipment as I've beefed it all up yet still have not overclocked it. I'll keep monitoring its progress and report back if it should freeze up again but it looks promising now. ![]() |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Check your cc_config.xml file if the default flags were set to >0< instead of >1<. It may explain why we see so little messages, other than start/completed/upload.
----------------------------------------The flags are: <task>1</task> <file_xfer>1</file_xfer> <sched_ops>1</sched_ops> cheers. This FAQ: How do I configure my client using the cc_config.xml file? explains the configuration file.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Lone-Wolf,
Update: I've changed the profile on the machine in question to crunch only HPF and it's been running just fine since on both cores. At this stage I'd have to say I'm a bit suspicious. BOINC allows you to set Preferences setting a maximum RAM usage. If 2 processes combined exceed that maximum, then one of the processes will be suspended. Unless you have checked the 'Leave in memory' option the suspended process will be removed from memory. Meaning that it will have to start over from the last check point. Which is why I am suspicious of using this limit. I suppose that I was too terse in my previous post mentioning this possibility. Lawrence |
||
|
|
![]() |