Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 53
Posts: 53   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6544 times and has 52 replies Next Thread
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Frozen work units?

My Software Firewall used to be the cause of this until permitting the 4 application components to use the localhost IP 127.0.0.1 and port 31416 unhindered. The Start Here forum Vista FAQ (is windows too) has a paragraph on it and describes the elements needing exemption.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Nov 12, 2007 6:54:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Frozen work units?

My Software Firewall used to be the cause of this until permitting the 4 application components to use the localhost IP 127.0.0.1 and port 31416 unhindered. The Start Here forum Vista FAQ (is windows too) has a paragraph on it and describes the elements needing exemption.


Well, I'm on a Linux system so I'm not sure what to look for. Since this is my work laptop, it does have ipchains (or iptables maybe) rules setup but I think that's mainly for inbound traffic. I've never had any outbound traffic issues and as it's not a "personal" firewall product it doesn't prompt to allow traffic that gets caught by a rule. This laptop also has Symantec Antivirus (yeah yeah, on a Linux machine... take it up with the silly corporate people) but I've checked and that's not running too much. Those heartbeat issues seemed to coincide with a lot of disk I/O though. Does DDDT write a lot of stuff to disk like the old HDC project that used to write several hundred meg files to disk?
[Nov 12, 2007 7:33:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Frozen work units?

DDDT doesn't do an unusual amount of disk IO (it's AutoDock, the same as FightAIDS@Home).

However, if Symantec is doing a lot of blocking IO, and this coincided with a checkpoint - it could very well interrupt the heartbeat.

I don't like Symantec. Is there any way you can reduce the priority, so it doesn't interfere? Any other heavy IO processes going on?

Anyway, losing the heartbeat isn't a fatal error. It should continue from the last checkpoint.
[Nov 12, 2007 7:55:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Diana G.
Master Cruncher
Joined: Apr 6, 2005
Post Count: 3003
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Frozen work units?

Sekerob
Added: In Lone_Wolf's log I'm missing for instance the upload speeds, unless some other none standard flag achieves this.


That is what is missing d oh I kept scratching my head going why are the messages sooooo BORING now in 5.10.28. Thanks Sek!

Diana
----------------------------------------

[Nov 12, 2007 11:47:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Lone-Wolf
Cruncher
Joined: Apr 10, 2007
Post Count: 33
Status: Offline
Reply to this Post  Reply with Quote 
Re: Frozen work units?

I don't use Symantec and have all my machines set up pretty much the same way.

They use XP Pro, AntiVir PE, CCleaner, FireFox and the built in Windows firewall and all my units connect through a router.

Cooling is also beefed up on all my computers well beyond what they really require.

All Asus motherboards but there are some various incarnations.

As for what info goes into my logs I have always installed BOINC and lived by the "run what you got" motto.

I just reformatted a dual core laptop last night and installed the newest version of BOINC on it and it is running fine from what I can see.

I am certainly not ruling out the possibility that my chipset got damaged when the Northbridge fan failed however what is confusing me is that in my mind I figure running 100% CPU is 100% CPU whether I'm running HPF or DDD yet one works and the other clearly doesn't.
----------------------------------------

[Nov 13, 2007 12:57:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Frozen work units?

Hello Lone-Wolf,
What system diagnostic program have you run to check out your system's hardware?

Lawrence
[Nov 13, 2007 2:38:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Lone-Wolf
Cruncher
Joined: Apr 10, 2007
Post Count: 33
Status: Offline
Reply to this Post  Reply with Quote 
Re: Frozen work units?

Hello Lone-Wolf,
What system diagnostic program have you run to check out your system's hardware?

Lawrence


All I've had time to do is run memtest.

This machine is a dedicated cruncher so generally sits under the desk without keyboard and monitor attached therefor a bit of a pain to mess with repeatedly.

If you have a suggestion I'll download it and run it as soon as I get the opportunity.
----------------------------------------

[Nov 13, 2007 2:53:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Frozen work units?

Sekerob
Added: In Lone_Wolf's log I'm missing for instance the upload speeds, unless some other none standard flag achieves this.


That is what is missing d oh I kept scratching my head going why are the messages sooooo BORING now in 5.10.28. Thanks Sek!

Diana

If it does, supplant your cc_config.xml with this content:
<cc_config>
<log_flags>
<task>1</task>
<file_xfer>1</file_xfer>
<file_xfer_debug>1</file_xfer_debug>
<proxy_debug>0</proxy_debug>
<http_debug>0</http_debug>
<checkpoint_debug>1</checkpoint_debug>
</log_flags>
<options>
<save_stats_days>90</save_stats_days>
<dont_contact_ref_site>0</dont_contact_ref_site>
</options>
</cc_config>


Shows these extras:

13/11/2007 12.59.14|World Community Grid|[checkpoint_debug] result X0000038190034200409101259_1 checkpointed
13/11/2007 13.00.15|World Community Grid|[checkpoint_debug] result faah2630_ZINC01600154_xmd04360_01_1 checkpointed
13/11/2007 13.00.43|World Community Grid|Computation for task X0000038190034200409101259_1 finished
13/11/2007 13.00.43|World Community Grid|Starting dddt0201b0175_ZINC04017563-0000_00_1
13/11/2007 13.00.43|World Community Grid|Starting task dddt0201b0175_ZINC04017563-0000_00_1 using dddt version 510
13/11/2007 13.00.45|World Community Grid|Started upload of X0000038190034200409101259_1_0
13/11/2007 13.00.45||[file_xfer_debug] URL: http://www.worldcommunitygrid.org/boinc/wcg_cgi/file_upload_handler
13/11/2007 13.00.48||[file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval 0
13/11/2007 13.00.53||[file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval 0
13/11/2007 13.00.53||[file_xfer_debug] file transfer status 0
13/11/2007 13.00.53|World Community Grid|Finished upload of X0000038190034200409101259_1_0
13/11/2007 13.00.53|World Community Grid|[file_xfer_debug] Throughput 11253 bytes/sec

(some motivation may have come from crunchers who 'advised' that the transmission speeds were 'incorrect', so hid it by default)!
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Nov 13, 2007 12:30:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Frozen work units?

Okay, Lone-Wolf - sorry about all the red herrings. Feel free to ignore all the stuff about heartbeats and log files. None of it seems specific to your case.

You think it may be a hardware issue, and so far I have seen nothing to contradict this.

The heavy duty science being done by WCG puts computers through their paces like little else. It isn't unusual for an existing hardware problem to be triggered by a particular operation in one of the projects.

For now, I suggest you opt out of running DDDT, and see whether the other projects run without issues. Meanwhile, we will keep an eye out for similar issues affecting other DDDT crunchers.
[Nov 13, 2007 1:52:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dark Angel
Veteran Cruncher
Australia
Joined: Nov 11, 2005
Post Count: 721
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Frozen work units?

I get a similar problem intermittently on two different machines. One, a P4 2.4GHz machine with 1Gb RAM, the other a XP3000+ also with 1Gb RAM. Neither machine is overclocked, both are running Ubuntu 7.10 and neither show anything in any logs I've looked at. They just ... stop. Most units go through fine, it's just the odd one the stops for some reason.
----------------------------------------

Currently being moderated under false pretences
[Dec 7, 2007 6:35:44 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 53   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread