| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 11
|
|
| Author |
|
|
Gollumer
Senior Cruncher Joined: Mar 23, 2006 Post Count: 194 Status: Offline |
Anyone else having problems with BOINC and HDC non-beta workunits on Core2 Duo processors?
New PC, new install of Windows (32 bit). Runs Games BF2/UT2004 like a champ, no probs no hangs. The CPU never gets above 41 degrees. On some workunits BOINC *HANGS* the PC. After rebooting and logging back in, BOINC reports COMPUTATION ERRORS on the workunits. I re-installed BOINC, forgot to save the logs, I'll postem in the troubleshooting when it happens again. Just curious if anyone else has had this problem, or am I too far on the bleeding edge? |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Have u checked in Taskmanager whether the Sciences app like wcg_hdc_tma_5.05_windows_intelx86 or wcg_faah_autodock_5.26_windows_intelx86 is still counting up time and merely the BOINCmgr.exe has become unresponsive? If so, kill BOINCmgr.exe and restart it. Check whether your firewall is the cause i.e. deactivate temporarily.
----------------------------------------The 'hanging' of BOINCmgr.exe on the firewall may cause the whole system to appear stuck...ctrl-alt-del to bring up taskmanager to check above. plz advise.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Oct 24, 2006 8:21:23 PM] |
||
|
|
Gollumer
Senior Cruncher Joined: Mar 23, 2006 Post Count: 194 Status: Offline |
When this happens the PC stops responding. I can't get to task manager, the mouse won't move/stops working. I let it go for 15 min once to see if I could try to shut it down or kill the task, but the only option is to reset or power off.
----------------------------------------[Edit 1 times, last edit by Gollumer at Oct 24, 2006 8:24:27 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There is no inherent problem with c2d's and hdc; I have 5 running 24/7 myself and many others on XS team have many more. Because it runs both cores at 100% constantly, wcg is a great stability test. It could be simply that your system is building up heat over time. Can you give more details about your system, such as chip,mobo,ram, fsb and ram speed, type of cooler, number of case fans, temps?
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Would you be Over-Clocking by any chance? I've seen this problem with OC'ing on projects like these. I refuse to OC myself.
This program will test reliability if so and detect errors in hardware. http://www.passmark.com/products/bit.htm |
||
|
|
Gollumer
Senior Cruncher Joined: Mar 23, 2006 Post Count: 194 Status: Offline |
Hardware:
Intel Core 2 Duo E6600 ASUS P5B-E Socket T Intel P965 (Latest BIOS on it) CORSAIR XMS2 2GB (2 x 1GB) DDR2 800 Antec Phantom 500 ATX12V 500W Power Supply - Retail Titan Robela Case (waterblock for GPU and CPU) GeForce 7950GT 512MB PCI Express x16 1x Maxtor SATA 120GB Drive 1x cd/dvd reader No overclocking, all standard/default bios settings. ASUS AI overclocking turned off in the bios. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Suggest u have a look in the files stderrdae.txt & stdoutdae.txt if anything was logged in there prior to the hang.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Gollumer,
I don't have a clue. I am assuming you are running Windows XP SP2. So, maybe experimenting is in order. To reduce (though not eliminate) the possibility of a video device driver problem, switch the screen saver to (blank). Also, since dual HDC programs could take a lot of memory and produce a lot of streaming disk I/O, try reducing the BOINC threads to 1 in your preferences to see if that changes things. It shouldn't, but if it does then we have something to puzzle over. Lawrence |
||
|
|
RCTCGrid
Cruncher Joined: Apr 6, 2006 Post Count: 10 Status: Offline Project Badges:
|
You are not the only one who is seeing problem with work units failing. We are seeing similar problems with our newer chipset machines running the BOINC client. Our problem began around 10/5 and we are losing about 50% of our cpu time on a large number of our machines to errored results. We were thinking that it might have been an issue with the new 3d screensaver on the boinc client but as of yet we have not had any luck resolving the problem. The computers run the units for anywhere from 30 mins to 9 hours then throw back an error.
Result Log <core_client_version>5.4.11</core_client_version> <message> - exit code -529697949 (0xe06d7363) </message> <stderr_txt> World Community Grid AutoDock (projects/www.worldcommunitygrid.org/wcg_faah_autodock_5.26_windows_intelx86) version Failed to get VersionInfo size: 1812 Failed to get VersionInfo size: 1812 INFO: projects/www.worldcommunitygrid.org/wcg_faah_autodock_5.26_windows_intelx86 Start AutoGrid... INFO:[20:34:35] Start AutoGrid... autogrid: autogrid4: Successful Completion. wcg_checkpoint() called Starting to checkpoint ... Checkpoint complete INFO:[20:36:00] End AutoGrid... Beginning AutoDock... INFO: Setting num_generations: 27000 Setting maxGen to 6750 autodock4: WARNING: Unrecognized keyword in docking parameter file, in line: compute_unbound_extended # compute extended ligand energyINFO: No state to restore. Start from the beginning. About to enter main loop...(dockings already completed: 0) call_glss(): pop_size: 200 num_evals: 10000000 start: [20:36:13] _maxGenSeenSoFar changed: 6750 ********Start app_graphics_init******** Total used = 1515585536 Difference = 1515585536 ********After gfxData******** Total used = 1515585536 Difference = 0 ********After DockingSpheresInit()******** Total used = 1515585536 Difference = 0 ********After opengl calls******** Total used = 1515585536 Difference = 0 ********After LoadTGA******** Total used = 1523838976 Difference = 8253440 ********After boinc_get_init_data******** Total used = 1523838976 Difference = 0 ********After loadModel******** Total used = 1517150208 Difference = -6688768 ********End of app_graphics_init******** Total used = 1517150208 Difference = 0 ********Before app_graphics_resize******** Total used = 1517150208 Difference = 0 ********After app_graphics_resize******** Total used = 1517150208 Difference = 0 ********Start app_graphics_init******** Total used = 1477865472 Difference = -39284736 ....... pages of this data cut out..... ********After app_graphics_resize******** Total used = 1728442368 Difference = 0 call_glss(): end: [04:16:50] wcg_checkpoint() called Starting to checkpoint ... </stderr_txt> It seems to never return from the checkpoint or returns with the error code and gives up maybe something with the checkpoint process, the 3d rendering or maybe the machines are running out of memory? I don't know with the crash debug for a 0xe06d7363 might indicate. These machines are not overclocked its a standard config. Most of our machines are the new clientpro 375 so 3 GHZ to 3.8 GHZ P4 processor either HT or Dual core IntelĀ® 945G chipset Integrated Intel Graphics Accelerator 950 512 MB of dual channel 533 DDR2 memory all the other specs can be found on the mpc website Good Luck with your bug let us know if you do discover the source of the problem. -Melchior |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
RCTCGrid,
Your problem may be that you are running out of RAM on these machines. As your log shows, you are using 1.5gigs of ram on a machine with only 512 MB of RAM. The reason this is showing up is because of the device drivers used in the graphics part of the program. When initially testing on my laptop I only noticed a 5MB graphics increase, but on another machine with an ATI graphics card it bumped up to 50MB. This machine was tested on other projects and encountered the same boost for graphics. Since your machines were running fine before graphics were enabled, I would suggest changing your screen saver on the machines to something other than BOINC. Can you test this on say 10 of your machines that are having this problem? Please note the machine ID numbers (host ID) from the client_state.xml file. Here is an example of what you will see in that file. <project> <master_url>http://www.worldcommunitygrid.org/</master_url> <project_name>World Community Grid</project_name> <user_name>uplinger</user_name> <team_name>Austin Grid Team</team_name> <email_hash>663ab4a907a4a436c6997fbdbd9332a6</email_hash> <cross_project_id>d6b988571681f53f7df5536c6ac6c0a0</cross_project_id> <user_total_credit>909630.837965</user_total_credit> <user_expavg_credit>2231.992249</user_expavg_credit> <user_create_time>1116875469.000000</user_create_time> <rpc_seqno>984</rpc_seqno> <hostid>49967</hostid> <host_total_credit>15028.450168</host_total_credit> <host_expavg_credit>151.775644</host_expavg_credit> In there you should see <hostid>xxxxx</hostid>. Please post these and I can monitor them if you would like. -Uplinger |
||
|
|
|