Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 13
Posts: 13   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 14775 times and has 12 replies Next Thread
smiley7804
Cruncher
Joined: Dec 8, 2012
Post Count: 5
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Restarting tasks. Why?? TCEPP2

What should i do to avoid getting this error now and then?

2013-11-17 08:58:53 | World Community Grid | Task E217024_985_I.47.C36F6H17N3OS.00284976.4.set1d06_1 exited with zero status but no 'finished' file
2013-11-17 08:58:53 | World Community Grid | If this happens repeatedly you may need to reset the project.
2013-11-17 08:58:53 | World Community Grid | Task E217024_683_I.46.C35H19N5O4S2.00082339.1.set1d06_1 exited with zero status but no 'finished' file
2013-11-17 08:58:53 | World Community Grid | If this happens repeatedly you may need to reset the project.
2013-11-17 08:58:53 | World Community Grid | Task E217024_500_I.47.C36F6H17N3OS.00412405.2.set1d06_1 exited with zero status but no 'finished' file
2013-11-17 08:58:53 | World Community Grid | If this happens repeatedly you may need to reset the project.
2013-11-17 08:58:53 | World Community Grid | Task E217023_366_I.49.C36F8H12N4O.00306914.1.set1d06_2 exited with zero status but no 'finished' file
2013-11-17 08:58:53 | World Community Grid | If this happens repeatedly you may need to reset the project.

CPU used 55%
RAM used 2,0GB of 3,2 GB

2013-11-17 02:17:33 | | Starting BOINC client version 7.0.64 for windows_intelx86
2013-11-17 02:17:33 | | log flags: file_xfer, sched_ops, task
2013-11-17 02:17:33 | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
2013-11-17 02:17:33 | | Data directory: H:\ProgramData\BOINC
2013-11-17 02:17:33 | | Running under account Mikael
2013-11-17 02:17:33 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz [Family 6 Model 58 Stepping 9]
2013-11-17 02:17:33 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes nx lm vmx tm2 pbe
2013-11-17 02:17:33 | | OS: Microsoft Windows 8: Professional x86 Edition, (06.02.9200.00)
2013-11-17 02:17:33 | | Memory: 3.22 GB physical, 3.79 GB virtual
2013-11-17 02:17:33 | | Disk: 232.88 GB total, 167.14 GB free
2013-11-17 02:17:33 | | Local time is UTC +1 hours
2013-11-17 02:17:33 | | No usable GPUs found
2013-11-17 02:17:33 | Docking | URL http://docking.cis.udel.edu/; Computer ID 129511; resource share 100
2013-11-17 02:17:33 | MindModeling@Beta | URL http://mindmodeling.org/; Computer ID 34086; resource share 100
2013-11-17 02:17:33 | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2287124; resource share 100
2013-11-17 02:17:33 | World Community Grid | General prefs: from World Community Grid (last modified 27-Sep-2013 00:08:21)
2013-11-17 02:17:33 | World Community Grid | Host location: none
2013-11-17 02:17:33 | World Community Grid | General prefs: using your defaults
2013-11-17 02:17:33 | | Reading preferences override file
2013-11-17 02:17:33 | | Preferences:
2013-11-17 02:17:33 | | max memory usage when active: 1650.27MB
2013-11-17 02:17:33 | | max memory usage when idle: 2640.43MB
2013-11-17 02:17:33 | | max disk usage: 170.84GB
2013-11-17 02:17:33 | | max CPUs used: 4
2013-11-17 02:17:33 | | don't use GPU while active
[Nov 17, 2013 2:36:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Restarting tasks. Why?? TCEPP2

Hello smiley7804,
I tell people not to worry about this error, but Sekerob points out that it is caused by too much contention for resources. I let my computer use 90% of my memory when active for BOINC and seldom see this message. Disk I/O is a frequent cause of contention. I only allow 2 CEP2 jobs to run simultaneously and run other less disk-intensive projects on the remaining cores but you have a newer computer that is probably faster than mine.

Lawrence
[Nov 17, 2013 3:17:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
captainjack
Advanced Cruncher
Joined: Apr 14, 2008
Post Count: 147
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Restarting tasks. Why?? TCEPP2

This happens quite often on my Windows 7 box when a CEP2 task finishes up. When the CEP2 task finishes, the Status column still shows that it is running, the Elapsed column stops moving, and the Remaining column shows "---" and for about 45 seconds there is something happening (wrap-up/zip-up/whatever) that takes up quite a few system resources. When the wrap-up finishes, then the Event Log shows the following messages.

11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVAaINleA_0325258_0515_0 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVAaINy3A_0325452_0054_0 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVAaINy3A_0325452_0141_0 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVAaINy3A_0325452_0146_0 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVAaINy3A_0325452_0134_0 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVCbINy3B_0327447_0549_0 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVJaINleB_0306691_0546_2 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVJaINleB_0306691_0387_2 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVJaINleB_0306691_0532_2 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Task FAHV_x3AVJaINleB_0306691_0518_2 exited with zero status but no 'finished' file
11/17/2013 10:49:42 AM | World Community Grid | If this happens repeatedly you may need to reset the project.
11/17/2013 10:49:42 AM | World Community Grid | Computation for task E217029_636_I.46.C35H19N5O4S2.00083931.4.set1d06_1 finished
11/17/2013 10:49:42 AM | World Community Grid | Restarting task FAHV_x3AVAaINleA_0325258_0515_0 using fahv version 706 in slot 5
11/17/2013 10:50:47 AM | World Community Grid | Restarting task FAHV_x3AVAaINy3A_0325452_0054_0 using fahv version 706 in slot 8
11/17/2013 10:50:47 AM | World Community Grid | Restarting task FAHV_x3AVAaINy3A_0325452_0141_0 using fahv version 706 in slot 4
11/17/2013 10:50:47 AM | World Community Grid | Restarting task FAHV_x3AVAaINy3A_0325452_0146_0 using fahv version 706 in slot 6
11/17/2013 10:50:47 AM | World Community Grid | Restarting task FAHV_x3AVAaINy3A_0325452_0134_0 using fahv version 706 in slot 10
11/17/2013 10:50:47 AM | World Community Grid | Restarting task FAHV_x3AVCbINy3B_0327447_0549_0 using fahv version 706 in slot 7
11/17/2013 10:50:47 AM | World Community Grid | Restarting task FAHV_x3AVJaINleB_0306691_0546_2 using fahv version 706 in slot 2
11/17/2013 10:50:47 AM | World Community Grid | Restarting task FAHV_x3AVJaINleB_0306691_0387_2 using fahv version 706 in slot 0
11/17/2013 10:50:47 AM | World Community Grid | Restarting task FAHV_x3AVJaINleB_0306691_0532_2 using fahv version 706 in slot 9
11/17/2013 10:50:47 AM | World Community Grid | Restarting task FAHV_x3AVJaINleB_0306691_0518_2 using fahv version 706 in slot 11

Note that all 10 of the other WCG tasks get an error message that the task exited with a zero status but no 'finished' file. Then the CEP2 task shows that it is finished. Then all 10 of the other WCG tasks restart. Note that all of the error messages and the message that the CEP2 task is finished happen at exactly 10:49:42.

This is happening with the profile set to use 75% of the available threads and to suspend processing is non-BOINC work takes more than 50% of the CPU.

My other computer is running Ubuntu on an i7 with 8 threads. It is running 100% of the threads. I have never seen this happen on the Ubuntu box when a CEP2 task finishes.

If anybody is interested in troubleshooting and needs more information from me, I will be glad to provide it.
[Nov 17, 2013 5:15:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Restarting tasks. Why?? TCEPP2

↑ I have this issue, too! /o\

Hmm, an obvious solution here I think is to use a solid state drive... or maybe use an external application (like Process Lasso) to raise the priority... or maybe, as a dirty hack, temporarily suspend the WUs with almost equal percent completed. Give them time at least, say, 10 minutes away from each, then resume again. Though the last one will have to be redone repeatedly... : ( This is only good if you have lots of RAM, of course.

Edit: Okay that didn't work well. Just two out of 8 WUs survived. : (
----------------------------------------
[Edit 3 times, last edit by Former Member at Nov 18, 2013 1:12:14 PM]
[Nov 18, 2013 3:06:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Loui_h20
Cruncher
Joined: Aug 12, 2013
Post Count: 25
Status: Offline
Reply to this Post  Reply with Quote 
Re: Restarting tasks. Why?? TCEPP2

This should be fixed if you update to Boinc 7.2.28!
----------------------------------------

[Nov 22, 2013 5:46:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Restarting tasks. Why?? TCEPP2

Hello captainjack,
Exit with 0 status normally means too much competition for a resource. I am usually suspicious of memory. Be sure that you have your profile set to "Leave Application In Memory", then see how much memory BOINC is allowed to use when computer is active. Subtract 1 GB for worst case CEP2 and divide the remainder to see how much each FAHV is allowed to use.. You might want to increase the amount of memory that BOINC is allowed to use. (I let BOINC use 90% since the OS will grab any memory it needs from BOINC.)

Lawrence
[Nov 22, 2013 6:14:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
captainjack
Advanced Cruncher
Joined: Apr 14, 2008
Post Count: 147
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Restarting tasks. Why?? TCEPP2

Thanks for the feedback.

Loui h20, that box has been running 7.2.28 since 11/8 so I don't think that is part of the problem.

Lawrence,

The box is a 16 thread machine with 6GB of memory, when all threads are busy with BOINC jobs, it is normally using 3-4GB of memory (according to the Task Manager). Right now, it is using 3.33 GB. When the example that I cited above was running, the device profile was set to:
Use no more than 75% of memory when computer is in use
Use no more than 90% of memory when computer is idle
Suspend work if CPU usage is above 50 %
On multiprocessors, use 75% of processors

Now here's the bizarre part. In the device profile on the WCG web site, LAIM was turned on. However, when I checked the computing preferences in BOINC on the computer, it said LAIM was turned off. I checked the device profiles on other GPU projects that I support and some of them said that LAIM was turned off. So I turned on LAIM on all the other projects and did a project update in BOINC on the machine. Now BOINC on the machine says that LAIM is on.

I thought BOINC was supposed to sync device profile settings across projects. Maybe it got confused. Happens to me sometimes.

Now that LAIM is turned on for BOINC on the machine, I'll crank up another CEP2 job and see what happens. I'll let you know how it turns out.

CaptainJack
[Nov 22, 2013 8:26:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Loui_h20
Cruncher
Joined: Aug 12, 2013
Post Count: 25
Status: Offline
Reply to this Post  Reply with Quote 
Re: Restarting tasks. Why?? TCEPP2

I'll post my windows 7 work pc specs so that you can compare systems at a later point, Captain.

I'm just glad I can get all threads running at once with no loss of data every time 1 wu finishes!
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Loui_h20 at Nov 22, 2013 10:13:14 PM]
[Nov 22, 2013 10:12:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
captainjack
Advanced Cruncher
Joined: Apr 14, 2008
Post Count: 147
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Restarting tasks. Why?? TCEPP2

Had a power outage last night and lost ~9 hours on my CEP2 test. It just finished and looks like it worked okay. In the event log, when the CEP2 task ended, there was a message that said "Suspending computation - CPU is busy" then a bit later, there was another message that said "Resuming computation" and everything started back up.

Looking back through my notes, I'm net exactly sure when I updated to BOINC 7.2.28. The only thing I could find is that it was downloaded on 11/8, but I'm not sure at this point when exactly it was installed. I do know that it was running for this latest test.

Either the update to 7.2.28 or turning on LAIM (or both) seemed to help. I will allow some more CEP2 tasks in and see what happens.

Lawrence and Loui_h20, thanks for all the suggestions.
CaptainJack
[Nov 24, 2013 12:29:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Loui_h20
Cruncher
Joined: Aug 12, 2013
Post Count: 25
Status: Offline
Reply to this Post  Reply with Quote 
Re: Restarting tasks. Why?? TCEPP2

Apologies CaptainjacK I typed to soon :-(.

After switching to 8 threads over the weekend I've had to go back to 7 threads.
To many exited with zero status messages.

My work PC is a Windows 7 mini tower HP Compaq Elite 8300 with a i7-3770, 8gb mem and a 500GB hdd, GTX650ti, GT440(OEM) and finally a Gt440.
Got those GPU cards to help crunch other projects but will pull them out as I'm only doing CEP2 now. Pity there isn't a version of CEP2/qchem for GPUs.

Nothing special but it won't get upgraded for the next 5 years so why not do something useful with it.

Damn it I thought the new version of bionc had fixed that zero status issue.

Disk usage
Use at most 12GB
Leave at least 0.05 free
Use at most 100% of total disk space
Check point at most 60 seconds
Use at most 100% swap space

Memory usage
Use at most 100% when computer is in use.
Use at most 100% when computer is in idle.

[tick] Leave apps in mem while suspended.

As an update to this post I don't think anything has changed or been improved after the boinc update. CEP2 still resets, losing you days of work after it stops to finalise some work units. Even dropping down to running CEP2 on 5 threads on an i7 makes no difference.
----------------------------------------

----------------------------------------
[Edit 3 times, last edit by Loui_h20 at Dec 4, 2013 12:05:27 PM]
[Nov 25, 2013 1:54:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 13   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread