World Community Grid - View Thread - Task exited with zero status but no 'finished' file

World Community Grid Forums

Category: Completed Research

Forum: Influenza Antiviral Drug Search

Thread: Task exited with zero status but no 'finished' file

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 12

[ ]

Author

This topic has been viewed 3699 times and has 11 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Task exited with zero status but no 'finished' file

Restarting task flu10101b0090_100121_0 using flu1 version 604
Task flu10101b0090_100121_0 exited with zero status but no 'finished' file
If this happens repeatedly you may need to reset the project.

flu10101b0090_100121_0 status running: but zero CPU time, and above mssg every 2 minutes.

Eventually i had to abort it manually after giving it 2 hours to proper start or exit with a error. all the time my second core was at 0% cpu.

[May 8, 2009 5:34:51 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Task exited with zero status but no 'finished' file

And now another one:

11/05/2009 21:14:04|World Community Grid|Restarting task flu10101h0423_100152_0 using flu1 version 604
11/05/2009 21:14:45|World Community Grid|Task flu10101h0423_100152_0 exited with zero status but no 'finished' file
11/05/2009 21:14:45|World Community Grid|If this happens repeatedly you may need to reset the project.

Same machine:

03/05/2009 11:25:45||Starting BOINC client version 6.2.28 for windows_intelx86
03/05/2009 11:25:45||log flags: task, file_xfer, sched_ops
03/05/2009 11:25:45||Libraries: libcurl/7.19.0 OpenSSL/0.9.8i zlib/1.2.3
03/05/2009 11:25:45||Running as a daemon
03/05/2009 11:25:45||Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
03/05/2009 11:25:45||Running under account boinc_master
03/05/2009 11:25:47||Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz [x86 Family 6 Model 15 Stepping 10]
03/05/2009 11:25:47||Processor features: fpu tsc sse sse2 mmx
03/05/2009 11:25:47||OS: Microsoft Windows XP: Professional x86 Editon, Service Pack 3, (05.01.2600.00)
03/05/2009 11:25:47||Memory: 1.96 GB physical, 3.81 GB virtual
03/05/2009 11:25:47||Disk: 111.78 GB total, 64.40 GB free
03/05/2009 11:25:47||Local time is UTC +1 hours
03/05/2009 11:25:47|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 900331; location: (none); project prefs: default
03/05/2009 11:25:47||General prefs: from World Community Grid (last modified 26-Apr-2009 03:13:29)
03/05/2009 11:25:47||Host location: none
03/05/2009 11:25:47||General prefs: using your defaults
03/05/2009 11:25:47||Reading preferences override file
03/05/2009 11:25:47||Preferences limit memory usage when active to 1003.11MB
03/05/2009 11:25:47||Preferences limit memory usage when idle to 1504.67MB
03/05/2009 11:25:47||Preferences limit disk usage to 9.31GB

Anyone else getting this ?
Any idea whats causing this ?

I'll continue to run FLU on this Thinkpad for testing purposes, but untill things are more stable or a cause is indentified ruling out reocurrence on certain computer models i'll be removing this project from my production machines, as i cannot affort ANY instability on those compagny machines.

KR,
Willem

[May 11, 2009 9:25:36 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Task exited with zero status but no 'finished' file

My lips are sealed, but if you schedule the BOINC networking with the local prefs to say, 15-30 minutes a day and set the cache/ additional buffer to about 1.5 days, your client will upload / download tasks & results most probably without concurrently finishing results and starting new jobs. My quad has issue to network and start jobs simultaneous. All jobs are affected.

You could of course start off by resetting the WCG project in the BOINC client as per the warning message and see if the warnings disappear. I'm curious to hear if that works. Also please check your Result Logs if there are heartbeat warning lines.

All that said, make sure in your AV software to exclude the BOINC data_dir from scanning. The path can be found in the BOINC start up message log.

Let us know.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 1 times, last edit by Sekerob at May 12, 2009 7:07:34 PM]

[May 12, 2009 6:59:59 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Task exited with zero status but no 'finished' file

Thanks for the reply Sekerob.

Plan of attack:

1 - I left my thinkpad as it is to see if it happens again. No changes made at all.

2 - I removed FLU from all 99 production machines.

3 - I use 1 production machine to crunsh FLU, and will closly monitor this IBM\Lenovo C2D 6300 to see if it occurs here too in the next 2 weeks.

If it happens again i'll start off with resetting the the project, and check the results log.

So far i'm the only one that posted this, so i hope it's just my setup causing this.

I'll post any updates here

[May 13, 2009 7:17:31 AM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Task exited with zero status but no 'finished' file

It started on my quad with Flu which includes a beginning and end benchmark routine (think all the AutoDock based science projects have). Seemingly BOINC transmission impacts this benchmarking detrimentally.

As said, with my mix WCG projects, it affects all projects, to include the new HCMD2, so this is why I set it up as described above. On larger farms it might be easier to manage too this way, as you know which time segment to monitor... but 99 devices all cramming their UL/DL into a small time segment 15-30 minutes... is bound to have bottlenecks unless you got oodles of bandwidth.

Anyway, I've notified the techs of the observations. Me alone might be something host specific. 99 devices affected is pointing at something in the software.

thanks for helping testing.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[May 13, 2009 7:38:40 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Task exited with zero status but no 'finished' file

With IBM paying the bandwith here i can assure you it's not that problem :) Even with 700+ computers connected and occupied by employees speed is still crazy :)

However i limited the network usage for all clients on the folowing:
- 50kb\s down
- 15 kb\s up

This to prevent any bottlenecks, and impact on performance. To keep natural randomisation i'm not using any scheduling, and connect every 0.1 day with a 1day buffer.

[May 14, 2009 7:06:48 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Task exited with zero status but no 'finished' file

15/05/2009 11:20:27|World Community Grid|Started download of flu10201a0014_100323_wcgrid.00033.gpf.gzb
15/05/2009 11:20:27|World Community Grid|Started download of flu10201a0014_100323_wcgrid.00033.dpf.gzb
15/05/2009 11:20:28|World Community Grid|Finished download of flu10201a0014_100323_wcgrid.00033.gpf.gzb
15/05/2009 11:20:28|World Community Grid|Finished download of flu10201a0014_100323_wcgrid.00033.dpf.gzb
15/05/2009 15:41:47|World Community Grid|Computation for task flu10101m0722_100205_0 finished
15/05/2009 15:41:47|World Community Grid|Starting flu10101o0793_100325_0
15/05/2009 15:41:47|World Community Grid|Starting task flu10101o0793_100325_0 using flu1 version 604
15/05/2009 15:41:49|World Community Grid|Started upload of flu10101m0722_100205_0_0
15/05/2009 15:41:49|World Community Grid|Started upload of flu10101m0722_100205_0_1
15/05/2009 15:41:53|World Community Grid|Finished upload of flu10101m0722_100205_0_0
15/05/2009 15:41:53|World Community Grid|Started upload of flu10101m0722_100205_0_2
15/05/2009 15:41:55|World Community Grid|Finished upload of flu10101m0722_100205_0_2
15/05/2009 15:41:55|World Community Grid|Started upload of flu10101m0722_100205_0_3
15/05/2009 15:41:57|World Community Grid|Finished upload of flu10101m0722_100205_0_3
15/05/2009 15:42:04|World Community Grid|Finished upload of flu10101m0722_100205_0_1
15/05/2009 15:45:38|World Community Grid|Task flu10101o0793_100325_0 exited with zero status but no 'finished' file
15/05/2009 15:45:38|World Community Grid|If this happens repeatedly you may need to reset the project.
15/05/2009 15:45:38|World Community Grid|Restarting task flu10101o0793_100325_0 using flu1 version 604
15/05/2009 15:46:19|World Community Grid|Task flu10101o0793_100325_0 exited with zero status but no 'finished' file
15/05/2009 15:46:19|World Community Grid|If this happens repeatedly you may need to reset the project.

And another one. and again on the Thinkpad T61. So far the Lenovo desktop test machine runs fine. So i'll reset the project on the thinkpad and see if it solves the problem.
keep you updated.

[May 15, 2009 4:23:41 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Task exited with zero status but no 'finished' file

Willem, it's an identified issue occurring when a FLU task is starting, and seemingly only FLU combined with a network component in BOINC that is not robust, causing enough delay on the science progress for them to incur a > 30 second comms interruption, thus a heartbeat issue. I can replicate and prevent (take the client off-line and only do scheduled networking as what I described previously).

But, a thought has occurred, which I'll touch base on with the techs.

ttyl

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 1 times, last edit by Sekerob at May 15, 2009 4:52:22 PM]

[May 15, 2009 4:51:38 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Task exited with zero status but no 'finished' file

Willem,
Are these machines using WAN to access your local network?
I have already seen a (not very powerful) PC stalled when using an USB WiFi adapter to connect to an ADSL2 box. In this particular case we finally had to use a LAN connection to be able to use this PC.

Just wondering... Jean.

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[May 16, 2009 8:27:09 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Task exited with zero status but no 'finished' file

(take the client off-line and only do scheduled networking as what I described previously).

Yeah it is possible since the thinkpad's are laptops that move around a lot from network to network in WLAN.

So i'm currently testing different Network settings as you described. I saw that "connect every XX day" was set to 0, causing excatly this issue. Scheduling is now in place and lets see how that works out :)

THZ

[May 18, 2009 6:38:17 AM]

[ ]