Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 107
|
![]() |
Author |
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2990 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Firstly, please don't misunderstand this post - it's not meant to undermine the project in any way - just an attempt to gain an understanding of why a 12 hour limit was set (as opposed to, say, a 15, 18, or a 24 hr limit).
----------------------------------------I do understand why there's a cut-off, and I'm also happy if the scientists are gaining useful information - even if some WU's aren't processing all 16 steps, although I've been wondering what the percentages of WU's which complete fully are against those that don't... After all, if there's 16 steps in a WU, wouldn't it be even more useful to the scientists if all 16 were processed, rather than 15, 14 or even less... Now that the project has been running (successfully, I believe) for a reasonable time, are there some stats (even, just internal), for the following; * percentage of WU's that complete fully (i.e. both quorum pairs) process under 12 hrs, * percentage of WU's where only 1 quorum process within the time limit, * percentage of WU's where neither pair process under the 12 hr cut-off point, * for those WU's which didn't complete fully, how far off were they - 1 step, 2 steps, 3 steps etc. After all, if I've interpreted things correctly (and please, feel free to correct me if my understanding is incorrect), if a WU gets killed on it's 16th step (even if it's 1-2 hrs into it), then only the 15 preceding steps are valid, and thus, useful to the scientists. From some very basic stats I've pulled together (i.e., my last 29 results), I've had the following; Both pairs completed in the time limit = 4 Only 1 of the pairs completed in the time limit = 20 Neither pair completed in the time limit = 5 From those 30 WU's which reached the time limit, 23 were on their last step - and thus, say, another 3 hrs would have increased the percentage of fully completed WU's. Basically, what I'm asking, is, has 'upping the time limit' been considered? Here's my base information; Both completed okay E200708_931_A.27.C22H13NO2S2.175.3.set1d06_0-- me = 11.87 (completed okay) wingman = 6.64 (completed okay) E200709_280_A.25.C21H15NSSeSi.31.3.set1d06_0-- me = 11.15 (completed okay) wingman = 6.78 (completed okay) E200709_506_A.26.C21H13N3SSe.38.1.set1d06_1-- me = 7.38 (aborted in step #13) wingman = 3.94 (aborted in step #13) E200765_673_A.28.C20H11N5OS2.381.3.set1d06_1-- me = 7.34 (cut short in step #13) wingman = 5.69 (cut short in step #13) Only 1 completed okay E200708_682_A.25.C21H15NSSeSi.152.0.set1d06_0-- me = 12.00 (killed in step #16) wingman = 6.28 (completed okay) E200709_050_A.26.C21H15NOS2Si.29.0.set1d06_1-- me = 12.00 (killed in step #16) wingman = 9.05 (completed okay) E200709_185_A.25.C21H15NSSeSi.102.1.set1d06_1-- me = 11.04 (completed okay) wingman = 12.00 (killed in step #15) E200709_436_A.26.C22H15NS3.86.1.set1d06_1-- me = 12.00 (killed in step #16) wingman = 6.20 (completed okay) E200712_299_A.27.C22H13NO2S2.182.3.set1d06_1-- me = 12.00 (killed in step #16) wingman = 7.77 (completed okay) E200713_137_A.27.C22H13N3SSi.33.set1d06_1-- me = 12.00 (killed in step #16) wingman = 11.05 (completed okay) E200762_606_A.27.C21H13N3S3.54.0.set1d06_0-- me = 12.00 (killed in step #16) wingman = 8.62 (completed okay) E200763_132_A.28.C20H11N5OS2.173.3.set1d06_1-- me = 12.00 (killed in step #16) wingman = 9.51 (completed okay) E200763_338_A.27.C21H13N3S3.61.4.set1d06_1-- me = 12.00 (killed in step #16) wingman = 8.19 (completed okay) E200763_620_A.27.C20H13N3OS2Si.108.0.set1d06_1- me = 12.00 (killed in step #16) wingman = 9.47 (completed okay) E200764_082_A.28.C19H11N7S2.62.4.set1d06_1-- me = 12.00 (killed in step #16) wingman = 8.95 (completed okay) E200764_838_A.28.C20H11N5OS2.345.1.set1d06_1-- me = 12.00 (killed in step #16) wingman = 11.83 (completed okay) E200766_272_A.27.C21H13N3S3.599.1.set1d06_0-- me = 12.00 (killed in step #16) wingman = 6.90 (completed okay) E200767_495_A.28.C21H11N3O2S2.159.0.set1d06_1-- me = 12.00 (killed in step #16) wingman = 10.35 (completed okay) E200767_603_A.28.C20H11N5OS2.331.1.set1d06_0-- me = 12.00 (killed in step #16) wingman = 11.44 (completed okay) E200767_882_A.26.C20H13N3S2Se.13.4.set1d06_0-- me = 12.00 (killed in step #16) wingman = 11.29 (completed okay) E200768_238_A.26.C21H15NS3Si.300.1.set1d06_0-- me = 12.00 (killed in step #16) wingman = 7.18 (completed okay) E200768_393_A.26.C21H13NOS2Se.150.0.set1d06_1-- me = 12.00 (killed in step #16) wingman = 6.35 (completed okay) E200768_584_A.27.C22H13NOS3.539.0.set1d06_1-- me = 11.41 (completed okay) wingman = 12.00 (killed in step #14) E200771_910_A.26.C22H14N2SSe.23.0.set1d06_1-- me = 11.91 (completed okay) wingman = 12.00 (killed in step #16) Neither completed okay E200708_749_A.27.C21H13N3OS2.130.2.set1d06_0-- me = 12.00 (killed in step #16) wingman = 12.00 (killed in step #09) E200708_938_A.27.C22H13NO2S2.112.3.set1d06_0-- me = 12.00 (killed in step #16) wingman = 12.00 (killed in step #16) E200709_181_A.27.C21H13N3OS2.268.3.set1d06_1-- me = 12.00 (killed in step #16) wingman = 12.00 (killed in step #15) E200766_411_A.26.C21H13NOS2Se.186.0.set1d06_0-- me = 12.00 (killed in step #16) wingman = 12.00 (killed in step #15) E200770_179_A.27.C23H14N2SSi.35.4.set1d06_0-- me = 12.00 (killed in step #15) wingman = 12.00 (killed in step #04) ![]() |
||
|
toss
Senior Cruncher New Zealand Joined: Jan 3, 2007 Post Count: 220 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
gb I have made similar observations.
This has made me wonder if it could be that at 12hrs the instruction is to finish current job then exit. I have assumed that chopping off a job after 1-2 hrs done is a bit of a waste. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I also have seen the 3 hrs since last check point, kill for time limit.
maybe (if even can) kill at next check point after 11 hrs, or something similar |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
You might want to revisit an exchange between a member and cleanenergy, on upload bandwidth limits (problem 1) and the continued growth of the result files beyond 12 hours, without much added value when one gets millions upon millions of results.
----------------------------------------You may have read that for storage the group is looking for help of the DoE... hundreds of TB (problem 2). The data will give pointers where to look closer, for a next iterative run of 750,000 results or whatever the number was/is needed to get a new, maybe 2D/3D histogram type of chart (seen in CEP2 graphics) with directions to do the next run and so on... this is "live" science. 0.2 Eurocents The right one is a good example I think to identify areas with optimal signal, if one knows what that signal is being looked for. ![]()
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
gb009761,
I agree that the 12 hour maximum seems a bit too short to me as well. After doing almost a year's work on CEP2, I went to the Result Status to get a clearer picture of how much had been processed for the WUs which reached 12 hours (there were only 10 WUs which had reached 12 hours). Of the 10 exceeding 12 hours, eight were terminated in JOB 15 and one was terminated in JOB 14. The other WU returned * The application is unavailable at this time, please try again later. This is for a total of 96 WUs (Valid or PV). It would seem that the number of WUs reaching 12 hours is reasonably small for this not to be a problem. I don't recall there being this many WUs hitting the in the first part of CEP2 for which I no longer have any stats, but seeing as how most of these reached the sixteenth job it would seem that one hour per job for 16 hours would have been a better choice for the maximum time to allow all JOBs in a WU to complete. I have gone over the information in Sekerob's post, but I will need to spend more time with it to see if I can understand better what he is saying. Off topic, my family and friends took a vacation trip to Scotland to see what life is like over there and they were really impressed. My wife liked the highland cows in particular. They took hundreds of pictures to communicate all of this to me. |
||
|
ng.louismarvin
Cruncher Joined: Sep 15, 2010 Post Count: 4 Status: Offline |
Hi guys,
Is the 12 hour limit automatic? i have several jobs already exceeding that limit, some even as high as 22 hours. Right now I have 2 running already 16 hours in, with just ~60% completion. Don't want to waste too many cycles. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The tasks view on the BOINC manager show over 15 hours projected time for the laptop, but the amount of time shown in the Results Status is normally less than 12 hours for Valid or PV completions. At least this is what I am seeing in Results Status,
|
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
ng.louismarvin,
----------------------------------------What you see in the BOINC Manager tasks window is quasi wall-clock time that the task is "allowed" to run. If you select such a task and hit the properties button on left it will open an window that also shows the CPU time, the real spare time used by the process. That is what limited to 12 (CPU) hours. If the client is set to a default of 60% CPU time, the difference can indeed go as high as 100/60 * 12 hours or 20 hours, but that is at 100% efficiency. Given that we use the system for other things too, 22 hours "Elapsed" is very well possible before the 12 hours CPU time is reached. --//-- edit: added quotes to highlight
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Dec 16, 2010 12:57:13 PM] |
||
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If the Harvard group is able to get funding for much larger storage, I too hope that the 12-hr maximum can be extended.
----------------------------------------My 2.53 Ghz Core 2 Duo always finishes CEP2 units in under 12 hrs. At the opposite extreme, I don't run CEP2 on my Atom 330 because it gets stopped during job #2 -- almost certainly not a useful contribution. In between is my 1.8 GHz machine, which usually gets to around job 9 in 12 hrs. It is a reliable machine -- crunches 24/7 without errors and can handle the bandwidth and disk access challenges of CEP2. It would be great if machines like this could get closer to the end of CEP2 units. ![]() |
||
|
ng.louismarvin
Cruncher Joined: Sep 15, 2010 Post Count: 4 Status: Offline |
Sekerob,
Thanks, i got it now, those 16 hour times were just 7 hours in cpu, i guess the machine got too many tasks at the moment. |
||
|
|
![]() |