| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 172
|
|
| Author |
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Sek,
I am thinking about your thought of having 1 result per core similar to beta, but at the moment we will probably let it play like a normal project. We will discuss as a group and decide. Kremmen, The concept of soft stop is something new we are trying. I believe there is an issue where someone who was at 80% done at 3 days should not have been given a soft stop. I will be reviewing the logs and correct. The idea of soft stops is to help speed up the overall completion of the steps from 0 to 3million or above. Unfortunately, we are gathering the data and as mentioned before, we will be tweaking the parameters to get the best throughput for the researchers. Thanks, -Uplinger |
||
|
|
Eric_Kaiser
Veteran Cruncher Germany (Hessen) Joined: May 7, 2013 Post Count: 1047 Status: Offline Project Badges:
|
My opinion in having only 1 wu per core might be to low. In case of a longer downtime of wcg the computers might run dry.
----------------------------------------If the runtimes of the wu remain the same as in this beta I recommend at least 2 wu per core to prohibit that computers run dry. ![]() |
||
|
|
OldChap
Veteran Cruncher UK Joined: Jun 5, 2009 Post Count: 978 Status: Offline Project Badges:
|
I kind of understand the premise of the soft stop, however passing the remainder of the work to another for faster completion surely needs that job to be prioritised to run at the soonest opportunity otherwise it could languish on the second rig for as many hours as it might have taken to complete anyway. On the subject of first in first out, I have never seen that any wu is so urgent as to need another already running wu to be paused. Surely it is enough just to go to the head of the queue of waiting to run wu's.
----------------------------------------![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I kind of understand the premise of the soft stop, however passing the remainder of the work to another for faster completion surely needs that job to be prioritised to run at the soonest opportunity otherwise it could languish on the second rig for as many hours as it might have taken to complete anyway. On the subject of first in first out, I have never seen that any wu is so urgent as to need another already running wu to be paused. Surely it is enough just to go to the head of the queue of waiting to run wu's. I see your point, but also understand the issue Uplinger is trying to resolve. Since every subsequent job depends on the previous completion point of the prior job, there is quite the task of getting the minimum of 30 jobs (to complete 3 million steps) or mor if the jobs are soft stopped before the 100,000 point. It seems there will be a balancing act probably depending on the level reached and the time taken to reach the cutoff level. We still don't know how many days they will set for the deadline for each job. Given the length of the jobs, it probably will not be 4 days like the betas, but be in the 7 to 10 range like other jobs. Maxing out the 7 day deadlines for 30 sequential jobs could give 210 days before the 3 million step sequence is completed. At this point I would say let the techs do their data analysis and specify the parameters they believe will work best. After all, they can always tweak the settings after the project is up and running. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
KLiK
Master Cruncher Croatia Joined: Nov 13, 2006 Post Count: 3108 Status: Offline Project Badges:
|
still think that 50.000 points would be a more than enough for a WU...do 60 WUs for a 3 million steps!
----------------------------------------![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The concept of soft stop is something new we are trying. I believe there is an issue where someone who was at 80% done at 3 days should not have been given a soft stop. I will be reviewing the logs and correct. The idea of soft stops is to help speed up the overall completion of the steps from 0 to 3million or above. Unfortunately, we are gathering the data and as mentioned before, we will be tweaking the parameters to get the best throughput for the researchers. Absolutely. For data gathering, it makes sense to try aggressive parameters now to test them out. In a live project, I would guess they might want to be even less tight. The betas on a shorter time-frame than other work are being prioritised above other work, so are starting almost immediately. Is the soft stop sent 3 days after the job was sent to the client or 3 days after the job started running? |
||
|
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges:
|
What does it mean when a task is returned and marked as valid but there is nothing written into the result log?
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Got an example of any result log that has a validation statement in the result log [it being the output from the client science app]?
----------------------------------------[Edit 1 times, last edit by SekeRob* at Sep 26, 2015 1:46:40 PM] |
||
|
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges:
|
Got an example of any result log that has a validation statement in the result log [it being the output from the client science app]? Don't know what you're looking for but here is what I'm looking at. BETA_ FAHB_ avx17556-ls_ 000086_ 0018_ 001_ 2-- Miner3 Valid 9/24/15 22:58:09 9/26/15 02:35:08 8.20 / 0.00 272.0 / 272.0 Result Log Result Name: BETA_ FAHB_ avx17556-ls_ 000086_ 0018_ 001_ 2-- Close Return to Top This particular task went out 5 times. 3 were returned as invalid. Mine was returned as valid and 1 more is listed as in progress. It seems like all 3 of the invalid tasks never got past step 10k out of 100k. They all had the INFO: received message from server to exit after next major checkpoint flag in their result log. That's why I asked why my result log had nada. Judging by my run time I didn't finish this task either. I'm guessing that it exited at around 60k or 70k of 100k completed.
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
----------------------------------------![]() ![]() [Edit 6 times, last edit by nanoprobe at Sep 26, 2015 2:37:44 PM] |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Now there's background and contextual meaning to your words. The number of 10K steps would normally be recorded in the trickle upload event log messages. If nothing is there either [the stdoutdea.txt file], then that would suggest a problem case, but maybe just the result log output itself and not the actual output files.
|
||
|
|
|