Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 102
|
![]() |
Author |
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7697 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Generally with WCG, if re-sends go out after the halfway point of a deadline, the time allowed is only half the standard time, so they only go to those machines which are considered 'reliable', which effectively is those which regularly return units within 3.5 days. Unless I am misunderstanding your statement, I do not believe that is correct. I believe that a resend would only be sent out if there is an error, an invalid, or there is no reply. The no reply would only come after the deadline of 7 days has passed. The resend would then have a deadline of 3.5 days (half of the original deadline) which should still be sufficient for most machines deemed reliable by WCG algorithm for re-sends. Edit: If you notice on mine, which ran for a little over 5 days, it completed within the 7 days and there was no resend sent out. The quorum is 2 and the other one only ran 22.39 hours. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Aug 17, 2020 4:46:45 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Can't remember if resends issued before the original deadline halftime (error, invalid) get full original deadline or something that equates to original deadline minus time already passed. The resends I see seem to always have half the original deadline whether issued on the first day or the last day before no reply. Checking what it was requires querying the database, which affects performance.
----------------------------------------BTW, most of the No Reply resents I get are server aborted before my 1 day deep buffer gets to them, meaning on the 8th day, the original is still good, almost biblical. Great, no double redundancy due slow returners. and 24 hours crunching going to waste at that. [Edit 1 times, last edit by Former Member at Aug 17, 2020 4:59:27 PM] |
||
|
blyons123
Cruncher Joined: Jan 2, 2007 Post Count: 9 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That same comment came from someone else a week or so ago, but no 18 hours crunching on 7 day deadline is still only 2.5 hours a day, were it not that ARP1 does checkpoint only once per 12.5% progress so you have to run BOINC at least 3 hours uninterrupted to reach the next save point. Why 7 days? The next step depends on the previous result i.e. unitl you finish your result and report there wont be a next task to send out. There's 180 48 hour simulations in a sequence. If everyone would return the result by the maximum allowed time it would take 1260 days (7x180) to do a full 1 year simulation. Too long. Can't do that, than don't opt in. You're wrongly assuming a lot! I'm running many projects at the same time at only 25% cpu. Also I didn't know how long a task would take. BOINC doesn't show an Actual Estimated based on actual time per computation. 1 hr remaining could be 4 actual hours based on settings! And like microbiome, it could run for hours showing no progress. 3-4 weeks would work. 1 year is ridiculous. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If I remember back many, many moons. All resends were sent out with half the original time due. Then folks started using AWS instances that allowed the instance to be taken if needed for higher priority work. This caused a major problem due to the number of resends and not enough reliable machines available to execute those aforementioned resends. The feeder ground to a halt. About that time only certain types of resends were designated as true errors and got the expedited return time. WUs that were designated as "Detached" did not get the expedited return and were resent with the standard 7 day turnaround. I think this happened around the time HCC was running on the grid. Uggh!!! I think I've been here too long.
----------------------------------------[Edit 1 times, last edit by Former Member at Aug 17, 2020 7:27:16 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That same comment came from someone else a week or so ago, but no 18 hours crunching on 7 day deadline is still only 2.5 hours a day, were it not that ARP1 does checkpoint only once per 12.5% progress so you have to run BOINC at least 3 hours uninterrupted to reach the next save point. Why 7 days? The next step depends on the previous result i.e. unitl you finish your result and report there wont be a next task to send out. There's 180 48 hour simulations in a sequence. If everyone would return the result by the maximum allowed time it would take 1260 days (7x180) to do a full 1 year simulation. Too long. Can't do that, than don't opt in. You're wrongly assuming a lot! I'm running many projects at the same time at only 25% cpu. Also I didn't know how long a task would take. BOINC doesn't show an Actual Estimated based on actual time per computation. 1 hr remaining could be 4 actual hours based on settings! And like microbiome, it could run for hours showing no progress. 3-4 weeks would work. 1 year is ridiculous. Your OP gave such an abundance of information, and in the absence of further replying on your part... 1) BOINC does give an initial estimated runtime based on the compute capability it tells the project server it has and then will give remaining estimated runtime. 2) MIP not showing progress for hours is about the time they finish on my machine. It shows regular progress in highly granulated percent fraction. 3)At 25% of CPU time or CPU threads? Assuming time, the crunch would take around 4 calendar days, if on 24/7. Running many projects, I'd say if an ARP hits your machine and BOINC learns of the sciences specifics, it's likely to run in high priority. More assuming on my part: I think your BOINC configuration is broken. If not already done so before, at the very least switch on Leave Application In Memory, when suspended, as else an interrupted task unloads and regresses to the previous checkpoint, loosing the progress a task has since made. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If I remember back many, many moons. All resends were sent out with half the original time due. Then folks started using AWS instances that allowed the instance to be taken if needed for higher priority work. This caused a major problem due to the number of resends and not enough reliable machines available to execute those aforementioned resends. The feeder ground to a halt. About that time only certain types of resends were designated as true errors and got the expedited return time. WUs that were designated as "Detached" did not get the expedited return and were resent with the standard 7 day turnaround. I think this happened around the time HCC was running on the grid. Uggh!!! I think I've been here too long. Detached at times get recovered by the same machine 'lost task' or something and then indeed get original deadline. Between detached and 'lost task' recovery there can't be much time or else the task gets assigned to a different machine. Not in the know what deadline these get. Does not really bother me whatever it is since my buffer is only 1 day deep... whatever arrives is crunched within 24 hours, except ARP. Always have one ready to start to replace the one finishing, so they get back no later than 48 hours, soon enough to still get the occasional repair, which then gets often cancelled because the an original no reply still came back, the copy sitting in wait for 1 day. |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12439 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
lavaflow
My post of 22 hours ago referring to re-sends after the half-way point referred to errors, invalids and aborts after the half way point and also to no reply which would be at the end point of the 7 day deadline. All of those circumstances would generate a re-send with half the deadline. If before the half way point then they would be sent with the full deadline. Mike |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
lavaflow My post of 22 hours ago referring to re-sends after the half-way point referred to errors, invalids and aborts after the half way point and also to no reply which would be at the end point of the 7 day deadline. All of those circumstances would generate a re-send with half the deadline. If before the half way point then they would be sent with the full deadline. Mike That logic doesn't make a lot of sense to me. What does the execution length have to do with the return deadline? an error is an error and the WU would need to be re-executed in it's entirety. I'm not saying it doesn't happen, it just doesn't make a lot of sense. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
"If before the half way point then they would be sent with the full deadline."
----------------------------------------Novel to me. Until just now always thought that the base deadline of a resent was half the original deadline (7 in case of OPN1), which was until not too long ago 30-35% but caused the feeder clogging in circumstances, so uplinger upped it to half, 50%. The doubting Thomas I am, I went to look for OPN1 log copies of _2 that got recently issued to my machine, more than I ever thought, so guess that answers the reliable state as well: Result Name Receive Date Deadline The timespan allowed is all over the place, just not the full 7 the original had. It seems to suggest what I speculated on "Can't remember if resends issued before the original deadline halftime (error, invalid) get full original deadline or something that equates to original deadline minus time already passed." i.e. the deadline same as the original. Oddly, there are those that get less than 3.5 days, sometimes close or less than 2. Maybe it's half time of half time in instances. Clear as mud. [Edit 1 times, last edit by Former Member at Aug 18, 2020 3:14:53 AM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12439 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I was referring to the deadline and not to execution time.
|
||
|
|
![]() |