Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 171
Posts: 171   Pages: 18   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 664737 times and has 170 replies Next Thread
zdnko
Senior Cruncher
Joined: Dec 1, 2005
Post Count: 225
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Someone can explain me how works the Too Late status?

[...]

After a few seconds the wu was sent 2 more times.

What Causes Too Late?
If my Return Time is Too Late why was the wu sent again?

That usually happens when the server has sent the "server abort" command to a task that has already started. The command from the server was "Too Late" to abort the task.

Ok, but why the wu was resent 2 more times after a server abort?
[Apr 16, 2021 6:06:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 769
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Too late is set when a unit is returned but max Invalids already returned.
Also when others validated OK then unit returned so too late to validate.

Paul.
----------------------------------------
Paul.
[Apr 16, 2021 6:39:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 769
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Valid is per project so need to return valids before send rrate increases.

Paul.
----------------------------------------
Paul.
[Apr 16, 2021 6:42:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
zdnko
Senior Cruncher
Joined: Dec 1, 2005
Post Count: 225
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Too late is set when a unit is returned but max Invalids already returned.
Also when others validated OK then unit returned so too late to validate.

That doesn't answer my question.

Before my Too Late:
_0 User Aborted
_1 Invalid
No one validated OK

If "max Invalids already returned" why the wu was send again?
After my Too Late why the wu was send again 2 more times?
[Apr 16, 2021 6:57:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

If "max Invalids already returned" why the wu was send again?

A Question I don't know. Other Projects like ARP1 resend only 1 more time when invalid happens. Unsure why OPNG resend more then 1 extra tasks on invalid, then server abort an extra from "too late" or too many invalids.

In the end, like to see a fix to very frequent invalid, server abort, and "too late" from too many other invalids.
[Apr 16, 2021 7:51:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2155
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Someone can explain me how works the Too Late status?

https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=620639614
Result Name             AVN Status          Sent Time         Due / Return Time CPUh Claimed/Granted
OPNG_0002331_00101_3-- 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:15 0.01 0.3/0.0
OPNG_0002331_00101_4-- 728 Invalid 4/16/21 15:48:11 4/16/21 15:50:14 0.02 1.3/0.0
OPNG_0002331_00101_2-- 728 Too Late 4/16/21 15:41:12 4/16/21 15:48:03 0.06 0.1/0.0
OPNG_0002331_00101_1-- 728 Invalid 4/16/21 15:41:06 4/16/21 15:43:12 0.01 0.2/0.0
OPNG_0002331_00101_0-- 728 User Aborted 4/15/21 06:40:03 4/16/21 15:41:04 0.00 0.0/0.0


0002331_ 00101_ 2
Sent Time: 15:41:12
Return Time: 15:48:03
Less of 7 minutes!

After a few seconds the wu was sent 2 more times.

What Causes Too Late?
If my Return Time is Too Late why was the wu sent again?
(reformatted for easier reading)

WU _0 was user aborted and was immediately sent out in two copies as _1 and _2 (15:41:06 and 15:41:12).
WU _1 ended up as Invalid when returned at 15:43:12.
When WU _2 was returned, it didn't turn out as Valid yet, so two more copies were sent out as _3 and _4 (at 15:48:11).
Both _3 and _4 ended up Invalid.
That's where the generation of new WU copies ends, because it stops at _4.
So there was no possibility to check the validity of _2 against another WU, it was Too Late to validate.
[Apr 16, 2021 9:46:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
zdnko
Senior Cruncher
Joined: Dec 1, 2005
Post Count: 225
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

That's where the generation of new WU copies ends, because it stops at _4.
So there was no possibility to check the validity of _2 against another WU, it was Too Late to validate.

So it was Too Late to validate, not returned Too Late.

Ok, thanks for explanation
----------------------------------------
[Edit 1 times, last edit by zdnko at Apr 16, 2021 10:13:10 PM]
[Apr 16, 2021 10:08:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
goben_2003
Advanced Cruncher
Joined: Jun 16, 2006
Post Count: 145
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

But it prompted a couple of thoughts.
(1) Every single one of these tasks that I spot-checked was paired with other Linux machines. I spot-checked a Windows machine, and every single wingmate there was - another Windows machine. We know (Uplinger has told us) that a task requiring confirmation is flagged to need the same class of GPU: it appears that applies to OS, as well. Coming from the SETI stable, where we struggled mightily to ensure that every version - CPU or GP; NV, AMD or intel; Windows, Linux or Mac; stock, optimised, or third-party - produced compatible and validateable results, these isolated 'bubbles' of validation feel very strange. I do hope that cross-bubble verification is being performed elsewhere in the system.

It seems strange to me that 2 results for the same work unit can be different but both be valid. That makes it seem like results are not being verified even within the same bubble. Perhaps I am missing something with how validation works though. I know Uplinger said that credit is granted based upon % of expected calculations that were actually completed. So I would expect wingmates to get the same amount of credit since I would expect they should be running the same calculations. However, this is not the case, they (almost?) always do a different amount of calculations and thus get a different amount of credit.

Note: I am not complaining about credit(points), I do not care about credit(points). I do care about science being valid and verifiable.
----------------------------------------

[Apr 17, 2021 8:55:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

My understanding is that docking programs are trying to filter 30 million possible chemicals down to a few hundred or thousand which can interact with the target ligand. WCG is rare in validating work units against a wingman. Processing using different CPU chips, operating systems, compilers (and now GPU chips) can produce a different numerical result when compared on a bit to bit basis, however providing the results are within a certain tolerance (say less than a fraction of a percent) the filtering can be achieved.
The next stage of more precise and time consuming docking programs can then proceed.
[Apr 17, 2021 10:02:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
goben_2003
Advanced Cruncher
Joined: Jun 16, 2006
Post Count: 145
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

I might understand that more if we were talking about differing precision causing slight differences with rounding on different GPUs.

However, I had one that was the same OS and GPU, but did a significantly different % of calculations. I should have saved the details because it has fallen off the work unit history, so I get an error when I try the viewWorkunitStatus for it.
----------------------------------------

[Apr 17, 2021 6:10:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 171   Pages: 18   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread