Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 95
Posts: 95   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 208060 times and has 94 replies Next Thread
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed

Not a huge number of invalids, but some are still worrying. Sample:

https://www.worldcommunitygrid.org/contribution/workunit/820394403

I'm the linuxmint, replication _0, with a GTX 1660 Ti, driver 470. That's a purpose-built cruncher, normally rock solid stable. So why do two others outvote me? And why do I show with 0.5 granted credit, despite being invalid?
[Sep 23, 2021 7:37:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed

Edit: Not good so far though. I still get invalids.

Quote from knreed:
Note that due to the results being returned by different hosts for the same workunit have slight variances in their energy computations, this check cannot be an exact check but it is instead checking if values are within a narrow range. As a result there are some[sic]times where you will see one of your results get marked invalid even though the device has a history of running well and unfortunately this is the nature of these type of "fuzzy" validators.

[Sep 23, 2021 7:40:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed

Quote from knreed:
Note that due to the results being returned by different hosts for the same workunit have slight variances in their energy computations, this check cannot be an exact check but it is instead checking if values are within a narrow range. As a result there are some[sic] times where you will see one of your results get marked invalid even though the device has a history of running well and unfortunately this is the nature of these type of "fuzzy" validators.

Yes, I saw that. And I'm worried by it.

My entry to distributed computing was via the SETI@Home project - initially as a stand-alone application, later as a participant in BOINC. The following comments relate to the BOINC version only.

From the beginning, SETI embraced the concepts of open source programming and 'anonymous platform' working under BOINC. Anyone was welcome to compile their own science application, their own BOINC client, and to run them on whatever hardware and operating system they had to hand. With that freedom, the only constraint on scientific credibility of the results was the validator. They cared about the validator.

SETI was one of the very first projects to embrace GPU processing, from late 2008/early 2009. The NVidia corporation helped them out by doing the heavy lifting of the initial code transfer to the new platform, but after the first three iterations (up to and including the 'Fermi' - 4xx - series), NVidia left the project to its own devices.

I don't think that SETI ever developed an in-house capability for coding GPU applications. All subsequent applications were developed by volunteers, and I operated on the fringes of that group. We cared about the validator, too.

At SETI, all tasks were treated the same: and any task could be sent to any hardware, any software. They all had to validate against each other, to be accepted as valid science. That's a far higher bar than is set here, where separate tribes of hardware and software are kept separate, and tasks are only expected to validate against other members of the same tribe.

So, all our volunteer-generated applications were tested offline, using an offline comparator which simulated the complex floating-point arithmetic used in the SETI validator. Only once an acceptable tolerance level had been demonstrated was that application accepted by the project for general, public, use.

So the problems of "fuzzy validators" that knreed describes are well known within the BOINC community, and can - with care and attention to detail - be overcome.

I hope that this is something that Krembil can address as they gain experience in running such an important BOINC project.
[Sep 23, 2021 10:22:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed

At SETI, all tasks were treated the same: and any task could be sent to any hardware, any software. They all had to validate against each other, to be accepted as valid science. That's a far higher bar than is set here, where separate tribes of hardware and software are kept separate, and tasks are only expected to validate against other members of the same tribe.

That is certainly a far higher bar, but is it necessary? Doesn't that depend on the science? If you validate against another machine of the same type, that would preclude obvious errors due to overclocking, wrong libraries, etc. It would seem that would be enough for some purposes. Having all results agree out to the nth decimal place may be necessary for SETI, but is that generally true?
[Sep 23, 2021 11:25:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed

Still way too many invalids IMHO.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


[Sep 23, 2021 11:37:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed

At SETI, all tasks were treated the same: and any task could be sent to any hardware, any software. They all had to validate against each other, to be accepted as valid science. That's a far higher bar than is set here, where separate tribes of hardware and software are kept separate, and tasks are only expected to validate against other members of the same tribe.

That is certainly a far higher bar, but is it necessary? Doesn't that depend on the science? If you validate against another machine of the same type, that would preclude obvious errors due to overclocking, wrong libraries, etc. It would seem that would be enough for some purposes. Having all results agree out to the nth decimal place may be necessary for SETI, but is that generally true?

It was the personal, professional, choice of the administrators who set up the project - who were astronomers by trade. In their eyes, it was necessary, and they made it work.

Other projects which use comparison validation will have to make their own choices, in the context of their own branch of science. My point is that if the variation between different apps supplied by the project is greater than the variation acceptable to the validator, then something is out of balance - either the tolerance of the validator could be relaxed, or (preferably, in my view) the accuracy of the applications could be tightened.

To do neither would be wasteful of the volunteers' donated resources.
[Sep 23, 2021 12:32:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 300
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed


To do neither would be wasteful of the volunteers' donated resources.

I fully agree.

By some time, it can be frustrating to first wait long time until WUs finally are being downloaded, and then some (or many) of them end up invalid.
[Sep 23, 2021 12:40:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sphynxx
Cruncher
Joined: Nov 24, 2010
Post Count: 47
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed

I started running the gpu app again last night. Since then I'm still seeing a 7.5% invalid rate on machines that have produced very few invalids over the last 2 months. It's better, but still needs some work.
----------------------------------------

[Sep 23, 2021 12:44:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed

I'm going to let it run overnight at this speed and confirm it is working as intended first. If it is, then I'll increase the speed in the morning.

Pump up the volume! smile
----------------------------------------
[Sep 23, 2021 3:28:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: GPU Work has resumed

Is this done by design...waiting until I'm processing the last WU to send me more? confused
----------------------------------------
----------------------------------------
[Edit 1 times, last edit by BladeD at Sep 23, 2021 5:11:13 PM]
[Sep 23, 2021 5:10:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 95   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread