Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 171
Posts: 171   Pages: 18   [ Previous Page | 9 10 11 12 13 14 15 16 17 18 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 664970 times and has 170 replies Next Thread
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2165
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Well, they clearly dropped the ball on this issue. I have tons and tons of invalids.
Does not bode well, for the upcoming move to Krembil.
Edit: Where is Uplinger in all this? Haven't heard a word from him, since the big OPNG stress test ended, back in April/May.

Edit2: OK someone woke up, and shut down GPU crunching it seems. Now, my computer only asks for CPU work.
----------------------------------------
[Edit 6 times, last edit by Grumpy Swede at Sep 19, 2021 8:12:47 PM]
[Sep 19, 2021 6:40:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

I apologize for the spike in invalids over the past two days. The GPU project is currently working through a set of jobs that have a "tight" fit that and the way validation is currently implemented these are being marked as invalid. We will be discussing and implementing a different approach to validation that will address this case and prevent these jobs from marking so many results as invalid.

Until we can get this implemented, I've suspended distribution of GPU work. Once we get the change in place we will run the GPU work at a faster pace for a period of time to make up for the time that it was stopped.

Note: The Krembil staff are not yet working on this aspect of the system (that will occur in the coming weeks), so please direct your frustration at us (the IBM staff) not them.
----------------------------------------
[Edit 4 times, last edit by knreed at Sep 20, 2021 1:23:25 AM]
[Sep 19, 2021 9:30:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2165
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Thanks for the explanation knreed. Looking forward to GPU work coming back, at a faster pace. Even if it's just for a short time. smile
[Sep 20, 2021 3:28:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Maxxina
Advanced Cruncher
Joined: Jan 5, 2008
Post Count: 124
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

I apologize for the spike in invalids over the past two days. The GPU project is currently working through a set of jobs that have a "tight" fit that and the way validation is currently implemented these are being marked as invalid. We will be discussing and implementing a different approach to validation that will address this case and prevent these jobs from marking so many results as invalid.

Until we can get this implemented, I've suspended distribution of GPU work. Once we get the change in place we will run the GPU work at a faster pace for a period of time to make up for the time that it was stopped.

Note: The Krembil staff are not yet working on this aspect of the system (that will occur in the coming weeks), so please direct your frustration at us (the IBM staff) not them.



So GPU work for covid will be down for couple of weeks ?
[Sep 20, 2021 5:52:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Is there any update on changing the GPU client to write less frequently to solid state drives?
[Sep 21, 2021 5:32:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Is there any update on changing the GPU client to write less frequently to solid state drives?
+ 1
[Sep 21, 2021 11:05:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 953
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Is there any update on changing the GPU client to write less frequently to solid state drives?

As I understand it, the [full] output produced is necessary both for validation and credit calculation, so it isn't going away. The trouble is that it appears to be extremely verbose (or, at least, it was when I last looked!)

When this topic came up in the past, I had a look at a running OPNG task to see what all the I/O was about, and the file(s) seem to be designed to be human-readable without any need for post-processing; lots and lots of text and white space as well as the actual data...

One possible mitigation might be to borrow a technique I first came across in the days when mainframe computers wrote error logs to magnetic tape :-) In order to condense them as much as possible, the output consisted of the error number and a list of parameters, and the programs to display or print the errors would look up the error number to find a format string which would then be used to output the full text. (It might, however, make more sense here to use code words (acronyms? abbreviations?) instead of numbers in this case.)

Unfortunately, at the moment there's no data to look at, so I can't tell how much of a reduction in size might be achieved against the current file format by doing something like that to the OPNG output. In an ideal world, the files would not be expanded out unless/until the human-readable output was required, thus reducing file sizes throughout the process!

And, presumably, the programs that currently analyse the dialogue file (and others?) at the WCG end might not need major overhaul to process the condensed format!

@knreed - any chance something like this could be done as a side-effect of the OPNG down-time? I appreciate that it would require an application version change (so the validator [et cetera] would know what format the file was in!) and a beta test, so perhaps not...

Just a thought (and probably offered in completely the wrong place...)

Cheers - Al.

[Edit: added question to knreed]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Sep 21, 2021 4:06:27 PM]
[Sep 21, 2021 3:59:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ttt67
Cruncher
Joined: Nov 6, 2010
Post Count: 7
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

Is there any update on changing the GPU client to write less frequently to solid state drives?

I can confirm this workaround for Ubuntu (20.04):
https://www.worldcommunitygrid.org/forums/wcg...ad,43399_offset,60#657626
[Sep 22, 2021 6:45:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

The output file doesn't have to be any smaller, it just shouldn't write to the disk faster than the user allows. It can be stored in RAM.
[Sep 23, 2021 6:28:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Invalid GPU work units

@knreed - any chance something like this could be done as a side-effect of the OPNG down-time?


This is something that we will pass along to the Krembil staff as we transition everything over to them. The next few months will be an intensive focus on migrating the system over to Krembil's infrastructure and cross-training their staff to become familiar with the different pieces system (some of this has been going for awhile now, but some couldn't start until the change became official).

As a result, I don't think there will be much capacity to take this on in the short term, but I think that is something that can be revisited once the migration is complete.
[Sep 23, 2021 8:57:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 171   Pages: 18   [ Previous Page | 9 10 11 12 13 14 15 16 17 18 | Next Page ]
[ Jump to Last Post ]
Post new Thread