Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 171
|
![]() |
Author |
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2165 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, they clearly dropped the ball on this issue. I have tons and tons of invalids.
----------------------------------------Does not bode well, for the upcoming move to Krembil. Edit: Where is Uplinger in all this? Haven't heard a word from him, since the big OPNG stress test ended, back in April/May. Edit2: OK someone woke up, and shut down GPU crunching it seems. Now, my computer only asks for CPU work. [Edit 6 times, last edit by Grumpy Swede at Sep 19, 2021 8:12:47 PM] |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I apologize for the spike in invalids over the past two days. The GPU project is currently working through a set of jobs that have a "tight" fit that and the way validation is currently implemented these are being marked as invalid. We will be discussing and implementing a different approach to validation that will address this case and prevent these jobs from marking so many results as invalid.
----------------------------------------Until we can get this implemented, I've suspended distribution of GPU work. Once we get the change in place we will run the GPU work at a faster pace for a period of time to make up for the time that it was stopped. Note: The Krembil staff are not yet working on this aspect of the system (that will occur in the coming weeks), so please direct your frustration at us (the IBM staff) not them. [Edit 4 times, last edit by knreed at Sep 20, 2021 1:23:25 AM] |
||
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2165 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for the explanation knreed. Looking forward to GPU work coming back, at a faster pace. Even if it's just for a short time.
![]() |
||
|
Maxxina
Advanced Cruncher Joined: Jan 5, 2008 Post Count: 124 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I apologize for the spike in invalids over the past two days. The GPU project is currently working through a set of jobs that have a "tight" fit that and the way validation is currently implemented these are being marked as invalid. We will be discussing and implementing a different approach to validation that will address this case and prevent these jobs from marking so many results as invalid. Until we can get this implemented, I've suspended distribution of GPU work. Once we get the change in place we will run the GPU work at a faster pace for a period of time to make up for the time that it was stopped. Note: The Krembil staff are not yet working on this aspect of the system (that will occur in the coming weeks), so please direct your frustration at us (the IBM staff) not them. So GPU work for covid will be down for couple of weeks ? |
||
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 452 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is there any update on changing the GPU client to write less frequently to solid state drives?
|
||
|
erich56
Senior Cruncher Austria Joined: Feb 24, 2007 Post Count: 295 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is there any update on changing the GPU client to write less frequently to solid state drives? + 1 |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 953 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is there any update on changing the GPU client to write less frequently to solid state drives? As I understand it, the [full] output produced is necessary both for validation and credit calculation, so it isn't going away. The trouble is that it appears to be extremely verbose (or, at least, it was when I last looked!) When this topic came up in the past, I had a look at a running OPNG task to see what all the I/O was about, and the file(s) seem to be designed to be human-readable without any need for post-processing; lots and lots of text and white space as well as the actual data... One possible mitigation might be to borrow a technique I first came across in the days when mainframe computers wrote error logs to magnetic tape :-) In order to condense them as much as possible, the output consisted of the error number and a list of parameters, and the programs to display or print the errors would look up the error number to find a format string which would then be used to output the full text. (It might, however, make more sense here to use code words (acronyms? abbreviations?) instead of numbers in this case.) Unfortunately, at the moment there's no data to look at, so I can't tell how much of a reduction in size might be achieved against the current file format by doing something like that to the OPNG output. In an ideal world, the files would not be expanded out unless/until the human-readable output was required, thus reducing file sizes throughout the process! And, presumably, the programs that currently analyse the dialogue file (and others?) at the WCG end might not need major overhaul to process the condensed format! @knreed - any chance something like this could be done as a side-effect of the OPNG down-time? I appreciate that it would require an application version change (so the validator [et cetera] would know what format the file was in!) and a beta test, so perhaps not... Just a thought (and probably offered in completely the wrong place...) Cheers - Al. [Edit: added question to knreed] [Edit 1 times, last edit by alanb1951 at Sep 21, 2021 4:06:27 PM] |
||
|
ttt67
Cruncher Joined: Nov 6, 2010 Post Count: 7 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is there any update on changing the GPU client to write less frequently to solid state drives? I can confirm this workaround for Ubuntu (20.04): https://www.worldcommunitygrid.org/forums/wcg...ad,43399_offset,60#657626 |
||
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 452 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The output file doesn't have to be any smaller, it just shouldn't write to the disk faster than the user allows. It can be stored in RAM.
|
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@knreed - any chance something like this could be done as a side-effect of the OPNG down-time? This is something that we will pass along to the Krembil staff as we transition everything over to them. The next few months will be an intensive focus on migrating the system over to Krembil's infrastructure and cross-training their staff to become familiar with the different pieces system (some of this has been going for awhile now, but some couldn't start until the change became official). As a result, I don't think there will be much capacity to take this on in the short term, but I think that is something that can be revisited once the migration is complete. |
||
|
|
![]() |