| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 781
|
|
| Author |
|
|
Richard Haselgrove
Senior Cruncher United Kingdom Joined: Feb 19, 2021 Post Count: 360 Status: Offline Project Badges:
|
Yeah, I got 13 GPUs (each with 4 tasks in tandem) on it since roughly when it started and have had to rely heavily on back-up projects due to stalled/slow transfers in both directions (currently hundreds of pending uploads). Far cry from a steady workflow. I hope whatever insights gained from this server pounding are put into making things more efficient down the line. :) I agree with that observation: with modern NVida GPUs, I was producing upload files far faster than the server could accept them. Downloads were also a problem, but less severe than uploads. I've withdrawn my fast machines from this test, and uploaded/reported all outstanding tasks.I'll restart my Windows machines to run on iGPU only, so I can monitor how things go later in the day. My observations relate to between about 05:30 UTC and 07:00 UTC, which is normally a relatively quiet time: I hate to think what will happen when the USA starts to wake up again. I may dip in and out again with a fast Linux machine, to keep in touch with the wider picture. There are other side effects from the stress test: this forum is much slower than normal, and I think we've lost at least one scheduled statistics export. [Edit 1 times, last edit by Richard Haselgrove at Apr 27, 2021 8:55:59 AM] |
||
|
|
hnapel
Advanced Cruncher Netherlands Joined: Nov 17, 2004 Post Count: 82 Status: Offline Project Badges:
|
Lot's of uploads go to 100% but somehow do not complete.
----------------------------------------[Edit 1 times, last edit by hnapel at Apr 27, 2021 10:35:19 AM] |
||
|
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 786 Status: Offline Project Badges:
|
You are adding to the problem with 120 second loop to retry transfers.
----------------------------------------900 seconds would be more reasonable, that is enough to stop transfers going to multi-hour backoffs but won't hammer the servers that are already overloaded. Paul.
Paul.
|
||
|
|
TonyEllis
Senior Cruncher Australia Joined: Jul 9, 2008 Post Count: 286 Status: Offline Project Badges:
|
Not part of the GPU test (haven't any that qualify) - but ended up here to ascertain why ALL of my uploads and downloads were stalling and the forums so slow. Those interested in the GPU WUs probably knew about the test's potential impact - but what about the rest of us severely impacted by a test that has nothing to do with us and not informed i.e. not interest in GPU crunching?
----------------------------------------Anyway - having fitted 10 Linux machines with a retry file-transfer script can now file-transfer VERY SLOWLY with multiple retires until all files for a given WU get finally uploaded/downloaded. Have a Windows laptop on 2.4G wifi that has run 7 years crunching WCG WUs - no problem. Nothing I could do would get it from being stalled until moving it to a 5G AP.
Run Time Stats https://grassmere-productions.no-ip.biz/
|
||
|
|
squid
Advanced Cruncher Germany Joined: May 15, 2020 Post Count: 56 Status: Offline Project Badges:
|
Today my GPU got many GPU tasks. It processed the tasks without problems.
The upload of some tasks gave an error like below. I think it is a WCG server overload. 27-Apr-2021 10:40:28 [World Community Grid] Temporarily failed upload of OPNG_0004774_00156_0_r1196970475_0: transient HTTP error 27-Apr-2021 10:40:28 [World Community Grid] Backing off 00:18:04 on upload of OPNG_0004774_00156_0_r1196970475_0 |
||
|
|
goben_2003
Advanced Cruncher Joined: Jun 16, 2006 Post Count: 146 Status: Offline Project Badges:
|
You are adding to the problem with 120 second loop to retry transfers. 900 seconds would be more reasonable, that is enough to stop transfers going to multi-hour backoffs but won't hammer the servers that are already overloaded. Paul. Sorry Paul, but if iirc, I have 2 more undersea cables to jump through to get to the servers than you do. So even without the stress test I semi-regularly go into project back off. ![]() |
||
|
|
_heinz
Cruncher Joined: Apr 5, 2020 Post Count: 10 Status: Offline Project Badges:
|
I opened the doors of my V8-Xeon with 3 GTX Titans
will see how the units run :-) |
||
|
|
tux93
Cruncher Germany Joined: Jan 5, 2012 Post Count: 9 Status: Offline Project Badges:
|
Another option would be to put your boinc directory on a cheap spinny drive or ISCSI nas. That's what I ended up doing for the time being, copied the boinc dir to a spinning rust partition and bind-mounted it to the original location ![]() Primary: Intel i7-4790 + nVidia GTX 1060 Secondary: Intel i7-2600 + nVidia GTX 750 Ti OS: openSUSE Tumbleweed |
||
|
|
aegidius
Cruncher Joined: Aug 29, 2006 Post Count: 25 Status: Offline Project Badges:
|
So are the OPNG WU's going to keep coming after the 3-day stress test?
If they are, I'll go buy a better GPU :-) |
||
|
|
Chooka
Cruncher Australia Joined: Jan 25, 2017 Post Count: 49 Status: Offline Project Badges:
|
FWIW the stats haven't exported for Einstein@Home either for those commenting on stats. It might not be limited to WCG.... or just coincidence.
----------------------------------------![]() ![]() |
||
|
|
|