Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 781
Posts: 781   Pages: 79   [ Previous Page | 16 17 18 19 20 21 22 23 24 25 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 945026 times and has 780 replies Next Thread
Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Yeah, I got 13 GPUs (each with 4 tasks in tandem) on it since roughly when it started and have had to rely heavily on back-up projects due to stalled/slow transfers in both directions (currently hundreds of pending uploads). Far cry from a steady workflow. I hope whatever insights gained from this server pounding are put into making things more efficient down the line. :)
I agree with that observation: with modern NVida GPUs, I was producing upload files far faster than the server could accept them. Downloads were also a problem, but less severe than uploads. I've withdrawn my fast machines from this test, and uploaded/reported all outstanding tasks.

I'll restart my Windows machines to run on iGPU only, so I can monitor how things go later in the day. My observations relate to between about 05:30 UTC and 07:00 UTC, which is normally a relatively quiet time: I hate to think what will happen when the USA starts to wake up again. I may dip in and out again with a fast Linux machine, to keep in touch with the wider picture.

There are other side effects from the stress test: this forum is much slower than normal, and I think we've lost at least one scheduled statistics export.
----------------------------------------
[Edit 1 times, last edit by Richard Haselgrove at Apr 27, 2021 8:55:59 AM]
[Apr 27, 2021 8:54:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hnapel
Advanced Cruncher
Netherlands
Joined: Nov 17, 2004
Post Count: 82
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Lot's of uploads go to 100% but somehow do not complete.
----------------------------------------
[Edit 1 times, last edit by hnapel at Apr 27, 2021 10:35:19 AM]
[Apr 27, 2021 8:54:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 786
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

You are adding to the problem with 120 second loop to retry transfers.
900 seconds would be more reasonable, that is enough to stop transfers going to multi-hour backoffs but won't hammer the servers that are already overloaded.

Paul.
----------------------------------------
Paul.
[Apr 27, 2021 9:05:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 286
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Not part of the GPU test (haven't any that qualify) - but ended up here to ascertain why ALL of my uploads and downloads were stalling and the forums so slow. Those interested in the GPU WUs probably knew about the test's potential impact - but what about the rest of us severely impacted by a test that has nothing to do with us and not informed i.e. not interest in GPU crunching?
Anyway - having fitted 10 Linux machines with a retry file-transfer script can now file-transfer VERY SLOWLY with multiple retires until all files for a given WU get finally uploaded/downloaded.
Have a Windows laptop on 2.4G wifi that has run 7 years crunching WCG WUs - no problem. Nothing I could do would get it from being stalled until moving it to a 5G AP.
----------------------------------------
[Apr 27, 2021 9:09:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
squid
Advanced Cruncher
Germany
Joined: May 15, 2020
Post Count: 56
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Today my GPU got many GPU tasks. It processed the tasks without problems.
The upload of some tasks gave an error like below. I think it is a WCG server overload.

27-Apr-2021 10:40:28 [World Community Grid] Temporarily failed upload of OPNG_0004774_00156_0_r1196970475_0: transient HTTP error
27-Apr-2021 10:40:28 [World Community Grid] Backing off 00:18:04 on upload of OPNG_0004774_00156_0_r1196970475_0
[Apr 27, 2021 9:12:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
goben_2003
Advanced Cruncher
Joined: Jun 16, 2006
Post Count: 146
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

You are adding to the problem with 120 second loop to retry transfers.
900 seconds would be more reasonable, that is enough to stop transfers going to multi-hour backoffs but won't hammer the servers that are already overloaded.

Paul.

Sorry Paul, but if iirc, I have 2 more undersea cables to jump through to get to the servers than you do. So even without the stress test I semi-regularly go into project back off.
----------------------------------------

[Apr 27, 2021 9:19:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
_heinz
Cruncher
Joined: Apr 5, 2020
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I opened the doors of my V8-Xeon with 3 GTX Titans
will see how the units run :-)
[Apr 27, 2021 9:21:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
tux93
Cruncher
Germany
Joined: Jan 5, 2012
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Another option would be to put your boinc directory on a cheap spinny drive or ISCSI nas.

That's what I ended up doing for the time being, copied the boinc dir to a spinning rust partition and bind-mounted it to the original location
----------------------------------------


Primary: Intel i7-4790 + nVidia GTX 1060
Secondary: Intel i7-2600 + nVidia GTX 750 Ti
OS: openSUSE Tumbleweed
[Apr 27, 2021 9:30:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
aegidius
Cruncher
Joined: Aug 29, 2006
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

So are the OPNG WU's going to keep coming after the 3-day stress test?
If they are, I'll go buy a better GPU :-)
[Apr 27, 2021 9:34:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Chooka
Cruncher
Australia
Joined: Jan 25, 2017
Post Count: 49
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

FWIW the stats haven't exported for Einstein@Home either for those commenting on stats. It might not be limited to WCG.... or just coincidence.
----------------------------------------


[Apr 27, 2021 9:38:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 781   Pages: 79   [ Previous Page | 16 17 18 19 20 21 22 23 24 25 | Next Page ]
[ Jump to Last Post ]
Post new Thread