Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 781
Posts: 781   Pages: 79   [ Previous Page | 29 30 31 32 33 34 35 36 37 38 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 620362 times and has 780 replies Next Thread
William Albert
Cruncher
Joined: Apr 5, 2020
Post Count: 39
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I've shut down two machines equipped with an Nvidia GeForce GT 720 and an Intel HD Graphics 530.

There wasn't a large enough supply of Intel WUs to keep the Intel GPUs running, and the GT 720 is slow enough that about half of the Nvidia WUs were timing out at the 98% mark or so.

These are bog-standard Lenovo desktop PCs (with a basic video card for additional displays) that you'd find in any number of schools and businesses throughout the world. While these GPUs are no match for the powerful Nvidia and AMD GPUs used in workstations and enthusiast rigs, they're still more powerful than common desktop CPUs, and there's enough of them that writing them off as too slow might end up excluding a lot of aggregate power from participating in OPNG (or any future GPU-powered applications).

Anyway, I've documented my issues with the GeForce GT 720 if an admin wants to follow up.
----------------------------------------
[Edit 1 times, last edit by William Albert at Apr 27, 2021 10:42:11 PM]
[Apr 27, 2021 10:41:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Right now i have 0 GPU work units and have not received any in the last 20 hours so are the GPU work units still going out to the people


Can you post your message log? There should be good supply for any GPU's able to process the tasks.
[Apr 27, 2021 10:47:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I
There wasn't a large enough supply of Intel WUs to keep the Intel GPUs running, and the GT 720 is slow enough that about half of the Nvidia WUs were timing out at the 98% mark or so.


Can you post your message log from when your computer attempted to request work for the Intel GPUs? There should be plenty of supply available at this point for everyone who asks for work.
[Apr 27, 2021 10:48:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
William Albert
Cruncher
Joined: Apr 5, 2020
Post Count: 39
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I don't have those particular PCs online anymore, but an example log from another PC that is also unable to get any Intel WUs is below.

This computer has identical hardware specs to one another that is happily crunching Intel WUs.

Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Starting BOINC client version 7.16.6 for x86_64-pc-linux-gnu
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] log flags: file_xfer, sched_ops, task
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Libraries: libcurl/7.68.0 OpenSSL/1.1.1f zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Data directory: /var/lib/boinc-client
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] OpenCL: Intel GPU 0: Intel(R) Gen9 HD Graphics NEO (driver version 1.0.0, device version OpenCL 2.1 NEO, 6280MB, 6280MB available, 211 GFLOPS peak)
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] libc: Ubuntu GLIBC 2.31-0ubuntu9.3 version 2.31
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Host name: redacted-hostname
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz [Family 6 Model 94 Stepping 3]
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] OS: Linux Ubuntu: Ubuntu 20.04.2 LTS [5.8.0-50-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.3)]
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Memory: 7.67 GB physical, 4.00 GB virtual
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Disk: 233.24 GB total, 212.81 GB free
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Local time is UTC -7 hours
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Config: GUI RPCs allowed from:
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [World Community Grid] General prefs: from World Community Grid (last modified 17-Apr-2021 10:05:23)
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [World Community Grid] Host location: none
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [World Community Grid] General prefs: using your defaults
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Reading preferences override file
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Preferences:
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] max memory usage when active: 3925.31 MB
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] max memory usage when idle: 7693.60 MB
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] max disk usage: 209.91 GB
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Setting up project and slot directories
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Checking active tasks
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 6926287; resource share 100
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [World Community Grid] Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Setting up GUI RPC socket
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] gui_rpc_auth.cfg is empty - no GUI RPC password protection
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [---] Checking presence of 87 project files
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 Initialization completed
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [World Community Grid] Sending scheduler request: To fetch work.
Apr 27 22:55:45 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:45 [World Community Grid] Requesting new tasks for Intel GPU
Apr 27 22:55:47 redacted-hostname boinc[7688]: dir_open: Could not open directory 'locale' from '/var/lib/boinc-client'.
Apr 27 22:55:47 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:47 [World Community Grid] Scheduler request completed: got 0 new tasks
Apr 27 22:55:47 redacted-hostname boinc[7688]: 27-Apr-2021 15:55:47 [World Community Grid] Project requested delay of 121 seconds

[Apr 27, 2021 11:00:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Azmodes
Cruncher
Joined: Apr 4, 2017
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

thanks uplinger, any comment about what's going on with the low GPU utilization (lots of GPU idle time) of the 5-digit batches? you had mentioned that you though they should run fast.

I even confirmed that the process is constantly comming on and off the GPU. you can catch times running nvidia-smi where it shows the wcg application isnt even running on the GPU, while BOINC shows it running. and it'll constantly pop in and out. this is much different than all the tasks before, where even if the sub jobs were starting and stopping, nvidia-smi still recognized that the application was running on the GPU.

I'm seeing the exact same thing. GPU utilization is way down, CPU time ends up only being a quarter of the task (whereas it was about 100% before) and the processes keep showing up and vanishing again in nvidia-smi. Unsurprisingly runtimes appear to be longer.
----------------------------------------
[Edit 1 times, last edit by Azmodes at Apr 27, 2021 11:03:57 PM]
[Apr 27, 2021 11:02:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
m0320174
Cruncher
Joined: Feb 13, 2021
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Right now i have 0 GPU work units and have not received any in the last 20 hours so are the GPU work units still going out to the people


Can you post your message log? There should be good supply for any GPU's able to process the tasks.

That's a bit optimistic, I'm trying to build up a cache of (Nvidia) GPU workunits but it's impossible because the majority of my requests are unsuccessful:


04/28/21 01:07:50 | World Community Grid | Requesting new tasks for NVIDIA GPU
04/28/21 01:07:51 | World Community Grid | Scheduler request completed: got 0 new tasks
04/28/21 01:07:51 | World Community Grid | No tasks sent
04/28/21 01:07:51 | World Community Grid | No tasks are available for OpenPandemics - COVID 19
04/28/21 01:07:51 | World Community Grid | No tasks are available for OpenPandemics - COVID-19 - GPU
04/28/21 01:07:51 | World Community Grid | No tasks are available for Africa Rainfall Project
04/28/21 01:07:51 | World Community Grid | No tasks are available for Microbiome Immunity Project
04/28/21 01:07:51 | World Community Grid | No tasks are available for Help Stop TB
04/28/21 01:07:51 | World Community Grid | No tasks are available for Smash Childhood Cancer
04/28/21 01:07:51 | World Community Grid | Tasks for Intel GPU are available, but your preferences are set to not accept them


This is not that much of an issue because I currently have a buffer of 8 GPU workunits, sufficient for roughly 1 hour of processing. But, this is not what I call a good supply.

I also ran out of work twice in the the last couple of hours. So, I think there are still some optimizations to be done at server side.
[Apr 27, 2021 11:13:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gordonbb
Cruncher
Canada
Joined: May 14, 2019
Post Count: 19
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

I just spun up 10 more Nvidia GPUs across 5 systems that were on F@H during the expensive Time-of-Use Electricity rates here (nice that these tasks are lighter in terms of power load) and had no issues getting their job queues loaded with GPU tasks. The TOU period is ending so I've set the systems to "No New Tasks" and am emptying the queues and will put them back on F@H. At least these GPU jobs are short compared to the CPU jobs so I won't run out of CPU tasks even on my 3950x :-)

Things are much better with the big jobs compared to this time yesterday. I've stopped the scripts that were forcing transfers as they are no longer needed. I'm seeing only the very occasional transfer back-off.

Looking forward to these tasks coming out of "beta-beta" and into production just in time for the Air Conditioning season here in the Northern Hemisphere.

Too bad they're "Not Quite Ready for Prime Time" as this would have been an excellent candidate for the BOINC Pentathlon GPU event.
----------------------------------------

AMD - 2600x, 2 x 2700, 2700x, 3900x, 3950x, 2 x 5900x, 5950x
Intel - E3-1231v3, 9900K
NVidia - GTX 1060 6GB, 1660ti, 1070ti; RTX 2060, 2060s, 2070a, 5 x 2070s
[Apr 27, 2021 11:22:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
maeax
Advanced Cruncher
Joined: May 2, 2007
Post Count: 142
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

No problem with Einstein@Home!

And Einstein has longer running tasks which might not expose the issues with your SSD. You can’t really compare apples and oranges.

Like I said, I’m processing at a MUCH higher volume on OPNG, with no SSD issues. If it was a generic SSD issue, someone like me with many more writes would see this issue too, but we don’t. That points to your issue being related to something with your system specifically.


Nothing changed on Hardware-side, but OPNG-Tasks are running well since a few hours. Don't know why, but is ok :-)
----------------------------------------
AMD Ryzen Threadripper PRO 3995WX 64-Cores/ AMD Radeon (TM) Pro W6600. OS Win11pro
[Apr 27, 2021 11:25:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Uplinger, I've been reporting periodically getting "Scheduler request failed: Couldn't connect to server" since the GPU project entered Beta.
[Apr 27, 2021 11:38:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1000
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OpenPandemics - GPU Stress Test

Got an error with this WU ( OPNG_0015013_00013_4) I wasn't the only one to error, so I'm guessing it is the WU and not me.
[Apr 28, 2021 12:09:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 781   Pages: 79   [ Previous Page | 29 30 31 32 33 34 35 36 37 38 | Next Page ]
[ Jump to Last Post ]
Post new Thread