World Community Grid - View Thread - OpenPandemics GPU Beta Test

World Community Grid Forums

Category: Beta Testing

Forum: Beta Test Support Forum

Thread: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Quick Go »

No member browsing this thread

Thread Status: Locked
Total posts in this thread: 511

[ ]

Author

This topic has been viewed 1251721 times and has 510 replies

Mamajuanauk
Master Cruncher
United Kingdom
Joined: Dec 15, 2012
Post Count: 1900
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Fight Childhood Cancer

50 year badge for The Clean Energy Project - Phase 2

45 day badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

100 year badge for Mapping Cancer Markers

100 year badge for Uncovering Genome Mysteries

100 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Uplinger wrote 6th post from the bottom of page 8 in March 12 2021

I have toyed with the idea of granting points based on how many ligands are processed per work unit. However since it's not apples to apples with the CPU version, that seems incorrect. But not ruled out entirely.

I think this is a fair way to allocate points since work is processed so much faster.
Have the research team given any indication of how much time will be saved when the GPU application launches?

All of my work has completed successfully. Keep up the great work

So the points are still an issue, I am working towards fixing that, but I am planning to credit time based on elapsed time and not CPU time as that makes more sense for the GPU project. That would be the 0.37 hours you see on your report.

Note: I am not going to hold up the release of GPU over points...it is on my priority list near the top and I would like to have it fixed before launching.

Thanks,
-Uplinger

I think it would be fair for the points to be the same per gflop, since more powerful processors(cpu or gpu) should get more points in the same amount of time. I do not really care so much about the points though. Definitely do not hold up the release to figure out the points! It can be worked on after we start chewing through the rest of OPN(G).

As for time spent, elapsed time is definitely better than cpu time. Really though, it would be fairer to treat the gpu like a cpu. The way that it is now does not treat them equally. Using one of my computers as an example(just cpu or just gpu):

4c/8t i7 x 1hr = 8 hrs credit. This averages ~5.8 jobs total in that hour.
24c igpu x 1 hr = 1hr credit. This averages ~66.3 jobs total in that hour

This illustrates that it is not just more powerful total, but it is actually more powerful per core too, since it does a lot more jobs per core. Yet, it gets treated as if it were a 1 core cpu. It really should be given time credit per core. I do understand that it is probably non trivial to actually make that happen though. A compromise could be to have a multiplier on it though. Meaning elapsed time x Multiplier = time credit given.

Just want to point out that running just the GPU results in way more science happening than running just the CPU or running both the CPU and GPU. But if I just run the GPU, it will mean way less time credit. I do plan on just running the GPU since it is better for science of course. But I can't help regretting the loss of time credit, I was going to get my first 20Y diamond badge, but now will not with just running the GPU with the current loss of time credit. crying

Some projects that have an option to do GPU crunching allocate the same points for each task, irrespective of the time taken on CPU/GPU. That works for me as it all comes out in the wash, so to speak. A long task and a short task get the same points value, on average, it works out...

----------------------------------------

Mamajuanauk is the Name! Crunching is the Game!

[Mar 28, 2021 11:50:29 AM]

motech
Cruncher
Joined: Mar 30, 2007
Post Count: 23
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

45 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

180 day badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

180 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

10 year badge for Mapping Cancer Markers

180 day badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Long time cruncher, first time beta tester. Today I received four tasks for my NVIDIA GT 730, but all ended quickly with computation errors. I looked around, trying to find out what data I should send. If there's a post or if someone can suggest something additional, please let me know. I'll be happy to help if I can. From the event log:

3/27/2021 3:51:48 PM | World Community Grid | Computation for task BETA_OPNG_0000121_00360_0 finished
3/27/2021 3:51:48 PM | World Community Grid | Output file BETA_OPNG_0000121_00360_0_r2144017857_0 for task BETA_OPNG_0000121_00360_0 absent

[snip]
I've found that when an output file is absent, the cause is usually one not reported in the event log, but is reported in a different output file.

You may need to copy all files in the slot into a different directory so you can check the other files for error messages.

I figured that the event log is just an indication of a problem but not much help in finding the reason. However, I still don't know exactly what data I can provide that might shed some light on this. For all I know perhaps Grid staff already have the diagnostic data they need/want from my BOINC client.

If uplinger or other staff reach out and ask for specific data I'd be happy to provide it if I can. Alternatively, if my GPU turns out to be too old and is to be unsupported it would be best if these units weren't offered for such cards. Although I didn't mention it earlier, this is a Windows client.

[Mar 28, 2021 12:31:41 PM]

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

20 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

20 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

I didn't see how it could either but what do I know. gpu_ cpu_usage in the same line of code? confused

Well, I get this in the Event Log:

28/03/2021 09:35:16 | GPUGRID | Unparsed line in app_config.xml: gpu_gpu_usage
28/03/2021 09:35:16 | GPUGRID | Unparsed line in app_config.xml: gpu_cpu_usage

That event log message is telling you that you have a problem with the syntax in your app_config file and what does GPUGRID have to do with WCG?

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

----------------------------------------
[Edit 1 times, last edit by nanoprobe at Mar 28, 2021 12:57:01 PM]

[Mar 28, 2021 12:55:49 PM]

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

There was talk about your card either in this thread or in the one dated 3/12/21. Searching them may help sort out your issue. IIRC other people had issues with the 730.

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

[Mar 28, 2021 1:07:46 PM]

Richard Haselgrove
Senior Cruncher
United Kingdom
Joined: Feb 19, 2021
Post Count: 360
Status: Offline
Project Badges:

2 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

That event log message is telling you that you have a problem with the syntax in your app_config file and what does GPUGRID have to do with WCG?

I downloaded BoincTasks, installed it, taught myself how to use it, set an event log flag, and tried it out on a project that had an app_config.xml file but was currently idle.

BoincTasks sends an rpc to the core client, and the client writes out a new copy of app_info.xml

With multiple formatting errors, of which those two are the worst. BoincTasks is doing its part of the transaction correctly, but the BOINC code which was written specifically for BoincTasks to use is horribly broken.

https://github.com/BOINC/boinc/commit/5b6f648570cdef551b77bbabf3e69243dac69de4

[Mar 28, 2021 1:10:06 PM]

cpalmer
Cruncher
England
Joined: Feb 14, 2021
Post Count: 16
Status: Offline
Project Badges:

180 day badge for Mapping Cancer Markers


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Two topics that I've been resisting saying much on as there is already a lot of comment, but I think they are worth saying:

First, when running multiple copies on an NVidia GPU, it makes far more sense to me to set gpu_usage=0.5 (or whatever accordingly to how many copies), but always set cpu_usage=1. Why? Because with opencl on NVidia the CPU busy-waits for the GPU for each copy of the task. So you should always have a complete CPU allocated for each copy. I always override the just-below-one values sent by some apps. Without that you can easily end up over-allocating your CPUs if you are also running some CPU tasks. I'm only saying this about NVidia opencl though - I don't think it applies to Nvidia CUDA (which OPN isn't), and I don't know what the situation is on AMD/Intel. And the optimal settings may be different when the full-sized tasks arrive.

Secondly, the dreaded points. My rationale:
1. Don't use CPU time. It will be distorted for Nvidia (see above). It incentivises running things inefficiently (e.g. over-allocating CPUs) which achieves less work.
2. Don't use GPU time. It doesn't reflect work done. It also incentivises over-committing your GPU so extend GPU times at the expense of reducing work done. But even worse it incentivises using the slowest lowest-power-consumption GPU you can, resulting in much less useful work.
3. Don't use elapsed time. Similar arguments to above.
4. Do use Gflops if you can readily obtain them. It reflects useful work done. I don't really see why the same sort of per-Gflop value shouldn't apply as for CPU.
5. If you don't have Gflops available, just use a fixed value per task. Overs and unders will even out.
6. Don't use zero. I don't care about points but many do. I do care about having their GPUs working on perhaps the most important project ever. If you have to give points to get them to do that then give points. It's not going to cost any real money.
7. Give enough points to remain competitive with other GPU projects. Same rationale as previous point.

And a final thought.... I applaud WCG working on all 9 platform combinations. But I'd really like to see each of those 9 platforms moved into production as soon as it is ready, rather than waiting until all 9 are. I believe it is an urgent enough project to warrant that.

Thanks

----------------------------------------
[Edit 2 times, last edit by cpalmer at Mar 28, 2021 2:38:40 PM]

[Mar 28, 2021 1:37:32 PM]

motech
Cruncher
Joined: Mar 30, 2007
Post Count: 23
Status: Offline
Project Badges:


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

There was talk about your card either in this thread or in the one dated 3/12/21. Searching them may help sort out your issue. IIRC other people had issues with the 730.

Thank you for making me aware of that. Scanning through the beta forums but not reading every post, it appears that widdershins has been dealing with this since February. However, from what I can tell there has still been no resolution. I understand the WCG is focusing on the most pressing problems first, and the GT 730 is almost certainly a minor consideration at this time. I guess I'll take my old GPU back to Folding@Home for now as it is capable of contributing there. I would just add that before the GPU app comes out of beta I hope that WCG will have this sorted out, at the very least not distributing these work units to this GPU if it can't handle them.

[Mar 28, 2021 4:23:22 PM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

45 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

Good afternoon,

I am not going to be adding any new work on to the beta test right now or the rest of this weekend. I have plenty of results and I have quickly read through a lot of the comments and suggestions from everyone. I will take more time on Monday morning going through and examining what we have.

There are a few things I may want to try....One of which would be a target for fixing the GT cards on NVIDIA...The other is, I want to atleast put in place a point system for the next round of beta...ya'll deserve credit and time that your systems are computing work units.

At the moment, depending on what the researchers say, this might be our first GA release...the numbers on everything look very promising and well within reason for acceptable error/invalid rates. I personally have an aggressive timeline I wish to get this to production, but I know we will atleast have one more beta tests which will be to remove the dlg files from being sent back and to confirm backend processes work correctly with that.

Thanks,
-Uplinger

[Mar 28, 2021 5:20:01 PM]

cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:

90 day badge for Human Proteome Folding - Phase 2

90 day badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

90 day badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

At the moment, depending on what the researchers say, this might be our first GA release...the numbers on everything look very promising and well within reason for acceptable error/invalid rates. I personally have an aggressive timeline I wish to get this to production, but I know we will atleast have one more beta tests which will be to remove the dlg files from being sent back and to confirm backend processes work correctly with that.
Thanks,
-Uplinger

Hey, great news !!! Thanks for the help and support... specially on a Sunday !!! smile

CJSL
Crunching away on the thin ice of a new day ...

----------------------------------------

I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team

[Mar 28, 2021 6:22:20 PM]

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:


Re: OpenPandemics GPU Beta Test - March 26 2021 [ Issues Thread ]

That event log message is telling you that you have a problem with the syntax in your app_config file and what does GPUGRID have to do with WCG?

I downloaded BoincTasks, installed it, taught myself how to use it, set an event log flag, and tried it out on a project that had an app_config.xml file but was currently idle.

BoincTasks sends an rpc to the core client, and the client writes out a new copy of app_info.xml <Typo? If not that won't work either.

With multiple formatting errors, of which those two are the worst. BoincTasks is doing its part of the transaction correctly, but the BOINC code which was written specifically for BoincTasks to use is horribly broken.

https://github.com/BOINC/boinc/commit/5b6f648570cdef551b77bbabf3e69243dac69de4

I don't know what core client your using that writes it's own xml files but I would not use it under any circumstances. I know of no BOINC code that was written specifically for BT so I'll have to take you word for that. More than likely the horribly written code is your problem so why use it? I don't understand why someone is trying to reinvent the wheel. Stand alone BOINC with the BT app has worked fine for a long time. My suggestion would be to install the most up to date version of BOINC from the Berkley site, write your own xml files as needed and dump whatever it is you're trying to make work here. Good luck.

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

[Mar 28, 2021 7:05:14 PM]

[ ]