World Community Grid - View Thread - Project Status (First Post Updated)

World Community Grid Forums

Category: Community

Forum: Chat Room

Thread: Project Status (First Post Updated)

Quick Go »

Member(s) browsing this thread: VulcanCat

Thread Status: Active
Total posts in this thread: 573

[ ]

Author

This topic has been viewed 43380 times and has 572 replies

bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:

14 day badge for Human Proteome Folding - Phase 2

14 day badge for Help Fight Childhood Cancer

14 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Computing for Clean Water

200 year badge for Mapping Cancer Markers

180 day badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

180 day badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

files transferring - database issue no longer active.

EDIT to add: new computer still does not have the correct config. still investigating

----------------------------------------
[Edit 1 times, last edit by bfmorse at Oct 30, 2025 3:21:23 AM]

[Oct 30, 2025 3:19:28 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Database accessible again...

I wonder if Dylan took it out of the BOINC environment to dig in the database to identify WUs/tasks that would need coaxing through their Kafka-based processing path; it would be a lot easier for him to do that without BOINC jostling his elbows, after all :-)

Interesting times...

Cheers - Al.

[Oct 30, 2025 3:24:35 AM]

bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

it appears that I have 22 of these file transfer errors:

but when I tried to copy/paste the message in my WU's ERROR column, i got this:

Forbidden
You don't have permission to access this resource.

Apache/2.4.58 (Ubuntu) Server at www.worldcommunitygrid.org Port 443

As to the WU Problem transfer times:

2025-10-29 15:54:04 UTC

through

2025-10-29 18:56:35 UTC

[Oct 30, 2025 3:57:31 AM]

bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 442
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

re: computer profile updates.

In the past, when a computer's profile was updated it would propagate through the system and, if i recall correctly, ALL my systems would display their current Profile information in the log file when that computer polled the host.

That is not happening.

Have things changed, or is that function still on the TO DO list?

[Oct 30, 2025 4:13:31 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

I see that new tasks are still going out with that enormous estimated FPOPs value.

After an interval of dealing with retries that were all getting proper Claimed credit, presumably because they had a sane FPOPs value and hence a sane estimated run time, the majority of tasks being returned by my fastest system are back to getting that 202.5 Claimed Credit value. So far, none of the other systems seem to be going back to 202.5, but...

I don't care about the credit (it doesn't buy anything, after all!) but I do worry about the way-too-high run time estimates showing up in BOINC Manager; at a minimum, that will reduce the number of items that will be downloaded to "fill" the buffer, and at worst users might decide to [unnecessarily] abort stuff on slower systems. Whilst it will adapt over time, it's not currently using a suitable baseline :-(

Cheers - Al.

[Oct 30, 2025 11:37:02 AM]

Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2501
Status: Offline
Project Badges:

10 year badge for Mapping Cancer Markers

14 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

90 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

I see that new tasks are still going out with that enormous estimated FPOPs value.

@alanb1951

All MCM1 task since the restart have a Gflops value of 87 497 Gflops, which in the client_state.xml translates to <rsc_fpops_est>87497425433226.000000</rsc_fpops_est>

That is around 2,7 times too high.

The real Gflops should be around 32 406 Gflops, and therefore the right value in the client_state.xml should be:

<rsc_fpops_est>32406453864157.000000</rsc_fpops_est>

The BOINC client doesn't seem to adjust this down to correct values by itself, as long as WCG sends out MCM1 tasks with those extremely high values.
It creates problems for crunchers, when the estimated runtime for tasks are 2,7 times higher than they are in reality. Some people would as you say, for example not be able to get enough tasks, while others may abort their tasks, thinking that they will never be able to crunch them in time.

I'm adjusting <rsc_fpops_est> manually in the client_state.xml file though, from time to time.

I have just alerted Igor and Dylan about this issue.

----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Oct 31, 2025 6:23:17 AM]

[Oct 31, 2025 1:16:51 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

@Grumpy Swede,

Thanks for the follow-up -- you've done the same calculations I did but reported with much more precision than I did when making my two posts on this topic! As you've already alerted them I can now move on to something else!

I have just surfaced after some overdue sleep and was about to start an exercise to find out roughly how many total instructions (not just FP!) a typical MCM1 task executes during a complete run, with a view to publishing some numbers regarding actual throughput.

Whilst that's no longer necessary, I did do a few 60 second snapshots on my second fastest system[*1] which usually takes just under 50 minutes to complete a task. This is on Linux, so I used perf stat...

In all samples, it managed around 824.67 GigaOps per minute, so the total instruction count over a run was likely to be somewhere in the vicinity of 41,233 GigaOps; of course, only a [small?] proportion of those are going to be F.P. instructions, so that even the estimates of 30768849498855 FPOPs or 30879253383882 FPOPs I was seeing for old task retries might arguably be considered too high (and would explain why initial estimates were always somewhat too high even on pre-migration tasks!)

That said, I can live with an estimate that is about 50% too high smile

so just getting back to the old numbers would be great!

Cheers - Al.

P.S. On a couple of occasions I've tried to pass things like this to WCG via the contact mechanism, but beyond an automated response I have no idea whether they actually got delivered -- do I gather you use direct email?

*1 That is a Ryzen 7840HS with maximum clock software-limited to 4.1GHz; when I tried to put that information in the main post , bracketed thus -- (7840HS with clock limited to 4.1GHz) -- it threw up the dreaded Forbidden error, so I've been building this post piecemeal ever since!

[Oct 31, 2025 5:50:14 AM]

Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2501
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

@alanb1951

Very good Al.

Yes, I'm using direct mail to Igor, and I just got a reply from Igor saying "we will adjust"

Edit, added: They seem to have adjusted new tasks already. I just downloaded a bunch of MCM1 tasks, and they have the following values:
Estimated task size 30 879 GFLOPs
And in client-state.xml: <rsc_fpops_est>30879253383882.000000</rsc_fpops_est>
The exact value you calculated. smile

So, let's hope it stays around those values.

----------------------------------------
[Edit 4 times, last edit by Grumpy Swede at Oct 31, 2025 7:09:57 AM]

[Oct 31, 2025 6:34:48 AM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:


Re: Project Status (First Post Updated)

Thanks, Grumpy Swede!

I'd gathered a fix was in because I'd just checked one of my systems that runs a 64-task cache and it seemed to start getting much smaller estimates from around 04:00 UTC! So I came in here to see if you'd had a formal response.

And so far, all of the tasks Zen7900 (previously plagued by 202.5 credit claims) has returned that have the new FPOPs estimate have ended up with sensible claimed credit numbers (unlike some tasks with the higher estimate). I am hoping the better FPOPs number and runtime estimate have been sufficient to get rid of that problem without needing a separate fix!

All in all, that's some good headway made so (once again) thanks for sending it to WCG!

By the way, I didn't calculate those values -- I dug them out of the fe field in one of the WCG job logs. They seemed to be very common values for pre-migration MCM1 tasks and led to sensible looking estimated runtime (ue) values.

Cheers - Al.

P.S. I don't think I'd have the confidence to mail Igor directly, despite how verbose I might get in the forums! smile

[Oct 31, 2025 9:32:17 AM]

Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1296
Status: Offline
Project Badges:

180 day badge for Smash Childhood Cancer

45 day badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: Project Status (First Post Updated)

Thank you Grumpy Swede and Al for diagnosing the problem, figuring out the fix and communicating it. My WU values are back to normal.

I had my first error on MCM -
https://www.worldcommunitygrid.org/contribution/workunit/767776183

looks like others saw the error, and it was fixed quickly.

----------------------------------------
[Edit 2 times, last edit by Unixchick at Oct 31, 2025 3:29:31 PM]

[Oct 31, 2025 3:20:04 PM]

[ ]