| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| Member(s) browsing this thread: VulcanCat |
|
Thread Status: Active Total posts in this thread: 573
|
|
| Author |
|
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 442 Status: Offline Project Badges:
|
files transferring - database issue no longer active.
----------------------------------------EDIT to add: new computer still does not have the correct config. still investigating [Edit 1 times, last edit by bfmorse at Oct 30, 2025 3:21:23 AM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
Database accessible again...
I wonder if Dylan took it out of the BOINC environment to dig in the database to identify WUs/tasks that would need coaxing through their Kafka-based processing path; it would be a lot easier for him to do that without BOINC jostling his elbows, after all :-) Interesting times... Cheers - Al. |
||
|
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 442 Status: Offline Project Badges:
|
it appears that I have 22 of these file transfer errors:
but when I tried to copy/paste the message in my WU's ERROR column, i got this: Forbidden You don't have permission to access this resource. Apache/2.4.58 (Ubuntu) Server at www.worldcommunitygrid.org Port 443 As to the WU Problem transfer times: 2025-10-29 15:54:04 UTC through2025-10-29 18:56:35 UTC |
||
|
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 442 Status: Offline Project Badges:
|
re: computer profile updates.
In the past, when a computer's profile was updated it would propagate through the system and, if i recall correctly, ALL my systems would display their current Profile information in the log file when that computer polled the host. That is not happening. Have things changed, or is that function still on the TO DO list? |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
I see that new tasks are still going out with that enormous estimated FPOPs value.
After an interval of dealing with retries that were all getting proper Claimed credit, presumably because they had a sane FPOPs value and hence a sane estimated run time, the majority of tasks being returned by my fastest system are back to getting that 202.5 Claimed Credit value. So far, none of the other systems seem to be going back to 202.5, but... I don't care about the credit (it doesn't buy anything, after all!) but I do worry about the way-too-high run time estimates showing up in BOINC Manager; at a minimum, that will reduce the number of items that will be downloaded to "fill" the buffer, and at worst users might decide to [unnecessarily] abort stuff on slower systems. Whilst it will adapt over time, it's not currently using a suitable baseline :-( Cheers - Al. |
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2501 Status: Offline Project Badges:
|
I see that new tasks are still going out with that enormous estimated FPOPs value. @alanb1951 All MCM1 task since the restart have a Gflops value of 87 497 Gflops, which in the client_state.xml translates to <rsc_fpops_est>87497425433226.000000</rsc_fpops_est> That is around 2,7 times too high. The real Gflops should be around 32 406 Gflops, and therefore the right value in the client_state.xml should be: <rsc_fpops_est>32406453864157.000000</rsc_fpops_est> The BOINC client doesn't seem to adjust this down to correct values by itself, as long as WCG sends out MCM1 tasks with those extremely high values. It creates problems for crunchers, when the estimated runtime for tasks are 2,7 times higher than they are in reality. Some people would as you say, for example not be able to get enough tasks, while others may abort their tasks, thinking that they will never be able to crunch them in time. I'm adjusting <rsc_fpops_est> manually in the client_state.xml file though, from time to time. I have just alerted Igor and Dylan about this issue. [Edit 2 times, last edit by Grumpy Swede at Oct 31, 2025 6:23:17 AM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
@Grumpy Swede,
Thanks for the follow-up -- you've done the same calculations I did but reported with much more precision than I did when making my two posts on this topic! As you've already alerted them I can now move on to something else! I have just surfaced after some overdue sleep and was about to start an exercise to find out roughly how many total instructions (not just FP!) a typical MCM1 task executes during a complete run, with a view to publishing some numbers regarding actual throughput. Whilst that's no longer necessary, I did do a few 60 second snapshots on my second fastest system[*1] which usually takes just under 50 minutes to complete a task. This is on Linux, so I used perf stat... In all samples, it managed around 824.67 GigaOps per minute, so the total instruction count over a run was likely to be somewhere in the vicinity of 41,233 GigaOps; of course, only a [small?] proportion of those are going to be F.P. instructions, so that even the estimates of 30768849498855 FPOPs or 30879253383882 FPOPs I was seeing for old task retries might arguably be considered too high (and would explain why initial estimates were always somewhat too high even on pre-migration tasks!) That said, I can live with an estimate that is about 50% too high so just getting back to the old numbers would be great!Cheers - Al. P.S. On a couple of occasions I've tried to pass things like this to WCG via the contact mechanism, but beyond an automated response I have no idea whether they actually got delivered -- do I gather you use direct email? *1 That is a Ryzen 7840HS with maximum clock software-limited to 4.1GHz; when I tried to put that information in the main post , bracketed thus -- (7840HS with clock limited to 4.1GHz) -- it threw up the dreaded Forbidden error, so I've been building this post piecemeal ever since! |
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2501 Status: Offline Project Badges:
|
@alanb1951
----------------------------------------Very good Al. Yes, I'm using direct mail to Igor, and I just got a reply from Igor saying "we will adjust" Edit, added: They seem to have adjusted new tasks already. I just downloaded a bunch of MCM1 tasks, and they have the following values: Estimated task size 30 879 GFLOPs And in client-state.xml: <rsc_fpops_est>30879253383882.000000</rsc_fpops_est> The exact value you calculated. So, let's hope it stays around those values. [Edit 4 times, last edit by Grumpy Swede at Oct 31, 2025 7:09:57 AM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
Thanks, Grumpy Swede!
I'd gathered a fix was in because I'd just checked one of my systems that runs a 64-task cache and it seemed to start getting much smaller estimates from around 04:00 UTC! So I came in here to see if you'd had a formal response. And so far, all of the tasks Zen7900 (previously plagued by 202.5 credit claims) has returned that have the new FPOPs estimate have ended up with sensible claimed credit numbers (unlike some tasks with the higher estimate). I am hoping the better FPOPs number and runtime estimate have been sufficient to get rid of that problem without needing a separate fix! All in all, that's some good headway made so (once again) thanks for sending it to WCG! By the way, I didn't calculate those values -- I dug them out of the fe field in one of the WCG job logs. They seemed to be very common values for pre-migration MCM1 tasks and led to sensible looking estimated runtime (ue) values. Cheers - Al. P.S. I don't think I'd have the confidence to mail Igor directly, despite how verbose I might get in the forums! ![]() |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1296 Status: Offline Project Badges:
|
Thank you Grumpy Swede and Al for diagnosing the problem, figuring out the fix and communicating it. My WU values are back to normal.
----------------------------------------I had my first error on MCM - https://www.worldcommunitygrid.org/contribution/workunit/767776183 looks like others saw the error, and it was fixed quickly. [Edit 2 times, last edit by Unixchick at Oct 31, 2025 3:29:31 PM] |
||
|
|
|