Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 159
Posts: 159   Pages: 16   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 14575 times and has 158 replies Next Thread
Boca Raton Community HS
Advanced Cruncher
Joined: Aug 27, 2021
Post Count: 125
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

We are starting to see a minor improvement. If I could quantify the improvement, I would put it at 10%.
After having it pretty bad over the weekend, where some hosts "died of fuel starvation" wink due to both stuck downloads and uploads, it is much better today, though not without the need for some manual transfer retries.
The time is takes to download a whole set of files for one (1) ARP WU is on average down today from about half a day to 15-20 minutes. Not sure how you were calculating your 10% estimate though... wink


Ralf



My 10% estimate was created using some advanced calculus.

Kidding... We returned about 10% more work units today versus Saturday or Sunday.

I think I know one specific issue for us is related to core count on many of our systems. Let me be clear, I am thrilled to work with our systems that are high core count and we are not trying to take work away from any other volunteers (there seems to be plenty MCM1 work to go around). I am not (ever) going to complain about the hardware our students get to work with!

But, because they require SO many work units to stay "full", they starve faster in a time of drought. Then, because they request so much work, they have a higher chance to stall a download and then get stuck. Rinse and repeat.

I am really referring to MCM1 work here, not ARP1. We might only run ARP1 on small core count systems and then only MCM1 on high core count systems, but I am not sure this would help at all since many of the MCM1 work units are still attempting to download, despite ARP1 work.

Are these valid conclusions? I am trying to sort it out in my head for our systems.
----------------------------------------
[Edit 1 times, last edit by Boca Raton Community HS at Nov 12, 2024 4:32:37 AM]
[Nov 12, 2024 2:57:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ericinboston
Senior Cruncher
Joined: Jan 12, 2010
Post Count: 258
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024


But are you honestly telling me that you have 15 headless machines sitting there without having the ability to remotely access them? confused sad


Yes. The Macs are all in my basement, near each other. Normally (when WCG works), they just crunch away 24x7 without any issues. Power goes out?...they start right back up without human intervention. If I notice 1 machine hasn't submitted results in a few days (this doesn't happen on the Macs...only on my Wintels when Windows decides to have a nag screen after a Windows Update), I just power it off and on and it's back up. If for some bizarre reason the Mac continues not to report results, I will connect my kb and monitor.

The pain is circumstances like right now...I don't want to connect a kb and monitor to each of 15 Macs...click Retry...wait 1-4 mins...likely have to click Retry again...wait 1-4 mins...then when/if it works, unplug and go to the next Mac. It's just a total time waster.

So it appears in this case I can just let them sit and this issue will (hopefully) fix itself within the next few days.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by ericinboston at Nov 12, 2024 3:33:39 AM]
[Nov 12, 2024 3:31:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
rjs5
Cruncher
Joined: Jan 22, 2011
Post Count: 6
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

I hope I am not contributing to the problem with my farm set to 10 days buffer
Considering the 6 days deadline for MCM you might a bit as some WUs are likely not started before deadline and need to be resend to someone else. Buffer should be always significantly below the deadline.


Thank you! But I have on all machines set to 10 days and 'up to 10 days more' however I just looked on my Results page ( https://www.worldcommunitygrid.org/contribution/results?validationStatusIds=5 ) sorted 'too late' , 0 results so I seem to not be wasting their bandwidth.

No errors (if that's the errors page here, all other projects have state=6 for that) means, that you start all tasks before the deadline, not that you finish and report them to the server before the deadline so that replacement tasks don't need to be created and send out to another computer. I see sometimes in my task list such "aborted by server" task returned just a bit too late. Tasks returned after deadline, but before the replacement task is returned validate and do not appear as an error in the list.


I was having problems downloading files for the last week or so (hadn't paid much attention). It seemed to happen when they released the ARP1 work units.

Yesterday, I decided to open more xfer channels per project from the default "2" to "4" per project. ARP1 downloads continued to completion once started, and MCM cleared out pretty quickly. The download transfers are have been empty for hours now.

cc_config.xml
was
<max_file_xfers_per_project>2</max_file_xfers_per_project>
is ...
<max_file_xfers_per_project>4</max_file_xfers_per_project>
[Nov 12, 2024 5:21:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
catchercradle
Advanced Cruncher
Joined: Jan 16, 2009
Post Count: 125
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

Having 16 real cores, I can be downloading quite a few ARP tasks at a time. It would be nice if BOINC could concentrate on the downloads for one task at a time, that could be helpful. I often seem to have four or more tasks downloading with only 6 files left.
(This has been put in as a feature request in the past to git-hub. Not sure why it has been rejected.)
[Nov 12, 2024 9:22:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
erich56
Senior Cruncher
Austria
Joined: Feb 24, 2007
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024


Yesterday, I decided to open more xfer channels per project from the default "2" to "4" per project. ARP1 downloads continued to completion once started, and MCM cleared out pretty quickly. The download transfers are have been empty for hours now.

cc_config.xml
was
<max_file_xfers_per_project>2</max_file_xfers_per_project>
is ...
<max_file_xfers_per_project>4</max_file_xfers_per_project>

I have even 8 xfer channels with one of my hosts - since yesterday I've been trying to download MCM1 - not a single task came in. Really too bad sad
[Nov 12, 2024 1:12:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ericinboston
Senior Cruncher
Joined: Jan 12, 2010
Post Count: 258
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

If anyone is listening, it's 10am ET and the results from yesterday for all 15 machines are still about 1/3 of what they normally are...so this problem is not fixed yet. :(
----------------------------------------

[Nov 12, 2024 3:04:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
thunder7
Senior Cruncher
Netherlands
Joined: Mar 6, 2013
Post Count: 232
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

One wonders how long this fiasco can go on before they recognize that it doesn't work and ARP can be switched off again - then hopefully use their new insights to test more or differently before switching it on again.
----------------------------------------
[Edit 1 times, last edit by thunder7 at Nov 12, 2024 3:12:05 PM]
[Nov 12, 2024 3:10:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Boca Raton Community HS
Advanced Cruncher
Joined: Aug 27, 2021
Post Count: 125
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

I know this has been mentioned before, but is there any technical reason why the files cannot be combined into one file for download?

If every work unit is ONE file, then:

1. Once a download for a work unit begins because a connection was made, then it SHOULD complete (no more work units that are missing files).
2. About 10x fewer connection requests to the servers allows more work to move freely.
3. More completed tasks = more valids = fewer resends = less traffic.

Is there a technical limitation of the project that prevents this from being a possibility?

I would be THRILLED to download one file, even if it is large and takes a while, versus the fragmented process. Some other projects out there require the download of massive files, and few participants complain about file size.
[Nov 12, 2024 7:26:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1320
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

One wonders how long this fiasco can go on before they recognize that it doesn't work and ARP can be switched off again - then hopefully use their new insights to test more or differently before switching it on again.

Real work is done in spite of the hassle:
Statistics date		Total run time         Points generated	Results returned

11/11/2024 7:168:11:08:46 17.537.130 4.236
11/10/2024 7:017:18:27:06 16.901.101 4.063

[Nov 12, 2024 10:02:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ericinboston
Senior Cruncher
Joined: Jan 12, 2010
Post Count: 258
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024

I know this has been mentioned before, but is there any technical reason why the files cannot be combined into one file for download?

If every work unit is ONE file, then:

1. Once a download for a work unit begins because a connection was made, then it SHOULD complete (no more work units that are missing files).
2. About 10x fewer connection requests to the servers allows more work to move freely.
3. More completed tasks = more valids = fewer resends = less traffic.

Is there a technical limitation of the project that prevents this from being a possibility?

I would be THRILLED to download one file, even if it is large and takes a while, versus the fragmented process. Some other projects out there require the download of massive files, and few participants complain about file size.


My guess the challenges are:

1)The server would need to combine/zip/rar all your tasks into a bundle...requiring more/a lot of CPU time for thousands of users.

2)Possibly file size and hence where to store "the bundle" which is now essentially 2x the storage needed for WCG.

3)WCG projects' bundles could have massive file sizes


An answer to #1 is that simply the Server gives every client a "bundle" of X WUs. But what is X? I don't think WCG wants to keep Small, Medium, and Large bundles all over their storage and constantly try to predict when the bundles are starting to go dry and then some users don't get bundles for a few days.

Ideally as I've mentioned before, Krembil should have just re-written this relatively simply client/server app in modern code and plop it on a modern Cloud environment. Heck, the darn website takes 10 seconds to load ANY non-stats page while the stats pages take far longer! I bet you this website gets about 1000 visits an hour on any given day which is tiny.

There is no reason stats pages should take 20+ seconds to load in 2004 let alone 2024. The stats areas of this site should either be 1)re-written to show stats as of 24 hours ago and hence the stats are very simply db lookups into static summary Table entries or 2)the databases should be put on far faster CPUs, storage, and/or RAM as well as possibly a better db platform and/or better Table and/or db design. I sense the db is on IBM DB2 (which has a world of drawbacks compared to other dbs), was simply migrated from one OS to a new OS, AND is on low-performance hardware. I'm all ears to hear more details about how the technology behind WCG if someone has it.
----------------------------------------

----------------------------------------
[Edit 3 times, last edit by ericinboston at Nov 13, 2024 12:22:31 AM]
[Nov 13, 2024 12:12:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 159   Pages: 16   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread