Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Microbiome Immunity Project Thread: Something's broken in Rosetta |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 9
|
Author |
|
jetkins
Cruncher US Joined: Nov 9, 2006 Post Count: 7 Status: Offline Project Badges: |
Hi, folks.
----------------------------------------I run several instances of WCG, including a couple which I share with Charity Engine. Recently, my primary workstation has encountered several 0xc0000005 errors when running Rosetta work units from Charity Engine. I notified their support folks, and they have temporarily disabled Rosetta work units while they investigate the problem. The reason I'm mentioning it here is because my work laptop, which is running the WCG client and is exclusively dedicated to WCG projects, this afternoon encountered an identical 0xc0000005 error while processing an MIP work unit, and looking at the application log, I note that the failing process was wcgrid_mip1_rosetta_intel86. Methinks this is more than a coincidence, and I have subsequently disabled MIP in my list of WCG projects. I love providing resources for stuff like this, but not at the expense of possible impact to my systems' stability. Once Rosetta fixes their issues, I'll enable MIP, but not before. |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12120 Status: Offline Project Badges: |
There are plenty more projects on WCG, especially now that scc is back.
Mike |
||
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 447 Status: Offline Project Badges: |
Just for the record MIP runs on Rosetta software.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Actually, something is completely nuts with MIP. A volunteer noted observation that one MIP job came with a 97 MEGABYTE file and I thought it may have been some database file used by multiple MIP tasks. Well, since I had suspended a task and forgot about it, this morning my queue was empty but for HST and ARP which I only allow 1 at a time. After unsuspending/resume 6(six) MIP came down. Each with a variable input file between 30 and 71 MEGABYTE. That much for just 2 hours or less of crunching. NUTS. Removed from the job selection.
|
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: |
Actually, something is completely nuts with MIP. Works for me. I have a fast Internet connection, and file size is never a problem. More to the point, I never run more that two MIP at a time, along with the usual selection of other WCG work. And you are right, there is some interaction with MIP and Rosetta, though I have never seen errors on either. But I generally avoid running the two together, since there is some slowdown there also (but not as bad as running more than two MIP). EDIT: You may have had memory problems due to the recent large Rosettas. The developmental ones (as for COVID-19) can run up to about 3 GB. They then trend downward to a few hundred MB. [Edit 1 times, last edit by Jim1348 at Mar 12, 2020 2:35:13 PM] |
||
|
peugeotforever
Cruncher Joined: Oct 30, 2014 Post Count: 7 Status: Offline Project Badges: |
I've had also some exception errors in the past. That is no more since I lowered the max mem usage by boinc.
----------------------------------------Once i had a 250 mb upload for mip1. It was insane but hopefully neccesary. and that while i have data-usage issues and noticed about downloading the following: First boinc-client downloads the project files, executables and so on. next it will download some task containing 2 numbers mip1_x_y next it downloads the first number after mip1 x some files for more y tasks. (they used to be about 30 mb max but over time they have gotten bigger now like 90 mb ) And it downloads some small files for the second number y. When downloading multiple tasks to fill task que, needing for example 3 tasks: mip1_1_6 mip1_2_3 mip1_3_4 creates more traffic than downloading: mip1_2_1 mip1_2_2 mip1_2_4 because the number x is more the same and only y varies on the client I know more y numbers can be downloaded to a client, the tasks and files numbers are there because i have noticed them going to different computers at almost the same time. but instead it downloads some other x number creating another 90 mb traffic. I wonder if data traffic can be limited somehow some if the server prefers to not changes the x number for a given client but only change the y number as much as possible when a subsequent task is requested by the same client [Edit 1 times, last edit by peugeotforever at Mar 14, 2020 11:36:41 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Files are only downloaded once. If a WU can use an existing file, it will not be downloaded again. There have been times when I have received X WUs but no files were downloaded. That's because they were already on my machine.
|
||
|
peugeotforever
Cruncher Joined: Oct 30, 2014 Post Count: 7 Status: Offline Project Badges: |
Server side there is (i think) no optimization in deciding wich wu to send based on the files that are allready downloaded(or download in progress) on the client.
----------------------------------------i name mip1_x_y_z the big project files are related to the number in X. For example completing 4 wu(x 1 to 2, y 1 to 2), on 2 different computers. forcing the distribution like: computer1: mip1_1_1 mip1_1_2 computer2: mip1_2_1 mip1_2_2 is better than (randomly?)distributing: computer1: mip1_1_1 mip1_2_1 computer2: mip1_1_2 mip1_2_2 in the first case the big files related to x are only downloaded once and then used again on the same client. The second case the big files are downloaded not once but because of the way of distribution twice(once per client, 2 total). [Edit 1 times, last edit by peugeotforever at Mar 14, 2020 3:19:34 PM] |
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: |
But I generally avoid running the two together, since there is some slowdown there also (but not as bad as running more than two MIP). I think I can update my numbers a bit. On my Ryzen 1700 and 2700, I can do three MIP at a time with no slowdown. On my Ryzen 3700x, it looks like at least four; I have not tried more. My original numbers came from my Haswells. And maybe they have changed the work units (or data) a bit. |
||
|
|