| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 18
|
|
| Author |
|
|
Luke081515
Cruncher Joined: Apr 16, 2017 Post Count: 8 Status: Offline Project Badges:
|
Some weeks ago, I installed BOINC on my server. Normally, according to worldcommunitygrid, this server returns between 9 and 12 days CPU time per day. Some day, the server crashed, BOINC started normally, but since that, there is a weird behaviour. The server returns now results with about 6 minutes runtime, has several processes who are taking way longer then normally (normal tasks took about 2 hours, now 10 hours, some tasks did have expected runtime of 80d and more). I tried already: Limiting projects, terminating tasks with 80ds and more runtime, and I uninstalled boinc, and reinstalled it (using apt-get purge too), but still there is this weird behaviour. BOINC is using the same amount of ressources as before, but returns less results. He is staring several tasks, but pauses them, and starts processing others, also if a task has already 100%, he is pausing it, and does not start uploading at least at a few tasks. Can somebody help me with this?
|
||
|
|
katoda
Senior Cruncher Poland Joined: Apr 28, 2007 Post Count: 172 Status: Offline Project Badges:
|
First I would try to reset local settings, as a second try - to reset the entire project. Clearly something wrong happened to one of BOINC files, so better to refresh them from the scratch.
----------------------------------------![]() |
||
|
|
Luke081515
Cruncher Joined: Apr 16, 2017 Post Count: 8 Status: Offline Project Badges:
|
That was what I've tried, I cleard all settings with apt-get purge for BOINC, and installed it again, he pulled the data from worldcommunitygrid again. He is using the default local settings with the default values (but I switched CPU to about 70%, as before), but the instance has still the same behaviour. My other clients (another server + 2 laptops) run fine with my config, (the second server is 4 times smaller as the one I have problems on). I opted out of beta testing too, but I still have this problems...
|
||
|
|
katoda
Senior Cruncher Poland Joined: Apr 28, 2007 Post Count: 172 Status: Offline Project Badges:
|
Never tried uninstalling BOINC on Linux, so my question is if the uninstallation removes local config files as well? Did you have to register (give project name, user and password) after installing the client again?
----------------------------------------![]() |
||
|
|
Luke081515
Cruncher Joined: Apr 16, 2017 Post Count: 8 Status: Offline Project Badges:
|
Yeah, apt-get purge cleans all config, I needed to add the project, the data, the config etc again.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
On which projects are you seeing this behavior? If you are running the FAH Phase 1 project, they have work units that are VERY QUICK and some that run a lot longer (RIGID vs NON-RIGID). This has been known to "mess up" the BOINC scheduler which can take a few days to stabilize.
|
||
|
|
Luke081515
Cruncher Joined: Apr 16, 2017 Post Count: 8 Status: Offline Project Badges:
|
Before this happened, the server was working on Mapping Cancer Markers mainly, these tasks took 2-4 hours at maximum. I'm opted in to all currently available projects, and I'm expecting these long runtimes on all projects, as you can see int he screenshot. There you can see one task which is completed, but does not get uploaded, and the runtime of the other tasks. (The interface is in german, so I marked the column with the remaining runtime.
I uploaded the screenshot here: https://www.dropbox.com/s/8wxpwmuwjsupx0g/BOINC.PNG?dl=0 |
||
|
|
Col323
Senior Cruncher Joined: Nov 4, 2008 Post Count: 372 Status: Offline Project Badges:
|
As Doneske said, perhaps a small FAH1 unit slipped in there and messed with your estimated completion times. I also see Outsmart Ebola Together work in your queue. In my experience, those can vary in duration even more than FAH1, which may also have impacted your estimated completion times.
I say let it run a few days and it should sort itself. (Until the next batch of tiny work. ;-) ) |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Regardless of the estimated run-time, if the machine is acting normally, the tasks should finish in the same amount of time as before. If tasks are using more elapsed time than before, you might possibly have a background task interfering with the science tasks. Another possibility is that the whatever caused the "server crash" is still present and is causing issues with your work units. Hardware problems would be my guess. Do you know why the server crashed?
|
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
As Doneske said, perhaps a small FAH1 unit slipped in there and messed with your estimated completion times. I also see Outsmart Ebola Together work in your queue. In my experience, those can vary in duration even more than FAH1, which may also have impacted your estimated completion times. I say let it run a few days and it should sort itself. (Until the next batch of tiny work. ;-) ) As DCF is locked to 1.000000 by WCG on standard clients, meaning the client does not adapt/adjust runtime to real-time throughput, the only messing happening is server driven. Combined with the lapse rate between work generation, the point where fpops are slotted in, and current average runtime used as base for setting those fpops, at science level, makes for chaos on any science that has large variability in their runtime durations, HST1 neither a stranger to the issue. |
||
|
|
|