| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 7
|
|
| Author |
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Greetings..
----------------------------------------Everything was working fine - then I went on vacation. ( Have you heard this one before?) I am running Linux - and when I came back and did updates, BOINC could not see the GPU card. After 4 days of messing around I rebuilt (twice) the system with a different flavor of Ubuntu and started over with a clean slate. Now, I BOINC recognizes the GPU but other problems popped up. I changed the default number of CEP programs from 1 to 2 and all was well for a while. At the same time I started getting BOINC Manager timeouts when it was communicating to Boinc Client. For example, going from the projects page to the Tasks page did not happen for a while - and I got the 'Communicating with Boinc - Please wait.' message. It took about 20 seconds to come up. Using a linux prompt, I did a 'ps -ef' that showed 6 tasks with CEP in the names that did not show up on the tasks page. I suspended all tasks (boinc client should close all wu tasks as well) - the the 6 tasks were still there - not running - call them orphans. A reboot cleared matters - there were no CEP tasks running. What are probabilities that the two problems are related??? Note that 5 of 6 CEP tasks completed with error in the last day (29June1015) configuration stuff: ---------------------- BOINC version 7.6.2 ( from the BOINC PPA) wxwidgets version 3.0.2 CPU has 8 kernels.\I'm running 7 CPU tasks and have 1 reserved for GPU loading and unloading. The GPU is running 1 task. May be running system monitor - but usually no other tasks running. Memory - 11.6 GiB of memory - running the 7 tasks takes only 8.9% I had seen the temporary hang on all 3 linux systems: Linux Mint Ubuntu Mate 15.04 Ubuntu 14.04 (Gnome) Another factor that may effect... I had attached to 7 projects - 2 were GPU only: DENIS@Home FIND@Home World Community Grid Pogs Malaria Control Einstein (GPU) SETI (GPU) Another factor that may effect... I live in Florida and Thunderstorms/Lightning has caused resets of internet 3 or 4 times today. After the CEP WU have finished, I can not recreate the timeout problem. Question: Is this really a problem, or just something to live with while crunching CEP WU, or something else? I plan on doing full memory diagnostics when the wu finish. Thanks in advance to the many helpers, crunchers and staff.... Jay ![]() |
||
|
|
Yarensc
Advanced Cruncher USA Joined: Sep 24, 2011 Post Count: 136 Status: Offline Project Badges:
|
The tasks being left in memory could have been from two things: either you have "Leave Applications in Memory While Suspended" checked on in your profile, or the work units hadn't gotten to the first checkpoint, in which case they stay in memory anyways. If either of these are the case, then I'd say the two problems probably aren't related.
Do you know if the rest of the system was hung when you were experiencing these lags? Because you might have had a (or multiple) CEP tasks starting up or checkpointing. This could have been causing enough disk activity to create your 20 second hangup. A related question: does is it always that unresponsive when you're navigating around BOINC Manager, or just occasionally? You could try using a different manager such as BOINCTasks |
||
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Thanks for the answers!
----------------------------------------I didn't think of the "Leave Applications in Memory While Suspended" because I usually disable it. BUT, I had done a fresh install and had not set up the venue, or had read in my settings... The rest of the system was not hung while the manager was waiting on the client. When? that's the answer that drives me crazy: Sometimes. Most of the time I can zip through the tabs with no wait. I remember doing a project remove that kicked it off (waiting). It froze and took about 3 or 4 minutes to clear up. I tried to recreate that just now - get a project and then remove it. Boinc manager hung on getting the project after the name and password were entered Tue Jun 30 11:54:32 EDT 2015 j$ date Tue Jun 30 11:57:24 EDT 2015 Then an error msg said it could not communicate to the project. (I can call up google in less that 1/2 second.) Tried again. Success. Took about 5 seconds to respond and load the new project (denis@home) Now, try to reset and remove... Hung on reset... date Tue Jun 30 12:18:26 EDT 2015 to $ date Tue Jun 30 12:19:03 EDT 2015 -- about 45 seconds - it took me a while to issue the first 'date' and to remove... No hang. took about 1 second. This is not related to CEP2 - they were not downloaded Running: 3 Ebola, 1 Genome Mystery,, 1 pogs, 1 Einstein (GPU) Still interesting. I'll goto the BOINC forum.. Thanks again, Jay ![]() |
||
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Update..
----------------------------------------Set number of tasks to 1; still errors Ran memory test. No errors. Should I run disk tests? (Also, still getting the communication wait messages that clear up in 20 seconds..) Any suggestions for debug? Thanks, Jay ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There's some discussion over at devs on responsiveness of the BM and large tasks / many files involved.... CEP2 has 6700 or so per job... picture the effort to set up and close or resume. I keep an exclusive partition and periodically defrag such that the slot and jo files go to contiguous space. Of course there's different logic to Ramdrives and SSDs. Read that the 4.2 Linux kernel will have improved handling of latter.
|
||
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Thanks to all for the insights.
----------------------------------------Jay ![]() |
||
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Was linked to a problem with a loose cable.
----------------------------------------Sorry, Jay ![]() |
||
|
|
|