Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1828 times and has 6 replies Next Thread
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
3 CEP tasks were orphaned -- Linked to another problem?

Greetings..

Everything was working fine - then I went on vacation.

( Have you heard this one before?)

I am running Linux - and when I came back and did updates,
BOINC could not see the GPU card.
After 4 days of messing around I rebuilt (twice) the system with a different flavor of Ubuntu and started over with a clean slate.

Now, I BOINC recognizes the GPU but other problems popped up.

I changed the default number of CEP programs from 1 to 2 and all was well for a while.

At the same time I started getting BOINC Manager timeouts when it was communicating to Boinc Client. For example, going from the projects page to the Tasks page did not happen for a while - and I got the 'Communicating with Boinc - Please wait.' message. It took about 20 seconds to come up.

Using a linux prompt, I did a 'ps -ef' that showed 6 tasks with CEP in the names that did not show up on the tasks page.
I suspended all tasks (boinc client should close all wu tasks as well) -
the the 6 tasks were still there - not running - call them orphans.

A reboot cleared matters - there were no CEP tasks running.

What are probabilities that the two problems are related???

Note that 5 of 6 CEP tasks completed with error in the last day (29June1015)

configuration stuff:
----------------------
BOINC version 7.6.2 ( from the BOINC PPA)
wxwidgets version 3.0.2

CPU has 8 kernels.\I'm running 7 CPU tasks and have 1 reserved for GPU loading and unloading. The GPU is running 1 task. May be running system monitor - but usually no other tasks running.

Memory - 11.6 GiB of memory - running the 7 tasks takes only 8.9%

I had seen the temporary hang on all 3 linux systems:
Linux Mint
Ubuntu Mate 15.04
Ubuntu 14.04 (Gnome)

Another factor that may effect...
I had attached to 7 projects - 2 were GPU only:
DENIS@Home
FIND@Home
World Community Grid
Pogs
Malaria Control
Einstein (GPU)
SETI (GPU)

Another factor that may effect...
I live in Florida and Thunderstorms/Lightning has caused resets of internet 3 or 4 times today.

After the CEP WU have finished, I can not recreate the timeout problem.

Question: Is this really a problem, or just something to live with while crunching CEP WU, or something else?

I plan on doing full memory diagnostics when the wu finish.

Thanks in advance to the many helpers, crunchers and staff....
Jay
----------------------------------------

[Jun 30, 2015 12:27:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Yarensc
Advanced Cruncher
USA
Joined: Sep 24, 2011
Post Count: 136
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 3 CEP tasks were orphaned -- Linked to another problem?

The tasks being left in memory could have been from two things: either you have "Leave Applications in Memory While Suspended" checked on in your profile, or the work units hadn't gotten to the first checkpoint, in which case they stay in memory anyways. If either of these are the case, then I'd say the two problems probably aren't related.

Do you know if the rest of the system was hung when you were experiencing these lags? Because you might have had a (or multiple) CEP tasks starting up or checkpointing. This could have been causing enough disk activity to create your 20 second hangup. A related question: does is it always that unresponsive when you're navigating around BOINC Manager, or just occasionally?

You could try using a different manager such as BOINCTasks
[Jun 30, 2015 2:00:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 3 CEP tasks were orphaned -- Linked to another problem?

Thanks for the answers!

I didn't think of the "Leave Applications in Memory While Suspended" because
I usually disable it. BUT, I had done a fresh install and had not set up the venue,
or had read in my settings...

The rest of the system was not hung while the manager was waiting on the client.

When? that's the answer that drives me crazy: Sometimes.
Most of the time I can zip through the tabs with no wait.
I remember doing a project remove that kicked it off (waiting).
It froze and took about 3 or 4 minutes to clear up.

I tried to recreate that just now - get a project and then remove it.

Boinc manager hung on getting the project after the name and password were entered

Tue Jun 30 11:54:32 EDT 2015
j$ date
Tue Jun 30 11:57:24 EDT 2015
Then an error msg said it could not communicate to the project.
(I can call up google in less that 1/2 second.)

Tried again. Success. Took about 5 seconds to respond and load the new project (denis@home)

Now, try to reset and remove...
Hung on reset...
date
Tue Jun 30 12:18:26 EDT 2015
to
$ date
Tue Jun 30 12:19:03 EDT 2015 -- about 45 seconds - it took me a while to issue the first 'date'

and to remove...
No hang. took about 1 second.

This is not related to CEP2 - they were not downloaded
Running: 3 Ebola, 1 Genome Mystery,, 1 pogs, 1 Einstein (GPU)

Still interesting.
I'll goto the BOINC forum..

Thanks again,
Jay
----------------------------------------

[Jun 30, 2015 4:28:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 3 CEP tasks were orphaned -- Linked to another problem?

Update..
Set number of tasks to 1; still errors
Ran memory test. No errors.
Should I run disk tests?
(Also, still getting the communication wait messages that clear up in
20 seconds..)

Any suggestions for debug?

Thanks,
Jay
----------------------------------------

[Jul 1, 2015 12:38:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 3 CEP tasks were orphaned -- Linked to another problem?

There's some discussion over at devs on responsiveness of the BM and large tasks / many files involved.... CEP2 has 6700 or so per job... picture the effort to set up and close or resume. I keep an exclusive partition and periodically defrag such that the slot and jo files go to contiguous space. Of course there's different logic to Ramdrives and SSDs. Read that the 4.2 Linux kernel will have improved handling of latter.
[Jul 1, 2015 1:59:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 3 CEP tasks were orphaned -- Linked to another problem?

Thanks to all for the insights.
Jay
----------------------------------------

[Jul 3, 2015 8:03:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 3 CEP tasks were orphaned -- Linked to another problem?

Was linked to a problem with a loose cable.
Sorry,
Jay
----------------------------------------

[Jul 8, 2015 3:35:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread