Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 43
Posts: 43   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 16095 times and has 42 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: very short tasks

Not to worry I moved them over to other projects for now.
[Apr 1, 2012 3:49:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: very short tasks

All my Linux machines have run dry - only getting WUs on Windows machines at the moment.

Are you sure that your Linux machines are really running dry?
During the last days, I experienced several time "system upside-down" on machines crunching HCMD2 only.
  • Boinc-Manager does not show any tasks anymore (as the queue would be empty)
  • No task is running (CPU load is very low)
  • After rebooting the machine, I noticed following:
    • The task queue is full
    • The machine starts again to compute (CPU-load 100%)
    • But one or two tasks run far away from 100% (e.g. over 2'000%)
    • After manually aborting the suspicious tasks, everything is running again perfectly

It could be two reasons for this problem which occurred four times on three different hosts during the last two days:
  • Boinc can experience some troubles by managing a very big task queue (around 2'000 tasks for a 6-core machine with 1 day buffer).
  • Some few HCMD2-tasks could be inaccurate and they could cause incorrect Boinc behaviour.

I would be interested to know if other members do experience similar troubles.
For my-self I support this very particular phase of HCMD2 since I am confident that WCG-techs are not producing work for fun.
Enjoy,
Yves
----------------------------------------
[Apr 3, 2012 7:54:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
HCMD2: very short tasks

There's a button for that problem... BOINC spending more time on managing the task queue and refreshing the screen than the tasks... Hit the "Show active tasks" button in left margin. It is not supposed to be a problem long as the BM is minimizes to system tray [not minimized to start bar], as then there is no screen refresh needed.

BOINCTasks alternate multi-client manager is superior [runs on all 3 main supported WCG platforms], as you can contract tasks by their class in the view. I've got it set to filter the Ready to Start tasks, so the view only shows the total tasks *per science app* and their estimated run time total, whilst showing the detail for the running jobs. Also BOINCTasks shows in the project view the total estimated run time per core... so you always know how much work there is until the client goes idle. Refresh times can be set. I've got it on low, for reduced use of CPU cycles. Mine has used 9:34 minutes per Task Manager and has been up for 3 days since last boot and monitors remote clients at that too.

--//--
----------------------------------------
[Edit 1 times, last edit by Former Member at Apr 3, 2012 8:48:12 AM]
[Apr 3, 2012 8:14:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HCMD2: very short tasks

Hi SekeRob,
I know BoincTask and I use it since over one year.
The problem I described (btw it happened again today) is not related to boinc manager visualization since no user was logged on the concerned machine - i.e. no boinc GUI active - and BoincTask is not able to connect to this particular machine.
Indeed the full boinc stuff is done.
Again, after reboot, two tasks run over 100% (resp. 475% and 6'200%) and I aborted them. At the same time, I notice that the queue is again around 2'000 tasks big. I reduced again the buffer in order to operate the machine under this limit. I will try to drive it under 1'850. The queue issue is caused by these very short tasks which trouble boinc by evaluating the needed work. From time to time boinc inflates suddenly the queue from one day to a couple of days. Two hosts which computed a mix between HCMD2 and SNTS fetched wok within few hours for over 10 days with a defined buffer on 1.5 day.
Since this problem happened several times in a "reproducible" way, I think that it is important to investigate it closely.
Cheers,
Yves
----------------------------------------
[Apr 3, 2012 5:36:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: HCMD2: very short tasks

It's known... the core client not able to handle queues of several thousands, and the button added to reduce the additional load set by the BM. That said, some asynchronous file handling elements are being introduced in version 7, so this may be a good chance to test on alpha 7.0.24. Think it's a candidate to get pegged for Beta. Been running it since 2 days, but not HCMD2. Get it here: http://boinc.berkeley.edu/dev/forum_thread.php?id=6698&nowrap=true#43219

--//--

edit: added link to official announcement post. Linux build of either 7.0.24 or 7.0.25 coming soon.
----------------------------------------
[Edit 1 times, last edit by Former Member at Apr 3, 2012 5:45:49 PM]
[Apr 3, 2012 5:43:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HCMD2: very short tasks

Again a crash/hang-up last night.
I reduce again the buffer size.
Definitively, 2000 tasks in the queue kills Boinc.
Thank you SekeRob for the information related to version 7.0.24.
Exceptionally I will try to test an early version of boinc as soon as available for Linux.
The experienced problems show how complicated the deployment of GPU-based projects can be, since both the boinc platform at WCG as well as the Boinc clients should be able to deal with the dramatic increasing of tasks to manage.
Yves
---
FYI: my statistic on WCG (My Grid) is with around 26 cores over 1'000 pages (usually it was no more than 50 pages). I would not be surprised if the web and database servers managing the statistic pages will experienced some troubles in the next time.
---
edited FYI
----------------------------------------
----------------------------------------
[Edit 1 times, last edit by KerSamson at Apr 4, 2012 7:24:12 AM]
[Apr 4, 2012 7:08:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HCMD2: very short tasks

HI Yves!
I have no answer to this real problem but I can confirm it does exist, even when running with BOINC Manager only showing active tasks (this is for Rob wink ), but anyway you also said that it happened to you with machines not running BOINC Manager at all.

Since a few days I had it once a day, at various times, and already twice today.
Every time the BOINC client failed I rebooted this quad machine (Ubuntu 64-bit 11.10 + BOINC v6.12.33) and after the restart I also had one or two tasks running beyond 100 % that I had to kill, but I think this is another problem which only shows more easily these days with so short HCMD2 WUs that there is always one which completes during the shutdown.

After each restart I had a look at the message log files for messages happening just before BOINC fails. Unfortunately, before today I think I had not twice the same message and this is why I have not reported it yet: no consistent clues to offer to the techs.
However, every time the messages were related to something going wrong between the server and the client. For example, last times today (0:34:35 UTC and 15:37:43 UTC),
Requesting new tasks for CPU
Can't open client_state_next.xml: fopen() failed
Couldn't write state file: fopen() failed; giving up


I am also running HCMD2 with a 1 day cache (because of announced maintenance interrupts and also because feeding HCMD2 tasks is not always available over a day) but since this a quad the number of queued WUs varies only between 700 and 1,200, and this machine has no main or disk storage contention problems.

PS: I am sorry I cannot offer more error messages for earlier failures, but with so many tasks a day the stdoutdae files can hardly contain more than one day of messages. Messages for this morning's failure are already in the stdoutdae.old file.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Apr 4, 2012 7:23:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mysteron347
Senior Cruncher
Australia
Joined: Apr 28, 2007
Post Count: 179
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HCMD2: very short tasks

I've not had an HCMD2 unit for an age.

Am I using the wrong soap or the wrong bait?
[Apr 5, 2012 12:04:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HCMD2: very short tasks

Hi Jean,
it is nice to read you since so a long time. smile
Your observations fit perfectly with mine. I think, we have to deal with the same issue.
By reading your post, I came on the idea that the log file could maybe represent the cause of the problem. Could a very big log file cause that boinc does not succeed to manage it correctly?
At this time, I experience the problem mostly on two machines: Athlon II x2 (3.1 Ghz), Phenom II x6 (3.2 GHz).
One the Phenom II x2, the average running time for a WU is 3.5 minutes; i.e. over 2'400 WU /day (without crash), around 1.7 WU/min to manage (download, compute, upload, report).
Is this (reporting) workload the reason for the problem? ...
Yves
----------------------------------------
----------------------------------------
[Edit 1 times, last edit by KerSamson at Apr 5, 2012 12:06:17 AM]
[Apr 5, 2012 12:05:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HCMD2: very short tasks

Hi Mysteron347
at this time, I only run HCMD2 on mine hosts.
Yves
----------------------------------------
[Apr 5, 2012 12:08:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 43   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread