Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 14
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Recently I note that jobs are now being assigned that have estimated run times of 4 to 6 hours. Why such long jobs? One of these jobs was showing as running but no change in CPU time or any other reference. When I checked this on the consol, this job had been running for over 40 hours of CPU time.
I went back to check to see what my choosen parameters of reference are and found that they had all changed so that I now was being assigned all types of jobs not just the two I selected. Is this a common occurence? I have found that long running jobs seem to be the ones that get into difficulty. My solution is to abort any job I see in that kind of time frame. I hope this is just an anomaly and will not last for long. I realize that I have fast machine but that should not change the job sets. Mac Pro Quad-Core Intel Xeon 3 GHz 5 GB memory |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Hi stupidname,
----------------------------------------All jobs get an estimated number of iops/flops when send to a client. On basis of these estimates, the client computes the expected duration based on passed historic performance parameters. Problem is, non-deterministic calculations can end up doing many more than expected, extending the duration extensively. 4-6 hours for FA@H and HPF2 sound quite reasonable for the present work as they have more attempts and seeds packed since early this week. GC sizes remain unchanged, but fluctuate quite a bit in duration based on complexity. Your preferences are not being changed and if they were, that could have been a program fluke, but given that there are several places to select choices - My Projects (for overall control) and Device Profile (for client level control), it's probable you looked at the other. Even the long running jobs are fine 99,999%, it's the patience being tested very hard. Aborting a job does not help anyone as you loose the computing time and the WCG technicians will not know why it happened. We want you to post the BOINC message log that show any anomaly. Also we'd like you to look at the "Result Status" page and click on the suspect jobs Work Unit Name (first column) to see the quorum. If any other jobs were returned of the standard set of 3 (all projects except HPF2 which has standard 19 presently), and they were returned without error, it's more likely a random or local problem arose. Again WCG technicians like to see a description and a piece of the message log and client information like if the CPU is/was clocking up time for the job. What project running 40 hours and how far had the job progressed and were graphics viewed during any time? And yes, FA@H and HPF2 Sekerob PS Where normally HPF2 runs 6+ hours on my C2D, yesterday had a few which did 4.5 hours..... no duration is fixed.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello stupidname,
I went back to check to see what my choosen parameters of reference are and found that they had all changed so that I now was being assigned all types of jobs not just the two I selected. Is this a common occurence? The only change that has recently been made is for users who had selected 'Help Defeat Cancer'. When the project ended they were reset to 'All Projects'. But we have been wrestling with the database since early in May when we started being hit with errors after an update, so we may have done something to your profile by mistake. What changes do you see in your profile? Lawrence |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
.... forgot about that one
----------------------------------------![]() Added: About 2,000 members had that profile condition.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at May 12, 2007 12:33:26 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks for taking the time to provide some clues to me. I appreciate the time being given.
The job at 40+ hours was still clocking time as revealed on the Console. It was not doing anything as shown by Boinc. To determine all the details as you suggest is very difficult and time consuming. The name of the job shown on the Console is only a high level discription. In this case I had six WCG jobs runnning all showing the same name. One has to Inspect each job and then compare CPU Times showing in Boinc by way of elimination to determine which job is actually at fault. Now after doing all that I am not going to go to the WCG web site which I find to be very confusing and difficult to follow to do another elimination routine to see what that site might be telling me. When I made this post I had just gone through hours of the WCG web site not allowing access. The actual contact to get logged in would just time out at the web site asking to try later. I might be retired but I am not looking for a full time job. I run other jobs under Boinc and have found that when things go screwy I just suspend that site and run something else until things sort out. Right now I have 8 jobs running, all WCG jobs and only one in the que for some strange reason (usually there are four or six in the que.). The one in the que is estimated at 4:49:14 hours. I will watch it if I can identify it when it is running. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello stupidname,
I will watch it if I can identify it when it is running. That will not be easy unless you use BOINC Manager. Otherwise, all you will see is a number of same-name application programs. The easy way to see what went5 wrong is to read the error messages stored in the 'Results Status' page with each result. Do it the easy way. If it has been running too long, abort it and then look at the 'Results Status' page to see if anybody else in your quorum had that problem. Lawrence |
||
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm running the UD agent on an old AMD K6 overclocked to about 900MHz. I've just dl'd a job that is at 2% after 2 hours (roughly). That means I'm looking at a run time of at least 4 days - probably more like 5 or 6 since it's actually reading 1% and I'm rounding up.
----------------------------------------I have to run UD agent because BOINC seems to cause problems as a screen saver on older machines with AGP (ATI brand) graphics cards and I need the screen saver to run so I can tell at a glance if everything is copasetic. Of the 6 machines running now, only about half are used for other purposes and I don't always remember to check the device statistics to make sure everyone is working. ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello twilyth,
Regardless of the official minimum requirements, I would only run the project with the shortest time (Genome Comparison). HCMD is working through really big proteins now, and will continue to do so until the next phase, which is months away. Even so, it sounds as though you will get your current work unit done in time, whatever project it is for. Lawrence |
||
|
twilyth
Master Cruncher US Joined: Mar 30, 2007 Post Count: 2130 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello twilyth, Regardless of the official minimum requirements, I would only run the project with the shortest time (Genome Comparison). HCMD is working through really big proteins now, and will continue to do so until the next phase, which is months away. Even so, it sounds as though you will get your current work unit done in time, whatever project it is for. Lawrence Yeah - if it takes a week to run - I don't really mind. It's nice to see the stats grow on a daily basis but getting a big bump every once in while is worth waiting for. Now if I could just get real time stats downloaded directly into my brain . . . Ahhhh . . . Nirvana!!! ![]() ![]() |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
I think so-far one (female) indicated to be perfectly okay with the extreme long run times...... don't you love the satisfaction to work thru a 100 hour unit, hibernate, resume, hibernate, resume, and that enormous flush on the 7th day leaving all your team mates in the exhaust fumes as you're speeding by riding that ol P3
----------------------------------------![]() BTW Lawrence, south paws first check the Result Status page of BOINC to see if others returned a WU without error and how long it took them. If I see others delivering with Pending Validation, I'm running on. ![]() cheers PS, my slowest/longest/toughest record stands at 110 hours on UD agent for a HCMD.... did an occasional peak to see in the graphics screen if percentages moved. There you can see even 0.1%. With BOINC one even sees 0.001% in the Tasks Tab section.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
![]() |