Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 86
Posts: 86   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 10550 times and has 85 replies Next Thread
Vuj
Cruncher
Joined: Nov 21, 2004
Post Count: 33
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Consolidated Feature Wish List

36. Ability for the Program to be used as a system service so Win 2K/XP users don't have to login to start the program. (got it schedule task'ed to run on boot currently)

Check this webpage http://www.tacktech.com/display.cfm?ttid=197
OK, there is a way to run the client as a service or on the second processor but does anybody know a way to check the progress of the task (since there is no icon in the task bar)?
May be it is easy for the guys from WCG to write a little program which can do this?

If you're running XP Pro, go to Administrative Services and find where you setup the WCG to run as a service, right click and select properties.
The second tab allows interaction with desktop. HTH
[Jun 10, 2005 2:47:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Viktors
Former World Community Grid Tech
Joined: Sep 20, 2004
Post Count: 653
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Consolidated Feature Wish List

Regarding Linux ETA, we have promised this before the end of the year. If things go smoothly, you might be pleasantly surprised much sooner. That is about all I can say right now. Sorry.
[Jun 11, 2005 11:00:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
debrouxl
Advanced Cruncher
France
Joined: Dec 31, 2004
Post Count: 61
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Consolidated Feature Wish List

It would be great to have a client that makes a save of the current WU every, say, 10%, so that even that if the WU gets corrupt and aborted, a part of it can be returned nevertheless ?
I've lost a dozen of ultra-long WUs (after more than 20h and as far as 100 hours of crunching, but they would have been shorter and less likely to be corrupt if the air fans had been cleaned up before - I have a notebook P4A 2.6 GHz) due to corruption. That is really counter-productive and annoying.

A simple shutdown & reboot can corrupt the WU, like I saw more than once, and no later than yesterday with a WU that aborted after ~22 hours @ ~80% (WU 2615455). The UD Monitor reports that when the agent is started again, it saves the WU once, and some time later, it aborts.
----------------------------------------
[Sep 9, 2005 10:10:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Viktors
Former World Community Grid Tech
Joined: Sep 20, 2004
Post Count: 653
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Consolidated Feature Wish List

Some work units do take much longer (10 times or more) than others. However, I have seen run-away processes in some machines which consume 100% of the CPU and since they don't run at lowest priority, Rosetta gets no CPU time (although the "Run time" continues to build). "Run time" is the wall clock time during which the agent is able to run as long as no other processing is occurring on the machine. If you see no progress for many hours, I would use the Task Manager to check for something else which might be consuming CPU time. I have seen virus scan software get stuck in an infinite loop, print spoolers consume 100% of the cpu while waiting for an unconnected printer, and other stranger things.

Rebooting does not currupt a work unit. Rosetta simply resumes from the last checkpoint. Checkpoints normally occur every several minutes (depending on the speed of the machine) and at most after about an hour (or more on slower machines) if the particular protein fold is non-converging. The progress percentage is updated at the time of the checkpoint.

If you are using UD monitor, what can happen is that a work unit can timeout because it has taken too long (2 weeks run time, 3 weeks wall clock time). If something else modifies the files in any way (including a sector going bad on your disk), the agent simply quits on the current work unit and gets new work. This also happens also if Rosetta crashes for any reason. Normally, crashes are a sign of some hardware failure or running out of virtual memory. You might want to increase the maximum virtual memory paging file size by 200MB or more to be on the safe side. Otherwise, most relevant hardware problems can be discovered using tools such as memtest86, scandisk, and the "hot cpu tester". See: Tools

If you are seeing what you call "work unit corruption," exactly what are the symptoms? Can it be explained by any of the above?
[Sep 9, 2005 3:03:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
debrouxl
Advanced Cruncher
France
Joined: Dec 31, 2004
Post Count: 61
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Consolidated Feature Wish List

> Some work units do take much longer (10 times or more) than others.
Yes.
> If you see no progress for many hours, I would use the Task Manager to check for something else which might be consuming CPU time. I have seen virus scan software get stuck in an infinite loop, print spoolers consume 100% of the cpu while waiting for an unconnected printer, and other stranger things.
I know that, but none of those apply here, sorry.

> Rebooting does not currupt a work unit.
Of course, but I saw more than once a WU being corrupted right after (within 1 or 2 checkpoints) the PC is stopped and rebooted (actually, the agent restarted), and that already happened before installing a Linux on my HD. Anyway, my Linux cannot be the cause of corruption: it has read support for NTFS, but not write support.
> Rosetta simply resumes from the last checkpoint. Checkpoints normally occur every several minutes (depending on the speed of the machine) and at most after about an hour (or more on slower machines) if the particular protein fold is non-converging. The progress percentage is updated at the time of the checkpoint.
Yes.

> If you are using UD monitor, what can happen is that a work unit can timeout because it has taken too long (2 weeks run time, 3 weeks wall clock time).
Yes, but this never happened to me, as I always kept a reasonable number of slots (formerly 6, now 4) and never got too many long WUs at a time on the computer.
> If something else modifies the files in any way (including a sector going bad on your disk), the agent simply quits on the current work unit and gets new work.
Well, the disk is scanned from time to time, and it has never had any bad sectors.
> This also happens also if Rosetta crashes for any reason.
Well, it never crashed here in more than 200 WUs. That said, I noticed at least once that a WU aborted a short period of time after another application crashed (that was the application's fault - it was a very buggy beta - not the hardware's).
> Normally, crashes are a sign of some hardware failure or running out of virtual memory. You might want to increase the maximum virtual memory paging file size by 200MB or more to be on the safe side.
I have already run out of VM while WCG was running, due to a buggy GreaseMonkey under Firefox - Firefox had allocated more than 350 MB of VM - but Windows XP smoothly increased the amount of VM, and the WU did not abort.

As far as the entire computer being too hot when saving a WU... maybe. Before I cleaned it up (I made a topic about that, on the Member-to-member forum IIRC), it did overheat all the time, and abort most WUs: the UDMon logs shows it pretty well. When working in a very hot office, it aborted a number WUs the following way: stop the PC at the end of the day, reboot, WU is soon corrupted.

> If you are seeing what you call "work unit corruption," exactly what are the symptoms?
The WU aborts, a directory and a number of files are erased, and the cache slot becomes empty, while there is no seemingly valid reason - especially, no timeout (wall clock time, non-convergence, etc.). I know this is different from a timeout, because I have seen a WU aborting smoothly (hit the maximum time between two checkpoints), and a small result was returned. That WU was definitely too long for my older computer.

I estimate my corrupted WUs are at least three weeks of CPU time total since 2004/12/31, peaking just before I cleaned the computer, which was in dire need of cleaning. If I had had "reliable checkpoints" (what I'm suggesting: checkpoints every 10% or so, which can be sent to the server nevertheless when the WU is corrupted), well, I and WCG would not have lost all that crunching time...
Actually, I did try to restore files from the UDMon backups once, and it worked for some time (the WU could go further than the percentage it aborted at), but it aborted again some time later, and I gave up.
My point is that such a feature could be built in the official software.
----------------------------------------
[Sep 10, 2005 10:33:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Consolidated Feature Wish List

Regarding Linux ETA, we have promised this before the end of the year. If things go smoothly, you might be pleasantly surprised much sooner. That is about all I can say right now. Sorry.





WOOOOOOOOOOOOOHOOOOOOOOOOOOOOOOOOOOOOO

Bring it on
[Sep 11, 2005 5:35:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
debrouxl
Advanced Cruncher
France
Joined: Dec 31, 2004
Post Count: 61
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Consolidated Feature Wish List

I tried again backuping a WU from the UDMon backups. This time, the WU aborted for no reason within seconds after a save, while I was doing nothing fancy with my computer (Explorer, etc.)...
I restored the backup from ~40 minutes earlier. So far, it has crunched more than 4 hours since it started working again on that WU, and is now ~15% further than it was when it aborted (now at ~88%). We'll see if it turns into a returned result. The WU number is 2644630.
----------------------------------------
[Sep 13, 2005 3:06:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
retsof
Former Community Advisor
USA
Joined: Jul 31, 2005
Post Count: 6824
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Consolidated Feature Wish List

Maybe a blank screen after some time in confiration of screen saver like seti client. So it wont waste time showing images. biggrin

Alther replied to a query like this saying that the screensaver can be changed using the normal Windows commands without bothering the Rosetta program. In other words, the screen saver is optional.

Many of us take out the WCG screensaver entirely and run with screensaver (none). That gives the most percentage to crunching. You can always rollover the icon to check the status, or click on it to go to the large screen. The energysaving feature of this monitor is set to turn it off after 30 minutes .... good for overnight work.
----------------------------------------
SUPPORT ADVISOR
Work+GPU i7 8700 12threads
School i7 4770 8threads
Default+GPU Ryzen 7 3700X 16threads
Ryzen 7 3800X 16 threads
Ryzen 9 3900X 24threads
Home i7 3540M 4threads50%
----------------------------------------
[Edit 1 times, last edit by retsof at Sep 13, 2005 4:20:55 PM]
[Sep 13, 2005 4:18:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Consolidated Feature Wish List

PepeG seems to have solved the “Unable to Process Task Data - Backing Off” error at http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=1858#31030
This problem shows up right after installation. We could add an explanation to the long list under 'Trouble-shooting' at http://www.worldcommunitygrid.org/help/viewTopic.do?shortName=trouble

Looking over this page, the section titled 'Why does my PC show 100% CPU use?' could be followed by one that says 'Why does my PC show 50% CPU use?' that reassures members whose computers use hyperthreading. The next section titled 'My CPU is overheating running while running the agent.' could contain a link to http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=2683 , the post by Viktors explaining the CPU Throttle Feature. Also, the section title could be changed from 'My CPU is overheating running while running the agent.' to 'My CPU is overheating while running the agent.' without running any risk of confusing the reader.

Added: Troble-shooting is a gigantic page, but if you try to get to it from HELP, it is treated as a title and you can only reach a sub-page. And it can be difficult to get from the sub-page to Trouble-shooting. Try it and see for yourself. shock This is not acceptable. We need to make it possible for prospective members having trouble to at least reach our help pages. Otherwise they will probably grow too frustrated and give up in disgust.
----------------------------------------
[Edit 2 times, last edit by Former Member at Sep 19, 2005 5:38:55 PM]
[Sep 19, 2005 5:26:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
smile Re: Consolidated Feature Wish List

I agree about the max processor load. I was running this on my work laptop, and ended up uninstalling it because it kept overheating to the point of shutting itself off.
[Sep 29, 2005 3:49:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 86   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread