Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Retired Forums Forum: UD Windows Agent Support [Read Only] Thread: Wide variance in my results. Is it normal? |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 11
|
Author |
|
cchilaquil
Cruncher Joined: Apr 5, 2007 Post Count: 5 Status: Offline Project Badges: |
In the last two weeks or so I have been experiencing wide variances in my results. For example, I registered a very low result last monday, at 12K points, today it is 20K, and I have been unable to reach the 22-26K that I was doing in average daily. I have not modified much in my setup, except for two devices that were upgraded in memory. I have no idea if the problem is limited to just one platform (I run several devices, about half and half UD and BOINC, using Windows XP and Linux), but I am posting here because the only change was done on machines running UD. Perhaps it is a strange but normal variance? Or a glitch or something else? Any one would have suggestions on what to check?
Best regards, thanks on your attention. Chilango Chilaquil |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It sounds like normal variance to me.
One way to get some more detailed information would be to check your device statistics page: http://www.worldcommunitygrid.org/help/viewTopic.do?shortName=points#24 (if you click on the device name, you can check the history over the last few weeks). Hopefully that will let you spot any problem devices. If you pin down a problem, please post again if you need further help. |
||
|
cchilaquil
Cruncher Joined: Apr 5, 2007 Post Count: 5 Status: Offline Project Badges: |
Hi there,
Thanks on your answer, Didactylos. I have checked, and it seems that six of my devices (including the upgraded computers I mentioned before) are now just crunching barely one unit per day, instead of the 4-8 that were done on average three weeks ago. Perhaps it is still normal variance, but there is other evidence that prompts me to think that these particular devices may be rebooting or stopping the UD process. I'll monitor this devices with snmp to watch for the process running, as well as mem and cpu temperature and voltages. I'll post if and when I find anything that may be helpful to pinpoint the problem. On a related topic: Checking on statistics for devices, I just noted that one of my devices is listed there twice. In other words, I see two entry listings for a device named Regulus, when in fact I have just one device by that name. How can it be corrected? Best regards, Chilango Chilaquil |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Firstly, we are seeing some larger than usual work units currently. Normally, you get full credit for the extra time on a long work unit. However, if crunching is interrupted, and it has to restart at a checkpoint, you don't get credit for the lost time. Large work units often have much larger gaps than normal between checkpoints, so if you shut down your computers, you may lose quite a lot of time.
If your work units were actually failing in some way, you would see the opposite - you would see lots of work units being completed unusually quickly. I don't think that's happening here. I would suggest switching these troublesome devices to run only one of the projects with short work units, using a custom device profile. Genome Comparison has reliably small units. Have a look at this graph to get a sense of work unit sizes: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=13733 Sorry, you can't do anything about the extra device registrations. You get a new one every time you reinstall the agent. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
On the double device names, would advise to rename the one not generating points. This can be done in the Device Manager, select the duplicate, change name and .
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
cchilaquil
Cruncher Joined: Apr 5, 2007 Post Count: 5 Status: Offline Project Badges: |
Hi there, Perhaps it is still normal variance, but there is other evidence that prompts me to think that these particular devices may be rebooting or stopping the UD process. I'll monitor this devices with snmp to watch for the process running, as well as mem and cpu temperature and voltages. I'll post if and when I find anything that may be helpful to pinpoint the problem. Hello people. It is me again, reporting. I have been able to trace the problem on these computers to the fact that the computer is continuosly rebooting, thus losing quite a bit of the work done. I know by know that the problem seems to be that UD interacts quite badly with another application running in those computers. The other application is a monitoring tools, that receives updates constantly via UDP and makes request to a map server. When both the application and UD are running, the machine will reboot shortly after activation of the screensaver. If I deactivate the screensaver, then the machine will reboot randomly every 45-120 minutes. The computer runs smoothly when either only the ud or the application is ran. As we have recently upgraded the application, I tought it may be some contention or cpu sharing problem, but it does not seem to be the case, as the last reports from snmp show consistently that the process using more cpu (and according to my logic, the use of cpu in that particular recolection cycle) is the UD process; as well, there are no pagination issues, as virtual memory seems to be just very rarely used. UD seems to run fine using a previous version of the monitoring application. However, I can not keep using that version as it uses TCP instead, and a new and strict policy on the network management prevent me from using too much tcp conections. The funny thing is that UD and the very same application seem to run finely on other computers using same processor and motherboard, and only difference being those others computers having less memory (and, no, it is not a memory problem, as I have already hot tested it for more than 72 hours.) I am quite lost, and do not know where to walk now. Any suggestions? Best regards, Chilango Chilaquil |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
stumping..... UD has lots of encryption going on and as it still is running fine with an older version of your monitor, no doubt you have some collision happening. Try BOINC as that does not use encryption, the downside for corporate networks is, it needing https/port 443 to be open on outgoing traffic for secure communications with the servers. I've never had issues banking securely from the office, so presume if setup is correct, it should work fine. UD worked fine on the office laptop too btw.
----------------------------------------If the problem machine has more ram, is all of the ram of same spec?
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hey Ya Sekerob!
Had a total system failure. Had to reload everything from scratch. How do I download a new agent? I want to stay with UD because I have an older system. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
In theory, at least, mere software can't cause random hard reboots. However, grid software does put computer hardware under an unusual amount of pressure. Some people actually find it useful as a burn in tool.
So, my first suggestion is to throttle the grid agent, maybe to 20 or 30%, and see if that reduces or eliminates the problem. Other things to try are underclocking the processor, or running some really intensive hardware tests. Looking for a hardware fault that only occurs under maximum load can be very difficult. If this doesn't work, or you don't want to go to the trouble of further diagnosis, then please stop crunching on that machine. I'd hate for you to damage the computer or lose any data. Oh, and check the power supply. Sudden power spikes or loads are one of the few things that can cause a sudden reset. A final thought: you mention a map server. That implies some kind of GIS application, which probably uses some fancy graphics card features. The screensaver also kicks the grachics card into gear. Coincidence? |
||
|
retsof
Former Community Advisor USA Joined: Jul 31, 2005 Post Count: 6824 Status: Offline Project Badges: |
Oh, and check the power supply. Sudden power spikes or loads are one of the few things that can cause a sudden reset. It happened to me ... random reboots ... power supply replaced last year - no problems since.
SUPPORT ADVISOR
----------------------------------------Work+GPU i7 8700 12threads School i7 4770 8threads Default+GPU Ryzen 7 3700X 16threads Ryzen 7 3800X 16 threads Ryzen 9 3900X 24threads Home i7 3540M 4threads50% [Edit 1 times, last edit by retsof at Jun 13, 2007 1:46:24 AM] |
||
|
|