| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 352
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
going fine on all fronts at the moment
|
||
|
|
Eurwin
Cruncher Joined: Apr 28, 2007 Post Count: 17 Status: Offline Project Badges:
|
Hello,
This evening everything ran smoothly for the first time in day's. Uploading, downloading AND reporting without problems. Let's hope the problem is almost counterd ![]() |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
Unfortunately, I can confirm that the issue is not resolved. Here is what is going on:
1) We are seeing the network adapters on the two servers that handle file upload/downloads filling up their cache and then starting to drop packets. These particular adapters have some known issues with the driver under load, but the question we are looking at is why they are backing up. 2) We are doing some problem determination work on our switches that handle communications between servers. Unfortunately the data collection crashed one of the switches and caused a 10 minute outage earlier today. 3) We have connected a network sniffer to monitor congestion and other issues during an 'episode'. The challenging thing about this issue, is that once the issue starts, the only way to clear it is if we stop all access to the filesystem and let things settle down. Once everything is calm we are able to start everything back up and it performs fantastic for many hours before occurring again. We have not been able to correlate the episodes with any scheduled task (backups, application scripts, etc). In the short term, we are implementing scripts to do this automatically, but that is a workaround until the fundamental issue is identified and fixed. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
ty for the updated information
|
||
|
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges:
|
Maybe Mario can help. ![]()
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
|
pramo
Veteran Cruncher USA Joined: Dec 14, 2005 Post Count: 716 Status: Offline Project Badges:
|
FWIW I did go about half dry with a 0.1 day cache so went to 0.2 days and have had enough to keep busy. That said, might go all crazy today and bump to 0.5 just in case. ![]() I got burned running at 0.1. Close to getting burned with a 0.25. . . . call out the white shirts with the service cases. Nod to a minimum cache cruncher! sitting at .5, until its sorted. also a big nod to the folks working on this, we know you're busting B's ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've been running with a 0.1 day cache on my two core laptop with no problem. I have been running CFSW exclusively for which the WUs are short (under one hour). Might longer running WUs be a cause of the problem with a smaller cache size?
|
||
|
|
Bearcat
Master Cruncher USA Joined: Jan 6, 2007 Post Count: 2803 Status: Offline Project Badges:
|
Sounds like the switches need replacing or a different vendor if these aren't up to the task.
----------------------------------------
Crunching for humanity since 2007!
![]() |
||
|
|
Bearcat
Master Cruncher USA Joined: Jan 6, 2007 Post Count: 2803 Status: Offline Project Badges:
|
Maybe Mario can help. ![]() BFH is the tool of choice when computers don't want to cooperate. ![]()
Crunching for humanity since 2007!
![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
only had a couple this AM that I had to hit the retry to upload
----------------------------------------[Edit 1 times, last edit by Former Member at Jul 26, 2012 10:38:37 AM] |
||
|
|
|