Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 10
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2793 times and has 9 replies Next Thread
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Network Problem - WU upload hangs at 100% completed

Greetings.
I need help debugging a network problem.
I have had failures when trying to ping my closest DNS server.
It appears my network goes down for minutes to hours at a time - then recovers.
When It recovers, I see several WU hanging at 100% completed and the rest pending.
This can last for several hours - and then, all Transmissions complete.
Question 1:
Is there anything special about sending that last packet - or its ACK?

Question 2:
Suggestions for debug?

Thank you very much,
Stay safe,
Jay

PS
when pinging DNS Server - 1 very 5 seconds:
2835 packets transmitted
2182 received
+412 errors
23.0335% packet loss
time 14208925 ms
= 3 Hours, 56 Minutes, 48 Seconds, 925 ms
----------------------------------------

[Nov 19, 2020 4:06:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BobbyB
Veteran Cruncher
Canada
Joined: Apr 25, 2020
Post Count: 638
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Network Problem - WU upload hangs at 100% completed

What does "It appears my network goes down for minutes to hours at a time" mean?
To me this means no Internet at all. ISP is down or router is disconnected from the ISP.
I'm sure this is not what you mean.

Your data says there were:
+412 errors
23.0335% packet loss

That's the DNS. Which one is it?
Can you ping an IP address like Google's DNS 8.8.8.8 or Cloudflare's DNS 1.1.1.1
Maybe you could use one of these as DNS. IBM has one too. Quad9 But Ugh.
If you can't even ping an IP then you are really dead.

When you says the network is down can you ping 169.47.63.74 which is: www.worldcommunitygrid.org?
----------------------------------------
[Edit 1 times, last edit by BobbyB at Nov 19, 2020 6:58:41 PM]
[Nov 19, 2020 6:37:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Network Problem - WU upload hangs at 100% completed

Greetings,

Sorrry I'm late - lost the thread ID.
I live in Orlando, Florida. The nearest DNS server is , I assume, is in Miami.
ping -c 3 -l 3 dns.mia.bellsouth.net
(and my ISP is AT&T / Bellsouth.)


Pings to world community grid were OK.
My guess the network was busy.
A few years ago, I had a similar problem with the final ack.

a well-used scrpit is

for i in asteroidsathome.net \
weather-and-climate.com \
trendmicro.com \
einsteinathome.org \
aei.mpg.de \
dns.mia.bellsouth.net \
Google.com \
downloads1.kaspersky-labs.com \
dnl-eu10.kaspersky-labs.com \
wzw.tum.de \
security.ubuntu.com \
worldcommunitygrid.org \
srv4.bakerlab.org \
einstein.phys.uwm.edu \
ssl.berkeley.edu \
ADNS1.BERKELEY.EDU \
ADNS2.BERKELEY.EDU \
DNS2.UCLA.EDU \
PHLOEM.UOREGON.EDU \
dyndns.com
do
echo " "
echo $i
# ping -v -W 3 -c 4 -s 100 -i 1.0 -p deadbeef $i
ping -c 3 -l 3 -p deadbeef $i
echo " "
echo " "
echo " "
sleep 2
done
exit


These were OK. slower than usual, but OK.
This problem seems to come and go....

Jay

ps
The -l 3 allows 3 ping packets to be sent at once without waiting - then counting the responses. (Non-root is allowed 3.)

PPS

Merry Christmas!
----------------------------------------

[Dec 25, 2020 6:43:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BobbyB
Veteran Cruncher
Canada
Joined: Apr 25, 2020
Post Count: 638
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Network Problem - WU upload hangs at 100% completed

I'm going to concentrate on just the WCG stuff. I'm making a presumption here that this machine is just used for crunching WCG.

I would assign a hard DNS IP in my system and use Google's 8.8.8.8 - I doubt they will be down! ever... and if you can assign a secondary DNS then use Cloudflare's 1.1.1.1

With these 2 there should not be a DNS problem. This removes doubt of your ISP's DNS server

Now I would hard assign www.worldcommunitygrid.org to 169.47.63.74 in the .hosts file of your OS. Doubt this IP will ever change. With this in place, YOU, are the DNS server for WCG.

Now for the script which it seems is how it is determined that the network is down or slow: just ping 169.47.63.74 and 8.8.8.8 and 1.1.1.1

Now if there are problems connecting to WCG it is not DNS but something between your machine and the WCG server.... or the machine. Is it still crunching while you observe the slow down?


Let's see how this works out.
----------------------------------------
[Edit 1 times, last edit by BobbyB at Dec 25, 2020 8:30:24 PM]
[Dec 25, 2020 8:24:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BobbyB
Veteran Cruncher
Canada
Joined: Apr 25, 2020
Post Count: 638
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Network Problem - WU upload hangs at 100% completed

Ah! I see from your other thread you run Einstein@Home
[Dec 25, 2020 8:28:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Network Problem - WU upload hangs at 100% completed

Bobby,
The DNS is not my problem.
Missing the final ack is the problem.
I used DNS pings just to show/infer that there was some net traffic, but not unrecoverable TCP failures.

Have you encountered missing that final respone on your machine(s)??

What are your thoughts on the failure to complete the uploads??

Thanks, Jay
PS
Happy Boxing Day.
----------------------------------------

[Dec 26, 2020 9:34:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BobbyB
Veteran Cruncher
Canada
Joined: Apr 25, 2020
Post Count: 638
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Network Problem - WU upload hangs at 100% completed

I have not seen what you described. I sometimes see a few WUs hang in there for a short time "ready to report" but they are gone when I check a while later. If I click update on the project tab they disappear and downloads start. Just did that to see what happens.

I zoomed in on DNS because you said you had problems pinging your closest DNS server.

When you say "It appears my network goes down for minutes to hours at a time - then recovers." does it apply to all the machines on your LAN or just these WCG machines? I presumed it was just these WCG machines?

If it's everything everywhere then I can see why the WUs hang. It's connectivity.

To debug I would start at the router and look at the logs. disconnect and reconnect would show there.
----------------------------------------
[Edit 2 times, last edit by BobbyB at Dec 26, 2020 4:15:11 PM]
[Dec 26, 2020 4:03:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BobbyB
Veteran Cruncher
Canada
Joined: Apr 25, 2020
Post Count: 638
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Network Problem - WU upload hangs at 100% completed

Here is what I observed if it can help.

I watched one WU as it neared completion and counted down the seconds to zero. (100% progress) It transitions from working to "uploading" to "ready to report".
I also observed, using a packet sniffer, the transmission of a "ready to report" WU. Yes it ends with an ACK or two but I don't see that as relevant to the problem. It's just normal TCP protocol.

I interpret "uploading" as sending the WU output from memory to the disk or preparing the output on the disk somewhere ready to transmit because it can stay there for a while in a "ready to report" state so it is not really uploading (to WCG).
2020-12-27 10:49:29 | World Community Grid | Started upload of MIP1_00327497_15475_0_r1325880629_0
2020-12-27 10:49:33 | World Community Grid | Finished upload of MIP1_00327497_15475_0_r1325880629_0
At 11:28, as I write this, it is still sitting there "ready to report". If I click update on the project tab it will transmit to WCG

update:
Sun 27 Dec 2020 11:50:11 AM | World Community Grid | Sending scheduler request: To report completed tasks.
Sun 27 Dec 2020 11:50:11 AM | World Community Grid | Reporting 1 completed tasks
Sun 27 Dec 2020 11:50:11 AM | World Community Grid | Not requesting tasks: don't need (job cache full)
Sun 27 Dec 2020 11:50:13 AM | World Community Grid | Scheduler request completed
Sun 27 Dec 2020 11:50:13 AM | World Community Grid | Project requested delay of 121 seconds

They are gone at 11:50
----------------------------------------
[Edit 4 times, last edit by BobbyB at Dec 27, 2020 5:27:22 PM]
[Dec 27, 2020 4:25:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Network Problem - WU upload hangs at 100% completed

Bobby,
Thanks for the data - especially on the sniffer.
Its been a long while since I worked on tcp/ip.

The net problem was on all of my machines.

I live at the end of the lines. I assumed many people working at home and kids doing virtual classroon attributed to full or near-full net capacity.


My ISP is AT&T.
Too bad they don't have a network capaciiy status or graphic.
The problem has not happened recently.
Other people in my neighborhood have told me that they had problems when it rained.

THANKS AGAIN!!
Jay
----------------------------------------

[Dec 27, 2020 8:00:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BobbyB
Veteran Cruncher
Canada
Joined: Apr 25, 2020
Post Count: 638
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Network Problem - WU upload hangs at 100% completed

It seems I was wrong about the my interpretation of "uploading".

I read the fine manual:
https://boinc.berkeley.edu/wiki/How_BOINC_works
https://boinc.berkeley.edu/wiki/Preferences

and when it says "uploading" it really uploads to the data server. Seen this on a sniffer. Ready to report is "waiting for its points" I guess.

Good that the problem is solved.
----------------------------------------
[Edit 3 times, last edit by BobbyB at Dec 27, 2020 8:59:35 PM]
[Dec 27, 2020 8:57:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread