| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 264
|
|
| Author |
|
|
coolstream
Senior Cruncher SCOTLAND Joined: Nov 8, 2005 Post Count: 475 Status: Offline Project Badges:
|
Gandalf: Are the deadlines on these new wu's 3 days by any chance? Re-sends still are being sent it seems... which makes sense as otherwise the guy that has already completed is left in limbo This is the 13th and the deadline indicates the 16th, so I guess that makes it three days. I can only assume that I ran into a timing differential. The MCM1 WUs were halted just after the one PC had received its downloads. Otherwise, why weren't re-sends sent to my other three PCs? Part from looking at the time left and doing calculations, it is far easier ti look at the name of the WU. If it ends with _0 or _01, it isn't a resend. If it ends with anything higher than _01, it is a resend. The higher the number, the more times it has been resent ![]() ![]() Crunching in memory of my Mum PEGGY, cousin ROPPA and Aunt AUDREY. |
||
|
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
Woke this morning to four more tasks sitting there chugging away on elapsed time and no CPU time. Fortunately they were all on the same machine so I just rebooted it.
----------------------------------------Hope this gets addressed when the WU start flowing again. It could lead to a lot of lost CPU cycles. ![]() Distributed computing volunteer since September 27, 2000 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Is there any environment commonality in these Elapsed counting, no CPU time incidences? You say you booted... did that set CPU time in motion? Is it always the same OS, version, service/user (as was found to be an issue during previous betas of MCM). When looking in with Process Explorer, properties > performance any oddities.
----------------------------------------Whilst, these 4 were repairs no doubt. What did the previous copies do? ATM, the present beta 7.26 is seemingly only to focus on the random seeding issue [to prevent 'some duplication', at least not heard anything different being in view. 2 of 3 batches have circulated [0000000 and 9999979, turning out to be 9999975], the wait on the 3rd special build batch, no one haven broken silence of it going around. :! [Edit 1 times, last edit by Former Member at Nov 16, 2013 5:42:48 PM] |
||
|
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
Most of my systems are running Ubuntu. All of them are running in service mode.
----------------------------------------There appears to be no commonality between WU and it appears to be quite random across systems. The machine in question is a 12 core HT server and only those four got stuck. After reboot the elapsed time reset to the CPU time and they're happily crunching along. This has been an ongoing issue since Beta 7.24, I did not encounter it previous to that. In the past I would just abort the units that got stuck and the wingmen/repairs would continue without issue. Like I say, it's quite random. Less than 5% of the cores ever experience this problem when it occurs and it can be days between instances. My only hunch is that it has something to do with an unavailable internet connection. ![]() Distributed computing volunteer since September 27, 2000 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
WIFI? >>>> sudo iwconfig wlan0 power off (and most all me internet related BOINC instabilities went away)
|
||
|
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
Nope, I gave up on Ubuntu wifi years ago. This has happened on my windows systems as well. Just moved into a new house and the DSL has been spotty. It seems to be working better now though.
----------------------------------------![]() Distributed computing volunteer since September 27, 2000 [Edit 1 times, last edit by KWSN - A Shrubbery at Nov 17, 2013 2:48:01 AM] |
||
|
|
Rutor
Cruncher Joined: Apr 27, 2010 Post Count: 1 Status: Offline Project Badges:
|
Just recognized, I also have one work unit with status 100% but running without checkpointing for 134 hours now.
System: Intel i5-3320M, Win7 32bit, Boinc 7.0.28 <stderr.txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.24_windows_intelx86 -SettingsFile MCM1_0000033_4508.txt -DatabaseFile dataset-17_72_SDG_v1.txt Initializing wcg_learn_limit = 500000 Running Until today also no result from my wingman. https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=885072792 Deadline will be reached in two days. Do you expect any reasonable result or shall I abort it? Regards, Rutor |
||
|
|
deltavee
Ace Cruncher Texas Hill Country Joined: Nov 17, 2004 Post Count: 4894 Status: Offline Project Badges:
|
Just recognized, I also have one work unit with status 100% but running without checkpointing for 134 hours now... Do you expect any reasonable result or shall I abort it? Abort it and get on to more worthy crunching. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
New work is downloading, but none of it can start because some files are failing to download ...
"Temporarily failed download of wcgrid_mcm1_7.26_i686-pc-linux-gnu: http error" |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
Kremmen,
The download servers are serving lots of large files right now and clients are having to backoff to help allow files to be served as quickly as possible. Thanks, -Uplinger |
||
|
|
|