Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 211
Posts: 211   Pages: 22   [ Previous Page | 12 13 14 15 16 17 18 19 20 21 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 29094 times and has 210 replies Next Thread
branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]

Since I am at work, I can't check the progress, so just remote observations for now:

- Full (down)load: 8 for i7 + 4 for i5.
- 2 already errored: 1 on my i7-3770 Win7 64b 7.2.26 after 2.63 hours, 1 on my i5-2500S MAC OS X 10.9. (Mavericks) 7.0.65 after 2.68 h. Wingmen still In progress.
- The other 10 In progress.
...
Good luck and cheers peace


ETA1: 1 on Win Valid (CPU Time 1.30 h), 2 on Mac PVal (CPU time 3.57 and 3.48 h)
ETA2: methink 10 days deadline for Betas is a bit long rolling eyes


ETA3: the last 2 WU's I caught unfinished when I came back from work were resends (one on Win, the second one on MAC), so both of them errored out. But checkpoints worked fine and the RAM usage was around 250 MB.
----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Nov 1, 2013 3:24:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]

I've had one Beta WU for nearly a day and I just noticed in the log that it has been restarting itself every ~10mins or so. Each time it restarts, the "estimated completion time" resets to about 10hrs. Absolutely no progress has been made. Do I abort or just let it go?


Rich,

What OS are you running and do you have any security software installed on your computer? If so can you check to see if the BOINC data directory is excluded. It sounds like an outside source is killing the process to have it restart.

Thanks,
-Uplinger
[Nov 1, 2013 3:33:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]

Unfortunately there is nothing you can do to fix this. This is a setting from the server. We have limited the result files to 10MB, but as some have reported the result file has grown to 100MB in some cases. The researchers have a list of results that have this issue and will be looking into getting this file size down.

Thanks,
-Uplinger

I don't know if this limit is for the benefit of the users, or WCG. But I have a fast (2 Mbps) upload speed, and could easily do the 100 MB if that will help the science. You could make it user-selectable, like the number of tasks downloaded for CEP2.


It's kind of a dual edged sword. First, not everyone has a good connection and some actually pay for transfers. This means some members are limited to say 10GB per month.

But as you say, there may be a way to give some members who are willing to send back larger result files. The problem on our end is we don't have infinite storage. Also, large file transfers would use up ports for others to request and send back results. The servers can only handle so many connections at one time.

Another issue with the large result files is that it generally uses up LOTS of memory because of this. And then subsequently write very large check point files about every 10 minutes. We are working on a solution that would limit this AND provide the results back to the researchers without putting too much stress on the uploads as well as memory usage on the member's machines.

Hope this helps.

Thanks,
-Uplinger
[Nov 1, 2013 3:42:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Gil II
Senior Cruncher
Canada
Joined: Dec 6, 2006
Post Count: 368
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]

SekeRob

One more for your list. I' seen this issue on other comments. Tasks restart every few minutes as in:

01/11/2013 11:28:28 AM World Community Grid Restarting task BETA_BETA_9999986_0563_4 using beta17 version 719
01/11/2013 11:28:38 AM World Community Grid Restarting task BETA_BETA_9999987_0516_4 using beta17 version 719
01/11/2013 11:31:35 AM World Community Grid Restarting task BETA_BETA_9999986_0563_4 using beta17 version 719
01/11/2013 11:31:45 AM World Community Grid Restarting task BETA_BETA_9999987_0516_4 using beta17 version 719

I have aborted 8 jobs in total with this problem.

1) Output file too large (Error -131)
2) Maximum Disk Use Exceeded (disk_bound overstepped)
3) Memory model exceeded (memory_bound overstepped)
4) Loss of -large- portions of CPU time at time of reporting, which looks to happen at end.
5) Progress % erratic (e.g. happens it can from 0.5% to 50% only at end of 1st pass when there are only 2 passes)
6) Related to 5), checkpoints at times multiple hours apart... not good for part time crunchers.
7) Jobs seem stuck in memory at times, [when seemingly no more progress is made]... wont unload, even when "Leave application in memory when suspended" is off. Full client restart required to get them to unload.
8) Some tasks freeze on the CPU time use when running [is it the display or is it the CPU time in Task Manager indicates no CPU time use?], while elapsed time keeps accumulating and progress % goes backward. Users of BOINC manager wont see this easily, to users of BOINCTasks it's obvious since both Elapsed and CPU time is shown.
9) Running 4 concurrently (i.e., using all available cores), appears to be very inefficient.
10)Tasks restart every few minuites
----------------------------------------

[Nov 1, 2013 3:44:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]


1) Output file too large (Error -131)
2) Maximum Disk Use Exceeded (disk_bound overstepped)
3) Memory model exceeded (memory_bound overstepped)

These 3 are associated pretty much with the same problem. The result output is growing larger than it needs to. The researchers are working to fine tune the filtering that is needed within the work units. Also as a fail safe we are working on a way to detect this mid work unit and exit gracefully so that work done is returned to the researchers and they can evaluate how to proceed with these monster result files.
4) Loss of -large- portions of CPU time at time of reporting, which looks to happen at end.

We are looking into the checkpointing issue, we believe we have a fix, but it'll need to be tested on the next round
5) Progress % erratic (e.g. happens it can from 0.5% to 50% only at end of 1st pass when there are only 2 passes)
6) Related to 5), checkpoints at times multiple hours apart... not good for part time crunchers.

5 and 6 are similar to 4 in that we have a potential fix and will be testing it next round.
7) Jobs seem stuck in memory at times, [when seemingly no more progress is made]... wont unload, even when "Leave application in memory when suspended" is off. Full client restart required to get them to unload.

This is something I need more information on. Is this an issue with Windows only? What flavor? (ex. Windows 8 32bit or Windows Vista 64bit) I have not been able to recreate on my machines, but that could be I'm looking at the wrong OS.
8) Some tasks freeze on the CPU time use when running [is it the display or is it the CPU time in Task Manager indicates no CPU time use?], while elapsed time keeps accumulating and progress % goes backward. Users of BOINC manager wont see this easily, to users of BOINCTasks it's obvious since both Elapsed and CPU time is shown.
I believe some of this might be due to the large results and checkpoints, we are still investigating it.

Wish list: Printing of OS and CPU details in Result Log.
Yes, we are thinking of adding this information to the result status page, not in the result log as that would not require us to recompile the older applications on WCG to support this change.

Thanks,
-Uplinger
[Nov 1, 2013 3:54:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]

If you are encountering a restart issue of your result, please let us know a few things.

1. What OS are you running (ex. Windows 8 64 bit)
2. Do you have security software on your computer. One report from Gil was McAfee.
3. Check your security software to see if you can exclude either this application or the boinc data directory. On windows this is usually C:/ProgramData/BOINC/.

All of the errors up to this point on the restart issue are Windows machines. My assumption at this time is since it's a new application the security software on your machine is killing the process and boinc is trying to restart. This kill and restart happens too many times and an error is reported to the server.

Thanks,
-Uplinger
[Nov 1, 2013 4:00:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]

9) Running 4 concurrently (i.e., using all available cores), appears to be very inefficient.
I believe this is due to really large files needing to be checkpointed which brings it back to we are working on a solution at this time to detect the memory usage and exit based on that.

10)Tasks restart every few minuites

Please see my post above, requesting more information.

Thanks,
-Uplinger
[Nov 1, 2013 4:02:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Gil II
Senior Cruncher
Canada
Joined: Dec 6, 2006
Post Count: 368
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]

Restart issue additional info: I am running Windows 7
----------------------------------------

[Nov 1, 2013 4:11:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pramo
Veteran Cruncher
USA
Joined: Dec 14, 2005
Post Count: 703
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]

If you are encountering a restart issue of your result, please let us know a few things.

1. What OS are you running (ex. Windows 8 64 bit)
2. Do you have security software on your computer. One report from Gil was McAfee.
3. Check your security software to see if you can exclude either this application or the boinc data directory. On windows this is usually C:/ProgramData/BOINC/.

All of the errors up to this point on the restart issue are Windows machines. My assumption at this time is since it's a new application the security software on your machine is killing the process and boinc is trying to restart. This kill and restart happens too many times and an error is reported to the server.

Thanks,
-Uplinger

These few results are from various OS's, all Symantec Endpoint protection V12.1 - Same package is pushed to each machine (well, one for 32 bit and one for 64 bit) can't modify settings but the logs show no issues. This was the only w/u I had with restart issues.

My aborted task that was restarting:
(XP 32 bit)
11/1/2013 4:54:12 AM World Community Grid Restarting task BETA_BETA_9999984_0541_0 using beta17 version 719
11/1/2013 4:57:18 AM World Community Grid Restarting task BETA_BETA_9999984_0541_0 using beta17 version 719




another XP 32 bit machine didn't have the restart problem.
10/31/2013 4:24:55 PM World Community Grid Computation for task BETA_BETA_9999985_0055_1 finished
10/31/2013 4:24:55 PM World Community Grid Output file BETA_BETA_9999985_0055_1_0 for task BETA_BETA_9999985_0055_1 exceeds size limit.



there were a few on Win7 64bit and server2008r2, no restarts.

A few valid:
10/31/2013 8:09 World Community Grid Computation for task BETA_BETA_9999987_0131_1 finished
10/31/2013 8:11 World Community Grid Started upload of BETA_BETA_9999987_0131_1_0
10/31/2013 8:11 World Community Grid Finished upload of BETA_BETA_9999987_0131_1_0

some erors:
10/31/2013 8:56 World Community Grid Computation for task BETA_BETA_9999984_0548_1 finished
10/31/2013 8:56 World Community Grid Output file BETA_BETA_9999984_0548_1_0 for task BETA_BETA_9999984_0548_1 exceeds size limit.
10/31/2013 8:56 World Community Grid File size: 12270504.000000 bytes. Limit: 10485760.000000 bytes
----------------------------------------

[Nov 1, 2013 4:59:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Thargor
Veteran Cruncher
UK
Joined: Feb 3, 2012
Post Count: 1291
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test starting Oct 31, 2013 [Issues Thread]

If you are encountering a restart issue of your result, please let us know a few things.

1. What OS are you running (ex. Windows 8 64 bit)
2. Do you have security software on your computer. One report from Gil was McAfee.
3. Check your security software to see if you can exclude either this application or the boinc data directory. On windows this is usually C:/ProgramData/BOINC/.

All of the errors up to this point on the restart issue are Windows machines. My assumption at this time is since it's a new application the security software on your machine is killing the process and boinc is trying to restart. This kill and restart happens too many times and an error is reported to the server.

Thanks,
-Uplinger

1. Windows 7 Home Premium 64-bit
2. Yes, spybot S'n'D & ESET NOD32 (64-bit) A/V
3. Not at home, but can give this a try when I get in...

In the meantime, here's an excerpt from the HUGE error-log attached to the WU which finally failed on my Windows box at home:
---
Running 
ERROR: could not initialize graphics pointer in shared memory.
Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.19_windows_x86_64 -SettingsFile BETA_9999986_0743.txt -DatabaseFile dataset-GDS2771-v1.txt
Initializing
wcg_learn_limit = 500000
Running
ERROR: gfxData is null.

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x000000013F4A3A24 write attempt to address 0x00000010
---
I'm assuming you can view the full log from the WU-name listed below? If not, let me know where I can c&p the full log...
----------------------------------------

[Nov 1, 2013 5:01:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 211   Pages: 22   [ Previous Page | 12 13 14 15 16 17 18 19 20 21 | Next Page ]
[ Jump to Last Post ]
Post new Thread