Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 40
Posts: 40   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 9229 times and has 39 replies Next Thread
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

Click on "messages" in BOINC Manager.



d oh Thanks.


Did it complete or did it time out.... anything in the BOINC message log?


Unfortunately I can't answer that question now. I made some changes to my computer yesterday and installed 5.10.30 (was running 5.10.28).
----------------------------------------

[Jan 29, 2008 1:05:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

Unless you did a clean install the messages can be recovered from various files. Discussion is covered in the Start Here forum, BOINC: Message & Error Log Reporting topic.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jan 29, 2008 1:10:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

I pretty much did a clean install...... I deleted 5.10.28 from my C drive and installed 5.10.30 on a different drive. Looks like the problem may have been something in the configuration on my computer or 5.10.28. ach1_18_65 finished yesterday with no problems. It took 5.05 hours. It's showing inconclusive right now but I know not to worry about that. I have 3 more work units that should run and complete today. Let's see how those go.
----------------------------------------

[Jan 30, 2008 5:19:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

I don't understand why I'm having problems with this project. Three more work units that don't look very promising.........


Anyone have any ideas?


QX6700
3GB RAM
208GB hard drive space (installed on seperate drive, not on same drive as OS).
----------------------------------------

[Jan 30, 2008 10:11:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

This is very demanding project. You should not be running 3 climate models simultaneous. Just suspend 2 and release them as each completes.

Is QX6700 a 4 core? How much L2 has it got?

Others will be jealous of getting 1 let alone 3.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Jan 30, 2008 10:17:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

This is very demanding project. You should not be running 3 climate models simultaneous. Just suspend 2 and release them as each completes.

Is QX6700 a 4 core? How much L2 has it got?

Others will be jealous of getting 1 let alone 3.


The QX6700 is a 4 core CPU. It has 8MB of L2 cache.

I'll try suspending 2 and see what happens. I'm willing to try almost anything. I didn't chose to get 3 in a row. That's just the way it happened.


Thanks for the advice. If you have any other ideas let me know.
----------------------------------------

[Jan 30, 2008 11:18:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

Do you happen to know the checkpoint interval?

With work units of this size, it is possible that between ones that get aborted and ones that never complete, we are getting nowhere fast. I will ask the techs for a status update.

Can you check the page fault delta, too?
[Jan 31, 2008 12:06:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
David_L6
Senior Cruncher
USA
Joined: Aug 24, 2006
Post Count: 296
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

Do you happen to know the checkpoint interval?

With work units of this size, it is possible that between ones that get aborted and ones that never complete, we are getting nowhere fast. I will ask the techs for a status update.

Can you check the page fault delta, too?



If you don't mind walking me through how to find the information you want, I'll get it for you.

checkpoint interval and page fault delta - I don't know what they are or where to find them.
----------------------------------------

[Jan 31, 2008 12:35:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

Add me to the list of WU's that seem to go no where:

WH: ach1_18_73_43
Started: 01/27/08 14:35:44 Starting task ach1_18_73_43 using acah version 514 (local time Z-6)
Aborted: 01/29/08 20:48:13 (local time)
CPU: 53.54

I tried suspend/resume WU with no change in % complete (78.020).

Then tried suspending all WU, closing BOINC, re-boot, resuming all WU with Same result

I aborted the WU

Environment:
BOINC 5.10.30
CPU: C2D E6750 L2 4096 x 1
RAM: 4GB
HD: C: 172 GB actual / 14.2 GB used

Running nothing but BOINC for last 15 days
----------------------------------------
Bill P

[Jan 31, 2008 3:35:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: I aborted one of these work units today

Do you happen to know the checkpoint interval?

With work units of this size, it is possible that between ones that get aborted and ones that never complete, we are getting nowhere fast. I will ask the techs for a status update.

Can you check the page fault delta, too?

By my own counts of the logs for 3 at the default 60 second minimum, 50 or so are written per job (See Matrix post in Start Here forum for the counts of all WCG projects). That makes them about 14 minutes apart on my Q6600, 40 minutes on the P4, but the latter produced 6.5 billion PFs on a single non stop run, with some delta. I've had only a 50% validation rate on the 6 i had in a series on the P4 (1.3gb ram allowed), one sitting still in inconclusive and looking in the individual logs on the Result Status page, 2 had heartbeat issue, 3 had restarts/resumes, but more concerning, 1 had a smooth single run and went invalid not withstanding. Not very good. The 1 Q6600 job ran in 7 hours on Vista and validated. Clean single run.
<core_client_version>5.10.35</core_client_version>
<![CDATA[
<stderr_txt>
Failed to get VersionInfo size: 2
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
INFO: No state to restore from. Starting from beginning
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/8/13::0:0:0 1

</stderr_txt>
]]>


Update: The remaining inconclusive validated so have an exact 50% 'invalid' rate, but all completed on the P4 in 17 to 21 hours and compared to all in quorum had 2 to 3 times longer run times!
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at Jan 31, 2008 4:14:41 AM]
[Jan 31, 2008 4:09:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 40   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread