Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 51
Posts: 51   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 225696 times and has 50 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

I have tried to understand why your computers are generating invalid results. As part of that effort, I pulled the data from 4 hours worth of workunits that had completed and looked at the ratio of valid to invalid results focusing on narrower subgroups of machines similar to yours. Here are the findings:

All results for workunits completed in past 4 hours
num_results validation_outcome 
----------- -------------------
92152 VALID (99.2%)
713 INVALID ( 0.8%)


All results for workunits completed in past 4 hours with quorum of 2 or higher
num_results validation_outcome 
----------- -------------------
47276 VALID (98.6%)
665 INVALID ( 1.4%)


All results for workunits completed in past 4 hours with quorum of 2 or higher on Linux
num_results validation_outcome 
----------- -------------------
3598 VALID (99.8%)
7 INVALID ( 0.2%)


All results for workunits completed in past 4 hours with quorum of 2 or higher on Linux with AMD processors
num_results validation_outcome 
----------- -------------------
687 VALID (99.6%)
3 INVALID ( 0.4%)


All results for workunits completed in past 4 hours with quorum of 2 or higher on Linux with the same types of processors as KerSamson*
num_results validation_outcome 
----------- -------------------
78 VALID (98.7%)
1 INVALID ( 1.3%)

*These are:
  • AMD Athlon(tm) II X4 640 Processor [Family 16 Model 5 Stepping 3]
  • AMD Phenom(tm) II X6 1055T Processor [Family 16 Model 10 Stepping 0]
  • AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]


With this information, I cannot find any symptom of a larger issue. I do not dispute that you are seeing the issue on your computers. I have looked at them and seen the invalids previously. However, as far as I can see the issue is isolated to those machines when they run VINA.

Unfortunately, the FightAIDS@Home is about to enter one if its phases of only having VINA work. We run out of AutoDock work in ~10 days. More information will be coming out tomorrow.

However, we are within 2.5 weeks +/- a week of starting the beta test for our next research project (focusing on cancer research) and it will not be VINA based - this should give you a good project to run if you are willing to check back in a few weeks.
----------------------------------------
[Edit 1 times, last edit by knreed at Aug 1, 2013 2:34:12 AM]
[Aug 1, 2013 1:20:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

cjslman, I do believe KerSamson's issue, is with the AMD cores throwing a 'wobbly' every now and again (by the sounds of it, it's more 'now' than 'then')


I have three Intel machines that every single WU they return that is VINA based is invalid. So it is not limited to just AMD machines. The original FAAH WU's run just fine, so did DSFL, before that so did HPF2 and before that HCC. Now if I get a Vina on them, I just dump it. Might as well as I already know it will be invalid.


Applying the same analysis as above.

All results for workunits completed in past 4 hours with quorum of 2 or higher on Linux with Intel processors

num_results validation_outcome 
----------- -------------------
2911 VALID (99.93%)
2 INVALID ( 0.07%)

All results for workunits completed in past 4 hours with quorum of 2 or higher on Linux with the same types of processors as lanbrown*

<none> - no results were returned during this time frame in this category

*These are:
  • x86 Family 6 Model 8 Stepping 6 865MHz [x86 Family 6 Model 8 Stepping 6]
  • Intel(R) Celeron(TM) CPU 1400MHz [x86 Family 6 Model 11 Stepping 1]


Again - with this data I am hard pressed to find a generalized issue. Looking at your machines specifically I do see that they return 100% invalids. I would encourage you to upgrade from the 5.10 client you are running to a more current version. A lot has changed since the 5.10 client and there is a chance that is impacting things.
----------------------------------------
[Edit 1 times, last edit by knreed at Aug 1, 2013 2:34:42 AM]
[Aug 1, 2013 1:39:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

One of my systems is also a Phenom 1090T running linux. What I have noticed is that AMD chips tend to be slightly more sensitive to heat issues than do the Intel chips.

Whenever one of my machines, Intel or AMD starts throwing errors or invalids, blowing out the dust bunnies has ALWAYS resolved the issue.

To those of you having problems, I do wish you the best and hope your issues can be resolved.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Aug 1, 2013 3:11:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Top marks to knreed for taking the time to analyse an individual member's results in an attempt to resolve an issue Sharing that with the wider community will go a long way to negating recent criticisms of WCG.

I look forward to More information will be coming out tomorrow. re the FightAIDS@Home is about to enter one if its phases of only having VINA work. We run out of AutoDock work in ~10 days.

Advance dissemination of this important information in the broadest way possible will avert a lot of grief.
----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread
[Aug 1, 2013 8:49:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Hi everybody,
at first, thank you for the kind words many of us wrote after my last post.
My hosts are still crunching, mainly on CEP2 for the AMD and Linux based machines.

Knreed ! I appreciate the time you spent for investigating.
Unfortunately, since I redefined the projects allocation in the device profiles at the beginning of the week, the last days and hours are not really representative.

I do not have an accurate monitoring and recording of invalid WUs. However my estimation is a rate of around 50 invalid WUs within 3 or 4 weeks between end of June and July 27th.
Invalid WUs are in particular frustrating since there is no indication/reason for the invalidity of a WU.
Depending of the project, it represent a lot of hours.

I've tried several times to compute SN2S, DSFL, and GFAM with really similar results:
- Sometime it is good
- Sometime it bad
- Sometime it is really bad.

I observe a similar behaviour with VINA-based FA@H WUs.

Over the years, I did not have any real concerns with invalid or errored WUs (probably in some ppm). At the beginning of a new project, some issues could occur and I accept it.
The current problem with VINA is that suddenly a host (sometime several hosts) does compute correctly but at the end a large part (sometime every WUs) are declared invalid without any reasons.
Temperature could be an issue but the troubles occurred also during the Winter time. Additionally, the machines are well ventilated and the CPU temperatures (for the AMD systems) are below 60°C
- Phenom II x6 1090: 57°C
- Phenom II x6 1055: 50°C
- Athlon II x4 640: 40°C

After I noticed this problem, I avoided VINA-based projects. Today, I do not have any choice excepted CEP2 which is not the easiest project in terms of resources and efficiency (even with SSD).

My post was also caused by WCG's silence. We did not know anything excepted vague indications about some possible future projects. As I read here, I notice that I am not alone with this impression to stay in the "fog".

Again I thank you all for the feedback and I hope that we will experience again more satisfying times in the future.

Cheers,
Yves
----------------------------------------
[Aug 1, 2013 11:18:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
RCC_Survivor
Veteran Cruncher
USA
Joined: Apr 28, 2007
Post Count: 1337
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

There is a good tool to analyze problem patterns.
WCGDAWS
WCG Device and Workunit Stats
http://www.wcgdaws.com/forums/index.php

I am running a mix of Intel, AMD, and SnapDragonPro processors.
Windows 7 and Android 4.1.2 OS.
Running FAAH exclusively.
I am seeing errors across all of them but primarily on the SnapDragonPros..
I am seeing invalids only on the SnapDragonPros.

The i7-2700K and FX-8350 are overclocked to 4.5 GHz.
All others are running stock.
I do not run BOINC as a service.

If I was having a problem with a particular chip and OS I would also look at drivers and software versions.
VINA is new and there may be issues with earlier versions of OS, drivers, or BOINC.
If all of my devices fail then it still could be my problem so I would make sure everything is in order before assigning the problem elsewhere.

Having said that, WCG has a tool called Beta testing that could help narrow the problem.
They also have ways to identify the failure in a more specific manner.
Based on the info available to me I can't tell you the cause of any single error.
I would expect WCG and IBM to do that.
Here is an example:
Result Log

Result Name: FAHV_ x3AO1_ IN_ FBP_ 0047561_ 1177_ 3--
<core_client_version>7.2.8</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[15:50:10] Number of tasks = 5
[15:50:10] Starting task 0,CPU time is 0.000000.
[15:50:10] ./ZINC32738145.pdbqt size = 27 6 ../../projects/www.worldcommunitygrid.org/fahv.x3AO1_IN_FBP.pdbqt size = 2637 0
[16:19:04] Finished task #0 cpu time used 1716.310000
[16:19:04] Starting task 1,CPU time is 1716.310000.
[16:19:04] ./ZINC32738147.pdbqt size = 36 9 ../../projects/www.worldcommunitygrid.org/fahv.x3AO1_IN_FBP.pdbqt size = 2637 0
SIGSEGV: segmentation violation

Exiting...

</stderr_txt>
]]>

Code 193?
Segmentation violation?
I don't have a clue.
----------------------------------------
Be kinder than necessary, for everyone you meet is fighting some battle.

Please join the team The survivors hugs
Bilateral Renal, Melanoma, and Squamous Cell cancers
[Aug 1, 2013 5:53:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Again - with this data I am hard pressed to find a generalized issue. Looking at your machines specifically I do see that they return 100% invalids. I would encourage you to upgrade from the 5.10 client you are running to a more current version. A lot has changed since the 5.10 client and there is a chance that is impacting things.


I would but I cannot:
http://boinc.berkeley.edu/wiki/Release_Notes_...e_with_Domain_Controllers

BOINC 7 incompatible with Domain Controllers

The present range of BOINC 7 is incompatible with Domain Controllers, meaning that you cannot install it on your system if it is a DC. This is because the developers used the Local Account API's instead of the Global Account API's.

Install BOINC 5.10.45 instead, even though this doesn't support GPUs or multi-threading applications.

BOINC 5.10.45 32bit version
BOINC 5.10.45 64bit version


These system run AD for a home setup. Due to restrictions on the BOINC client, I'm at the latest that can be installed. Thanks for checking though.
[Aug 2, 2013 4:47:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3315
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Eagerly awaiting the update set for today.
----------------------------------------


- AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
- AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
- AMD Ryzen 7 7730U 8C/16T 3.0 GHz
[Aug 2, 2013 12:11:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Temperature could be an issue but the troubles occurred also during the Winter time. Additionally, the machines are well ventilated and the CPU temperatures (for the AMD systems) are below 60°C
- Phenom II x6 1090: 57°C
- Phenom II x6 1055: 50°C
- Athlon II x4 640: 40°C

57°C on the x6-1090 is hot, I've got one and even with the stock Amd-cooler the max temperature was around 45°C, as measured by Core Temp and this also claims the absolute max legal temperature is 67°C for this cpu.

With a better cooler, atleast when it comes to fan-noise, the current temperature is 41°C while it's 27°C in the room.


As for Vina, I've not tried the new FAAH-application yet, but atleast for the previous WCG-projects using Vina I don't remember any problems with these on my Amd-X6-1090T running win7/64.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Aug 2, 2013 12:51:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tired and frustrated - time for leaving?

Since Knreed performed some investigations, I did also my "house work".
I compiled around 42'000 results collected since November 2012 using WCGDAWS.
Because I was often business travelling, I probably missed around 4 or 6 weeks of work between November 2012 and July 2013.

I calculated the error & invalid / valid ratio for various projects, CPU, and operating systems. I excluded FA@H from this analysis since I did not have time to segregate Autodock based WUs and VINA based WUs.
Likewise, I excluded HCC for some investigations, since I was not able to discriminate between CPU and GPU WUs. I excluded also BETA WUs.

I have to recognize that the average ratio is never higher than 3.5% on a period of several months based on CPU elapsed time (I used as reference for this investigation).
Even if the ratio is not so high - anyway higher with VINA based projects than with non VINA-based projects - the "wasted" CPU time is greater than 2'500 hours within 8 real months.

Overall analysis: November 2012 - July 2013

VINA projects (without FA@H): November 2012 - July 2013

I performed this investigation using LibreOffice spreadsheets. As soon as I will have more available free time, I will try to put this information into a database for generating reports.

Regarding the Phenom II x6 temperature: 57°C is below 67°C.
Considering currently a room temperature around 30°C (outside between 33°C and 40°C), the cooling is OK.
I would like to emphasize that the problem of invalid results with VINA occurred also during the Winter season with a CPU temperature at around 52°C..

Cheers,
Yves
----------------------------------------
[Aug 3, 2013 11:02:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 51   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread