| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 27
|
|
| Author |
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Hi adriverhoef, IKWYM However I do not see why the problem should be related to the content of boinc data. Nevertheless, if nothing else could help, I will try it. Chees, Yves OK Yves, let me explain my thinking: a boinc directory is related to the CPU/hardware and the OS - the operating system (with its programs and libraries and what have you) - on which boinc runs, so if you switch the boinc directories (which should be possible if you have compatible OSs), there is no need to install a complete operating system in order to try to find differences, if any. So, if you switch the boinc directories, will the Invalids move - with the boinc directory - to the other machine, too, or will they stay on the same machine? [Edit 1 times, last edit by adriverhoef at Jul 20, 2019 9:28:24 AM] |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hi adriverhoef,
----------------------------------------would a project detach / fresh boinc install / project attach have a similar impact? My feeling is that you raise an interesting possible cause ! ... since the boinc directory is very old (about 10 years) and it has been migrated several times by multiple hardware and OS updates: - Athlon II x2, Athlon II x4, Ryzen 2700 - Ubuntu 10.04, Ubuntu 14.04, LinuxMint 17, Ubuntu 18.04 Cheers, Yves |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
In the BOINC log, at startup it prints the versions of several libraries that BOINC uses. Are they the same between the two machines (Debian vs Ubuntu)? If not, I would see if the errors (invalids) follow the OS.
|
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Hi adriverhoef, would a project detach / fresh boinc install / project attach have a similar impact? On the offending device? We can't tell, Yves, because we don't know yet what the cause is of the Invalids or where the problem lies. Doneske has also a good suggestion with which you could start investigating. You mention that you have a very old boinc directory. The interesting thing indeed is if the age of the boinc directory (i.e. the contents of the configuration files) is of any importance. I would say - and I hope - it isn't, since it would mean that probably more people are affected by this. The oldest boinc directory that I can find at home stems from February 2018. |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Hi Doneske,
----------------------------------------Thank you ! I will check the libraries next week (the machines are located in my office). Cheers, Yves |
||
|
|
Country Bumkin
Cruncher Australia Joined: May 14, 2008 Post Count: 14 Status: Offline Project Badges:
|
A complete guess but use the package manager to confirm the Ubuntu machine has the package amd64-microcode installed and is up to date. It is standard with Linux Mint 19 but confirm it is up to date.
----------------------------------------
Regards C Bumkin
----------------------------------------[Edit 1 times, last edit by Country Bumkin at Jul 22, 2019 3:05:23 AM] |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Update #1
-------------------------------------------- After close verification and comparison between the both 2700 machines: - Libraries are OK - Microcode is OK - CPU release/masks are identical Yesterday I ran the machine dry. Afterwards: - Project reset (clean boincdata directory) - Machine detached from WCG - Reboot - Machine reattached to WCG. The machine computed SCC WUs for a couple of hours: - at this time about 10% of the work is considered being valid - the rest of the performed work is invalid or pending. The next step will be to switch to LinuxMint. If it will not help, it should be a CPU failure or a very curious RAM failure. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Bad/Intermittent RAM or CPU caches, even the swapfile could be sitting in a corrupt disk area.
|
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
With 16 GB RAM, the swap file stays at 0.
----------------------------------------In the mean time I have about 60 valid WUs, about 25 invalid WUs and about 120 pending WUs. |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Update #2
------------------------------------------- Good news and interesting case ![]() After the purge of the boinc data directory (project reset) at about 2019-07-26 00:30 UTC, the machine generated only 21 invalid WUs (finally during the first day), the rest of the performed work (SCC and ZIKA) is valid ![]() Many thanks to Adri suggesting me that a possible cause (what for a cause still remains open) could be in the content of the boinc data directory including the boinc client related files. As mentioned, this directory was about 10 years old and experienced over the years multiple migrations related to boinc upgrade, hardware changes and OS changes (from Ubuntu 10.04 until LinuxMint 19 over 12.04, 14.04, LinuxMint 17). Thank you all for your valuable inputs. I would be very interested by a feedback of the TechTeam just for understanding (if possible) what did happen over the years. Cheers, Yves --- PS: I will still monitor closely what this machine will do in the next future. ---------------------------------------- [Edit 2 times, last edit by KerSamson at Jul 28, 2019 7:45:35 AM] |
||
|
|
|