| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 11
|
|
| Author |
|
|
IlluminAce
Cruncher UK Joined: Jan 25, 2009 Post Count: 24 Status: Offline Project Badges:
|
Hi everybody
----------------------------------------Similar to a thread posted almost a year ago, I'm having some difficulty in running BOINC on FreeBSD. Since I couldn't see a resolution, I'm starting a new thread. I am using the old BOINC port 6.4.5 compiled with linux emulation support. What I'm witnessing is quite odd. Some tasks run and complete apparently successfully (pending validation at the moment), whilst others go moreorless straight from "Starting Task xxx" to "Computation for task xxxx finished". Looking at the BOINC logs under results status, in each failure case I'm seeing one of two signals: Illegal Instruction or Segmentation Violation, along with messages such as "process exited with code 193 (0xc1, -63)", "process got signal 4" etc. Here's the top of messages: Tue Nov 8 04:26:14 2011||Starting BOINC client version 6.4.5 for x86_64-pc-freebsd Tue Nov 8 04:26:14 2011||log flags: task, file_xfer, sched_ops Tue Nov 8 04:26:14 2011||Libraries: libcurl/7.21.3 OpenSSL/0.9.8q zlib/1.2.3 libssh2/1.3.0 Tue Nov 8 04:26:14 2011||Data directory: /var/db/boinc Tue Nov 8 04:26:14 2011||Processor: 4 amd64 Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz [] [sse sse2 sse3 mmx] Tue Nov 8 04:26:14 2011||Processor features: Tue Nov 8 04:26:14 2011||OS: FreeBSD: 8.2-RELEASE Tue Nov 8 04:26:14 2011||Memory: 15.91 GB physical, 0 bytes virtual Tue Nov 8 04:26:14 2011||Disk: 1.93 GB total, 1.66 GB free Tue Nov 8 04:26:14 2011||Local time is UTC +0 hours Tue Nov 8 04:26:14 2011||Not using a proxy Tue Nov 8 04:26:14 2011||Can't load library libcudart Tue Nov 8 04:26:14 2011||No coprocessors Tue Nov 8 04:26:14 2011||No general preferences found - using BOINC defaults Tue Nov 8 04:26:14 2011||Preferences limit memory usage when active to 8144.11MB Tue Nov 8 04:26:14 2011||Preferences limit memory usage when idle to 14659.40MB Tue Nov 8 04:26:14 2011||Preferences limit disk usage to 0.97GB Tue Nov 8 04:26:14 2011||This computer is not attached to any projects Tue Nov 8 04:26:14 2011||Visit http://boinc.berkeley.edu for instructions Tue Nov 8 04:29:00 2011||Fetching configuration file from http://www.worldcommunitygrid.org/get_project_config.php Tue Nov 8 04:29:12 2011||Running CPU benchmarks Tue Nov 8 04:29:12 2011||Suspending computation - running CPU benchmarks Tue Nov 8 04:29:19 2011|World Community Grid|Master file download succeeded Tue Nov 8 04:29:24 2011|World Community Grid|Sending scheduler request: Project initialization. Requesting 1 seconds of work, reporting 0 completed tasks Tue Nov 8 04:29:29 2011|World Community Grid|Scheduler request completed: got 1 new tasks Tue Nov 8 04:29:29 2011||General prefs: from World Community Grid (last modified 20-Oct-2011 23:01:56) Tue Nov 8 04:29:29 2011||Host location: none Tue Nov 8 04:29:29 2011||General prefs: using your defaults Tue Nov 8 04:29:29 2011||Preferences limit memory usage when active to 8144.11MB Tue Nov 8 04:29:29 2011||Preferences limit memory usage when idle to 11401.76MB Tue Nov 8 04:29:29 2011||Preferences limit disk usage to 0.97GB I know there isn't much disk space allocated - I'll shift it to another partition shortly - but I'm pretty sure that isn't the primary issue here. I'm not seeing any disk space exceeded messages anywhere, and even my 8 thread box only uses ~ 0.9 - 1.1GB. Tue Nov 8 04:32:53 2011|World Community Grid|Starting M0000003620519201104141719_0 Tue Nov 8 04:32:53 2011|World Community Grid|Starting task M0000003620519201104141719_0 using hcc1 version 640 Tue Nov 8 04:32:54 2011|World Community Grid|Finished download of DSFL_00000059_0000054_0429_DSFL_00000059_0000054_0429.job Tue Nov 8 04:32:54 2011|World Community Grid|Finished download of DSFL_00000059_0000054_0429_DSFL_00000059_0000054_0429.zip Tue Nov 8 04:32:54 2011|World Community Grid|Starting oy911_00062_1 Tue Nov 8 04:32:54 2011|World Community Grid|Starting task oy911_00062_1 using hpf2 version 640 Tue Nov 8 04:32:55 2011|World Community Grid|Starting DSFL_00000059_0000054_0429_0 Tue Nov 8 04:32:55 2011|World Community Grid|Starting task DSFL_00000059_0000054_0429_0 using dsfl version 619 Tue Nov 8 04:32:56 2011|World Community Grid|Computation for task DSFL_00000059_0000054_0429_0 finished Tue Nov 8 04:32:56 2011|World Community Grid|Output file DSFL_00000059_0000054_0429_0_0 for task DSFL_00000059_0000054_0429_0 absent Tue Nov 8 04:33:03 2011|World Community Grid|Computation for task oy911_00062_1 finished Tue Nov 8 04:33:03 2011|World Community Grid|Output file oy911_00062_1_0 for task oy911_00062_1 absent Tue Nov 8 04:34:06 2011|World Community Grid|Sending scheduler request: To fetch work. Requesting 28604 seconds of work, reporting 2 completed tasks Any ideas? edit: For clarification, I should add that it is pulling in Linux tasks from WCG, as per the doc linked to from the BOINC wiki: Tue Nov 8 04:29:31 2011|World Community Grid|Started download of wcg_hcc1_img_6.40_i686-pc-linux-gnu Tue Nov 8 04:30:31 2011|World Community Grid|Started download of wcg_hpf2_rosetta_6.40_i686-pc-linux-gnu Tue Nov 8 04:30:48 2011|World Community Grid|Started download of wcg_dsfl_6.19_i686-pc-linux-gnu ---------------------------------------- [Edit 2 times, last edit by IlluminAce at Nov 11, 2011 3:59:09 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi IlluminAce,
I have a blank mind. You show a HPF2 and a DSFL unit failing. May I assume that there is no pattern among projects? I mean that work units succeed or fail at random, regardless of which project they belong to? Lawrence |
||
|
|
IlluminAce
Cruncher UK Joined: Jan 25, 2009 Post Count: 24 Status: Offline Project Badges:
|
Hi Lawrence,
----------------------------------------Thanks for your response. I haven't let many run yet, but that seems to be correct. For example, I have a successful HPF2 in pending validation. Seen as I'm at a loss too, I have just ditched the existing build, ditched my cc_config.xml and all other files under the BOINC dir, and have rebuilt with a couple of minor changes: > set the CONFIGURE_ARGS as per the BOINC wiki > shifted the partition to one with a bit more space > left all the BOINC computing preferences at their defaults. Let's see how this goes... edit: This already looks better. All 4 threads are now running; the most I had running simultaneously before was 2. I'll let some WUs complete, configure as per my requirements and post back with the results. Maybe it was the CONFIGURE_ARGS after all - I thought that setting the cc_config.xml appropriately would take care of that. ---------------------------------------- [Edit 1 times, last edit by IlluminAce at Nov 8, 2011 5:19:46 PM] |
||
|
|
IlluminAce
Cruncher UK Joined: Jan 25, 2009 Post Count: 24 Status: Offline Project Badges:
|
Ah, the folly of misplaced optimism - if only it were that easy...
----------------------------------------A HCC errored after about 30 minutes, followed by a CEP after about 30 seconds, and then a FAAH after 25 minutes. All of then failed with signal 4 (Illegal Instruction), and in all cases, the client only reports "Computation error" under the status field, and nothing abnormal in Messages. Meanwhile, two HCC WUs have completed successfully. I can only think of two candidates. The first would be bad hardware; yet that seems unlikely. This is all brand new, high quality hardware with a top of the range PSU & surge protection/noise filtering, all temperatures are fine, and no other issues encountered - including with a CPU stress test under FreeBSD. The second would be a bug in this client port, version 6.4.5, which has been fixed subsequently. I notice there is mention on the port maintainer's page that: Alternatively, you can download Linux app, brandelf it to Linux type, write a app_info.xml file for it and place it to the right place under /var/db/boinc. It sounds like he's saying I can download the Linux client, install as usual, and then run something like:
and finally create a custom app_info.xml. Has anybody any experience of this? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
You might want to visit any of the "Start Here" forum topics in this search, except the top one which is the index.
https://secure.worldcommunitygrid.org/forums/...=0&sort=1&rows=20 Maybe any of these or the links provided in there gives the right pointer. If remembering well the proper platform identifier was needed as well and of course the 32 bit libs are needed, but then you probably have them already since you had some good results. --//-- |
||
|
|
IlluminAce
Cruncher UK Joined: Jan 25, 2009 Post Count: 24 Status: Offline Project Badges:
|
Thanks for the pointer SekeRob. Alas, I've already perused most of those links to no avail, and the other bases are covered.
----------------------------------------Yet, I think I have identified the issue - for which I am quite embarrassed - but it seems to be hardware related after all. I tried shifting from Turbo (default) to Standard CPU and RAM settings, after which everything errored. memtest86 is bombing out instantly, and another boot-time memory test is bombing out with Invalid Opcode errors - bingo. It looks like there is an issue with one of the RAM modules after all. edit: All modules are OK; looks like a motherboard issue now. Most odd; memtest passed fine just a month ago. edit: It seems I jumped the gun: no hardware faults are present. I was using v4.1 of Memtest, which doesn't support Intel SB. Version 4.2 is showing no errors. ---------------------------------------- [Edit 2 times, last edit by IlluminAce at Nov 8, 2011 8:25:51 PM] |
||
|
|
IlluminAce
Cruncher UK Joined: Jan 25, 2009 Post Count: 24 Status: Offline Project Badges:
|
Seen as I still couldn't get the port running reliably, I tried to compile 6.13.10 from svn. This was not without issue. Some of the steps that I needed to run:
----------------------------------------svn co http://boinc.berkeley.edu/svn/tags/boinc_core_release_6_13_10 # to build clientgui (boincmgr) with wxwidgets ln -s /usr/local/bin/wxgtk2u-2.9-config /usr/local/bin/wx-config ./_autosetup ./configure –disable-server &> configure-output # fix incorrect use of -ldl in Makefiles find . -type f -print0 | xargs -0 sed -i '' -e 's/\-ldl //g' # use stdlib.h (not malloc.h) for malloc sed -i '' -e 's/malloc.h/stdlib.h/' clientgui/stdwx.h # use socket.h (not network.h) for sockaddr_storage sed -i '' -e '23i\ # << newline #include “sys/socket.h”' clientgui/AccountManagerPropertiesPage.cpp # nasty - manually hack wxwidgets 2.9 to avoid conflict with gtk # add “#ifdef __WXMSW__ […] #endif around the #define on line 106, also the two #defines starting at line 133 in /usr/local/include/wx-2.9/wx/taskbar.h # fix use of non-FreeBSD include xlocale.h find . -type f -print0 | xargs -0 sed -i '' -e 's/xlocale.h/locale.h/g' make &> make-output I couldn't build boincmgr due to xlocale.h. However, even the client which did build couldn't be run due to use of ioctl to obtain the mac address - this returned null, which blew up the c lib. In the end I gave up and used one of Lars Bausch's pre-compiled FreeBSD clients, based on much more recent source than the FreeBSD port. I followed the detailed instructions on his howto page. Aside from a CA error and a couple of directory points, that worked perfectly, resulting in a flow like this: su # probe linux compat kernel module linux.ko kldload linux sysctl compat.linux.osrelease=2.6.32 # remove any boinc lockfile if need be: rm /boinc/active/lockfile # run boinc with alt platform specified, avoids need for app_info.xml - not WCG compatible cd /boinc/bin/opt/boinc su boinc ./boinc_client.alt --dir /boinc/active # run boincmgr of any version from any source, e.g. the port exit # (to root) boinc_gui # use the Select Computer menu option and provide the # hostname from boinccmd –-get_host_info # password from cat /boinc/active/gui_rpc_auth.cfg # if you get the error “peer certificate cannot be authenticated with known CA certificates” # copy the file ca-bundle.crt from a recent svn checkout of boinc, e.g. cd /boinc/src svn co http://boinc.berkeley.edu/svn/tags/boinc_core_release_6_13_10 cp boinc_core_release_6_13_10/curl/ca-bundle.crt /boinc/active chown boinc /boinc/active/ca-bundle.crt However, even with this much more recent (6.12.26) precompiled client, I am encountering similar problems to the port. As I speak, CEP has errored instantaneously twice, DSFL once, but a FAAH WU is running OK. I am now out of ideas. |
||
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
IlluminAce,
----------------------------------------I am sorry I can't give you useful tips since I am an ignorant about FreeBSD. However I want to warn you about your choice of BOINC clients: Unless the versions naming scheme has changed recently versions with an odd second number (like 6.13.10) are Alpha versions. And, personally, if I had a tricky problem to sort out, I would stay away from alpha software which can/will add a bunch of additional problems to the global picture. Best wishes for finding a solution. |
||
|
|
IlluminAce
Cruncher UK Joined: Jan 25, 2009 Post Count: 24 Status: Offline Project Badges:
|
Hi JmBoullier. I did notice a message in the stdout of 6.13.10 stating it was a development version, but wasn't aware of the naming scheme. Thanks for the info, if/when I try building boinc again I'll make sure to pick an actual release version! As for FreeBSD ignorance, you're not exactly alone: this is my first FreeBSD installation anyway!
----------------------------------------Partly due to that, I can't help but feel the cause is going to be a dodgy system configuration somewhere - ulimit or something similar... but without being able to debug the actual wcg binaries I don't know how to get more detail! (For the sake of experimentation, I did try to attach gdb to one, but that itself caused a sigill when trying to continue - I wasn't exactly surprised)As an update, I've limited the projects to the three that have succeeded so far (HCC, FAAH & HPF2). I'm not sure that's due to anything other than coincidence & HCC/FAAH being shorter WUs (hence getting "more chances" to succeed). Additionally I've posted the issue on the newsgroup mailing.freebsd.ports. |
||
|
|
IlluminAce
Cruncher UK Joined: Jan 25, 2009 Post Count: 24 Status: Offline Project Badges:
|
An extended Prime95 torture test run under FreeBSD has thrown up an error.
----------------------------------------I am considering this Resolved. Cause: unstable SB CPU. |
||
|
|
|