| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 28
|
|
| Author |
|
|
Hans Sveen
Veteran Cruncher Norge Joined: Feb 18, 2008 Post Count: 981 Status: Offline Project Badges:
|
Hi!
A quick question; whats the maximum replication for an ARP1 wu, just got one(ARP1_033892_129) ? So far 3 No Reply, 2 User Aborted and 2 in Progress and 1 Error! Thanks😎 |
||
|
|
Dinesh123
Cruncher Joined: Sep 2, 2013 Post Count: 3 Status: Offline Project Badges:
|
Facing similar error issues. Around 90% of the results are showing up as errors for me. I have a total of 47errors for African Rainfall Project in the past few days. Previously when IBM hosted the project I faced may be 1 out of 50 error results but never on this scale.
|
||
|
|
spRocket
Senior Cruncher Joined: Mar 25, 2020 Post Count: 280 Status: Offline Project Badges:
|
I just checked my currently-shown ARP units, and the only "interesting" one was ARP1_0015034_132_5, Server Aborted, but that one had been successfully validated before my system got to crunching it. Everything else is either valid or pending validation.
----------------------------------------I babysat stuck transfers, though (particularly the downloads). I suspect a lot of errors in general (not just ARP) are transfers that got stuck for too long. Edit: As of this writing, the general supply of work units has dried up. I guess there's some more behind-the-scenes work going on, but today there has been a noticeable overall improvement in the stuck transfer department. [Edit 1 times, last edit by spRocket at Aug 29, 2022 8:45:14 PM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1316 Status: Offline Project Badges:
|
Facing similar error issues. Around 90% of the results are showing up as errors for me. I have a total of 47errors for African Rainfall Project in the past few days. Previously when IBM hosted the project I faced may be 1 out of 50 error results but never on this scale. What sort of errors? And how long does it usually take to run an ARP1 task on your system(s)? if you are suffering file download errors (file size error or checksum error, not transient errors) there's not a lot that can be done about that, but it should be possible to evade some other errors... If you are getting lots of errors where the only text in the report sent to WCG is a line describing the client in use, that probably means that the task hadn't started before the deadline so it cancels it and sends a "Not started by deadline" message to the server. As there might be [long] delays downloading work at present and the deadline is based on request time, not file receipt time, it may be a good idea to shrink your work cache size to reduce the likelihood of having jobs unable to start in time, or alter your device profile(s) on the server to reduce the total number of Arp1 tasks it will send to a system. If you are getting application failures (SIGSEGV or SIGILL on Linux; Access violation or Illegal instruction [I think] on Windows) that's a completely different matter, and if you give more details someone might be able to help... Cheers - Al. [Edit 1 times, last edit by alanb1951 at Aug 29, 2022 8:47:08 PM] |
||
|
|
Vester
Senior Cruncher USA Joined: Nov 18, 2004 Post Count: 325 Status: Offline Project Badges:
|
I have had no errors with over 800 valid results. Intel i9-10850K running at 4.9GHz.
----------------------------------------![]() |
||
|
|
Dinesh123
Cruncher Joined: Sep 2, 2013 Post Count: 3 Status: Offline Project Badges:
|
Facing similar error issues. Around 90% of the results are showing up as errors for me. I have a total of 47errors for African Rainfall Project in the past few days. Previously when IBM hosted the project I faced may be 1 out of 50 error results but never on this scale. What sort of errors? And how long does it usually take to run an ARP1 task on your system(s)? if you are suffering file download errors (file size error or checksum error, not transient errors) there's not a lot that can be done about that, but it should be possible to evade some other errors... If you are getting lots of errors where the only text in the report sent to WCG is a line describing the client in use, that probably means that the task hadn't started before the deadline so it cancels it and sends a "Not started by deadline" message to the server. As there might be [long] delays downloading work at present and the deadline is based on request time, not file receipt time, it may be a good idea to shrink your work cache size to reduce the likelihood of having jobs unable to start in time, or alter your device profile(s) on the server to reduce the total number of Arp1 tasks it will send to a system. If you are getting application failures (SIGSEGV or SIGILL on Linux; Access violation or Illegal instruction [I think] on Windows) that's a completely different matter, and if you give more details someone might be able to help... Cheers - Al. Each task takes a little over a day to complete. I'm running the tasks on an i7 3610qm,it has 4 cores and 8 threads and 8GB RAM. It completes 8 tasks per day and I'm on ubuntu 20.04 lts. Its the task that has been downloaded and run for a day and completed successfully and uploaded to WCG is showing up as "Error" under status of WCG website's results page.So no credit is provided for the completed task. |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1316 Status: Offline Project Badges:
|
Each task takes a little over a day to complete. I'm running the tasks on an i7 3610qm,it has 4 cores and 8 threads and 8GB RAM. It completes 8 tasks per day and I'm on ubuntu 20.04 lts. What does the Error status link show for these tasks??? There will be something there if you click on the link - if nothing else, it will show whether the job got going properly and, if so, how far it got before failing. If it says it's an Error, either it didn't complete successfully, or it had trouble during task wrap-up (perhaps "Finish file present too long"...)Its the task that has been downloaded and run for a day and completed successfully and uploaded to WCG is showing up as "Error" under status of WCG website's results page.So no credit is provided for the completed task. Having seen your machine specification, there are so many possibilities I can think of that I'm not going to list the rest - for anyone to help you we need to know what the error messages are! Cheers - Al. P.S. One of my larger systems is a 4-core, 8 thread i7-7700K with 16GB RAM, on XUbuntu 20.04 -- I only let BOINC have 6 out of 8 threads, and won't allow it to run more than two ARP1 tasks at once, and that system can turn an ARP1 task around in 11 to 15 hours. [Edit 1 times, last edit by alanb1951 at Aug 30, 2022 6:50:24 PM] |
||
|
|
Grumpy Swede
Master Cruncher SvÃþjóð Joined: Apr 10, 2020 Post Count: 2492 Status: Offline Project Badges:
|
Well, one can't possibly run 8 ARP1's at a time on a 8 GB RAM computer, without getting into serious resource trouble. Check the System Requirements on the following link. Each ARP1, can use as much as 1 GB of Memory. The rest of the system needs memory too, of course.
----------------------------------------With 8 ARP1 at a time on a 8 GB computer, there's got to be some serious swapping going on all the time, and errors will occur, I'm sure. https://www.worldcommunitygrid.org/help/topic.s?shortName=minimumreq [Edit 4 times, last edit by Grumpy Swede at Aug 30, 2022 7:10:47 PM] |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
I am running an i7-3770 but with 20 GB Ram. The jobs slow down significantly over 4 ARP, although I can run 8. I try to limit it to 4 ARP and run other projects on the other 4 threads.
ARP is very intense at every checkpoint (each 12.5%) and also at downloading and uploading. If I find units checkpointing at about the same time, I suspend the second unit for a couple of minutes to keep them apart. It doesn't matter which checkpoint - they are all intense. There is also a possibility that the upload is timed out, depending on the bandwidth available at each end. Mike |
||
|
|
Dinesh123
Cruncher Joined: Sep 2, 2013 Post Count: 3 Status: Offline Project Badges:
|
Each task takes a little over a day to complete. I'm running the tasks on an i7 3610qm,it has 4 cores and 8 threads and 8GB RAM. It completes 8 tasks per day and I'm on ubuntu 20.04 lts. What does the Error status link show for these tasks??? There will be something there if you click on the link - if nothing else, it will show whether the job got going properly and, if so, how far it got before failing. If it says it's an Error, either it didn't complete successfully, or it had trouble during task wrap-up (perhaps "Finish file present too long"...)Its the task that has been downloaded and run for a day and completed successfully and uploaded to WCG is showing up as "Error" under status of WCG website's results page.So no credit is provided for the completed task. Having seen your machine specification, there are so many possibilities I can think of that I'm not going to list the rest - for anyone to help you we need to know what the error messages are! Cheers - Al. P.S. One of my larger systems is a 4-core, 8 thread i7-7700K with 16GB RAM, on XUbuntu 20.04 -- I only let BOINC have 6 out of 8 threads, and won't allow it to run more than two ARP1 tasks at once, and that system can turn an ARP1 task around in 11 to 15 hours. Thanks for the reply.I checked now and it says download error,might be bandwidth issues and the website is also having a few bugs as valid tasks seems to disappear from the list in results page after a few days.The error message was for the tasks that weren't downloaded properly and not for the completed tasks I guess.Completed tasks might have become valid and disappeared from results page. Here is the report- <core_client_version>7.16.6</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>ARP1_0004208_134_ARP1_0004208_input_d02</file_name> <error_code>-200 (wrong size)</error_code> </file_xfer_error> </message> ]]> I'm running 8 tasks and my RAM usage is around 6.2GB out of 7.7GB available and there seems to be no issue with L2 cache usage as well( checked it using cachestat). So I thought there is no issue in running 8 tasks.I have been running for years like this and I have not faced any issues.I also run climateprediction projects as well.When the tasks are of lower resolution I run 8 of those tasks as well and when the resolution of the tasks are higher I run 4 tasks. |
||
|
|
|