Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 28
Posts: 28   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 7029 times and has 27 replies Next Thread
Hans Sveen
Veteran Cruncher
Norge
Joined: Feb 18, 2008
Post Count: 981
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

Hi!
A quick question; whats the maximum replication for an ARP1 wu, just
got one(ARP1_033892_129) ?
So far 3 No Reply, 2 User Aborted and 2 in Progress and 1 Error!

Thanks😎
[Aug 29, 2022 4:35:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dinesh123
Cruncher
Joined: Sep 2, 2013
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

Facing similar error issues. Around 90% of the results are showing up as errors for me. I have a total of 47errors for African Rainfall Project in the past few days. Previously when IBM hosted the project I faced may be 1 out of 50 error results but never on this scale.
[Aug 29, 2022 7:17:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 280
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

I just checked my currently-shown ARP units, and the only "interesting" one was ARP1_0015034_132_5, Server Aborted, but that one had been successfully validated before my system got to crunching it. Everything else is either valid or pending validation.

I babysat stuck transfers, though (particularly the downloads). I suspect a lot of errors in general (not just ARP) are transfers that got stuck for too long.

Edit: As of this writing, the general supply of work units has dried up. I guess there's some more behind-the-scenes work going on, but today there has been a noticeable overall improvement in the stuck transfer department.
----------------------------------------
[Edit 1 times, last edit by spRocket at Aug 29, 2022 8:45:14 PM]
[Aug 29, 2022 8:40:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

Facing similar error issues. Around 90% of the results are showing up as errors for me. I have a total of 47errors for African Rainfall Project in the past few days. Previously when IBM hosted the project I faced may be 1 out of 50 error results but never on this scale.

What sort of errors? And how long does it usually take to run an ARP1 task on your system(s)?

if you are suffering file download errors (file size error or checksum error, not transient errors) there's not a lot that can be done about that, but it should be possible to evade some other errors...

If you are getting lots of errors where the only text in the report sent to WCG is a line describing the client in use, that probably means that the task hadn't started before the deadline so it cancels it and sends a "Not started by deadline" message to the server. As there might be [long] delays downloading work at present and the deadline is based on request time, not file receipt time, it may be a good idea to shrink your work cache size to reduce the likelihood of having jobs unable to start in time, or alter your device profile(s) on the server to reduce the total number of Arp1 tasks it will send to a system.

If you are getting application failures (SIGSEGV or SIGILL on Linux; Access violation or Illegal instruction [I think] on Windows) that's a completely different matter, and if you give more details someone might be able to help...

Cheers - Al.
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Aug 29, 2022 8:47:08 PM]
[Aug 29, 2022 8:43:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Vester
Senior Cruncher
USA
Joined: Nov 18, 2004
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

I have had no errors with over 800 valid results. Intel i9-10850K running at 4.9GHz.
----------------------------------------

[Aug 29, 2022 9:40:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dinesh123
Cruncher
Joined: Sep 2, 2013
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

Facing similar error issues. Around 90% of the results are showing up as errors for me. I have a total of 47errors for African Rainfall Project in the past few days. Previously when IBM hosted the project I faced may be 1 out of 50 error results but never on this scale.

What sort of errors? And how long does it usually take to run an ARP1 task on your system(s)?

if you are suffering file download errors (file size error or checksum error, not transient errors) there's not a lot that can be done about that, but it should be possible to evade some other errors...

If you are getting lots of errors where the only text in the report sent to WCG is a line describing the client in use, that probably means that the task hadn't started before the deadline so it cancels it and sends a "Not started by deadline" message to the server. As there might be [long] delays downloading work at present and the deadline is based on request time, not file receipt time, it may be a good idea to shrink your work cache size to reduce the likelihood of having jobs unable to start in time, or alter your device profile(s) on the server to reduce the total number of Arp1 tasks it will send to a system.

If you are getting application failures (SIGSEGV or SIGILL on Linux; Access violation or Illegal instruction [I think] on Windows) that's a completely different matter, and if you give more details someone might be able to help...

Cheers - Al.


Each task takes a little over a day to complete. I'm running the tasks on an i7 3610qm,it has 4 cores and 8 threads and 8GB RAM. It completes 8 tasks per day and I'm on ubuntu 20.04 lts.

Its the task that has been downloaded and run for a day and completed successfully and uploaded to WCG is showing up as "Error" under status of WCG website's results page.So no credit is provided for the completed task.
[Aug 30, 2022 5:34:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

Each task takes a little over a day to complete. I'm running the tasks on an i7 3610qm,it has 4 cores and 8 threads and 8GB RAM. It completes 8 tasks per day and I'm on ubuntu 20.04 lts.

Its the task that has been downloaded and run for a day and completed successfully and uploaded to WCG is showing up as "Error" under status of WCG website's results page.So no credit is provided for the completed task.
What does the Error status link show for these tasks??? There will be something there if you click on the link - if nothing else, it will show whether the job got going properly and, if so, how far it got before failing. If it says it's an Error, either it didn't complete successfully, or it had trouble during task wrap-up (perhaps "Finish file present too long"...)

Having seen your machine specification, there are so many possibilities I can think of that I'm not going to list the rest - for anyone to help you we need to know what the error messages are!

Cheers - Al.

P.S. One of my larger systems is a 4-core, 8 thread i7-7700K with 16GB RAM, on XUbuntu 20.04 -- I only let BOINC have 6 out of 8 threads, and won't allow it to run more than two ARP1 tasks at once, and that system can turn an ARP1 task around in 11 to 15 hours.
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Aug 30, 2022 6:50:24 PM]
[Aug 30, 2022 6:47:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2492
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

Well, one can't possibly run 8 ARP1's at a time on a 8 GB RAM computer, without getting into serious resource trouble. Check the System Requirements on the following link. Each ARP1, can use as much as 1 GB of Memory. The rest of the system needs memory too, of course.
With 8 ARP1 at a time on a 8 GB computer, there's got to be some serious swapping going on all the time, and errors will occur, I'm sure.
https://www.worldcommunitygrid.org/help/topic.s?shortName=minimumreq
----------------------------------------
[Edit 4 times, last edit by Grumpy Swede at Aug 30, 2022 7:10:47 PM]
[Aug 30, 2022 7:02:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

I am running an i7-3770 but with 20 GB Ram. The jobs slow down significantly over 4 ARP, although I can run 8. I try to limit it to 4 ARP and run other projects on the other 4 threads.

ARP is very intense at every checkpoint (each 12.5%) and also at downloading and uploading.

If I find units checkpointing at about the same time, I suspend the second unit for a couple of minutes to keep them apart. It doesn't matter which checkpoint - they are all intense.

There is also a possibility that the upload is timed out, depending on the bandwidth available at each end.

Mike
[Aug 31, 2022 1:52:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Dinesh123
Cruncher
Joined: Sep 2, 2013
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Multiple Errors

Each task takes a little over a day to complete. I'm running the tasks on an i7 3610qm,it has 4 cores and 8 threads and 8GB RAM. It completes 8 tasks per day and I'm on ubuntu 20.04 lts.

Its the task that has been downloaded and run for a day and completed successfully and uploaded to WCG is showing up as "Error" under status of WCG website's results page.So no credit is provided for the completed task.
What does the Error status link show for these tasks??? There will be something there if you click on the link - if nothing else, it will show whether the job got going properly and, if so, how far it got before failing. If it says it's an Error, either it didn't complete successfully, or it had trouble during task wrap-up (perhaps "Finish file present too long"...)

Having seen your machine specification, there are so many possibilities I can think of that I'm not going to list the rest - for anyone to help you we need to know what the error messages are!

Cheers - Al.

P.S. One of my larger systems is a 4-core, 8 thread i7-7700K with 16GB RAM, on XUbuntu 20.04 -- I only let BOINC have 6 out of 8 threads, and won't allow it to run more than two ARP1 tasks at once, and that system can turn an ARP1 task around in 11 to 15 hours.


Thanks for the reply.I checked now and it says download error,might be bandwidth issues and the website is also having a few bugs as valid tasks seems to disappear from the list in results page after a few days.The error message was for the tasks that weren't downloaded properly and not for the completed tasks I guess.Completed tasks might have become valid and disappeared from results page.
Here is the report-
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>ARP1_0004208_134_ARP1_0004208_input_d02</file_name>
<error_code>-200 (wrong size)</error_code>
</file_xfer_error>
</message>
]]>

I'm running 8 tasks and my RAM usage is around 6.2GB out of 7.7GB available and there seems to be no issue with L2 cache usage as well( checked it using cachestat). So I thought there is no issue in running 8 tasks.I have been running for years like this and I have not faced any issues.I also run climateprediction projects as well.When the tasks are of lower resolution I run 8 of those tasks as well and when the resolution of the tasks are higher I run 4 tasks.
[Aug 31, 2022 4:03:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 28   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread