Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 5
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2517 times and has 4 replies Next Thread
jlrobins58@gmail.com
Cruncher
Joined: Jan 2, 2021
Post Count: 5
Status: Offline
Reply to this Post  Reply with Quote 
OPNG units failing with "Server Abort" about 10% of the time

I hadn't had that problem until a few days to a week ago. I have generally been seeing more jobs error out since the stress test back in April/May than before and almost exclusively on the GPU jobs. But that is just an impression. I don't track it that close.

I don't have big computing power, but I noticed it and thought it would be good to raise the issue.

It also appears that the jobs are only using my NVIDIA graphics and not my Intel GPU. But I don't know where to go to check that for certain; if there are logs generated by the jobs on my disk somewhere or what.

CPU work seems to be going fine.
[Jun 7, 2021 2:52:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPNG units failing with "Server Abort" about 10% of the time

As a rule server aborts happen when there is a problem with a particular task and it has returned numerous invalid or error results. I have seen many in the last week. Nothing to be concerned about. Link below is an example.
https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=702824287

EDIT: BOINC will only use the most capable GPU by default in a multiple GPU machine. Adding <use_all_gpus>1</use_all_gpus> to your cc_config file and selecting options> read config files in the BOINC client will enable all. Just an FYI, many Intel GPUs will not finish a task in the allowed time. If you get error results saying time exceeded in the result log you may have to disable it.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.


----------------------------------------
[Edit 1 times, last edit by nanoprobe at Jun 7, 2021 3:53:44 PM]
[Jun 7, 2021 3:00:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 786
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPNG units failing with "Server Abort" about 10% of the time

This can happen when a workunit is sent in response to one that timed out then the original unit returned before the new unit was started.
I am also seeing some where two units were sent in response to a unit returning error and one was returned before the other started.
More of both cases recently with larger workunits some Intel GPUs time out on.
I posted a command to increase limit in another thread.
https://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=659839

Paul.
----------------------------------------
Paul.
----------------------------------------
[Edit 1 times, last edit by PMH_UK at Jun 7, 2021 4:27:55 PM]
[Jun 7, 2021 4:25:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: OPNG units failing with "Server Abort" about 10% of the time

As Paul said, Server Aborts are usually of duplicate issued units where the original has been returned before the duplicate has started but he could also have said if the duplicate has not reached the first checkpoint. It is only then that the server knows that the duplicate has started. It is so that your tme is not wasted on a unit that has already succeeded. However, were the units re-sends? If not there must be a different explanation. You can check that from the results status for the unit.

As for your Intel GPU, I have given up on my sole Intel GPU because all the units errored. They all just about reached (or did reach) 100% but carried on for hours afterwards, then timed out for exceeding the time limit.

Mike
----------------------------------------
[Edit 1 times, last edit by Mike.Gibson at Jun 7, 2021 10:48:27 PM]
[Jun 7, 2021 10:44:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jlrobins58@gmail.com
Cruncher
Joined: Jan 2, 2021
Post Count: 5
Status: Offline
Reply to this Post  Reply with Quote 
Re: OPNG units failing with "Server Abort" about 10% of the time

Thank you for all of the quick and direct responses!

The ones offering how to change parameters go a step beyond my skills with Boinc, though. I do not know the location(s)of the parameter files mentioned. I am a new, casual user. I know the drive and the general directory it is in.

I have enough computer experience to know that just going in and mucking with parameters without any understanding is for the skilled and foolhardy.


(-;

Thank you!
[Jun 7, 2021 11:20:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread