World Community Grid - View Thread - ETA Way Overestimated

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: ETA Way Overestimated

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 9

[ ]

Author

This topic has been viewed 2355 times and has 8 replies

hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:

45 day badge for Help Cure Muscular Dystrophy

20 year badge for Mapping Cancer Markers

1 year badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


ETA Way Overestimated

One of my Win10 Pro x64 clients on BOINC 7.14.2 x64 just suddenly wigged out, and ETAs for all ARP1 tasks went from pretty accurate (i.e. 15-20 hours per task) to almost 8 days per task. As a result, the client is no longer requesting new tasks, saying "job cache full."

I manually went to Tools > Run CPU benchmarks hoping it would then re-adjust, but that didn't work. I waited for one task to complete and upload and report thinking it would re-adjust, but nope.

Other than restarting either the BOINC client or the computer itself, some questions:

1. What caused the estimates to go WAY out of whack?
2. What's the solution?
3. Is there a workaround in lieu of a solution?

Thanks! I don't want to restart the BOINC client or the computer because for this host ARP1 checkpoints occur every 2-3 hours.

----------------------------------------

i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 1 times, last edit by hchc at Nov 2, 2019 7:36:00 AM]

[Nov 2, 2019 7:35:21 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: ETA Way Overestimated

I can't really help, and I don't think I recall have this happen so blatantly to me, but I have had estimates double. Every time it happened it sorted itself out over time, but ARP1 tasks run so long that you may have to wait a day or two, I'm afraid.

I also seem to recollect that WCG stopped using the local benchmark data as they could be manipulated by the user, so they use feedback from results instead. This is another reason that things do, albeit slowly, adjust over time.

I think you'll find that, if you're patient, the mess will clear up of its own accord. [Though I do wonder if there's an underlying bug which is still breeding in the shadows ...]

[Nov 2, 2019 9:21:11 AM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

180 day badge for Africa Rainfall Project


Re: ETA Way Overestimated

hchc, from reading all your posts, you seem to interact with the server very often, and I am afraid you have triggered its "give-me-a-break" routine. wink

The last time I have observed what you describe was when I was trying to process as many very short SCC WUs as I could. My fastest machine had its queue at its 70-WUs-per-thread maximum, so every time it was returning one result it was requesting a new one. Practically that was about 30 times per hour because of the 2-minute delay.

As Apis said, it will fix by itself after several returned results. Unfortunately for you, several returned ARP1 results will probably take a looooong while. smile

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

[Nov 2, 2019 11:19:50 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: ETA Way Overestimated

Whatever you have in app_config.xml for ARP1, replace with below, where nn is the number you want to run concurrently for this science, 0 unlimited.

   <app>
      <name>arp1</name>
      <max_concurrent>nn</max_concurrent>
      <fraction_done_exact/>
   </app>

ETA ETC RTC, it will adjust time left, most of the time fairly accurately.

----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 2, 2019 11:53:45 AM]

[Nov 2, 2019 11:52:53 AM]

hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:


Re: ETA Way Overestimated

I'm at the one month mark for this issue. The ETA for one ARP1 work unit has gone from 7 days, 20 hours to 1 day, 3 hours. The actual runtime for an ARP1 work unit is about 15-18 hours on this device.

1. How long must I wait for this to normalize?
2. What's the root cause of why this happened in the first place?
3. Is there a way to fix this on either the client side or server side?

Thanks.

----------------------------------------

i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Dec 2, 2019 2:05:37 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: ETA Way Overestimated

Wait till you're blue in the face or start by forcing a CPU benchmark run on the client and apply the fraction_done_exact change to app_config.

[Dec 2, 2019 2:23:42 PM]

uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding

2 year badge for Human Proteome Folding - Phase 2

2 year badge for Discovering Dengue Drugs - Together

20 year badge for Nutritious Rice for the World

2 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

2 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

20 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

50 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: ETA Way Overestimated

hchc, originally the estimated times were way short, then i bumped them up, so that caused those who originally had some to go out of sync. Then looking into a points issue, I saw that the estimated fpops that are sent to the clients weren't getting adjusted automatically. This has been fixed from about 1-2 weeks ago.

Next, is the workunits themselves. They vary in runtime based on a few things that are in the workunits. A major one that can swing a runtime is if it is rainy in the simulation or dry.

The estimates for fpops are based on all workunits returned from all members as well.

Thanks,
-Uplinger

[Dec 3, 2019 3:06:41 PM]

hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:


Re: ETA Way Overestimated

Thanks uplinger, but that still doesn't explain why new work units received on this device -- even today 12/3/19 -- are still showing an estimate of 1 day 1 hour to complete on this device. It's taken quite a few completed work units to bring it down from 7 days 20 hours to this. I'm wondering if there's something I can delete in client_state.xml or somewhere to fix this?

Edited to Add: Speak of the devil, as of 9:45 AM CST or so (within the last 15 minutes), the estimate for a new work unit is 14 hours 15 minutes, which seems normal. Earlier this morning it said it would take 1 day 3 hours. It took about a month, but I think this machine is finally accurate!

----------------------------------------

i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 1 times, last edit by hchc at Dec 3, 2019 3:58:40 PM]

[Dec 3, 2019 3:56:11 PM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1407
Status: Offline
Project Badges:

90 day badge for Discovering Dengue Drugs - Together

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

20 year badge for FightAIDS@Home - Phase 2


Re: ETA Way Overestimated

A major one that can swing a runtime is if it is rainy in the simulation or dry.
.
.
-Uplinger

Interesting cool

Although set every 12.5% of progress (6 hour period of data), I already noticed different times between the checkpoints.
The difference between the shortest and longest run time between checkpoints is almost 50%.

The longest periods are during the processing of the data from 06 UTC - 12 UTC (twice in 48hrs).
That is the second and sixth pass and the shortest periods are between 18 UTC - 00 UTC.
That are the periods between the 3rd and 4th checkpoint and between the 7th and last checkpoint.

This may vary in future when other areas and months are processed, but for now I would not wonder
when the current batch is mainly grid data from Kenia area.

----------------------------------------
[Edit 3 times, last edit by Crystal Pellet at Dec 3, 2019 5:07:16 PM]

[Dec 3, 2019 4:59:48 PM]

[ ]