Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 83
|
![]() |
Author |
|
DerLetzteGermane
Cruncher Joined: Mar 31, 2020 Post Count: 6 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
is there an update on how things went at the weekend?
|
||
|
TigerLily
Senior Cruncher Joined: May 26, 2023 Post Count: 280 Status: Offline Project Badges: ![]() |
Hi everyone,
A member of the tech team has shared that some unexpected issues arose when we tried to restart. They are in the process of debugging the issues that came up. We will provide another update over the next couple of days. |
||
|
Hans Sveen
Veteran Cruncher Norge Joined: Feb 18, 2008 Post Count: 853 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank You for the latest update, we'll for new work anyway😉
|
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12561 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you, TigerLily.
Trying to restart is defihite progress. We are ready, willing and able to crunch. Mike |
||
|
TLD
Veteran Cruncher USA Joined: Jul 22, 2005 Post Count: 824 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Any new news when ARP will restart?
----------------------------------------![]() |
||
|
imakuni
Advanced Cruncher Joined: Jun 11, 2009 Post Count: 103 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Any new news when ARP will restart? In a couple© of days. ![]() Want to have an image of yourself like this on? Check this thread: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,29840 |
||
|
TigerLily
Senior Cruncher Joined: May 26, 2023 Post Count: 280 Status: Offline Project Badges: ![]() |
Hi all,
The tech team have shared the following update regarding the ARP restart: There are two remaining challenges: a networking issue in our cloud environment, and issues running the software that we built locally to support the extended pipeline that the scripting from TU Delft relies on. We have validated most steps in the new pipeline, and adjusted it to produce all workunits on a single internal research server. At TU Delft, the ~35k domains were split between two machines, each generating inputs for subsequent generations of their slice. Each of the two servers had slightly different scripting and environments. After integrating those workflows and building out our own, we discovered that we are not able to access the shared /science filesystem that BOINC components and the WCG workunit management pipeline/scripting relies on to build, load, send, validate/assimilate workunits for all projects, from the new box. Although the inputs themselves can be sent over the network without issue to a worker that already has access to the /science filesystem, we could not store locally or send over the network the quantity of intermediate files required and we couldn't store them locally. We can migrate the pipeline again, to the workers that already have access to this filesystem. However, we do not want to pursue this option because we do not have the resources available on those workers which are the critical path in production. We are also unsure if it will be faster than waiting for hosting at our data centre to fix this issue, so that we can again expose ports on the storage network. We require this fix for other planned maintenance regardless. The most likely outcome will be that we are able to put the new ARP1 input generation workflow on the storage network so it can read/write to the /science filesystem soon when hosting figures out what is wrong, but we will adjust if it becomes clear we cannot expect that issue to be resolved anytime soon. |
||
|
giba
Veteran Cruncher Brazil Joined: Dec 2, 2004 Post Count: 851 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi all, The tech team have shared the following update regarding the ARP restart: There are two remaining challenges: a networking issue in our cloud environment, and issues running the software that we built locally to support the extended pipeline that the scripting from TU Delft relies on. We have validated most steps in the new pipeline, and adjusted it to produce all workunits on a single internal research server. At TU Delft, the ~35k domains were split between two machines, each generating inputs for subsequent generations of their slice. Each of the two servers had slightly different scripting and environments. After integrating those workflows and building out our own, we discovered that we are not able to access the shared /science filesystem that BOINC components and the WCG workunit management pipeline/scripting relies on to build, load, send, validate/assimilate workunits for all projects, from the new box. Although the inputs themselves can be sent over the network without issue to a worker that already has access to the /science filesystem, we could not store locally or send over the network the quantity of intermediate files required and we couldn't store them locally. We can migrate the pipeline again, to the workers that already have access to this filesystem. However, we do not want to pursue this option because we do not have the resources available on those workers which are the critical path in production. We are also unsure if it will be faster than waiting for hosting at our data centre to fix this issue, so that we can again expose ports on the storage network. We require this fix for other planned maintenance regardless. The most likely outcome will be that we are able to put the new ARP1 input generation workflow on the storage network so it can read/write to the /science filesystem soon when hosting figures out what is wrong, but we will adjust if it becomes clear we cannot expect that issue to be resolved anytime soon. Again bad news but an valid update. Thanks. Good luck with this issues. ![]() ![]() |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1114 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Thank you for the update TigerLily. I'm sending good thoughts and luck to the tech team.
|
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2990 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you for the update TigerLily. I'm sending good thoughts and luck to the tech team. Yes, thanks for the update - it's much appreciated. Hopefully, this latest issue can be overcome shortly, and the learnings can be used going forward. ![]() |
||
|
|
![]() |