| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 45
|
|
| Author |
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I would caution against using consumer storage prices to judge how much back-end server storage for WCG costs. They are not equal. Just for a little perspective, even if you were using consumer grade storage. A 1tb drive is about $50. It takes 1000 drives of this size to make 1 petabyte, so that would be $50,000 for a petabyte of storage. Not a trivial amount. And you will need some pretty hefty software to manage that much storage. Probably that is somewhat expensive also. Not to mention the electricity to keep that many drives and enclosures running and to keep them cool. Just my 2 cents worth. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Mar 4, 2020 6:05:40 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dayle No-one seems to be backing you up here! Mike I'm backing him up.... He has a valid point. It isn't possible to judge whether or not the point is valid without being able to evaluate the claims being made. It is possible I missed it elsewhere, but I haven't seen any evidence or data showing throttling, to say nothing of throttling due to lack of storage space. I would caution against using consumer storage prices to judge how much back-end server storage for WCG costs. They are not equal. Especially if WCG is using an IBM mainframe storage solution. (I don't know if they are or not) From Jan 2 monthly call: "We just wrapped up our first official, non-launch related, monthly call with the research team. 1. We discussed the project's long-term storage needs, which are quite large given the amount of data and the size of the work units. We're exploring different options and will let everyone know once we have a solution. 2. Once there's a storage solution, we will be able to increase the number of work units sent out to volunteers. One of the tech team will make an announcement when we're able to do this. Early Stats Total Runtime: ~ 203 years Workunits Completed: ~ 62,988 Average Runtime per Workunit: ~ 28 hours" |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Sgt.Joe,
----------------------------------------your estimation is very "raw". You should consider that such storage units are operated based on RAID 6, i.e. depending of the number of disk for each RAID bundle, you should add at least 25% more probably 33% or 40% additional storage space. Such storage units are never operated "stand alone" but they are mirrored; i.e. x2 Server class HDD do not cost USD 50 /TB but a little bit more. You have to add the price of the SAN (storage unit) incl. all necessary expansion units. Since I do not have the current IBM prices for SAN and HDD, I cannot calculate the Grand Total, but I can ensure you that we are far away from USD 50'000.- /PB. --- Based on 4 TB HDD, the number of required HDD is: - 4000 GB -> 4 TB (what is anyway inaccurate) - RAID 6: 8 disks / RAID bundle -> 24 TB - 1 PT -> 1000 TB (what is again inaccurate) - 1 PT / 24 TB = 42 RAID 6 bundles -> 336 disks + 40 hot spare disks - about 380 disks x2 (SAN mirroring) = 760 disks Because of database performance, additional disks (SSD) should probably be planned for caching data; estimated about 50 SSD (incl. hot spare and mirroring). --- Such a bunch of HDD is not fitting in only one storage unit (288 HDD / storage unit; see V5030). At least 2 mirrored storage units must be planned (i.e. 4 storage units with 4 expansion units each). --- Maybe the reconstruction of a 8 disks RAID bundle will take too long. In this case, 6 disks RAID bundle should be considered, instead of 8 disks RAID bundle. This implies that about 420 to 440 disks x2 shall be planned: i.e. 880 disks + the caching SSD. --- It is just an estimation. --- Regardless of the electricity consumption, you have to consider the required man power for setting up and commissioning such a configuration. Later, each time you have to update the firmware of the storage controllers, you will surely have to plan firmware update for each disk (that is really time consuming and not really funny). The next one who comes to say that 1 PB is not a big issue, shall think a little bit deeper and try to consider the complete picture. --- Cheers, Yves |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Sgt.Joe, your estimation is very "raw". It was meant merely to show that having 1PB of storage is no small matter. You are entirely correct in your further estimation of the costs of various forms of RAID, fully necessary for redundancy and backup purposes, not to mention security. The $50,000 would only reflect the cost of the disks, not all of the associated costs you detailed. Thanks for the additional input. As a little sidelight, I just found out the U.S. Library of Congress has a 16PB storage system for its digitized information. It is mostly on a robotically controlled tape system, not on spinning disks. That is a scale which is difficult to comprehend. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Indeed, 16 PB based on tapes is still expensive but more "affordable".
----------------------------------------The access time is not the same Nevertheless, tape libraries need to be secured as well. In the specific case, do the 16 PB represent the overall storage capacity - i.e. 2x8 PB - or are the 16 PB duplicated? Yves |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
Yves/Sgt.Joe
Having worked for a company that had a robotic tape system, many years ago, I can assure people that the access time would be sufficient for most storage purposes. Continuous access might not be sufficient, but I don't think that is what is needed here. Mike |
||
|
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges:
|
1. The research team has obtained more storage. This allows us to increase the speed of the project again, so Uplinger is planning to double the speed today. Good news from today's updatehttps://www.worldcommunitygrid.org/forums/wcg...ead,42064_offset,0#621879 CJSL Crunching for the fun of it... |
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Thank you cislman for relaying the news.
----------------------------------------Cheers, Yves --- @Dayle: it confirms that WCG is/was not responsible for the storage limitation !!! |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
In the specific case, do the 16 PB represent the overall storage capacity - i.e. 2x8 PB - or are the 16 PB duplicated? I don't know the answer to that question, but I will try to research it. it is the storage system which powers the U.S. Library of Congress on line system. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
I understood that WCG were to provide storage, but storage can be at different levels, so it seems that it was at Delft rather than WCG.
Mike |
||
|
|
|