Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 45
Posts: 45   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 9083 times and has 44 replies Next Thread
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

I would caution against using consumer storage prices to judge how much back-end server storage for WCG costs. They are not equal.

Just for a little perspective, even if you were using consumer grade storage. A 1tb drive is about $50. It takes 1000 drives of this size to make 1 petabyte, so that would be $50,000 for a petabyte of storage. Not a trivial amount. And you will need some pretty hefty software to manage that much storage. Probably that is somewhat expensive also. Not to mention the electricity to keep that many drives and enclosures running and to keep them cool.
Just my 2 cents worth.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Mar 4, 2020 6:05:40 PM]
[Mar 4, 2020 6:04:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

Dayle

No-one seems to be backing you up here!

Mike

I'm backing him up.... He has a valid point.

It isn't possible to judge whether or not the point is valid without being able to evaluate the claims being made.

It is possible I missed it elsewhere, but I haven't seen any evidence or data showing throttling, to say nothing of throttling due to lack of storage space.

I would caution against using consumer storage prices to judge how much back-end server storage for WCG costs. They are not equal. Especially if WCG is using an IBM mainframe storage solution. (I don't know if they are or not)

From Jan 2 monthly call:
"We just wrapped up our first official, non-launch related, monthly call with the research team.

1. We discussed the project's long-term storage needs, which are quite large given the amount of data and the size of the work units. We're exploring different options and will let everyone know once we have a solution.
2. Once there's a storage solution, we will be able to increase the number of work units sent out to volunteers. One of the tech team will make an announcement when we're able to do this.


Early Stats
Total Runtime: ~ 203 years
Workunits Completed: ~ 62,988
Average Runtime per Workunit: ~ 28 hours"
[Mar 4, 2020 7:24:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

Sgt.Joe,
your estimation is very "raw".
You should consider that such storage units are operated based on RAID 6, i.e. depending of the number of disk for each RAID bundle, you should add at least 25% more probably 33% or 40% additional storage space.
Such storage units are never operated "stand alone" but they are mirrored; i.e. x2
Server class HDD do not cost USD 50 /TB but a little bit more.
You have to add the price of the SAN (storage unit) incl. all necessary expansion units.
Since I do not have the current IBM prices for SAN and HDD, I cannot calculate the Grand Total, but I can ensure you that we are far away from USD 50'000.- /PB.
---
Based on 4 TB HDD, the number of required HDD is:
- 4000 GB -> 4 TB (what is anyway inaccurate)
- RAID 6: 8 disks / RAID bundle -> 24 TB
- 1 PT -> 1000 TB (what is again inaccurate)
- 1 PT / 24 TB = 42 RAID 6 bundles -> 336 disks + 40 hot spare disks
- about 380 disks x2 (SAN mirroring) = 760 disks
Because of database performance, additional disks (SSD) should probably be planned for caching data; estimated about 50 SSD (incl. hot spare and mirroring).
---
Such a bunch of HDD is not fitting in only one storage unit (288 HDD / storage unit; see V5030).
At least 2 mirrored storage units must be planned (i.e. 4 storage units with 4 expansion units each).
---
Maybe the reconstruction of a 8 disks RAID bundle will take too long. In this case, 6 disks RAID bundle should be considered, instead of 8 disks RAID bundle.
This implies that about 420 to 440 disks x2 shall be planned: i.e. 880 disks + the caching SSD.
---
It is just an estimation.
---
Regardless of the electricity consumption, you have to consider the required man power for setting up and commissioning such a configuration.
Later, each time you have to update the firmware of the storage controllers, you will surely have to plan firmware update for each disk (that is really time consuming and not really funny).
The next one who comes to say that 1 PB is not a big issue, shall think a little bit deeper and try to consider the complete picture.
---
Cheers,
Yves
----------------------------------------
[Mar 4, 2020 7:59:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

Sgt.Joe, your estimation is very "raw".

It was meant merely to show that having 1PB of storage is no small matter. You are entirely correct in your further estimation of the costs of various forms of RAID, fully necessary for redundancy and backup purposes, not to mention security. The $50,000 would only reflect the cost of the disks, not all of the associated costs you detailed. Thanks for the additional input.
As a little sidelight, I just found out the U.S. Library of Congress has a 16PB storage system for its digitized information. It is mostly on a robotically controlled tape system, not on spinning disks. That is a scale which is difficult to comprehend.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Mar 5, 2020 2:05:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

Indeed, 16 PB based on tapes is still expensive but more "affordable".
The access time is not the same sleep sleep sleep
Nevertheless, tape libraries need to be secured as well. In the specific case, do the 16 PB represent the overall storage capacity - i.e. 2x8 PB - or are the 16 PB duplicated?
Yves
----------------------------------------
[Mar 5, 2020 9:02:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

Yves/Sgt.Joe

Having worked for a company that had a robotic tape system, many years ago, I can assure people that the access time would be sufficient for most storage purposes. Continuous access might not be sufficient, but I don't think that is what is needed here.

Mike
[Mar 5, 2020 10:00:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

1. The research team has obtained more storage. This allows us to increase the speed of the project again, so Uplinger is planning to double the speed today.
Good news from today's update
https://www.worldcommunitygrid.org/forums/wcg...ead,42064_offset,0#621879

CJSL
Crunching for the fun of it...
----------------------------------------
I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team


[Mar 5, 2020 5:40:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

Thank you cislman for relaying the news.
Cheers,
Yves
---
@Dayle: it confirms that WCG is/was not responsible for the storage limitation !!!
----------------------------------------
[Mar 5, 2020 6:26:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

In the specific case, do the 16 PB represent the overall storage capacity - i.e. 2x8 PB - or are the 16 PB duplicated?

I don't know the answer to that question, but I will try to research it. it is the storage system which powers the U.S. Library of Congress on line system.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Mar 5, 2020 7:04:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Throughput Throttled Until a Storage Space Solution is Found

I understood that WCG were to provide storage, but storage can be at different levels, so it seems that it was at Delft rather than WCG.

Mike
[Mar 5, 2020 7:47:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 45   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread