Cover-Front2.pdf
1
2/10/09
9:31 AM
&
The Blog Posts of a Storage Industry Veteran Mike Workman
LogoPage.pdf
1
1/21/09
10:56 AM
Contents-2pg.pdf
1
2/10/09
12:18 PM
BLOG . P I L L A R D ATA. C O M
Blog Posts 1
Why We Started Pillar Data Systems
October 10, 2007
2
Green Machines – Value or Valueless Bandwagon
October 15, 2007
3
Sun’s So What
October 16, 2007
4
Should Pillar buy EMC?
October 19, 2007
5
16 Ways Geeks Agree
October 25, 2007
6
Behind the Wall of Sleep: Sleepy Drives
7
Different Strokes
8
Straight Talk – Even When it May Cost You
9
Storage Performance Benchmarks – Worthwhile or Worthless?
November 19, 2007
10
Oracle VM and Pillar Data Systems
November 25, 2007
11
FC, SAS, SATA – Who cares?
November 28, 2007
12
Engineering! Space Shuttle Atlantis and the ISS
December 11, 2007
13
Compromise
December 17, 2007
14
Happy New Year!
December 31, 2007
15
Spindles Count or Count Spindles
January 16, 2008
16
Solid State Disk (SSD)
January 24, 2008
17
When It’s Time to Relax
January 25, 2008
18
Application-Aware Storage – Pillar’s Axiom
19
STS-122; This time for REAL!
February 12, 2008
20
Application-Aware Thin Provisioning – Part of a Bigger Story
February 28, 2008
21
Changing the Industry
22
Fox Business News
March 12, 2008
23
Changing the Industry – Distributed RAID
March 21, 2008
24
Changing the Industry – Software Licenses
March 26, 2008
25
Changing the Industry: QoS - The Simple View
April 2, 2008
26
QoS – The Even Simpler View!
April 9, 2008
27
Axiom 300 for up to 20TB
April 18, 2008
28
Who cares about Performance under Fault Conditions?
April 23, 2008
29
EMC has a Multiprotocol Array? Good one.
30
QoS Contention, Or Sharing of High Performance LUNs and Filesystems
31
Green Noise
32
Smokin’ The Strong Stuff
May 28, 2008
33
Tiering on Disk
June 11, 2008
34
Access Density
June 19, 2008
35
Pillar Axiom 600
36
Six-Lane Driveway for your Home
37
Cloudy with a Chance of Meatballs
August 27, 2008
38
I Love a Parade
August 29, 2008
October 29, 2007 November 5, 2007 November 10, 2007
February 5, 2008
March 9, 2008
May 1, 2008 May 9, 2008 May 23, 2008
July 1, 2008 July 21, 2008
M O R E
Contents-2pg.pdf
2
2/10/09
12:18 PM
BLO G . P I L L A R D ATA. C O M
39
Application-Specific vs. Application-Aware
40
Storage Guarantees and Fine Print
September 25, 2008
41
Economic Downturn
42
The Age of Skeptical Empiricism – And That’s a Good Thing!
43
Two Ways to Skin a Cat?
November 14, 2008
44
Every Cloud has a Silver Lining
November 26, 2008
45
Is FCoE our Savior or Another Empty Promise?
46
SPC-1 Top Performance
January 13, 2009
47
InfiniBandwagon
January 26, 2009
48
Cheating at Poker
January 28, 2009
49
The Economy and the IT Industry
February 6, 2009
October 3, 2008 October 29, 2008 November 3, 2008
January 8, 2009
Green-TitlePage.pdf
1
2/10/09
12:20 PM
Blog Mike Workman
Wisdom, Wisecracks & Weener Dogs – Mike Workman, CEO, Pillar Data Systems – Fourth Edition
1a.pdf
12/15/08
11:08:04 AM
1b.pdf
12/11/08
10:53:55 AM
BLOG POST NO. 1 – OCTOBER 10, 2007
Why We Started Pillar Data Systems Over the last 5–6 years I have been asked many times why I started Pillar, especially since at the time (2001) it was not obvious that the world needed another storage company. The Yankee Group reported that year that there were about 128 storage startups. I was working then at IBM as the general manager of the OEM Subsystems business, an amalgamation of engineering teams in Tucson, a company that IBM bought (Mylex), and some labs in Yamato Japan. We were building components and systems (External RAID Controllers, Storage Shelves (Controller and JBOD), Host-bus Based RAID, NAS, and SAN subsystems). Our customers included IBM, Dell, SGI, NEC, Fujitsu-Siemens and a host of others. After analyzing the entrenched competitors, I realized that the real competition was the big boys: EMC, IBM, HPQ, Hitachi, LSI, and NetApp. My job was to build a product and a company that could beat them. It all came down to what customers need and could not get from any of these players. If you are at a big company, and you have 10+ million lines of legacy code which determines your hardware architecture, you know the barriers to change. Every damn thing you want to do is a big deal, and affects 20,000 people. Living under these conditions people get frustrated. They yearn for a clean sheet of paper. Embodied on the rumpled old sheets is the wisdom of many researchers, engineers, customers, and rectified mistakes. Pillar represented an opportunity to take much of that knowledge and couple it with the lessons learned and new technologies to build a better mousetrap.
Pillar was also a chance to break a business paradigm: Specifically the one that says you offer technology and advancement on a given platform in a way to protect your gross margin. Pillar was also a chance to break some long standing arguments that took on a religious tone, like NAS versus SAN (more on this in a later post). All storage is block based, but pushing a volume manager and file system into a storage system is great for some applications. At the other end of the file system is block-based storage. There is no reason that one excludes the other, and it is possible to build both at the same time onto the same storage pool as Pillar has done. The other day someone told me that all storage systems were the same because they all used disk drives from the same vendors. Yup. All airlines are the same because they use equipment from Boeing or Airbus or whomever. Yup, Roger that. What a riot. In the end a storage system is defined by the technologies that are applied in a particular configuration to solve the storage problem. Like airlines who use Boeing, they can differentiate themselves in many ways, right down to cashews versus peanuts. Those differentiators have helped Pillar succeed in a competitive marketplace. This blog is my vehicle to talk to you about why we’ve made certain decisions and why we believe these are the right moves for our customers. We’re happy to take on industry issues. We have strong opinions and we can back them up. I certainly can talk technology down to the bit level, if you’d like. I enjoy a good debate, and I like to win. So feel free to agree or disagree.
Let the discussion begin.
B L O G . P I L L A R D A T A . C O M
2b.pdf
12/15/08
11:04:16 AM
BLOG POST NO. 2 – OCTOBER 15, 2007
Green Machines – Value or Valueless Bandwagon? Professor O.J.M. Smith once told me that you could tell what the impact of a technology, or product, was on the planet by looking at its real cost (no subsidies, tariffs or taxes included). If a piece of recycled paper cost more to manufacture than a piece of new paper, it was worse for the planet than the new paper. If you think about this, it makes total sense, and it becomes obvious. So, if your intent is to produce a storage system that stores more for less cost, it is equivalent to producing a more environmentally friendly product. And if you do that with lower power disk drives that gain their density because of lower signal-to-noise ratio (SNR) of slower spinning disks, the result is a product that is more efficient than alternative systems.
Sure, everyone builds storage with SATA now; the arguments against it have subsided substantially. Not everyone builds enterprise storage with SATA disk though. And nobody gets performance and versatility out of SATA disk like Pillar does. When we started Pillar, most in the industry said you can’t get performance out of SATA; SATA disk wouldn’t work in the enterprise arena. Well, they were wrong about that. Score one for the greening of the planet. Speaking of green – we also chose green as our corporate color. We did that 5 years ago. We did it because we liked it, not because we were jumping on a bandwagon. We were born green, but we have never claimed that we were prescient.
When we started Pillar, we intended to bust the paradigm that high performance necessitated high RPM. We intended to prove that SATA could be made into an enterprise class product and not relegated to “near store”. We didn’t intend to build a far more efficient platform for on-line storage, but we did.
B L O G . P I L L A R D A T A . C O M
Contentsa.pdf
12/3/08
2:45:16 PM
3a.pdf
12/11/08
1:09:30 PM
WhoopDee Doo!
3b.pdf
12/11/08
1:10:01 PM
BLOG POST NO. 3 – OCTOBER 16, 2007
Sun’s So What
Jonathan Schwartz at Sun shocked no one a few days ago by announcing that Sun has decided to do a reorganization. Well, stop the presses! It turns out that Sun is going to fold the Storage group into the Server group because it realizes that what they sell is a “System”. Time to re-think this whole business, and, as Mario Apicella wrote, watch out EMC, NetApp, and all of us littler guys like Pillar. The times they are a changin’. Yes this is the kind of rock-your-world announcement that can have you staring at the bottom of a bottle, wondering how the rest of us missed such a critical observation and organizational structure. Well, the rest of us except IBM, who has done this at least three times in the last 15 years. And every time IBM did it, they subsequently un-did it. Why? Because a company’s organizational structure depends on how they view their business rather than technology or the needs of their Customers. Organizations aren’t products; they should be immaterial to the Customer. Perhaps Steve Duplessie said it best when he said, “Whoopdee Doo” in his normal eloquent fashion. Server people look at storage as “clothing” for Servers. Storage people look at storage as well, storage. The
storage industry is quite large, and storage companies and storage divisions of larger companies look at the total available market (TAM) as their customer base, not just the part of the TAM that has their company’s Servers, or Switches, or whatever in it. So in the end, if your company’s attach rate is say, 30%, meaning that 30% of your servers are sold with your own storage instead of somebody else’s, you could argue this two ways: 1) Build a better storage subsystem to improve your attach rate, or 2) Sell your storage product on everyone else’s servers in addition to your own – since that is a lot larger opportunity anyway. Both of these arguments are reasonable, hence the shuffling around of these groups inside companies like IBM and Sun. The truth is, the shuffling has more to do with internal politics, sales force structure, and business growth targets than it does with some technological shifts or customer requirements. Perhaps my annoyance is too obvious, but for crying out loud! It seems like the internal machinations of our companies are not really relevant to our Customers; it is goods and services that matter. Who reports to whom should not matter: If it does, we have gone nowhere because all storage manufacturers were part of their respective “server” (read computer mainframe) groups 50 years ago.
B L O G . P I L L A R D A T A . C O M
4a.pdf
12/9/08
9:56:01 AM
4b.pdf
12/11/08
11:09:43 AM
BLOG POST NO. 4 – OCTOBER 19, 2007
Should Pillar buy EMC? Have you heard the rumor that we are down to our last $20 million? What about the one that says, Larry Ellison, our sole investor wants to pull the plug on us? Must have been a Hell of a dinner our sales guys just expensed. I knew they just had a wonderful event at Storage Expo in the UK but wasn’t aware it had cost that much.
Pulling the plug on Pillar would be as preposterous as the headline to this post is for Pillar. It would be as inane as deciding, when you no longer wanted to live in a house, to bulldoze it. A simple message to our customers and prospects: Pillar has money. It is well funded. We pay our bills. We are going to be around for a very long time.
Every few months it seems, some competitor starts a rumor that Pillar is broke. Such rumors seem to get stronger at the same pace as our market position strengthens. Let’s cut the crap. Pillar is alive and flourishing. We are succeeding by every measure. Rumors of our death are just old-fashioned FUD. If Pillar were out of money, we wouldn’t be hiring. We wouldn’t be gaining in customers and sales. I would never assume to know what Larry’s investment plans may be, but, if he decided he no longer wanted to invest in Pillar, would the logical course of action be to pull the plug on something this valuable? I think he’s smarter than that.
B L O G . P I L L A R D A T A . C O M
5a.pdf
1
2/3/09
3:53 PM
BLOG POST NO. 5 – OCTOBER 25, 2007
16 Ways Geeks Agree People say getting techies to agree on anything is a bit like pulling the molars from a Rottweiler. Well, I’m a geek, and I disagree with that. Here are 16 ways…
1.
A Switched Fabric is better than Loop topology for all but the most trivial applications (Disk Drive attachment).
2. 3. 4.
Off the shelf parts and standards are less costly to build a system around than proprietary ones. Modular is better, more flexible, than monolithic. Hardware-assisted RAID usually out-performs Software RAID, especially in re-build.
5. 6. 7.
SAN and NAS are both block based storage at some point. FC, Ethernet attached storage both have their place (File or Block).
To share storage, a SAN is optimal for some applications, NAS for others, both often fit in the same shop.
8.
A storage pool is much more flexible than dedicated hardware for each LUN or file system – at least virtualize your own if you can’t virtualize everyone else’s stuff!
9.
Cache memory is a weapon in the war on performance, and it isn’t too expensive.
5b.pdf
12/15/08
10. 11.
8:23:28 PM
Quality of Service (QoS) concepts in Networking are valuable.
Quality of Service concepts applied to Storage can be very useful in many applications (Pillar QoS).
12. 13. 14.
All LUNS and File systems probably do not have equal business priority.
“Ease of use” is very hard to do, but necessary as systems get more complex.
Single points of failure (SPOF) are not allowed in Storage Subsystems; highly available (HA) systems are.
15.
Reliability, Availability, and Serviceability (RAS) are becoming more and more important; Uptime is imperative.
16. 17.
Maintenance can be made easy if you want it to be, but it is a lot of work. The iPhone is way overrated. I said 16, not 17.
Hey. This whole thing was a set up. We started Pillar knowing full well technology pros would agree with this list of Pillar premises. Think there’s a reason to go with a more expensive, or a less capable solution from one of our competitors? I disagree.
B L O G . P I L L A R D A T A . C O M
6a.pdf
12/11/08
11:18:50 AM
BLOG POST NO. 6 – OCTOBER 29, 2007
Behind the Wall of Sleep: Sleepy Drives I am sorry I coined the phrase “Sleepy Drives”. As soon as I did it, the Pillar marketing team howled, and everyone in the company started saying “Sleepy Drives”. The term comes from a wonderful cartoon that my daughter and I loved when she was a kid (I still do love it) called “Porky’s Bear Facts” with Porky Pig, or as she said then “Porkity Pig”. She couldn’t remember the real title, so she called it “Sleepy Bear” or sometimes “Lazy Bear”. For me, it is just fun to say “Sleepy Drives”, and the very folks that were complaining about the term said it to members of the technical press, and now you have the term “Sleepy Drives” popping up in articles. Well, the guy who started talking about it assumed our implementation would spin drives down to save power, but not stop them. The problem with partial spin-down is, only Hitachi offers that function so unless you are prepared as a storage vendor to accept a single source for disk drives, you are outta luck. Pillar is developing Sleepy Drives, and as COPAN says, this is not MAID. We enthusiastically support that interpretation. MAID, as COPAN describes it, crams so many disks into a cubic meter of space that they cannot all be on simultaneously for several reasons: 1. The designs aren’t typically adequate to get the heat out of them if they are all on, (They would be “Blown to Smithereens”). 2. They typically aren’t packaged and configured to provide anything but streaming performance like tape. 3. They aren’t serviceable (based on Pillar’s standards). 4. The FRU size is ginormous. Now, all these problems could be solved, but the concept of MAID is “cheap dense disk is better and faster than tape”, so why bother? Well, there are reasons to bother, but that is another topic. Pillar’s Sleepy Drives implementation is similar to Hu Yoshida’s description of that which Hitachi employs. We have some better management, QoS architecture and storage pool segmentation than Hitachi, but that too is another topic. Suffice it to say that Sleepy Drives was meant to follow this definition: “If you have hard disk drives that do not need to operate continuously
throughout each day, you do not need to keep them spinning continuously at full RPM.” The catch in the above statement is “What do you mean by continuously?” The answer involves disk drive physics, but the drives are electromechanical and take about 30 seconds to spin up from a dead stop. So it makes no sense to think of time frames much shorter than a few minutes. But, we run into another problem: Disk drives have finite start-stop cycle lives. Depending on the manufacturer, the drives can start and stop with the read-write heads “in contact” with the disks (they drag on the disk) or they can be lifted off and plopped onto the disk while at flying speed. Either way, manufacturers specify startstop cycles at somewhere between 20,000 and 50,000 cycles. A quick calculation shows that a 5 year life span would allow only about 11 cycles a day! Thus – if you are thinking about allowing the drives to spin all the way down for periods of less that about 2 hours, you had probably better think again. Pillar’s implementation allows you to “rope off” sections of the storage pool called a “storage domain”. The power saving characteristics are assignable by domain. Sporadically used domains can be set as Sleepy Drives, while normal “continuous duty” domains would not. Some vendors stovepipe solutions into entire arrays that either do or don’t feature Sleepy Drives. Sticking with the Utility storage model, and the Axiom’s ability to use QoS to multi-tier and provide multi-tenancy in one (or multiple) arrays, we decided that we can turn back a tanker with the same platform that provides continuous duty, highly available, high performance storage. Well, perhaps half a tanker – but it is definitely more efficient than buying a different solution for every type of storage application. How much you save depends on how much of the disk you “rope off” and how much time they spend sleepin’. That’s our take...what do you think? Sleepy Drives aside, if you have young kids, check out the ‘toon.
6b.pdf
1
2/4/09
3:47 PM
B L O G . P I L L A R D A T A . C O M
7a.pdf
1
2/3/09
5:43 PM
B L O G . P I L L A R D A T A . C O M
7b.pdf
12/12/08
6:17:31 PM
BLOG POST NO. 7 – NOVEMBER 5, 2007
Different Strokes Unfortunately, disk drives are not solid-state devices like RAM or microprocessors. Disk drives are more like blenders, or doorbells. Disks spin, and the read-write heads move because they access data they write in concentric circles called tracks. There are about 25000 tracks in a disk drive of current vintage. The distance the read-write heads move to access the track closest to the outside of the disk from the ones closest to the center of the disk is often referred to as the “stroke length”. To read or write data stored on a track, the disk drives have to move the heads from the track they are on, to the track to be accessed. Easy enough, but it is slow; it takes milliseconds, during which an Opteron or Xeon could execute a million instructions. In fact, it is so slow that a lot of Storage system engineering, and database engineering is put into trying to hide the relative slowness of disk. One way to cut the time to access data on a disk is to cut the distance the heads need to travel to get from track to track (access time), and another is to cut the distance the heads need to seek over – stroke length. The head move time goes as the square root of the distance traveled, for you can cut the move time by a factor of two if you cut the stroke (distance) down by a factor of four. This is called “short stroking”. To double the speed, you throw away 75% of your storage space. In the old days, people made storage “drums,” where fixed heads just floated over rotating magnetic cylinders, then later over a disk surface. Why? No move time!
Pillar’s Axiom QoS, you don’t have to tell the system where to put the data. Who does then? Well, not you Ms. or Mr. Database Admin. Do you really want to assign LBA ranges to provision 100 LUNS or Filesystems? Probably not – unless you are trying to get out of going to your neighbor’s kids’ piano recital again. With Axiom, the system asks you the application type, a few other parameters like read, write, or mixed bias, redundancy level, and relative business priority to lay the data out for you. It stripes it, determines RAID level, write performance level, and business priority. The trick to statistical short stroking is that you have queues that make sure the disk drive heads stay over the highest priority bands when they need to be, most of the time. Since Axiom has the business priority and queue managers for each file system and LUN, there is no reason why we cannot also assign cache, network bandwidth, CPU utilization all aligned with business priority and application characteristics as well. In fact, this is exactly what Axiom does. If you have a “Write anywhere” anything in your NAS system, it is a little difficult to plop “Write here” technology into your product. Write anywhere is clever, and had its day, but it is the worst thing you can do from a data layout point-of-view. Fragmentation /non-contiguous data layout is not a good thing for performance. For a great SAN product, the last thing you would do is “Write anywhere”; it is inefficient as hell.
While underutilized NAS can survive some fragmentation and meet NAS performance norms, SAN cannot. Hence, making a SAN product out of an underlying NAS structure Anyway, the problem with short stroking is that you is not a great idea. It fact, doing so is akin to having a end up throwing away capacity. Being greedy, I always wondered why we couldn’t pretend to throw away capacity hammer and convincing yourself that everything is a nail. to make it appear we had achieved faster access time, but But, a good SAN product can be interfaced to a volume sneak in, every now and then, to the “thrown away part” manager, a file system, and a protocol stack to make a and read/write stuff. Well, that’s the basis for Pillar’s QoS NAS system, so this is exactly what we built at Pillar. Why Operating System. Essentially, we partition the disk off so not? As long as the SAN doesn’t go through the NAS for important, high priority, fast-accessed data is stored in the data or have NAS determine its data structure and layout, short stroke regions and less important stuff is stored in the all is good. This is the Pillar Axiom and what in my mind rest of the space. We call this “statistical short-stroking.” is meant by “converged storage.” Stacking stuff in the same rack is not “convergence”– nor is driving a screw in Many systems can specify LBA ranges (which translate to with a hammer. track or data band ranges) where data is to reside. With
8a.pdf
12/9/08
11:24:51 AM
8b.pdf
1
2/3/09
3:56 PM
BLOG POST NO. 8 – NOVEMBER 10, 2007
Straight Talk – Even When it May Cost You We’ve all been there before. You see something that doesn’t seem right… or that you know is not right, but hesitate to say anything about it. Especially if it’s someone we know… like a friend, family member or Customer. Well, I think sometimes there is no better way to deal with the situation than to just blurt it out.
Look, at the end of the day when you are selling stuff, you have three choices: 1) Sell the Customer what they want, 2) Sell them what you honestly believe is the best choice for them, and 3) Walk away before you become part of a situation where you know the Customer is making a serious mistake.
We had a Customer who was trying to configure their data center in a way that frankly was not smart. I said, “Tell them so.”
If a Customer is making a mistake with serious ramifications, you owe it to them, and to the company you work for, to do your best to intervene, even if it means giving up the sale.
Of course I realize there is a natural tendency for people inside a company, especially Sales and Marketing people, to tread so carefully around a customer that they won’t tell them that what they want – is not what they need. I don’t believe in the old adage “The Customer is always right.” Sure, I understand and endorse the spirit of that saying, but it isn’t always true. There comes a time when you have to say “If it were me, there is no way I would do what you’re proposing.” And say it with conviction. And hold your ground. You should stop short of saying, “Hey, that’s stupid,” but you should not be quiet or hide behind the drapes. Someone who is going to damage their business should be stopped. Why not? Because you might offend them? Or worse... lose the sale?
B L O G . P I L L A R D A T A . C O M
Besides – if you are committed to the Customer’s welfare – perhaps this is the only way you can get through to them – by putting your money where your mouth is. Oh, and by the way, the technique of “just blurt it out” referenced above – Don’t try this with your spouse or significant other. After 22 years of marriage I ended up naming a wine I make “Dog House” after such an event.
9a.pdf
12/11/08
11:31:39 AM
BLOG POST NO. 9 – NOVEMBER 19, 2007
Storage Performance Benchmarks – Worthwhile or Worthless? At Oracle OpenWorld last week, a customer asked me about storage benchmarks. I told him to approach performance benchmarks with a high degree of skepticism... especially if they were published by someone trying to sell him something! The fact is benchmarks are often not good representations of actual workloads or environments. That said, good benchmarks don’t have to necessarily represent real-world applications in order to be useful. EPA estimates for automobile mileage get criticized for this, but the estimates are useful comparisons for two vehicles even though no one drives the way the tests are defined, except my mother-in-law. Good benchmarks, like the EPA performs, have extreme controls and allow no vendor “tuning”. And what about SPC benchmarks? Well, they are very tunable, and the most configurations are downright ludicrous for most of the market. See for example this config – 1536 Spindles for 24TB… for Pete’s sake!… only $29/GB of user capacity, and a whopping $58/GB for the storage used in the benchmark! Yeah, right. Is this a typical real-world config? Nope. Not even close. My advice to anyone using taking these benchmarks too seriously is this: Don’t. I think it is important to point out that SPC benchmark engineers work hard to build representative workloads; workloads are engineered and are often close enough to reality that they represent meaningful tests. But representative workloads are not the problem. The problem is more centered on extreme configuration of solutions as in the $3M+ case sited above. Most vendors who participate in these types of benchmarks put together corner-case configurations to win marketing bragging rights. Enough said.
It is interesting to read Dave Hitz’ Blog on this subject. The shape of the IO response curve is indeed critical, and Dave points this out by referring to the IO rate when the response time is 1mS. The simplest characterization of response time versus IO rate curves is two numbers: 1) The minimum response time (Y-Axis intercept) and 2) The maximum IO rate (ridiculously large response time). I find it amusing that Dave cannot even use this characterization of the response time curve – he has to summarize it as one number. His point is valid, but for crying out loud! Are we all so pressed for time that we cannot look at the curve, two points, or more data than one number? One respondent to Dave made a critical observation: How many spindles? How many filesystems? These are major influencers on the result! The hilarity of this is that the whole give and take on the subject reflects the problem of “benchmarks”: they aren’t well enough controlled. In general, people don’t want to read and understand lots of data or analysis, they want a simple number. When you try and boil the performance of a complex system down to a simple number, the results are extremely easy to manipulate and often misleading. So what should you do in trying to assess performance? The best way is to do proof-of-concept tests or “bake offs” are useful if done in your shop by your own folks while trying to get the most out of all the systems tested as if you owned them. I think a key question for any benchmark is this: Who ran it, for what purpose? Weight your interpretation of the results accordingly.
B L O G . P I L L A R D A T A . C O M
9b.pdf
12/3/08
3:20:36 PM
10a.pdf
12/16/08
3:34:39 PM
10b.pdf
12/11/08
11:35:17 AM
BLOG POST NO. 10 – NOVEMBER 25, 2007
Oracle VM and Pillar Data Systems Like all storage vendors, Pillar Data Systems was at Oracle OpenWorld in force a couple of weeks ago. Amusingly, I had many people ask me about Pillar supporting Oracle VM. My answer was, and is, “Of course.”
Oracle is a $17B+ Global 500 Company. Is there any small company that sells in the Enterprise that shouldn’t ensure it works with Oracle Database and Applications? Doesn’t this sound like a reasonable business proposition?
That answer provoked many responses, including laughter, raised eyebrows, and comments like “I guess you have to.”
Of course we support Red Hat Linux as well because many of our Customers use it, and therefore it would be dumb not to, right?
For God’s sake. Pillar is in no position to choose not to support vendors who our customers want us to support. Our support of Oracle has nothing to do with a relationship between anyone at Oracle, even Larry Ellison, and Pillar. We love VMware, and we support the heck out of it because over 20% of our customers use it, not because we love EMC.
I suppose it is a lot more fun to bring personalities like Larry into it, and there is nothing wrong with that. Frankly, the association with Pillar and Larry is fun most of the time. Celebrities like Larry and Bill Gates are fun, but along with the positives there are negatives…you have to take the good with the bad.
Do people work with Microsoft because they love Microsoft? Hmmm. Sometimes, but their personal affinity for Microsoft or the executives they employ should really have nothing to do with the business decision of working with Microsoft.
B L O G . P I L L A R D A T A . C O M
11a.pdf
12/10/08
12:30:36 PM
FC
HIGH PERFORMANCE
QoS
SAS
Serial Attached SCSI
l a t e n c y
15000 RPM
7200 RPM
Fibre Channel
SAS
Serial Attached SCSI
10000 RPM access time SATA V I R T U A L T A P E L I B R A R I E S
11b.pdf
12/11/08
11:37:39 AM
BLOG POST NO. 11 – NOVEMBER 28, 2007
FC, SAS, SATA – Who cares? At Oracle OpenWorld a few weeks ago, I was asked more than once if we provide SAS drives. Although I answered politely, another thought came to mind: Why do you care? The fact is any disk drive can be made to work with any interface. The fundamental performance characteristics of the drive – access time and rotational speed – have nothing to do with the communication interface you use. Interfaces are chosen for their relative cost and physical packaging related attributes way more than they are for fundamental performance. FC is not meant for a laptop environment, it has drivers and receivers that are meant for much bigger, physically distributed systems. Desktop computers are made in the hundreds of millions, so computer manufacturers notice even a 10 cent difference in price. Simple, point-to-point connections of limited distance suffice for desktop applications. So why do people ask? Well, I think they aren’t really asking about the interface, they are asking about the type of drive. In other words, I think they want to know if this a 7200 RPM drive, a 10K RPM drive, or a 15K RPM drive? Is this a high capacity drive or a lower capacity drive? Is this “fast” or not? From the interface type, they draw conclusions which are not unreasonable, about system performance or most likely application target. The mindset has been that SATA drives are high capacity, lower performance on a relative scale, so they are great for applications like back-up and virtual tape libraries (VTL). When we introduced high performance arrays based on SATA, many raised their eyebrows because it sounded oxymoronic. After all, people were using them in storage arrays called “near store,” so how could they be used for serious “real store” systems? The truth is, we get a hell of a lot of performance out of SATA disk drives; we wanted to, that’s all. Our QoS architecture allows the system to overachieve or exceed most peoples expectations for performance on “SATA” disk (read 7200 RPM slower access time).
However, at the end of the day, the drive physics can get ya. Regardless of the interface the access time and latency will become visible in certain applications. So, in those applications, if you can afford it, you use low latency, fast access time drives. The trick is realizing which applications need what, rather than using the most expensive power hogs for everything. Storage system designers try to get the most out of the components they use – at Pillar, the QoS architecture allows fantastic performance from slower but much more cost effective disk, at half the power. When applications push the system, like all the other vendors, we resort to higher performance drives like FC, and SAS drives or “Server” disk drives. Why are these choices of last resort? Because they are more expensive and burn more power, that’s all. So what about SAS then? Well, SAS (Serial Attached SCSI) is associated with “Server” drive attributes, and is a better alternative than FC for certain applications, mostly due to the electromechanical packaging requirements of the enclosures the drives are packaged in. Specifically, small, 2.5 inch form factor, higher performance drives are being packaged into an enclosure that provides good density and response time characteristics for storage subsystems. SAS isn’t what people are asking about – they are asking if you are offering storage bricks (Pillar’s nomenclature) that have more actuators per TB with faster response time. Oh… and the short answer is, yes. We have SATA, FC drives and SAS is in our plans, along with the rest of the industry.
B L O G . P I L L A R D A T A . C O M
12a.pdf
1
2/2/09
6:45 PM
BLOG POST NO. 12 – DECEMBER 11, 2007
Engineering! Space Shuttle Atlantis and the ISS Thanks to NASA, I had the pleasure of visiting Kennedy Space center last week to view the launch of STS-122. Unfortunately, the launch was scrubbed and not rescheduled before I had to return home. The trip was not in vain however; a group of us were afforded the opportunity to tour the preparation facilities and view the back-up shuttle Atlantis up close as it was being readied for its next flight. The STS-122 mission continues construction of the International Space Station. We were able to see many of the pieces of the ISS being readied for journey into orbit. It is hard to describe the feelings one has when being up close to such a magnificent undertaking, especially without spewing factoids that are fun but can trivialize the grandeur of a project like this. I have to hand it to those who created the movie “Contact” based on Carl Sagan’s novel, as the feelings I got watching that movie were very similar to those I had traveling around Kennedy Space Center: awe and astonishment of the size and complexity of a large space travel project being undertaken by mankind.
Melanie, Larry and Mike
The night before the planned launch, we were driven out to view the Shuttle Atlantis on the pad. As the bus wound its way past the Vehicle Assembly Building, “the Crawler” (for transport to the pad which also hauled Saturn rockets for Apollo), we passed visual obstructions and the Shuttle came into view; gasps, people uttering “Oh My God”, and then a strange silence fell over those on the bus as we all took in the breathtaking view of the brightly lit Atlantis standing on its tail poised for launch. My eyes began to get teary; I could not help but feel proud to be an American. I was also proud to be an Engineer. I was feeling a bit nerdy for thinking that, when Larry Ellison said quite out loud “That’s Engineering”. Indeed. What’s the tie in with Pillar? Well, me I guess, and hey, it’s my blog. But I will say that nearly half of Pillar’s employees are Engineers, and I am damn proud of them, and their work.
Mike, Max and Nancy
Footnote: I am very grateful to the wonderful folks at NASA who afforded us the opportunity to visit the Kennedy Space Center. Their professional and passionate pursuit of their mission is inspiring. B L O G . P I L L A R D A T A . C O M
12b.pdf
1
2/2/09
6:44 PM
P H O T O
B Y
M I K E
W O R K M A N
13b.pdf
12/10/08
11:33:21 AM
13a.pdf
12/11/08
11:45:37 AM
BLOG POST NO. 13 – DECEMBER 17, 2007
Compromise
My last post was on STS-122, thanking NASA for the pleasure of visiting Kennedy Space center two weeks ago to view the launch of Space Shuttle. The launch was scrubbed and has now been rescheduled to January! Good thing I didn’t wait around Cocoa Beach for NASA to light the fuse on this baby. As we viewed the Shuttle on the pad in the daylight, a guide told us that the external hydrogen fuel tank was its natural color; by leaving it unpainted, they could add an additional 600 pounds of payload to the orbiter! Thinking about that, it means that NASA has decided that it is OK to have an ugly external fuel tank, but they drew the line at leaving the Shuttle itself looking like crap. I am sure that someone can offer a reason it has to be painted so pretty that rationalizes the payload loss, but I bet it is pride. Don’t get me wrong, I think they made the right choice. Who wants to see some ugly contraption sitting on the launch pad with the letters USA on it?
Personally, I have struggled with the same decision at Pillar (obviously in a far less important or grandiose way), but the look of our product costs us a bit of money. We spent time and energy trying to make it look pretty. We have had hundreds of customers, analysts, even competitors opine on the aesthetic appeal of our product. At the same time, there have been more than a few who say “who cares.” Well, I suppose I care. It’s sort of like a car, I care that I am not drivin’ around in a car that looks like a dog’s rear end, and I care that our data centers look sharp. Who the heck wants to pay forty grand – or worse, a million dollars – for something that looks like you bought it at Walmart in the clearance baskets? I mean, the world can only stand so much purple, or chrome plated plastic right?
At the end of the day, we all make compromises. NASA could have saved a few pounds perhaps on the Shuttle, but hey, pride is important and we deserve to doll it up a bit, don’t we?
B L O G . P I L L A R D A T A . C O M
14a.pdf
12/15/08
8:10:09 PM
BLOG POST NO. 14 – DECEMBER 31, 2007
Happy New Year! I know you don’t come to this site for Seasons Greetings, but what the hell: Happy New Year to all! We at Pillar will continue to strive for Customer satisfaction, innovations in storage that make your jobs
easier to do, and being a partner you are proud to work with. From all of us – thank you for working with Pillar.
B L O G . P I L L A R D A T A . C O M
14b.pdf
1
1/29/09
7:01 PM
15a.pdf
1
1/21/09
11:59 AM
B L O G P O S T N O . 1 5 – J A N U A RY 1 6 , 2 0 0 8
Spindles Count or Count Spindles I was reading a competitor’s brochure a few days ago, and it struck me how dishonest people can be talking about performance in a storage array. I am going to give a couple of rules of thumb for performance here, and out of context they could be said to be “wrong”, but I think they are true enough to be illuminating: 1. IOs per second or IOPS to a first order depend on how many spindles you have in a disk array. This can be exceeded with really effective cache on the right workloads, but IO performance is fundamentally based on Spindle count. 2. SATA drives (7200 RPM) are capable of 100 – 125 IOPS if you have an algorithm like Pillar’s QoS intelligently laying out the data on the disk and accessing it optimally. 3. Fibre Channel or SAS drives at 15K RPM can give you 2.5X that of a 7200 RPM SATA drive. 4. A typical storage shelf (2U) with 12 SATA drives therefore give you about 1000 –1500 IOPS typically.
5. A typical storage shelf (2U) with 12 15K RPM FC drives in it will yield about 2500- 3000 IOPS. 6. Other than effective cache utilization, storage controllers at some point limit performance of spindles. In other words, you aren’t going to get more than about 3000 IOPS from 12 15K RPM FC or SAS drives. You can put the world’s most capable controller on it, but you aren’t going to coax more IOPS out of them with a bigger controller. Of course, if you have enough cache, the disk can eventually idle but that is not reality in any system. 7. Putting a 1000 spindles of any type on a pair of RAID controllers, or in a single Storage Controller will not net you 100,000 IOPS: nobody makes a single controller (even an active-active pair) that handles that many spindles today. So saying your system scales because you can support 1000 spindles on your Storage Controller is deceptive at best: you may support the capacity but you will not get the performance those spindles can deliver – this is a waste.
15b.pdf
1
1/21/09
12:01 PM
8. If you expand your definition of storage controller to mean a confederation of controllers, in Pillars case 2, 4, 6, or 8 controllers on the same storage pool, expanding to 1000 spindles may give you both the capacity and the performance the drives can deliver. While some companies besides Pillar offer multiple controllers and RAID engines, nobody offers 128 hardware RAID engines as Pillar does. Even with multiple controllers, most companies charge you for software that you load on those “expansion” controllers, and as I always tell our customers, buying two of something isn’t scaling. Pillar does not charge you for the software when you non-disruptively add controllers to a system. 9. Cache is a weapon in the war on performance. Cache can help avoid using slower disk, and it can help in organizing operations to disk in a way that makes the disk more efficient. Cache is probably more important than ever with larger numbers of servers sharing the data of the same storage array. There is nobody in the midrange market that makes more cache available than Pillar (96 GB per storage pool). There are storage companies who offer
a whole 4GB of cache on 1PB of disk; this is just downright lame – embarrassing to say the least. Actually, let’s be brutally honest, it is pathetic as it is about twice as much as most people have on their laptop. 10. Throttling IO is different than maximizing performance through intelligent architecture. Throttling is just a limitation, and can be useful in preventing an application from hogging too much storage resource with other applications that share the storage array. Pillar’s QoS applies business priorities to LUNS and Filesystems to give applications the attention deserved from a business perspective, in the event of contention for resource. Even better, Pillar’s QoS optimizes all system resources to fit the application; cache, disk striping, queuing, network resources, and layout of data on the disk. There is a lot more to say about performance, but the above list of 10 items pertains to almost any storage purchase, not just those from Pillar. There is too much BS out there – some of the claims made by some vendors are just downright ridiculous; what’s worse, they are misleading.
B L O G . P I L L A R D A T A . C O M
16a.pdf
12/15/08
8:53:03 PM
16b.pdf
12/15/08
8:55:14 PM
BLOG POST NO. 16 – JANUARY 24, 2008
Solid State Disk (SSD) So EMC announced SSD on DMX last week. I have had a few press interviews recently, and of course I get asked about Pillar’s position or my thoughts on the announcement in general. For some applications, some SSD attached to an array is a good deal albeit expensive. Certainly one can get a ton of IOPS at low latency from solid state memory versus hard disk drives with their moving parts. Eventually, when solid state memory prices get low enough and write-cycle limitations are overcome, it will be nice to replace disk drives that vibrate, make funny noises, and throw an occasional piston rod through the ceiling. Indeed, laptops are already being sold using SSD instead of hard disk drives. The important thing to keep in mind is that storage arrays were not designed to fully leverage Solid State Disk; today’s array architectures from any vendor will quickly become the bottleneck for too much SSD. Why? The reason is because the bandwidth and latency of SSD are so much different than that of hard disk as to quickly surpass that of the controller that it is attached to. Furthermore, heavy access of the SSD on the array will overshadow the disk that is on it – you can look at it as though SSD can quickly consume much of the array resources. Pillar’s QoS will somewhat alleviate this problem, but at the expense of not fully leveraging the SSD you have in the Pillar storage pool.
In some ways Solid State Disk reminds me of VTL. VTL is Virtual Tape – making a faster, lower latency technology substitute for tape, but retain the way systems perceive the storage – as tape. In other words, we are taking a great technology with lots of advantages and hiding it under the guise of one that is slower and clunkier – all to keep from having to redesign the way we use “tape”. Solid State Memory under the guise that it is disk is no different. There are a lot more efficient ways to use solid state memory other than making it appear as though it is disk! That being said, both SSD and VTL allow the IT community to bridge between existing huge infrastructures (for disk array and tape respectively) and the ultimate best realization of the technologies we can build. Oh – and Pillar’s position includes SSD on our Axiom Storage Array. With QoS, the customers will be able to provision LUNS and Filesystems at Hyperdrive speeds when they have SSD resources in the storage pool.
B L O G . P I L L A R D A T A . C O M
17aa.pdf
12/15/08
9:04:44 PM
17b.pdf
12/12/08
9:36:50 AM
BLOG POST NO. 17 – JANUARY 25, 2008
When It’s Time to Relax So I had the pleasure of working with Amy Haimerl at Fortune Small Business on an article about executives with “interesting” hobbies. I was honored, but I have to admit I am afraid that it made me look a bit goofy, if not downright eccentric. This is not what I had in mind when we were contacted by Amy – after all, this is Fortune – they wouldn’t write an article like this, would they? Well they did, and as my friend Shel Israel says – when you blog, you get to have your own say in what’s written, and if you want to, you can add your own commentary. So to explain a little bit, I don’t really relax by “blowing things up.” Well, I guess you could say Pillar is trying to blow up the status quo within the storage industry… (more on that in my next post).
Before I get questions about legalities, we are indeed fully legal. We have a facility in the desert, we are licensed by the proper authorities, and we are regularly inspected as well. Same for wine – and we just barely stay under the legal limit for “hobby” home winemaking, which is 200 gallons per household per year (about 3.5 standard barrels). It’s pretty good wine – Syrah, Petit Syrah, and Cabernet – and if you are interested in learning more about what we can do for your business, I’d be happy to share some with you when you visit Pillar. And I promise to leave the explosives in the desert! I suppose the article could have taken a worse tack – that I get wasted and blow things up! Seriously, thanks Amy. I enjoyed talking with you. It was a fun interview.
When it comes to fireworks, I build “star shells” mostly. These are the aerial fireworks you see on the 4th of July, or Football games, or New Years celebrations. It is a load of work, but it is great fun at the same time. Physics, Chemistry, Art, Mechanical Engineering, and even some Electrical Engineering are all involved as we build our own tools for production.
B L O G . P I L L A R D A T A . C O M
18b.pdf
12/12/08
4:37:13 PM
B L O G P O S T N O . 1 8 – F E B R U A RY 5 , 2 0 0 8
Application-Aware Storage – Pillar’s Axiom At Pillar, we have always prided ourselves on innovation. We invented Quality of Service (QoS) for Storage, and 5 years ago, we built our storage administration around application templates. Essentially, when someone wants to set up LUNs for Oracle, Exchange, or SQL for example, they can follow best practices configurations for any storage array. So, we felt at the time, if one follows best practices, why not just use a pull-down menu selection against standard applications that does all the work for you? Well, this is exactly the approach we took 5 years ago, and started shipping with the Pillar Axiom two and-a-half years ago. It was and still is a great idea – avoids re-inventing the wheel every time you layout storage for an application by allowing the admin to fill out the variables, like Capacity, I/O characteristics and such, and avoid having to configure disk drives in arrays like all the old school systems out there. From our perspective, application-awareness implies configuration of disk, but in the case of Pillar’s Axiom, it also implies things like cache configuration, network bandwidth, CPU priority, and layout of data on the disk platters. In other words, all the system resources are tailored to the application – set up to make the application see the best possible disk attributes out of the resources in the array. Let’s take an example. Let’s say you want to configure the LUNs for an Oracle 11g database and Oracle Apps. How would you do this? We know that the live database will require good random read and write performance on high performance disk, while archive partitions can use lower performance disk allowing the core database to stay optimally tuned. Finally, the applications themselves can reside as a lower priority than the database but higher than the archive partitions. These are all performance characteristics that the DBA can tell the storage administrator (or the storage administrator already knows). That storage administrator can choose to create profiles within the Axiom for these various workloads, and then simply provision capacity (as shown below); or even better, allow the Oracle DBA to assign capacity using the
newly created profiles via Pillar’s integration with Oracle Enterprise Manager. And, just in case there is a temporary need to improve performance (discovery motion or data mining) for an archive partition, the Axiom allows that administrator to change queuing and cache tuning (relative performance) on the fly, newly empowering that application for the duration of the required performance bump. What if you change your mind, or circumstances in your business change? No problem, Axiom migrates the data on the fly from one disk configuration to another, and unlike any other storage in the marketplace, it also changes CPU priority, queueing priority, cache configuration, etc. So with Axiom, today’s choice is not tomorrow’s problem. Thus, Application-Aware storage is essentially storage that takes best practices into account and does all the work to configure your array resources to best meet those needs: Pillar’s Axiom does exactly that – automatically and dynamically. Cool right? I suppose some people would rather write a bin file, or be forced into using RAID 4, but most of us would rather leave that piece of our past in the past.
B L O G . P I L L A R D A T A . C O M
18b.pdf
12/16/08
3:32:50 PM
19a.pdf
1
2/3/09
4:33 PM
B L O G P O S T N O . 1 9 – F E B R U A RY 1 2 , 2 0 0 8
STS-122; This time for REAL! Well… the second time was a charm! Nancy Holleran, Pillar’s President and COO, and I traveled out to Cape Canaveral to see STS-122 blast off (I love that term – I guess people are too sophisticated to say it anymore, but it’s fun). Nancy and I decided it wasn’t optional – we both were recovering from the flu, but how many chances does one get to see the dang Shuttle blast off, anyway? I wrote about my first visit to the Kennedy Space Center in an earlier post. That launch was postponed, but it was still an emotional visit. This one was for REAL. Take a look at the attached pictures; I am sure you’ve seen similar ones before, so I suppose the whole point is you have to be there to fully appreciate them.
B L O G . P I L L A R D A T A . C O M
At liftoff, I suppose I stopped breathing for a while – I am not sure. At first it is dead silent because it takes about 20 seconds for the sound to hit you, but when it hits, it really hits you. Watching on TV or even IMAX can’t replicate the experience. There are two technology related reasons why. The first reason you have to be there is that the experience has a huge sub-sonic component. I would imagine the sound pressure spectrum extends down to 3 or 4 Hertz. This means that you feel it; you don’t just hear it. Much like a serious fireworks display – if you feel the concussion, ground shake, and heat on your face from a Cremora bomb – you are only then really experiencing the full fireworks experience.
19b.pdf
1
2/3/09
4:33 PM
The second reason you have to be there is because today’s technology and simulated experiences are just too damn good. Digital special effects like those of Industrial Light and Magic and other Hollywood studios are so damn good that we are treated to this kind of spectacle and drama all the time.
My God, this is real! I know I am a softy – but my eyes filled with tears once again. I guess I can’t help it. I love engineering and the human spirit that causes people to rise to this kind of challenge. Thanks NASA. Thanks for the experience… and for 40 wonderful, heart-pounding, inspiring years.
The difference is, when you are there watching in person, the experience hits you. This is real. This is very, very real. People are really risking their lives on this thing. The Earth is shaking, my chest is pounding and I am four miles away from it!
P H O T O
B Y
M I K E
W O R K M A N
20a.pdf
12/16/08
3:44:26 PM
B L O G P O S T N O . 2 0 – F E B R U A RY 2 8 , 2 0 0 8
Application-Aware Thin Provisioning – Part of a Bigger Story There’s a lot of discussion in the marketplace today about Thin Provisioning, but before we get into it, let’s set some ground rules on definitions: Allocation: The logical capacity or size of a LUN. Provision: The physical reservation of space on disk – to guarantee that space exists and is spoken for. Now that we have key terms defined, there are essentially two ways to look at Thin Provisioning: Thin Provisioning in a storage subsystem allows a disparity between capacity allocation and provisioning. In other words, one can allocate 2TB for a LUN, but provision significantly less than that. The details of how much less, how the provisioned storage approaches or grows toward that allocated, depends on the design of the system. Here is the goofy part: To the unfamiliar, the idea that you should be able to increase the size of a LUN on the server using it and the storage system providing it doesn’t seem very sophisticated. In fact, it seems kinda dumb. Of course you can do that, can’t you? No, not so simply on all operating systems and storage systems. Also – the idea that just because someone says they want a 2TB LUN that you dedicate all that storage, even if they are only using 200GB of that allocation seems kinda old fashioned (it is). In other words, when you show up at the airport and expect to have a seat on the plane because you own tickets, well, yes, that used to be the way it was; same for a hotel. But now, there is this thing called overbooking... airlines and hotels make sure their seats and rooms are fully utilized by assuming that they can overbook since usually not everyone shows up to claim their seat or their bed. Similarly, not everyone who allocates 2TB for a LUN uses all the space.
So what seems obvious isn’t. Some operating systems make it very difficult to increase the size of a LUN, and in fact, it may require taking the system offline to do so. Of course this is really bad, and people usually overallocate to prevent ever having to do so. Well, if you over allocate a bit, it’s not too harmful. But people don’t often forecast storage for an application really well. In fact, they have been burned many times underallocating, so the disparity is often large. People often allocate 2X and 3X what they think they will use. If your storage system reserves that space, the unused space is wasted, plain and simple. Underutilization is waste. The strange thing for most of us to get over is: the waste is avoidable without Thin Provisioning. But most people don’t want to worry about it, so they waste anyway. A storage system that doesn’t provision the storage until there is data to fill it up solves this problem, to the degree it is a problem. Some storage administrators work hard not to allow underutilization, so for them this may be easier, but it doesn’t change the efficiency of their storage system. Instead, Thin Provisioning merely shifts the work from their planning and careful management to software inside the storage system that is flexible. Not a bad thing, either. While Thin Provisioning can be a great function, it does not solve the general utilization problem, because utilization gets compromised by admins using spindles of a given capacity to build LUNs with certain IOP capability. While
20b.pdf
12/16/08
3:44:11 PM
they may need 500GB of capacity, they need 2000 IOPS. Getting 2000 IOPS out of a disk is not going to happen (see Blog on Spindles Count or Count Spindles). With FC disk, you will need about 10 –12 disks, depending on RAID type, read-write bias, and a few other factors. Building a LUN out of 10–12 disks to support this performance the will waste a TON of space. For 146GB drives, you would waste about 7–9 disks worth of space! Unless you have Pillar’s QoS. With Pillar’s Axiom, the
data gets laid out over the number of spindles you need to get the performance, and the disks are shared amongst other applications in such a way as to parcel out the performance that those applications need. Utilization nearing 100% can be achieved, although that allows no headroom for growth and such, so 80% is more advisable. But can’t other systems share disks to build LUNS? Sure, as long as you don’t mind LUNs interfering with each others’ performance. This is why disks are usually dedicated to applications! Without QoS, interference is bound to happen, and dedicated disks almost always imply underutilization – which has nothing to do with Thin Provisioning. We have competitors that think that QoS is data placement on disk. Well, it’s part of it. Without a prioritized queue manager in front of the disk, copying Pillar’s data placement on the platter and striping over spindles can (and usually does) result in WORSE performance instead of better! Pillar not only has a prioritized queue manager, but we optimize all the other system resources as well, like cache (which some competitors have shockingly little of because they built their architectures in the days of core memory) The meat behind Pillar’s claim as the first and only true Application-Aware storage is the combination of technologies such as Thin Provisioning, prioritized queuing, and intelligent data layout. One without the others does not deliver on the promise we’re all driving for.
B L O G . P I L L A R D A T A . C O M
21a.pdf
12/15/08
11:32:42 AM
BLOG POST NO. 21 – MARCH 9, 2008
Changing the Industry It might sound hyperbolic, but we built Pillar to change a few things in the storage industry. Our Customers know it – they experience it. But the “Industry” we are changing in large part doesn’t want to be changed. This includes just about everyone in the industry: Customers, Analysts, Reporters, but overwhelmingly, the resistance comes from entrenched competitors who feed all kinds of BS to the rest of the ecosystem to prevent change. There is a lot of money in storage, and the dilemma is in bringing about change when people (not necessarily Customers) are doing just fine, thank you. If you look over time, there have been a lot of changes brought about in the Service IT industry. Level A Mainframes to Minicomputers, Service minis to Level B Workstations, Workstations to PCs. Disk drives have Service gone from high Level C gross margins to the high teens or low twenties. The PC business started off with high gross margins and has hovered around 18–24% now for a long time. Business executives didn’t want that to happen of course, but competition brought it about. Score one for the free market system.
Mission Critical
Business Critical Task Critical
So, given all the change in the IT industry, what about storage? Well yes, there has been an evolution of change: Proprietary to Open, DASD to Networked, SAN to SAN+NAS and iSCSI. Of course we are describing a 40 year journey here, not 10 years as in the PC, or Workstation business. For those of you who aren’t convinced, what margins do NetApp and EMC brag about? Well, try 60%! This is
because their businesses are so small? Storage systems and the software that runs on them, for just those two companies are $14B per year. I should be so small (no short jokes please). So, if you were running a business making 65% gross margins and your top line revenue was growing, why would you want to change? Well, you might change if you thought you could get the jump on your competitors and steal more market share. However, the firmware and software that storage subsystems are comprised of includes in the neighborhood of 5–15M lines of code. Change that? Sure, can I work the weekend? Talk about a self-imposed barrier to change! OK, so what do we at Pillar want to change? Here’s the short list: Purpose-built storage. You don’t need to support, manage and maintain 10 different storage platforms in a data center to get your job done. This is inefficient at best. Platinum coated hardware for reliability and performance. Active-active, redundant systems with modern architectures can coax more performance out of off-theshelf hardware and desktop disk drives than the big guys want to... it eats their gross margins. This is a fact; I have worked in these companies and watch this behavior all the time. Scalability of typical storage systems sucks. When you have two RAID controllers for hundreds of spindles, you are doomed. Instead of adding more spindles to an array, storage companies want you to trade-up (forklift usually) or buy another system and split the work. What happens then? You have to buy all the software again. This is a ridiculous treadmill that
21b.pdf
12/15/08
10:03:22 PM
companies get on. There are shops with hundreds of nearly identical EMC or NetApp boxes where a modern architecture would allow 20, and would save many millions of dollars, headaches, staff attention, and space. Proof point: Pillar’s single software license model. This scares the crap out of incumbent suppliers, and if you are a Customer, you should get excited by this. Pillar scales to 8 control units in a midrange product, and we use distributed hardware RAID to let performance scale linearly from 24 to 832 spindles on a single frame. We also use QoS to deliver consistent performance to applications at the same time, allowing them to share their spindles. Everyone fought us there, but they were wrong, and hence are now copying us. It makes too much damn sense. A shared, virtualized storage pool with QoS and thin provisioning encourages high utilization rates 2–3 times the industry average: This is serious coin folks. Proof point: Pillar’s Bricks (storage enclosures) have hardware RAID, so as you add spindles, you add RAID engines. Rebuild times under load are 4X better than competitors. Performance during rebuild is 2X better than competitors. Instead of fixing the problem, our competitors ask you to add another disk drive that adds no capacity to your little group of drives and loses performance so you can withstand double failures in a 6 or 12 drive array. For Pete’s sake! Then they feature it!! Maintenance. This is a doozie... and a prime mover of inefficiency. First, the standard in the industry is “we’ll do that for you,” or “Don’t touch this system as it will void your warranty.” Big installations end up with EMC on staff! Well of course they do! The systems are as easy to maintain as the Space Shuttle. They are flexible, but within the confines of a 15 year old architecture and writing bin files to configure them. (I love my PDA, but just wish I could program it in FORTRAN).
Let’s face it – maintenance and support are big business for the incumbents. Why make it easy? Proof point: Pillar has built in Guided Maintenance that allows users to replace drives, or almost every other component in the system if they want to. Of course if you don’t want to, we will do it for you. I can repair my sprinklers too, but I don’t want to... the choice is mine. Sometimes, when I see the gardener’s bill for replacing the sprinkler my kid ran over, I rethink that one. The good news is, it doesn’t void my warranty if I do, because I would fire my gardener if he told me that... hey? Easing Storage Administration. How many Microsoft Exchange Servers, or Oracle installations do we need to configure per hour before we might just automate this task? Again, for Pete’s sake, take best practices and put them into the machine. Proof point: Pillar put policy-based, Application-Aware storage into the market almost 3 years ago. When customers see how easy it is – you wait and see – competitors will start copying it just like they tried to copy our QoS. However, they will charge you more for it instead of baking it into the array, but they will offer it. EMC tried to copy QoS, and they often wave the charge if they are up against Pillar. Lucky for us, they did a lousy job of copying it, too; free stuff that is difficult to use, and doesn’t work well, is worth exactly what you pay for it. There’s more! For those of you who found this interesting, don’t worry, I’ll be back with more. For those of you who didn’t, please comment, and thank my daughter, who ran over the sprinkler again. Cement and rebar protection just might be in order...
B L O G . P I L L A R D A T A . C O M
22a.pdf
1
2/4/09
4:10 PM
22b.pdf
12/15/08
9:33:13 PM
B L O G P O S T N O . 2 2 – M A R C H 12 , 2 0 0 8
Fox Business News
View my recent interview with Cody Willard of the FOX Business Happy Hour show that broadcasts from the Bull and Bear Bar in NY’s Waldorf Astoria hotel. http://www.youtube.com/user/PillarDataSystems
B L O G . P I L L A R D A T A . C O M
23b.pdf
12/15/08
9:45:11 PM
BLOG POST NO. 23 – MARCH 21, 2008
Changing the Industry – Distributed RAID A good friend of mine described a childhood experience, which involved the waiting room in a hospital. Although not the point of the story, it reminded me of our product. OK, I know… I am a geek, but hey, at least I am self-aware.
Unfortunately, one or two RAID controllers per storage system is a bit too similar to one or two TVs per hospital… nobody quite gets what they want and everyone waits for their turn much too often. Our competitors still build their storage systems in this fashion.
In hospitals 45 years ago, there was one TV in an “Entertainment Room.” TV’s used to be quite a luxury, and relatively expensive, so you needed to share it amongst the patients. And, of course, there was one in the waiting room. In today’s health care facilities, patients not only have their own TV, but they often have a computer of their own. Waiting rooms have Wi-Fi and of course, PDA connectivity. Sharing is not necessary anymore because the economics of technology have changed considerably.
Pillar includes a RAID controller for every 6 drives (12 drives for FC disk) and scales to 128 RAID controllers in a single system, providing redundancy, linear scaling of performance with capacity, and unsurpassed performance under fault conditions. If a disk drive fails, we rebuild at least four times faster than our competitors when the system is busy.
So 20 years ago, RAID was invented, and it was relatively expensive. As a result, RAID controllers were designed to handle a hundred disk drives or more. Amortizing the cost of a RAID controller over lots of spindles was great, but the performance was compromised as the number of spindles increased (they taxed the RAID controllers as one would expect).
If you think this isn’t a big change to the industry, try adding another 10 storage shelves to one of our large competitor‘s systems. And if you expect a commensurate increase in performance, the salesperson will most likely tell you that you should buy another storage system. If you do, you will pay for all your software all over again. NAS, SAN, iSCSI protocols, Thin Provisioning, Management Software, the works. Ka-ching! Ka-ching! Ka-ching! Isn’t it time to change this model? And, if you want to talk on that PDA or cell phone, please go outside – you are disturbing the guests in the waiting room….
B L O G . P I L L A R D A T A . C O M
24b.pdf
12/15/08
11:49:29 AM
BLOG POST NO. 24 – MARCH 26, 2008
Changing the Industry – Software Licenses So, let’s say you own a storage system and you want to triple its capacity. Well, unfortunately for you, that means another system, because it is very likely that your system doesn’t support another 16 storage shelves, or if you added them, the performance of the system would go into the crapper. So you buy another system instead of expanding the old one. At first this may not seem so bad, as you need the storage shelves anyway, so adding them to an old system versus getting a new shiny one doesn’t seem so bad….until you have to buy WAFL, ONTAP, NFS and CIFS protocols, all over again. Wait a minute – can’t I just put those shelves on my old system – I already bought that stuff!
for that upgrade, but you will pay more for the software again, because they charge more for the same software on the bigger platform than when it is loaded onto the smaller one. Get ya either way. Unless you own an Axiom. With Axiom you can add up to 64 Bricks of SATA and/or FC disk to scale your back-end performance and capacity, as well as up to 4 Slammers to scale your front-end performance, and use the software you bought originally – same license, under the premise that it “scales”… because viola! It does. Not a nice change for most storage vendors, but certainly a nice change for storage customers.
Well, no you can’t Mr. Network Appliance Guy. You could upgrade platforms, and then get that performance as the new one has a bigger back-end so you can attach the disk you want. Well, not only do you need a forklift
B L O G . P I L L A R D A T A . C O M
25a-Black.pdf
1
1/21/09
12:36 PM
BLOG POST NO. 25 – APRIL 2, 2008
Changing the Industry: Quality of Service (QoS) – The Simple View One of Pillar’s innovations is the extension of Quality of Service concepts often seen in networking, to storage. When we first introduced our QoS, we were the only company that allowed storage administrators to set priority levels of LUNs and filesystems relative to others. In terms of the inherent nature of system-wide QoS versus a chargeable, bolt-on feature, we are still the only one.
300GB 15K Fibre channel drive is no faster than its 9GB grandfather was 8 years ago. On a standard system, this is utilization hell. Yet you still use the same metrics when buying storage ($/GB) without taking into consideration that your utilization rates are decreasing. On Pillar Axiom, this space is there to be used, and actualized $/GB is half that of our competition!
IT folks used to avoid the sharing of a storage pool amongst applications because it was very difficult to predict the interference that might occur between various applications when they share storage resources. Without QoS, there is a “first-come, first-served” approach to reading or writing. Depending on the applications sharing the storage, this can produce very bad results, hence sharing was simply avoided, and given a bad name: contention. This, in turn, has led to an industry average of less than 40% utilization. Try to imagine another industry where the buyer was perfectly OK with a 40% yield. Would you go to the grocery store and buy the strawberries where the label reads... ”60% of these strawberries are not edible and may cause contention with your digestive tract!” But I digress. Without sharing & contention, the whole concept of a “storage pool” becomes moot.
Let’s list a few fun facts about Pillar’s QoS on Axiom:
Pillar’s QoS allows certain applications to take preference over others. There are five levels of priority if 15K RPM Fibre Channel disk is present, 4 if not. Essentially, most of the pitfalls of sharing are eliminated, and thus efficiency or storage utilization can drastically improve. At customer sites we see up to 80% utilization without performance implications! A standard trick people use to try to affect the same result without QoS is to define LUNs and file systems on separate spindles. This helps, but not much, because it says nothing about all the other resources which drastically impact the performance of the system. These include CPU utilization, cache (how much, if any), network, and Pillar’s disk platter layout. Furthermore, isolating spindles often causes a huge waste: stranded storage. The number of spindles determine IO rate. And, unless you are very lucky, this nearly always implies far more capacity than you really need, given today’s
1. It is not all about the layout of data on disk. We do put higher priority LUNs and filesystems on the outside of the disk because data rate and data density (sectors per track) are higher there, yielding better performance. 2. QoS is not just about where the data is on the disk. Actually placing data on the disk platters without a prioritized queue manager will often make things worse, not better. Here’s a tip: If someone comments on Axiom QoS, ask them what a “Prioritized Queue Manager” is. If they don’t know, but still feel comfortable talking about Pillar’s QoS, they are a boob and/or you are listening to a load of crap. 3. Part of being Application-Aware implies knowing that some applications need more cache than others, while still others run better without it; this is part of our QoS. 4. Allowing some test applications to hog up resources from production applications is a bad idea. Instead of buying separate systems, use QoS to protect your resources according to business priority. 5. If you think our QoS doesn’t work when many servers share the storage pool, then we need to talk. The whole point of QoS is to differentiate between applications with varying service levels. In fact, Pillar guarantees minimum service levels so applications and hosts don’t time out just because higher priority stuff is in contention with lower priority stuff – if you want to starve applications, your bolted-on version of QoS has to come from the Northeastern US.
B L O G . P I L L A R D A T A . C O M
26b.pdf
1
1/21/09
12:54 PM
BLOG POST NO. 26 – APRIL 9, 2008
Quality of Service (QoS) – The Even Simpler View! We can all remember 36GB FC 15K RPM drives, and 160GB SATA desktop drives. I say remember because you can’t really even buy these anymore, unless you shop on eBay. Today, we talk of 300GB and 1TB capacities instead. The HDD industry is spectacular in my opinion. Much of what the IT industry has been able to accomplish owes itself to our ability to store massive amounts of stuff very economically. I am biased here as I spent 20 years of my life working in the HDD industry, and I loved it. HDD performance has not kept pace with aerial density increases. We all know this one. Performance in an HDD means motors and things that move, and we all know that those things will not keep pace with solid state technology advances. So what can you do about it? Buy the drives you can buy, convince yourself that you can use the capacity, and then forgo the capacity because you need IOs. This means underutilization – don’t even think you are going to fill the disk drives to their capacity.
Unless you buy Pillar Axiom and use networked storage differently than you have for the last 10 years. Pillar allows you to extract both the capacity and the IO’s from a set of spindles as long as you share those spindles amongst applications that have varying requirements for speed and capacity. If you want to use Axiom to do what our competitors do, you can. In fact it is extremely competitive. But with Axiom you can, and in my opinion should, use storage differently than you are used to: Define a storage pool, Tier your applications and point them at the pool for their storage. The result? Higher utilization, full use of the disk you paid for, less stove-piping of platforms, lower cost. For all of us, this represents some important relief from the ever increasing HDD capacity march that heightens the disparity between that capacity and the performance with which it can be delivered.
B L O G . P I L L A R D A T A . C O M
27a.pdf
12/12/08
11:14:03 AM
27b-Green.pdf
1
1/21/09
1:04 PM
BLOG POST NO. 27 – APRIL 18, 2008
Axiom 300 for up to 20TB Contrary to the belief of some of our competitors, Axiom 300 is a full-fledged product. The software, look and feel, ease of use, and guided maintenance our customers love is the same as the Axiom 500. Besides being Application-Aware, the Axiom 300 has many features and functions that most competitors reserve for the Enterprise, built into a great product for smaller applications. The difference in the AX300 and the AX500 lies in the Slammer hardware; unique motherboards reduce power, cache, back-end connectivity to appropriate levels for smaller applications requiring fewer spindles (up to 54). There is not a lot of use in scaling to 832 spindles for smaller applications, so powering up all kinds of infrastructure to allow such scaling would be wasteful, and more costly, than necessary.
modified set of CPU boards for the Slammer. The great news for our customers who care to trade up is that they can, and with data in place. So in the event that their business scales much larger than they dreamed, they are protected! Typically, as storage vendors shrink their product offerings for smaller budgets, many features and functions are dropped. In fact, at the bottom end of the scale, “storage products” end up being a tray of disk drives without much software at all. While this is affordable (cheap), it also means the customer will spend lots of time managing their storage manually. With the Axiom 300 software, customers have lots more help than they get with a RAID box. Try to get a RAID box to “Call Home”...or feature 6GB of cache memory.
While the Axiom 500 has unprecedented scalability in the marketplace, making it smaller did require a
B L O G . P I L L A R D A T A . C O M
28b.pdf
12/15/08
12:22:41 PM
BLOG POST NO. 28 – APRIL 23, 2008
Who cares about Performance under Fault Conditions? Well, you do – if you own or operate a storage array. Storage systems have lots of components, including mechanical ones like disk drives. The whole point of RAID is to deal with failures of parts. Moving parts fail most often. Most systems today (not all) have redundancy built-in throughout the entire system. In an Axiom, all those redundant parts do work for the array all the time because it is an active-active architecture. Some arrays have active-passive architectures that waste those components that just sit around waiting for a failure. So, what happens when a component fails? Well, in HA systems like Axiom, Clariion, and NetApp products, customers still have access to their data. Paramount in all but the most trivial storage applications is being able to get at your data regardless of any single failure. What goes mostly unspoken is the effect of a failure on performance. Systems from NetApp and EMC can take a long time under load to rebuild their failed disks onto a spare drive. In fact, it can take more than a day! This matters because while the rebuild is in progress, the array is running without protection against another failure. The odds of a second failure are small, but get proportionately larger as the rebuild times grow longer. To solve this problem, some storage manufactures put yet another redundant disk drive into their arrays. So you pay for more unusable capacity and power to protect
yourself against the vendor’s long rebuild times. This is a great technique against loss of data, but wasteful and expensive. In contrast, Pillar’s Axiom drastically reduces the drive rebuild time by using a distributed hardware RAID architecture. Distributed RAID gives the following clear, demonstrable benefits that have been measured by outside laboratories against our competitors: We rebuild faster than any array on the market. We perform better under faulted conditions by a HUGE margin, factors of 2–3. We perform under all faulted conditions with minor loss of performance, on the order of 0-8%, versus 50% loss of performance from some vendors. While most everyone guarantees continuous access to data under fault, they really don’t want to talk about the systems’ performance under those fault conditions. Why does this matter? Well, backup window integrity, customer perceived performance, boot times, the list goes on and on. They all depend on predictable, reasonable performance of the system, not 3 to 1 variations in performance under fault. If you want great performance under fault conditions of any type, buy the Axiom. Your mileage may vary, but it will vary a hell of a lot less with the Axiom than with our competitors.
B L O G . P I L L A R D A T A . C O M
29b.pdf
12/15/08
12:15:33 PM
B L O G P O S T N O . 2 9 – M AY 1 , 2 0 0 8
EMC has a Multiprotocol Array? Good one. Our goal is to make your life simpler. We put the concept of true NAS/SAN integration on the map, in my opinion. I think NetApps* beat us with the function, but their implementation was poor. Of course, we had a great SAN product, but 2.5 years ago we had one customer, so nobody really took us seriously either. So I will give them credit for being the first, but Pillar, I think, deserves credit for having a real SAN. Being fair, NetApp has improved their SAN implementation quite a bit in the last couple years. One criticism I have heard from a BIG independent storage vendor is that we build a utility platform that competes against NS, CX and low-end DMX products. Duh? That is the whole point! The critics say we “have to” support SAN/NAS together. I say we do support SAN/NAS together… and the competition doesn’t. But I am speaking of facts, albeit with a bias, so I will quote Brian Garrett of ESG, taken from the March cover story in Storage Magazine by Jacob Gsoedl. “With Symmetrix, Clariion, Celerra and Centera, EMC has four different solutions, each with its own code base and architecture, and it would make sense for EMC to head to a unified solution,” says Brian Garrett, technical director, ESG Lab at Enterprise Strategy Group.
Why buy several platforms to serve all your requirements when one will do? Really! Stove-piping just jacks up the TCO to the benefit of the storage vendor, not the customer. The article in Storage Magazine goes on to say all kinds of nice stuff about Pillar – and frankly, we deserve it, but if I say any more, you guys will think the marketing team at Pillar wrote this instead of me. For those of you who are suspicious, check out the footnote below....you can be sure they didn’t write that one. OK – I will put one good one in for the marketing team – here she goes: The same story describes the Pillar Axiom in the following way, “Pillar has supported FC, iSCSI, NFS and CIFS in its Axiom arrays from day one. A scalable architecture, offloading of file-system protocol and RAID processing to so-called Slammers, and cluster support make the Axiom array family a great fit for SMBs and enterprises. Tight integration with Oracle tools like Oracle Enterprise Manager makes Axiom arrays a perfect fit in Oracle environments.”
* I know it is NetApp now, but for crying out loud could they be more sensitive about their name? Who cares about their stinking name? NTAP, Network Appliance, NetApp, they’re all just fine. What nobody can understand is why they changed their logo to one that looks familiar if you’ve been to Stonehenge. Perhaps they have a Druid on board over there? If they ever commission a sculpture for a stage prop with this logo – they better make sure they have their feet and inches straight or they could end up imitating the scene from Spinal Tap. B L O G . P I L L A R D A T A . C O M
30a.pdf
12/16/08
3:27:31 PM
30b.pdf
12/15/08
12:28:34 PM
B L O G P O S T N O . 3 0 – M AY 9 , 2 0 0 8
QoS Contention, Or Sharing of High Performance LUNs and Filesystems OK, so I will assume you accept that the old model of no prioritization of applications using storage in a storage pool is dumb. I will assume you’ve been reading my blog a bit, and you now realize that there is a reason why networking instituted QoS, why airlines have first, business, and economy classes of service. Right? If not, start here. If you don’t want to follow another stinking link, then let me summarize: Not all passengers, packets, LUNs or Filesystems are equal, and this inequality allows the Axiom to extract more value for the customer out of the disk drives using QoS for Storage. So, for the skeptical folks out there, they try and shoot holes in the concept – which is a reasonable thing to do. Let’s pick a favorite: how is the Axiom any better than another system without QoS if for example, you are sharing two high priority (call it Premium) LUNs? To first order, it is not! If all you have is people that want first class tickets, there is no gain in differentiation, because you have defined a problem that allows none. That is simple! So if every filesystem or LUN is equal, an old style system is just about as good as you can get anyway. Make sense? Well, with Axiom, the data placement on the platter and purposeful automatic striping across different spindles will give you a better and more automatic result than most of our competitors. On the other hand, the differentiation under those conditions is obtainable on any system as long as the data is laid out properly on the disk (manually, if you aren’t using an Axiom), and you have enough cache.
OK, so let’s say we don’t define a silly example of two equal, high priority LUNs forced to share spindles. Let’s say we have 10 LUNs, and two are high priority relative to the rest. Let’s also assume that the Axiom is forced to share the spindles that they all reside on (it won’t share them unless it has to). Is this different? How can this be better than a system without QoS since there is sharing going on? The answer is simple, and clear. To lay it out would be complex, but an example should make it abundantly obvious: Let’s say you are about to walk up to the ticket counter at an airline. Would you rather be in the first class line, or the main cabin? Well, assuming normal distribution of passengers, the answer is obviously first class. We all intuitively know that although you may be sharing the agent(s) with 2 or three other passengers, you aren’t sharing with 100 other passengers. Same applies for Axiom’s QoS. The difference is real, and worth money. Look, this is not some mumbo jumbo cloud of smoke. Spindles and cache equal IOPS. If you can differentiate the application requirements by IO, capacity, and workload, a storage pool can meet a much wider range of simultaneous application requirements using QoS than without it for a given quantity of spindles. This means that in a world of ever-increasing disk drive sizes, Axiom is the architecture that extracts the maximum value out of the storage technology you can afford. Period. And this, my friends, is worth money – to you.
B L O G . P I L L A R D A T A . C O M
31a.pdf
1
1/21/09
1:15 PM
31b.pdf
1
1/21/09
1:12 PM
B L O G P O S T N O . 3 1 – M AY 2 3 , 2 0 0 8
Green Noise
Well, I have just about had it with “Green”. Why? Too much hype. Everyone on the planet is running around explaining why their products are “green”, including Pillar. So if you are buying storage, who should you believe? What’s real, and what is just a load of crap? Here are a few observations regarding all the Green Noise out there regarding storage systems:
1.
If you want to save power, use the capacity you own. Don’t allow 60% of it to sit there spoken for, but not used (Thin Provisioning) along with technology that allows performance expectations to be met when the disk is full…
2.
If your storage system performance falls apart as you fill the disk, consider Pillar Axiom. It is designed to give great performance when it is highly utilized, even as high as 85%.
3.
If you can use SATA 7200 RPM disk for an application instead of FC, use it, it is roughly half the power of FC 15K RPM disk.
4. 5.
If you can use de-duplication to keep from wasting a lot of space, use it.
the better in terms of Watts/GB. Use the biggest drives you can, unless you need spindles for IOPS and cannot afford the higher capacity models.
6.
If you need some small databases with relatively high IO requirements, consider sharing the spindles from a disk-to-disk backup solution, a VTL solution, or any other giving high priority to the database. Pillar’s QoS will allow you to parcel out IOs from all those spindles you own to applications that need the IO but not the capacity, and thinly provision the capacity to the applications that are data hogs. Array sharing is green, and without the drawbacks one normally encounters with non Application-Aware systems.
7. 8.
Turn the lights off when you aren’t in the room.
Let Boise-Cascade plant trees, your storage company is probably not the best way to reforest the planet. People seem to want to say anything to claim they are green. I commit that we will use our engineers, and efforts to work on this problem. Buying carbon credits is a nice gesture, but for god’s sake it is not the best way to tackle this problem.
If you can use 1TB disk for applications versus 500GB or 750GB, use it. The higher the capacity,
B L O G . P I L L A R D A T A . C O M
32a.pdf
12/5/08
12:21:36 PM
32b.pdf
12/15/08
2:53:48 PM
B L O G P O S T N O . 3 2 – M AY 2 8 , 2 0 0 8
Smokin’ the Strong Stuff Chris Mellor quoted me regarding an EMC spokesperson saying that SSD would be at price parity with HDD by 2010 – about 18 months from now. The implication was that SSD will replace HDD in 18 months time.
can use it. In this situation, the one pounder is easier to keep in the fridge, and you can use it up before you get sick of trying to add cheddar cheese to everything you make. Never mind the large quantity value proposition.
While I think this is incorrect, I did point out that this happened once before and there are reasons other than cost per GB that might drive this crossover.
Such will be the case for laptops – 128GB of SSD is a lot, and its speed and physical robustness is far more valuable than an extra 128GB for all but the most geeky users.
My IBM team 10 years ago or so developed a 1-inch drive we called the “MicroDrive.” Well, before you could buy 256MB of Flash, we sold 1GB MicroDrives. Although the cost per MB was much cheaper with the MicroDrive, the marginal value of the capacity was not large enough to motivate its purchase by a large enough proportion of the users. Only in extreme situations, where the largest capacity was of value, would someone choose the “cheaper” solution. Essentially, the value of a robust solid state solution was higher than the one that gave a lower cost per MB. It’s kinda like making a trip to a Warehouse Store that sells groceries; it may be cheaper by the pound to buy a 10 pound block of cheddar cheese, but unless you’re feeding an army, the stuff will turn green before you
How about the Enterprise? Well sure, there are a lot of customers who will find the speed and size of SSD to be ideal for certain circumstances, but with the capacity demands today it is unlikely that we can afford to substitute all the Petabytes of HDD with SSD for at least, another 4–5 years. So I am not sure Tucci was blowin’ smoke, but in the least his assertion seems a bit aggressive. Perhaps if I started a “Pinheads and Patriots” section of the Blog? Nah, one of those is enough.
B L O G . P I L L A R D A T A . C O M
33a.pdf
12/15/08
4:20:29 PM
33b.pdf
12/15/08
3:46:26 PM
B L O G P O S T N O . 3 3 – J U N E 11 , 2 0 0 8
Tiering on Disk
So what in the hell does this mean anyway? To many, it means using expensive disk for high performance applications, midrange disk storage for mid-performance applications, and low cost SATA desktop drives for archive or disk backup applications. (You may recall this as ILM, HSM, or SRM, depending on which vendor’s Flavor-Aid you were drinking at the time. (BTW, did you know that despite conventional wisdom, it was not in fact Kool-Aid that was ladled out at Jonestown?) If you have a long memory and been around awhile like me, you can actually recall the days when this meant mainframe and minicomputers (and 8 track tape!) Tiering was by platform, and they were all very expensive relative to today. Most of our competitors tier by platform, although it is pretty much all open, networked modular arrays. Few people are building closed-interface, monolithic storage anymore, unless they have a cash cow to milk. (DMoooooX) So to some, the modern idea of tiering is allowing different types of disk on the same platform and allowing the storage for less critical applications to be SATA on the same platform that FC disk resides. You can see a lot of confusion in chat rooms and blog posts around this. Tiering on an array – well just about everyone has this don’t they? Sure, if you define it as having SATA and FC disk on the same array. We have finally reached the point where people at companies like NetApps (I know, but I like saying it that way since they are so officially uptight about their damn name) have stopped saying SATA will never make it into the enterprise. WhoopDee Doo!
So to me, tiering on disk goes much further. For those of you who think I am going to say down to the location on the platter, you’re wrong. Tiering on disk, to me, means that you don’t have to think about what platter, and define LUNs or Filesystems to reside certain types of disk. Tiering on the array means you have a storage pool, the array picks the type of disk you need out of the pool based on your application requirements. It also means that the system will move or migrate LUNs and File systems based on changing requirements. For those of you who think this is standard, that QoS, ApplicationAware, and auto-migration of data from FC to SATA are standard, you need to look again. They are far from standard; in fact, they are basically not there unless you buy a Pillar Axiom. Tiering on disk means you don’t have to buy 2, 3, or 4 different platforms to meet disparate needs in your IT shop; you buy one platform that meets those needs out of a disk storage pool according to the application requirements you specify when you set it up. The only thing you need to do is pick some disk resources that encompass the needs of your applications to put into the pool, like SATA 1TB disk, FC 300GB 15K RPM disk to span a wide range of high capacity and high performance for QoS to work with for all your applications. Why? Efficiency. Utilization is driven up with proper application of a storage pool in your data center. You can use both the capacity and the IOPs of the spindles you own using Axiom instead of one or the other.
B L O G . P I L L A R D A T A . C O M
34a.pdf
12/15/08
3:49:01 PM
BLOG POST NO. 34 – JUNE 19, 2008
Access Density
OK, we’ve talked about SAS drives and the fact that what people really want out of them is “fast”. Let’s expound a bit. What anyone wants out of an HDD is access density; IOPS/GB, and a High GB/$ (more normally specified as low $/GB, but I inverted this to make a point on the following graph). Unfortunately, we can see from this that as GB/$ has gone up sharply, access density has dropped precipitously. What is access density? Access density translates to the number of actuators chasing data. Thus, with the relentless pursuit of more GB per disk drive, access density has gone in the tank. We all know that the low capacity models of server drives last a lot longer than a mere $/GB consideration can explain, and the reason is that the more capacity you have under an actuator, the lower the performance of a subsystem comprised of these sorts of disks. So 24 spindle “shelves”, or in Pillar’s parlance, Bricks, have higher access density than the same capacity, low RPM larger platter incarnations. Hence, better performance. There are a few interesting and very enlightening points you can make about this: 1. Access density is always at odds with cost for HDD-based subsystems of a given RPM. 2. Access density always gets better with smaller platter sizes for an equal number of platters. 3. Smaller form factor drives of the high RPM variety don’t yield as big an improvement as you might think, because 95mm (3.5” form factor) 15K RPM drives already use smaller platters – closer to the 2.5” form factor that the SFF HDD uses anyway.
4. What Small Form Factor Drives give you is the ability to package them in a serviceable enclosure that puts more actuators per TB in the familiar storage tray! Serviceability is key for small, medium, and large Enterprise. 5. There has always been an option of stacking drives in some high density fashion to optimize actuators per unit volume, but serviceability is just about always compromised – hence it is not a common practice. OK good, now the question is… what else can we do to get around access density? Well, with legacy architectures, the answer is cache (don’t use the disk if we can avoid it). We can pay more and use smaller disks – oh, the wonderful days of 9GB disk drives. Or, we could take a large capacity disk with great $/GB, and short-stroke the disk. As an example, let’s say we use 10% of the drive’s capacity. The access density will go up by about 20X!! (~2X from access time reduction, 10X from the fact that the actuator is chasing 10% of the capacity). Wow! And to think we made that HUGE improvement by only increasing the $/GB by 10X. Oops. Well, here is what the Axiom QoS does for you. It allows you to short stroke the drive, and get a HUGE access density improvement. But instead of throwing away the other 90% of the capacity, it allows you to sneak in and access that part of the disk in a way that only causes the access density of the high performance capacity (10% share) to drop by say, 10%, from 10X better to only 9X better. WOW! You get the whole drive space back, but have a portion of it that behaves like it has 9X the access density!! You see, the truth is that stove-piped storage doesn’t get you anything if you can buy a single array that can mitigate contention while increasing access density.
B L O G . P I L L A R D A T A . C O M
12/15/08
3:49:26 PM
18 16 ACCESS DENSITY IOPS PER GB
34b.pdf
HDD Access Density (IOPS per GB) 2002–2011
14 12 10 8 6 4 2 0
2002
2003
2004
2005
2006
2007
2008
Y E A R
SFF 2.5-inch, 10,000/15,000 RPM 3.5-inch, 15,000 RPM 3.5-inch, 10,000 RPM (SCSI, SAS and FC) 3.5-inch, 7,200
2009
2010
2011
35a.pdf
1
1/7/09
12:47 PM
B L O G P O S T N O . 3 5 – J U LY 1 , 2 0 0 8
Pillar Axiom 600 We announced the Axiom 600 and Axiom 600MC today, in the same month of our three year anniversary. I am proud of what we’ve achieved in our first three years, especially our fast growing installed base of 400 customers, many of whom are running a suite of applications with Axiom as their storage target. When people pile lots of applications on a Slammer – it takes horsepower to virtualize a large storage pool. The Axiom 600 is about twice as fast as the Axiom 500, so for bigger installations our customers can apply the AX600 to their advantage. The AX600 even improves CIFS and NFS performance on smaller NAS applications as well. Axiom was designed to work better with larger workloads, hence our customers can drive up utilization and pool efficiency with many (especially disparate) applications connected to it. Thus, we built the Axiom 600 to extend the number of applications, and spindles that can be supported before you have to add another Slammer (you can add 4 Slammers of any type or model to a storage pool). Of course, our Axiom 500 customers can upgrade (about 12 have so far) to the Axiom 600 with their data in place. Mike Vizard, SVP and Editorial Director for Ziff Davis Media, wrote the following blog today in response to our announcement. The Storage Virtualization Conundrum The problem that most people are trying to solve when they start to dabble with storage virtualization is basically utilization. IT managers realize that most of their storage assets are way under utilized in terms of capacity, so they naturally want to look at ways of creating pools of storage across multiple devices that can be easily shared by any number of applications and servers.
35b.pdf
1
1/29/09
5:19 PM
To accomplish that, IT organizations have typically had to either upgrade to new storage hardware that supports the ability to pool storage or look for a software-based approach that would allow them to create virtual pools of storage from arrays supplied by different vendors. In either event, there are tradeoffs to be made. The upgrade to new storage virtualization is expensive, and the systems themselves can be difficult to manage in terms of allocating pools of storage resources across application sets that can change dynamically. It takes a lot of finesse and skill to make that scale across the entire enterprise. Furthermore, if they take a software-based approach to storage virtualization, they basically wind up turning off a lot of the value-added capabilities of their storage hardware as the intelligence for managing the storage infrastructure moves from being embedded in the hardware into what amounts to a generic storage application. Against that backdrop the folks at Pillar Data Systems have decided to issue an interesting challenge as part of the release of their new Axiom 600MC storage controller. The company is guaranteeing customers that they will see 80 percent disk utilization, which they say is the highest in the industry and about twice as much utilization the average customer sees from any other storage solution. The reason Pillar Data Systems can make this claim is because of their approach to policy-based storage management. By using a set of pull-down menus to essentially profile the storage requirements of each application, the Axiom 600MC makes sure that applications with similar performance requirements don't sit on the same array. This means that customers should only see a very limited amount of storage bottlenecks because the applications accessing any given disk array at the same time will have substantially different I/O performance characteristics. Essentially,
this means that Pillar Data Systems is managing the process of intelligently distributing where the data resides across the arrays to both maximize utilization and reduce bottlenecks. This really opens up the whole question of the value of storage virtualization given the fact that at least one vendor is now guaranteeing utilization rates of 80 percent. Essentially, IT organizations now need to ask themselves if they really need to go through the pain of creating and managing virtual pools of storage when they now have a more efficient way of basically throwing more hardware at the storage growth problem. Time will tell if the whole issue of storage virtualization is going to be carried along by the rising momentum of server virtualization. There’s certainly a lot of powerful marketing behind the storage virtualization concept. But the first question that any IT organization should really ask before they follow everybody down the virtualization path is exactly what problem are they trying to solve? And when they stop to think about that, they may just want to pause long enough to consider all the possibilities. Mike Vizard clearly gets it. VMware/Server Virtualization is about increasing server utilization to save costs in the evolving, more agile datacenter. Axiom is about storage utilization to the same end – through an intelligent operating system that embeds QoS and Application-Awareness in the entire platform. Oh, and the Axiom 600MC is a Mission-Critical five-nines capable platform for very high-end Tier 1 applications. Guaranteed.
B L O G . P I L L A R D A T A . C O M
36a.pdf
12/15/08
3:57:44 PM
B L O G P O S T N O . 3 6 – J U LY 2 1 , 2 0 0 8
Six-Lane Driveway for your Home Regardless of your personal wealth, I doubt anyone would tell you that it makes sense to have a six-lane driveway to your home. Despite how many vehicles you own, the classic one-lane driveway probably will do just fine. A one-lane driveway certainly won’t lengthen your commute time, unless the kids leave it strewn with skateboards and bikes where a bit of weaving is required, but I digress.
To wit, RAID calculations, drive sparing, and rebuilds are not accomplished using the Axiom’s Fibre channel fabric connecting the Slammers to its Bricks. These driveintensive, bandwidth-hogging, and latency-provoking operations are contained within the Brick for SATA drives, and 2 Bricks for FC drives. This great topology is what yields Pillar’s industry-leading rebuild, performance under fault, and general performance specs. This is real.
Such is the logic of one of our competitors who claims that “they have 4 Gb throughout their array, and Pillar doesn’t.” Wow! Could I be your best friend? Is there any chance that if I studied under you for a few years I could amass that keen insight? The architects of our Axiom never thought of that, it is just too esoteric. If they could have only anticipated the need to have 80 jagabytes per second everywhere, they could have eliminated all bottlenecks? If only I could have drawn that pirate, I too could have been accepted into your college. (Sorry, that was mean, but gawd).
It turns out that some topologies also employ fewer, higher performance channels while others employ more numerous, lower performance channels. The issue is the aggregate bandwidth of the fabric, not bandwidth of any individual connection. The term fabric here is enlightening – you don’t tell someone that the thread size of your rope is bigger than someone else’s, therefore your rope is stronger… lest they ask, how many threads in each rope?
The line of reasoning leads to six-lane driveways and trees with trunks and branches that are all the same diameter. It’s stupid. I guess the reason why I get so bent out of shape about it is because it is not a service to customers to tell them something like that. Tell the truth, if you know it. What limits one architecture is not what limits another – for some, it is memory bandwidth and for others, the number of RAID controllers, or a pitiful amount of write cache. But don’t make up some dumb thing like “we have 4 Gb everywhere,” implying that it makes you better.
So, while some people might find appeal to the symmetry of six lanes everywhere, or trees whose branches and trunk are all the same diameter, it is too simple-minded an assertion from which to draw conclusions. I will confess that my driveway is wide enough to weave a bit to avoid inappropriate obstacles, but my wife still manages to park her car smack dab in the middle of everything for “just a second”.
Some architectures would benefit from 4, 8 or higher Gb-per-second connections to their disk. Why? Because they do all operations (RAID rebuilds and the sort) over their back-end fabric connections, like this particular competitor does. How many disks share the fabric? The back-end architecture has more than a lot to do with the traffic on the interconnect – and some older architectures need lots more bandwidth than Pillar’s Axiom to accomplish the same thing.
B L O G . P I L L A R D A T A . C O M
36b.pdf
12/12/08
9:57:31 AM
37a.pdf
1
2/3/09
5:09 PM
37b.pdf
1
1/21/09
4:01 PM
BLOG POST NO. 37 – AUGUST 27, 2008
Cloudy with a Chance of Meatballs Ever read this one to your kid? Great story! I recommend it because it is silly and lots of fun. Well, that story reminds me of the Cloud Computing / Cloud Storage BS flying around these days. I’m sorry to rain on some people’s parade, but for Pete’s sake, do we have to keep inventing new buzz words for stuff? (Yes, of course we do, because it’s fun.) I’m not the only one who thinks this Cloud thing has gone too far. Mike Elgan’s great article in Datamation expresses a similar opinion. We differ in our adjectives for “Cloud computing”: he chose dangerous and misleading, while I prefer silly. Mike sees danger because the ambiguity of the term confuses rather than enlightens. I agree wholeheartedly that popular, ill-defined buzzwords distort communication, and hence can be dangerous. But I doubt serious people write “cloud this” or “cloud that” into contracts or quotes for services. In fact, I doubt that language using the word “cloud” survives serious environments at all, so it probably isn’t all that dangerous.
I mean, who suffers if two researchers sit around saying cloud-this and cloud-that to impress each other about how their ideas pertain to the cloud structure of the universe? What if they both fell forward into their plates of cloud soup and fell asleep? Who uses this term, then? People who want to be “with it” and appear to be “in the know.” People who don’t want to be left behind. People who want to frame ideas as something new, rather than technicalities of something old. We draw a piece of a system block diagram as a cloud to mean “there’s a bunch of crap going on in here and the details don’t matter.” Of course this infuriates cloud people, because to them it’s all about the cloud. Pardon me for not getting that. The problem is, I do get it – and it’s still not all about the cloud.
B L O G . P I L L A R D A T A . C O M
38a.pdf
12/15/08
5:29:10 PM
BLOG POST NO. 38 – AUGUST 29, 2008
I Love a Parade
Chuck Hollis over at EMC stirred up a hornets’ nest with his “Your Usable Capacity May Vary” blog today. Seems EMC is touting its storage efficiency superiority over rivals NetApps and HP by publishing the results of an in-house efficiency comparison. Let’s set aside, for a moment, the fact that Pillar was the first storage vendor to take up the storage efficiency discussion, way back in early 2007, and welcome EMC and NetApps to the discussion... Let’s also set aside the irony that this is the same EMC that finds it absolutely acceptable that their Clariion customers are currently only utilizing an average of 30-40% of the storage capacity they purchased. For as much as I love a good competitive bake-off, especially when it comes to overall storage efficiency, how can
Percent Allocation Usable Capacity 75%
RAID Overhead SNAP Allocation Hot Spares
Pillar Axiom Description
No. of Drives
Capacity (GB)
Usable Space SNAP Hot Spares RAID Raw Capacity Disk Shelves Percent Efficiency
120 From Usable 12 12 144 12 75%
16,148 1,752 1,752 1,752 21,404
Percent Usable 100
P E R C E N T
90 80 70 60
you promote the efficiency of your disk array when you don’t address the most fundamental question storage administrators should be asking: “Exactly how much of the storage I just purchased can I actually use?” (More on that later.) There are many different ways to frame a ‘like to like’ comparison, and when it comes to Storage Efficiency, Pillar leads the pack on most of them. In prior posts, I have addressed some of these frameworks. In future posts, I will address more. For now, let’s focus on what EMC tested, the results they published, how Pillar compares… and then, and perhaps most importantly, offer an alternative solution that is easily 2X better than what EMC, NetApps, or HP has to offer. Table per Hollis’ format. I don’t like this format, but to avoid confusing everyone by flopping it around, this is apples-to-apples with his blog presentation. Except: we don’t need to add drives and segregate SNAP, Vault, etc. with Axiom. Thus – we get the same spindle count for performance as EMC and anyone else (actually more, but that too is another blog entry). Pillar’s Axiom Application-Aware storage array nails a 75% efficiency score because there is no overhead for such things as WAFL (NetApps), Vault Drives (EMC), or Occupancy Alarm (HP). All system persistence information is kept and mirrored across all drives within the Axiom and is accounted for in the Usable Space from the 120 – 146GB drives. So, we get 75% vs EMC’s 70%, HP’s 48% and NetApps 34%. Of companies noted in Chuck’s blog, I think this kicks some buttski. Nice, thanks for bringing it up Chuck (hee hee). Now that we have established that Pillar is more efficient than the other arrays using this approach, let me say that it is really a shame. In other words, the premise of Chuck’s entire blog is an EMC view of the world – FC is how you should skin this cat using their machine. Yup, it is.
50 40 30 20 10 0
Pillar
EMC
NetApp
HP
B L O G . P I L L A R D A T A . C O M
38b.pdf
12/3/08
3:08:44 PM
39b.pdf
12/15/08
6:45:33 PM
BLOG POST NO. 39 – SEPTEMBER 25, 2008
Application-Specific vs. Application-Aware I suppose when people read that Oracle is now selling storage, fear wells up if they are in the storage business. Well, at the risk of being like Apple when they welcomed IBM to the PC business, the Exadata product is fine with me – and Pillar. Since Larry funds Pillar, Pillar is a great place to deflect concerns regarding “Oracle is in the Storage Business.” After all, isn’t Larry putting Pillar out of business? No. Exadata is aimed at Netezza and Teradata. Why? Because those companies build Database appliances that have impacted Oracle, and Oracle needed a strategy and product to sell against them. It has one now. Good for Oracle. Pillar doesn’t build special purpose machines. We never have. Frankly – Pillar doesn’t lose deals with Oracle to Netezza or Teradata or GreenPlum or Blooming Pumpkin. I am not saying that the business is small, or uninteresting, or not good for Oracle – I think this will be a great thing for Oracle, and perhaps for HP. It’s just not the business Pillar is in, that’s all.
with quality of service built in to manage contention in a way that drastically increases the utilization capability while maintaining performance. Exadata is not utility storage; it is a special purpose Oracle Database Appliance. Exadata doesn’t need quality of service – it doesn’t entail sharing storage amongst multiple applications. There is a strong difference between Application-Aware and Application-Specific. If fact, the difference is rather humorous: Application-Specific means that your storage is so aware of an application that it is oblivious to all other applications. Well, if that’s what you want or need, then I think Exadata looks to be an awesome product. If on the other hand, Oracle is only one of your applications, and you want a virtualized storage pool supporting high performance OLTP, BI/DW, and a host of other applications, Pillar Axiom is your best choice. End of crisis. Whew!
Pillar sells modular, scalable networked storage, virtualized for ease of management and configuration,
B L O G . P I L L A R D A T A . C O M
39a.pdf
12/15/08
6:40:12 PM
40a.pdf
1
2/3/09
5:57 PM
40b.pdf
12/15/08
6:21:54 PM
BLOG POST NO. 40 – OCTOBER 3, 2008
Storage Guarantees and Fine Print In March 2008, Pillar decided to guarantee our customers could actually use 80% of their “Useable Capacity” – that is, they could run 80% full – with minimal deleterious effects on Axiom performance. This was a first in the industry. I blogged about it, and customers and prospects found the offer compelling. Recently, EMC rattled NetApps and HP by posting a Storage Efficiency number which of course, did not include Pillar because we score better... The remarkable thing is that while we top the charts on efficiency as defined in Hollis’ blog, we blow the competitors away by allowing – and guaranteeing – a reasonable written utilization of 80%. NetApps honestly does poorly at utilization. Out of the box, they recommend setting aside huge amounts of storage, like 40% or more, because of their architecture. While they spin it as a customer benefit, WAFL simply dies hard if there is not a lot of free disk space. You don’t want to run a NetApps system more than about 60% “full” if you are accustomed to any sort of performance. So here is the corker: We issue a guarantee that customers can get 80% utilization, and NetApps follows with a 50% storage saving guarantee. In summary, NetApps offered a guarantee that says you only need to buy half of the raw storage with their systems as you do with others’ storage. Honestly, I’ve never seen such disingenuous BS in my life. Here are the conditions on the NetApps “guarantee” – otherwise known as the “fine print”: 1. It’s a raw capacity comparison and the competitor’s array must be configured using RAID 10 vs. NetApp’s RAID 6 (DP). Absolutely ridiculous. RAID 10 imposes 100% overhead and is much faster than RAID DP (which has much less capacity overhead). 2. Must run VMware OK, but why? Who gives a damn? Why isn’t their storage better in non-VM environments? After all, we are
comparing RAID 10 to RAID 6. With a reasonably sized array you are 90% there without VMware, De-dupe, and other associated flim-flammery, aren’t ya? 3. Must use dedupe. So is this a NetApps benefit or can all of us with dedupe do this? 4. Must use Thin Provisioning. I’m not entirely certain why this is a requirement. With Pillar you can use it or not – your choice. 5. Must engage Professional Services to validate the config. Oh for God’s sake – Is this that hard? You can’t find this out without using Professional Services? Do you need an accountant, too? 6. Must be less than a $2,000,000 deal. I love the reason they limit this guarantee: “Rogers responded that they're (>$2M systems) also the most likely to see dramatic efficiency gains, and so they don’t need an assessment service to calculate what those gains might be.” Stunning. Never guarantee something that is a shoo-in. Always hedge your bets by placing money on stuff that is riskier. Never bet on a sure thing. 7. No more than 10% of the following data types: images and graphics, XML, database data, Exchange data and encrypted data. Seriously? No databases? No Exchange? Is this a laundry list of data they can’t compress well? I keep thinking to myself – NetApps is a good company. They have some good stuff. Who in the hell made this up? Really. This is embarrassing.
B L O G . P I L L A R D A T A . C O M
41a.pdf
12/12/08
10:07:29 AM
41b.pdf
12/12/08
10:06:02 AM
BLOG POST NO. 41 – OCTOBER 29, 2008
Economic Downturn I think the headlines this past month pretty much say it all – expect rough seas ahead. There is no escaping that fact for just about every sector of the economy. I envy a friend of mine who is a cardiologist – because people simply cannot postpone cardiovascular issues because medical bills might be inconvenient.
Customers in sixteen countries is in great shape. Our development roadmaps have actually been strengthened through partnerships and some added outsourcing, so our products will continue to enjoy superior features and functions, and timely introduction of enhancements people expect from Pillar.
Pillar is anticipating a tough economic climate will persist throughout 2009 and into 2010, so we have decided to batten down the hatches and lay off some of our employees. These decisions are always among the toughest decisions a CEO has to make. But our jobs require we do what’s necessary to fulfill our duties to the company and its shareholders.
At the end of the day, I am sorry to say we are in a long line of companies doing the same thing to protect their Customers and shareholders. But good management teams prepare for the worst, and drive for the best, and I am happy to say that Pillar is no different.
The good news for Pillar and our Customers is that our long term viability is ensured by prudent financial actions such as these. At Pillar, we have maintained our Account Reps, added forces to our channel, and our distributors. We have also maintained and enhanced our Customer support groups. Our large installed base of over 420
B L O G . P I L L A R D A T A . C O M
42a.pdf
12/16/08
3:42:05 PM
BLOG POST NO. 42 – NOVEMBER 3, 2008
The Age of Skeptical Empiricism – And That’s a Good Thing! I recently received a reply from Marvel Jones to one of my previous blog posts that warrants a post of its own. First, thanks for your question/comment, Mr. Jones. It read:
applications. In this case, consider DAS, because sharing always risks contention. Just ask any parent with children who are “sharing” just about anything. Servers aren’t any better, but you don’t have to send them to college.
“This sounds great with a platform like the Axiom mixed storage with a single interface and 80% utilization of usable storage there has to be a catch. Is there a disadvantage at all to this system? Can you tell me that there is no technical disadvantage to going with an Axiom platform? I don’t mean to sound skeptical but well I am a bit skeptical, if you can explain how a system so great hasn’t hit the market with greater force in the time this has been on the market or give me some technical deep-dive as to how a system like this gets 80% utilization with no degradation in performance and why more companies have not adapted to the Pillar way of storage. I know quite a bit about the architecture and just want some of the gaps filled in.”
Now on to the good news: Most IT shops have a wide variety of apps with disparate requirements for IO, bandwidth and capacity. Rather than dedicating spindles to your applications, you can consolidate your storage onto a Pillar Axiom, and let it manage the performance and contention among those applications. It does this with our patented QoS.
My response: Yes there is a catch… if by “catch,” you mean there is something that may not be obvious or deserves clarification. Let me explain: 1) As I have said before, if you are spindle IO limited in all your applications, then reducing the number of spindles by consolidation makes things worse. In other words, if you are capacity underutilized because you are IO/spindle limited, then storing more data on those spindles further overtaxes the IO you were counting on for other applications. You can’t get blood from a turnip. 2) If all your applications need the same level of service, and you do not want any contention whatsoever, you need to dedicate spindles and arrays to those
Even on a single application, differentiable QoS extracts far more performance from a spindle than the unmanaged FIFO queuing of most storage systems. See these previous blog posts for more details: Apr 2: Simple overview of QoS – The Simple View. Policy-based selection of filesystem and LUN attributes determine how Network, Cache, RAID, and Disk are used to meet Application workload requirements. Mar 9: Big Picture view of Axiom & Pillar – Changing the Industry. Axiom is an Enterpriseclass machine. Available, Reliable, and Serviceable; the Axiom is a Tier 1 networked storage machine built on a virtualized pool.
42b.pdf
12/16/08
3:41:43 PM
Mar 21: Reducing Contention for Controller and RAID resources – Distributed RAID. Nearly all storage controllers employ centralized RAID
controllers with FC Loops connecting Storage shelves; not Axiom. The Axiom employs RAID controllers in the Storage shelves themselves. Why? Because RAID controllers
are the bottleneck in spindle expansion on standard systems. Furthermore, centralized RAID controllers use system bandwidth for sparing, parity, and rebuild, which is wasteful and hinders performance under fault and scalability. Aug 29: Utilization Regardless of QoS – I Love a Parade. Regarding hitting the market with great force... Well, we are the youngest in-the-market for any of the “new” storage companies (by years). We have won more than 400 enterprise customers of all sizes with a firstclass product in our first three years in the market. Comparably, this is a great achievement. We have competitors that don’t have as many customers after 6 years in the market. Others sell low-end systems to more customers with a lot lower average selling price. I say good for them! The storage market has plenty of room for more than one alternative to the big legacy providers. I think your point is: if we offer better performance and utilization, why aren’t people adopting our technology even faster than they already are? I think that to take full advantage of our utilization superiority, you have to understand QoS and be willing to use it. Even for customers not using QoS, Pillar Axiom offers many advantages over standard arrays from our competitors. Regarding QoS use, it takes a long time to change old ways of doing business. Let’s face it – what we advocate is a bad idea with old-fashioned arrays using FIFO queuing. Our customers have found it simple, easy, and extremely effective. It takes time to break our old habits, but as is often the case in life, it is well worth it. Economically, our technology saves money big-time, and we guarantee it.
B L O G . P I L L A R D A T A . C O M
43a.pdf
1
2/3/09
6:24 PM
43a.pdf
12/15/08
6:54:32 PM
BLOG POST NO. 43 – NOVEMBER 14, 2008
Two Ways to Skin a Cat? Do people actually skin cats? On second thought, I don’t really want to know.
Examples will illustrate the two ways of looking at utilization:
This may sound obvious – but there are two valid ways to look at utilization and how it can save money.
1. You own a NetApps array and you have allocated 60% of its useable capacity. Whoa, don’t add more LUNS or file systems onto this array without adding more shelves of disks. The syrup drips off the ol’ WAFL when you start using all that space they want you to “reserve for snapshots,” because you need lots of free space to “write anywhere.” If you allocate more of your space, you’re in the crapper on performance. With Pillar, you can use another 20 points (80% total), guaranteed! Heck, 15% of our existing customers use run at >85% utilization. The savings here are obvious.
But first, let’s start off by saying this: All storage systems allow 100% utilization of the useable capacity; all systems – this is by definition. So what the heck is an 80% utilization guarantee then? The point is this: on most systems, as you fill the arrays, the performance goes in the tank:
Performance
100%
Pillar Competitor 0
10
20
30
40
50
60
70
80
90
100
% Utilized Capacity (Written/Usable Space)
Pillar’s Utilization guarantee says that you can use 80% of the capacity before applications experience degradation in expected storage performance. Most of our competitors can’t do that. In an upcoming blog post I will explain how we do that – and why others don’t get these results. Pillar Axiom’s ability to use 80% of its storage without performance loss saves money for customers. You get to use more of your disk space before being forced to add spindles and therefore you buy fewer disks (which translates to capacity and spindles) than the other guys force you to buy in the first place. Fewer disks means lower purchase price, less rack space, and lower power and cooling costs.
So this aspect of utilization is “Use more of what you own on an Axiom before having to buy more.” 2. You need 40TB of storage. You can buy 50TB with Pillar (actually you can buy as little as 40TB, as I said, but this is not a great idea). Don’t even think about trying to get by with 50TB from NetApps – you need to buy 67TB of usable capacity (the RAW numbers made my knees weak, so I didn’t even bother stating those numbers – remember that SNAP overhead, RAID DP, etc). At a competitive or equal $/GB, that means with the NetApps example you are looking at paying at least 33% more for the system. Add to that the additional support costs and power, cooling and footprint costs and you’re talking some real money!! So this aspect of utilization is “Buy less up front with an Axiom than you need to buy with traditional arrays.” This is just fun! If you compare us to 3PAR using SATA, you are looking at more spindles because they don’t even support the 1TB disks that have been in the market for a year! (They also don’t support LUNS greater than 2TB either, but hey, that is a Blog for another day. I am straying off topic – sorry. Split-brain I guess).
B L O G . P I L L A R D A T A . C O M
44a.pdf
12/15/08
7:06:21 PM
BLOG POST NO. 44 – NOVEMBER 26, 2008
Every Cloud has a Silver Lining In a previous post entitled “Cloudy with a Chance of Meatballs,” I poked fun at the hyperbole that exists today because so many people have their heads in the Cloud (sorry). The hyperbole rolls on: Salesforce has declared itself a Cloud Computing company. Well, in their case, they really are a Cloud Computing Company, but up until now they limited themselves to their own apps for SFA and CRM. Most Cloud providers let you run apps of any kind on their compute, store, and connectivity resources.
For lots of IT infrastructure companies, it doesn’t really matter. If Pillar sells storage to end users or to people who sell the storage as a service, all is well. People still need to store stuff. We have many customers who do just that.
I am going to try to outline this whole phenomenon and discussion in terms that I can relate to. Perhaps you can too: (see table on opposite page)
Pillar sells an Enterprise class product – the Axiom. This matters because data centers that offer cloud computing must be highly reliable, fault tolerant, performance resilient (under fault), serviceable, and virtualized. Pillar’s QoS offers Cloud providers far more than just storage; it gives them the ability to gain the huge efficiencies they need from their capital assets that classical storage solutions don’t allow.
My friend Tom Mornini of Engine Yard pointed out that the Type 2 analogy was a bit pejorative; he thought it was a negative slant on Cloud computing. So I put the Type 1 analogy in; talk to anyone who owns a boat – four out of five will tell you that it might be a lot less work and more bang for the buck to ride around in one than to have the headaches of owning one. By the way, Tom wrote a great article on Cloud computing.
It seems to me that the story around the Cloud is about the efficiencies that can be gained using distributed computing and virtualization. If a Customer is big enough to have a an efficient IT infrastructure, outsourcing brings no more efficiency than the standard “this isn’t a core competency” argument. For small organizations, the efficiency of sharing the Cloud with lots of other small customers can be significant.
Of course, to some, owning anything and staffing it is an advantage, especially if it includes proprietary “secret sauce.” So, the beauty of the Cloud is in the eye of the beholder. My mother-in-law uses gmail – and if she could get rid of her computer, she would. We’ve been through this before. Remember WebTV? Your computer was a set-top box. Or your set-top box was your computer.
So, bring on the Cloud! Of course, a slight interruption in Google’s Cloud and gmail service, say for preventative maintenance, for a year or so, would also be appreciated; I would just have to do without hearing from my mother-in-law for a year. Gad!! (You see, one man’s downtime is another’s silver lining!!)
B L O G . P I L L A R D A T A . C O M
44b.pdf
1
2/3/09
6:29 PM
Personalized
Computing Centralized
Cloud
Apps
Applications run by you on your own stuff (compute, disk).
Applications run for you by your IT shop on their compute, disk.
Applications run for you by a third party on their shared stuff.
Connect
Connect to local hardware.
Connect to LAN to corporate data center.
Connected by Web to somewhere – you probably don’t care where.
Example
Running Turbo Tax on your PC with Intuit Software.
Microsoft Exchange Server in your company’s IT data center.
Using Gmail through a web interface – on Google’s software and hardware you know nothing about.
Pros
It’s yours, and you can run it until the cows come home for one price. You don’t really need much connectivity – everything runs local.
IT runs it for you. You dont’ have to know much about it, nor how it works. Connectivity is local to the data center. You don‘t have to know anything about the application other than how to use it.
A third party owns everything. You pay by the “drink” or annoying ad. You don’t worry about space, power, cooling backup or restore.
Cons
You mostly have to be at your PC to use it. For some people, advantages are disadvantages.
IT runs it for you – and you have no choices. You take what they give you, and that’s that.
You and your tools are no longer together with your company. Everything is outsourced. Data storage, compute, connectivity, disaster preparedness. You need a ton of connectivity if you are running a hybrid system (partly cloudy).
Type 1 Analogy
Owning your own boat.
Renting a Boat.
Riding around on someone else’s boat, paying them for gas and the wine you drink.
Type 2 Analogy
Owning your own home.
Renting a home.
Residence Inn. Hopefully one of the nicer ones.
45abcd.pdf
1
2/10/09
9:00 AM
45abcd.pdf
2
2/10/09
9:00 AM
B L O G P O S T N O . 4 5 – J A N U A RY 8 , 2 0 0 9
Is FCoE our Savior or Another Empty Promise? OK, I’ll admit that sometimes I am a bit cynical. And skeptical too; teenagers will do that to a guy. So I am squinting down on this whole FCoE thing, wondering what Promised Land it will lead to… and what the cost will be. Do I have to throw out all my gear and get new stuff? Is it good for this, but not for that? If we build systems around it, are we going to be sued for patent infringement by people who standardized it? Can we have a bailout? OK, the last question is a cheap shot against well, everyone who wants a bailout, which these days seems like just about everyone. My Uncle Al wants a bailout because he forgot to drain the water from his sprinkler system before it froze. Al figures Uncle Sam is good for 3,000 rescue bucks. After all, if AIG is worth $50 billion, Al is worth at least three grand. Before addressing the other questions, I figured I better review what FCoE is first. Al knows damn near everything, but he wasn’t any help on this one. And if you read this Blog, you know that my mother-in-law isn’t talking to me because of my little joke about her Gmail going down for a year. Needless to say, that source of knowledge dried up like Uncle Al’s front lawn. Are you ready? FCoE stands for Fibre Channel over Ethernet. Well, that was easy. Great idea huh? Seems like Al should’ve known this one. Ya gotta love Wikipedia.
Appendix (i.e. content) If you have a clean sheet of paper, do you design two or three different networks to connect your gear or do you just change the protocols and layers on a single network as the applications demand? Of course you design one physical network and layer the protocols to accomplish the networking task at hand. The bad news: it wasn’t done that way. (The parallels between standard storage products and Pillar Axiom are almost too strong to ignore, but in the interest of leaving the Axiom plug out of this, I will stick to networking. You can thank me later.) Instead of one, we have multiple networks. We have the two “biggies” namely Ethernet and Fibre Channel. They are physically different, and operate at slightly different speeds. Ethernet, the more popular of the two, has probably over a half billion port install base. To say this garners the power of volume is an understatement. Because of the layering of functionality, Ethernet equipment ranges from $39 hubs to $539,000 big-iron switches. Nice. Everyone loves it because you plug stuff into Ethernet and it just works. Then there’s Fibre channel. Fibre channel has been the purview of the Enterprise storage gang. It is expensive, requires very costly HBAs, and is far from ubiquitous. It has a deserved reputation for being finicky and easy to foul up without even trying. On the plus side, it has the key properties of high performance and ensured delivery that are very desirable for connecting Servers to storage. FCoE attempts to address fibre channel’s strengths while mainta ining all the advantages of Ethernet. We already agreed that if you could deploy one set of technologies to solve networking requirements you would. But originally, you couldn’t. Therefore, there are two incompatible networks out there, and we should look at
45abcd.pdf
3
2/10/09
9:00 AM
Figure 1A
Figure 1B
what that means. Figure 1A shows our current world, while Figure 1B shows what the world might look like if we simplified and integrated FCoE into our current datacenter. In Figure 1A, you have two sets of switches, thus double the management and configuration. In Figure 1B, you only have one set of switches.
clients via distribution switches, 10Gbits to Axioms and servers for example), the switch resources will handle both kinds of traffic just fine. These big switches are often underutilized anyway, so combining them reduces waste.
Today, different people with different skills usually manage your Ethernet and fibre channel networks and therein lies some of the problem: turf battles. The first thing people say when you propose combining storage and TCP/IP traffic is that you don’t want to share network resources between some dude surfing the net and a server doing serious application work on a Storage Array. Well, chillax... because, besides QoS parameters (like Axiom has – sorry couldn’t resist) that prioritize correctly, high-end Ethernet switches allow you to isolate the storage traffic from TCP/IP using VLANs. But people still worry, “If the switch is still shared, how can you do that?” The answer is that if you have a big enough switch, and the network connections are sized properly (let’s say 1Gbit to
Wait a Minute… What about iSCSI? Isn’t this what iSCSI was supposed to accomplish: storage connected through the Ethernet using TCP/IP? Well yes it is, but it missed the mark for the Enterprise. And Dell paid $1.4B for Equalogic, an iSCSI storage company. While some people interpreted Dell’s purchase as iSCSI taking over the planet, most people decided that iSCSI was locked into the SMB (Small / Medium Business) market, and the data center still needed a better solution. The problem is that TCP/IP is a “lossy” protocol; it’s OK to drop packets and resend them later. TCP/IP also has unpredictable latency; while adequate for client-server communication, it is not the tight coupling that high performance applications on servers need for their storage attachments in the data center.
45abcd.pdf
4
2/10/09
9:00 AM
Fibre Channel
iSCSI
FCoE
Physical Transport
Fibre Channel
Ethernet
Ethernet
Layer 3
Not Routable
IP
Not Routable
Layer 4
No L4 Encap
TCP
No L4 Encap
Data Type
Fibre Channel
iSCSI
Fibre Channel
Pros
Built for the job. Lossless, predictable latency. Less complex than sharing all traffic – physically isolates storage from other networks.
Uses Standard IP network. Share existing resources. Simple. Uses only Ethernet adapters in clients and servers. Already own the infrastructure.
Lossless and predictable latency. Shares Ethernet resource, fewer network adapters per server or host. Improved switch utilization.
Cons
Lower volume than Ethernet (by far). Requires separate network management & capital equipment.
Lossy, unpredictable. Loose New. Requires new Ethernet switches, coupling relative to FC. adapters. Lower performance.
Analogous to driving a car on a...
Racetrack
City Streets
So let’s do a comparison between Fibre Channel, iSCSI and FCoE Packet descriptions of Ethernet used for three purposes: 1) Client communication, 2) Proposed Storage connection with predictable latency and lossless properties, 3) Routable lossy iSCSI connection to less demanding storage applications. So what does the advent of FCoE mean to the storage industry? As you can see, it’s all about consolidation. Server Virtualization enables you to use more server resource, Application-Aware storage means that you can use 80% of your storage resource, and FCoE means that you can use more of your Ethernet infrastructure resource and eliminates your fibre channel infrastructure. Consolidation eliminates the costs of a second infrastructure, such as power and cooling, management costs, maintenance, training, spares, etc. Like all new technologies, it should make our lives a little simpler. Timing… let’s acknowledge that these things always take longer than the proponents claim they will. While the world will not be FCoE by end of 2009 or 2010 for that matter. You will begin to see people buying FCoE-ready equipment rather than FCoE-incompatible gear. Count on Pillar to accept and support FCoE in your infrastructure. Truth is, we don’t care either way; we make what Customers want to buy. On the other
Freeway
Ethernet Header
IP Header
TCP Header iSCSI Header iSCSI Data T C P
Ethernet Trailer
S E G M E N T
iSCSI for Remote LUN (for WAN IP Routing)
Ethernet Header
FCoE FC Header Header
FC Payload
CRC
EoF
FCS
FCoE for LUN on Local Subnet (not routeable)
PRE SF DA SA
EthernetEncoded TPID
User Priority (QoS)
VLAN ID
Data CRC
Ethernet/IP for Client Communication
hand, we do have opinions, and our opinion is that FCoE should be a good thing. In an upcoming post I will put forward a simple economic analysis of FCoE versus currently available alternatives. After all, until the economics of FCoE are a clear incentive, it will be a false promise. Related Links: In my opinion one of the best overview/comparison articles out there was written by Chris Mellor.
B L O G . P I L L A R D A T A . C O M
46a.pdf
1
2/3/09
6:36 PM
46b.pdf
1
2/10/09
8:59 AM
B L O G P O S T N O . 4 6 – J A N U A RY 1 3 , 2 0 0 9
SPC-1 Top Performance While I am excited that Pillar has topped the Storage Performance Council’s performance charts for $/IOP for business-class storage, I am sure it is temporary. All of the systems benchmarked use “currently available” drive technology, but both drive technology and controller technology are constantly evolving. The point of the benchmark, in my opinion, is that a vendor is competitive enough to be in the game. Clearly we are in the game, and are, in fact, leading the business-class pack for now. This brings me to another point – what about EMC? Well, the truth is, EMC doesn’t play this game. Chuck Hollis pointed out in his blog that EMC finds the benchmarks to be too far off the real-world workload to consider them useful, and hence refuses to play. I think Chuck has a point, and I have blogged about it before. The problem with this is, you can’t have it both ways; you can’t just play when it suits you, and pass when it doesn’t. You either play or don’t. EMC decided to set up their own utilization efficiency benchmark, and then roasted NetApps over the coals, along with HP and IBM, but they won’t play in other benchmarks like SPC-1. Well if they don’t play, how did their results get posted? It’s old news, but NetApps bought an EMC Clariion, ran the benchmarks, and posted it for them! Damn that’s funny. What balls do you have to have to do that? Anyway, I suspect that’s why the Clariion showed so poorly in the configuration that we beat them in by over 2.5X... not because their product is really that bad. So please don’t use these Benchmark results and draw the conclusion that our product is 250% better than EMC Clariion; it isn’t. It is our belief that our product is typically 20–40% better than a mid-range Clariion under real workloads and a reasonable price assumption like 37% off list price as HP used in their posting. Wait a minute – can you really get results this inferior in a benchmark based on setup assumptions, B L O G . P I L L A R D A T A . C O M
tweaking and “configuration parameters” that figure into these things? Really? YES!! Truth be told, people work hard to optimize their configs and setups to get competitive results; they don’t just happen. So “setting one up for EMC” is a scam – nobody has the knowledge and motivation to set up a competitor’s system to perform its best. Doing so is almost as dishonest as comparing your RAID DP (the NTAP marketing term for RAID 6) system against another’s RAID 10 configuration for storage efficiency. Yikes, who would do that? While we are excited about our first results, I also have to say that the Axiom does far better in real workloads than it does on SPC-1 benchmark workloads (We mentioned this in our press release on the subject). I believe that the Clariion will do better too, but not as good as the Axiom; perhaps the Clariion won’t even beat the NetApps machine. It turns out that if you were to construct a workload for NetApps WAFL, you couldn’t construct too much better a workload than the SPC-1 benchmark... which might give us all a tad of insight as to why EMC refuses to participate, really. There is another point worth making here, for those who like to see BIG numbers (bragging rights). 3PAR lost to Pillar in this posting, but only for $/IOP. 3PAR actually posted the biggest IOP count of all the players. Well, remember when I said in a previous blog that these benchmarks were all fundamentally limited by spindle count? 3PAR posted a result that took over 1200 spindles and cost over 2M smackers (that’s US smackers, not Euro-smackers). If we chose to build a $2M configuration of Axiom, we would post a larger number than 3PAR. So we say. Truth is, I am not sure you are ever going to see us do this, as I think it is kinda silly. It would temporarily get us bragging rights for sure, (we could post about 260,000 SPC-1 IOPS for this price) but who really gives a damn?
B L O G P O S T N O . 4 7 – J A N U A RY 2 6 , 2 0 0 9
InfiniBandwagon A great question was asked by “Joe Smith” (if that’s his real name and let’s just say, yah, and the DJIA is back at 11,000 too) as a response to the FCoE Blog: Mike, I’d like to know your opinion of Infiniband. It’s clearly superior technology when compared to Ethernet and less expensive to deploy when compared to 10 GigE. Why do you feel it hasn’t taken off? I think everyone has thought about it, even my Uncle Al who is still waiting for his bailout. Uncle Al says it’s because it isn’t popular enough. Thanks Al, keen insight. I agree that often times what is superior doesn’t always win in the marketplace, and I agree that Infiniband is a great technology. When we first started designing our Axiom storage array, it employed Infiniband (IB) on the back-end to connect the Slammers to our RAID controllers (which are in our Storage Enclosures). I was sad to see Intel and Microsoft pull their support for a time, which effectively caused us to redesign using FC. Of course now support has grown significantly, but the specter of 10Gbit Ethernet is causing the adoption of IB to
B L O G . P I L L A R D A T A . C O M
slow for the same reason the HD DVD versus Blu-ray battle caused the high density follow-on to DVDs to stall; few people want to make much of a bet – lest they end up on the wrong side of the outcome. In low volume applications which are less cost-sensitive, the technology advantage mattered more than the cost, hence IB was adopted by folks that built HPC clusters and were linking Servers or Control Units together, just as Pillar was going to do. The problem now is that you are betting on the relative power of Ethernet and its massive volume versus what has been a technology that has had relatively minor volume. Conventional wisdom holds that the “big boys” will win with volume over the technologists who might prefer IB. So to be honest, IB is a niche of the network market. To break out of that niche some powerful economic barriers will need to be broken. With 10Gbit Ethernet coming on line it is hard for me to believe the world will shift large volumes into IB technology, hence I fear that a niche market may be all it gets to serve. And that niche might even shrink.
48a.pdf
1
2/10/09
8:57 AM
B L O G P O S T N O . 4 8 – J A N U A RY 2 8 , 2 0 0 9
Cheating at Poker? Paul Travis over at Byte and Switch published an interesting article on SPC-1 benchmarks. A comment that likened the benchmark process to a Poker game where everyone is cheating piqued his interest. Rather than cheat, I think that vendors play their absolute best hand within the rules (mostly). Of course, some think that participants cheat for this, that, or some other reason. Others feel the rules favor the house (established storage vendors). But the implication that everyone cheats at benchmark wars is not fair. I don’t think everyone cheats – I know Pillar doesn’t. It’s amazing the vitriol you see in the responses to these articles and blogs. NetApps posers, EMC posers – I’m sure even Pillar posers – all jump in. While some are balanced and circumspect, others need to report to anger management. Most people clearly don’t understand or want to understand the benchmarks. They focus on aspects of the benchmark that favor their company and ignore those that don’t. Reminds me of the George Carlin routine where he says “Some people don’t want to talk about certain things. Some people don’t want you to say this. Some people don’t want you to say that. Some people think if you mention some things they might happen,... and some people are just really f^%$* stupid.”
B L O G . P I L L A R D A T A . C O M
48b.pdf
1
2/10/09
9:45 AM
OK, perhaps they aren’t stupid, but for Pete’s sake, do they need to blow a gasket when somebody trumps them on a benchmark they apparently disrespect? Or say they could have done better if they wanted to (this one fractures me, as one commenter to a blog said, “Who’s stopping them?”)? Or mix in NAS results on a SAN benchmark? Chuck Hollis’ position on benchmarks looks really good to me right now... he posted some great comments about our SPC-1 press release (I mean sensible – not good for Pillar particularly), and his position on the circus surrounding these things is pretty much right on. I love how he goads us all to keep it up,
“You're not gonna let Pillar run away with this thing, are you?” What could be better than having your competitors waste time and money? Like I said before, we can play the game and our performance looks good. If we had a benchmark that showed off our prodigious cache or sequential reads, we would really kick the crap out of NetApps. But we don’t and I don’t worry about it. To answer Chuck’s question “Why?” (referring to participating in the benchmarks at all) I will just say this: 1) a material number of our prospects continue to ask us for it; and 2) some of our competitors’ salespeople say that if we don’t post we must be hiding something. We aren’t in the same position as EMC, so a discussion about the ridiculous benchmark circus can sound defensive, or deflective. So we participated... and, sure enough, the circus around it started, and I’m sure will continue. The problem is you get hooked. Once you’ve trounced NetApps, they will post some benchmark that asserts they are the best, and various constituencies will hound the rest of us to respond. Will we be tempted to beat them again? Bet on it.
49A.pdf
1
2/9/09
6:24 PM
49b.pdf
1
2/10/09
11:50 AM
B L O G P O S T N O . 4 9 – F E B R U A RY 6 , 2 0 0 9
The Economy and the IT Industry A while back I had a call with Tiernan Ray at Barron’s. Of course he and the folks at Barron’s are focused on the state of the world economy – but I guess that doesn’t take a rocket scientist to figure out. Besides, who the hell isn’t focused on it anyway? Tiernan and I chatted about the effect the fiscal woes are having on the storage industry. I suppose it’s obvious to people that if you have 20% market share, and 10% of your business is in the financial sector, things don’t look great for the year ahead. There simply isn’t enough new business to compensate for the slowdown in repeat business you would normally expect. At Pillar, we look forward to the year ahead. Our “next generation” storage architecture and the cost savings it represents are more attractive now than ever. In fact, the vigorous interest in our product has surprised us a bit, but it is a welcome surprise. Customers who were comfortable with their legacy storage suppliers are not so any longer. One of our execs was at the Gartner Data Center Conference recently where 8 out of 10 (actually 84%) storage managers said that they were looking for emerging storage vendors. And they were characterized as “impatient” for change. They want to reduce costs and complexity and know the old way of doing things will not work for them in the long run. One of our marketing execs uses the term “complacent” instead of comfortable; I don’t think this is fair. New things are always a risk because they move away from something that you know how to do, and move toward something you don’t. With finite time and resources the tradeoffs don’t always line up with “pick something new.” But, when you have to get more done with a lot less money, the business tradeoffs change. So instead of no longer complacent, I would say people are “no longer as comfortable.” New technologies that promise greater performance and overall value per dollar spent start to look much more attractive. So for us at Pillar, this is good news. I just attended a briefing with prospect in New York, and we did a comprehensive TCO and storage efficiency analysis for them. The analysis showed that with Pillar Axiom, the cost savings is at least ~40% less than their current
legacy storage provider... and this is after all price reduction negotiations. In other words, it was not based on a list price comparison, which is not worth discussing. Who pays list price anyway? Our TCO analysis with other customers has shown even more potential savings... and who wouldn’t want that in this current economic climate? In fact, Beth Schultz from Network World interviewed me a couple of weeks ago on this topic and ways customers can both improve their storage efficiency and lower their TCO. (The podcast “Know Your Storage Efficiency Quotient,” is now posted on the NW Data Center site.) On the other hand, many businesses that cannot get the loan, lease, or cash to support their infrastructure will stall many IT purchases and wait for better times. Unfortunately, some of these infrastructure-dependent companies will also collapse in the year ahead. We have been through this before... at least many of us, anyway. Pillar was founded after all of the “newer” storage companies were launched. In fact, Pillar was founded three months before the 9/11 tragedy. I used to tell our developers during 2002–2003 that it was a good time not to have a product to sell. Well, we started shipping our product three years ago and it is now a damn “good time to have a product to sell”– especially one like Pillar Axiom, which was designed to save customers the cold hard cash it takes to build and/or operate their data centers. We’ll get through this economic mess, especially if we can convince the politicians not to buy bowling balls with the bailout funds and create an economic and regulatory environment that spurs growth instead. Either that or we could decorate each other’s offices with a million dollars worth of furniture and rugs, which would be good too. Seriously, we all need to keep a cool head, take delight in our health, family and friends, and help each other, as these are the most important considerations.
B L O G . P I L L A R D A T A . C O M
Bio.pdf
1
2/10/09
12:27 PM
Join the Conversation. blog.pillardata.com/
Mike Workman Chairman & CEO, Pillar Data Systems Mike Workman has spent his career breaking new technical ground in the storage industry. In his 25+ years in the storage business, Mike’s appointments have included Vice President of Worldwide Development for IBM’s storage technology division, Senior Vice President and CTO of Conner Peripherals, and Vice President of OEM Storage Subsystems for IBM. He has a PhD and Master’s from Stanford, a Bachelor’s degree from Berkeley and holds over 15 technology patents.
© 2009 Pillar Data Systems. All Rights Reserved. Pillar Data Systems, Pillar Axiom, AxiomONE and the Pillar logo are trademarks or registered trademarks of Pillar Data Systems. Other company and product names may be trademarks of their respective owners. Illustrations and photographs have been specifically created or licensed for this book and may not be reproduced separately.
B L O G . P I L L A R D A T A . C O M
Blank-Green.pdf
1
2/11/09
11:41 AM
LogoPage.pdf
1
1/21/09
10:56 AM