One question which comes up very often is when one should use SAN with MySQL, which is especially popular among people got used to Oracle or other Enterprise database systems which are quite commonly deployed on SAN.
My question in such case is always what exactly are you trying to get by using SAN ?
Depending on the goals the advice could be a lot different.
Manageability SAN allows you to manage storage easily, compared to directly attach storage. If you have one box which starts to require more IO you can have more resources allocated to it. This is pattern which is somewhat typical for Enterprise managing a lot of relatively small MySQL installations or managed hosting provider. For large MySQL web MySQL installations using Sharding or replication IO needs are typically rather predictable. There are also often nice monitoring tools available to see disk utilization latencies queues etc. There is however also a downside compared to directly attached storage in terms of performance management - the SAN is shared resource (has more or less shared parts) so it is possible one application to impact other which means you can’t always analyze the local system performance independently from anything else.
Performance This is perhaps the most interesting one. I constantly see SANs sold as magic equipment which magically going to solve all performance problems, with magically means nobody knows how exactly. I would suggest always to question yourself where do you expect these performance gains to come from ? If we’re speaking about purely disk based SAN (no Flash) the drives are same drives you will see in directly attached storage and each can do only so many IO requests per second. True the SAN can have much more hard drives than directly attached storage but usually also at much higher prices per drive. The next possible advantage is software - can SAN have some very smart software instead of simple RAID you use with directly attach storage which can magically improve performance ? There are cases when it can but you surely can ask this question. For example if you share same physical drives among applications which have different peak usage patterns or something similar there can be an advantage. The question is again if it is high enough to pay for the price premium ? The third one is caching. SAN can have a lot of cache though servers can typically have more. If you can afford SAN you should afford 128GB of memory or so on the server too which will make read caching more efficient in its memory while write buffering can too be done by MySQL and local RAID (with BBU) quite efficiently. It is worth to say there is a benefit of read caching on SAN - if you have MySQL/server restart you may have warmup being quite shorter than in case of local storage.
It is also worth to note SAN does not only have advantages compared to directly attached storage but also downsides - SAN typically has better throughput (because of larger amount of drives) but longer latency because extra processing (and extra trip) involved. This in particular hurts log writes which are very latency critical.
Now what is about MySQL/Innodb specifics ? First, in MySQL if you’re looking for durable transactions the log write latency is triple important compared to other database systems. MySQL has to do more than one fsync() for the log because it internally implements XA to synchronize with binary log and also if you’re looking for maximal data consistency (as SAN users often are) you better have binary log flushed on commit too. MySQL also has broken group commit (which we have partial fix for) meaning concurrent transaction commits will need to be serialized.
Second, MySQL is often not able to submit a lot of outstanding IO requests which is needed to utilize SAN efficiently. It is especially problem if you’re running replication Slave as slave thread will issue single IO request most of the time making it sensitive to latency rather than throughput. Also you should plan on any queries executed having only one IO at the time if Innodb or MyISAM tables are used. There is read-ahead functionality but usually it is not able to drive number of outstanding request significantly. For write intensive workload you will have problem with number of outstanding writes too. Though for that we have a fix at least.
Scalability We need more IO or space than we can get from 6-8 build in drives… so lets do SAN is the story here. As I mentioned in Performance section you may be surprised performance is not “scaling” as much as you expected. You should also consider external directly attached storage which is cheaper alternative in most cases and is especially good if your IO needs are predictable, like you need 3TB of space per box - just external disk enclosure does it great. My Advice for Scaling IO capacity with SAN (this applies to directly attached storage too) is to really understand what you’re trying to scale and analyze things appropriately. I’ve seen in so many times this was a poor choice as it did not allow to scale (ie attempt to scale replication) or was a poor choice, with adding memory or SSD being a better choice.
High Availability Some people are got used to using SAN based active-passive clusters for availability purposes and they look to do the same with MySQL. This works though I do not think it is the best choice. SAN is just another object which can fail completely. Sure it is more reliable, same as server with multiple power supplies ECC memory kill-chip and RAID is more reliable than without these technologies but I always in my HA architectures it is a single “point” which can experience “failure”. Note even if hardware fully redundant the Firmware may have a glitch causing failure and data corruption (and this is not just a speculation, some clients really had it)
Even if you do not run any “Cluster” having SAN you can always “connect” the storage to another node - if server fails or if you just want to upgrade to bigger box, which can be more complicated with directly attached storage.
If you want SAN like high availability I would highly consider DRBD which can do storage level replication between directly attached volumes. Now with support for Dolphin Interconnect it can also offer very decent latency. It also have a benefit of having the system which you can actually split in two (ie for recovery purposes). Sure it needs double the storage but you get 2 copies of data too and using SAN you probably pay more than 2x premium anyway.
Yet another approach is to use MySQL replication with something as MMM or Flipper. I think this actually works best for most cases (unless async nature of MySQL replication is a showstopper) and it also solves the other big pain of semi-online MySQL/OS upgrades and Schema changes.
Backups SAN has number of advantages for operations (and I guess these are the guys which often push for it). It often would offer snapshots with low overhead allowing to take a backups conveniently. Also if you keep last uncompressed backup on SAN (or last snapshot) you may be able to minimize recovery time significantly by switching to that backup - no data copy will be required which can be significant concern to 1TB+ databases.
My Take: I view SAN as solution for niche circumstances when it comes to MySQL. It may be you need it but it is not silver bullet and best solution for all problems at once. When evaluating SAN you should also evaluate external directly attached storage, SSDs, Increasing system memory as well as sharding and replication as a scaling solutions.
Entry posted by peter | No comment
Even though we haven’t yet gotten a schedule online, we’re still accepting proposals for the Percona Performance Conference.
As a teaser, let me just share one accepted proposal with you: Cary Millsap. If you are even vaguely involved with Oracle, you should know who he is. He is one of the world’s foremost authorities on Oracle performance. Cary will be giving a technology-agnostic session on Performance Instrumentation Beyond What You Do Now. Don’t miss this one.
But that’s only the beginning; there’s a lot more to come. There are good reasons why we created our own conference specifically on performance, and if you attend, you’ll see for yourself. In the meantime, you can submit your own proposal. Oh, and sign up as an attendee, too. We haven’t gotten the attendee list online either, but we will soon.
Entry posted by Baron Schwartz | No comment
We had too many requests for deb builds of Percona releases so I could not ignore that and added scripts to build binaries on our Ubuntu 8.10 box. It's going to be only 64bit releases (32bit is dead, isn't it?), and I tested binaries on Debian Lenny system and it seems working fine.
So I prepared binaries for our Build13 (which I actually did not announce to not spam PlanetMySQL a lot, I will figure out another channel for announcements), and you can get it there http://www.percona.com/mysql/5.0.77-b13/deb/, and your feedback is welcome!
As for changes for Build13 it contains InnoDB Data Dictionary restriction patch, which I mentioned before (http://www.mysqlperformanceblog.com/2009/02/11/limiting-innodb-data-dictionary/).
Speaking of deb binaries I tried to keep it compatible with native deb binaries, but I had to make some changes:
1. There was strange configure parameters
which I was not sure how to interpret, so I removed any mentioning of embedded server as 5.0 actually is not supposed to work in embedded mode and our patches can't be compiled in this case.
2. I removed NDB cluster engine from builds, we have not tested combination of our patches with NDB so I really do not know how it works together.
Enjoy!
Entry posted by Vadim | No comment
前些日子推荐了一些朋友去参加英国大使馆文化教育处和友成企业家扶贫基金会共同主办的“社会企业家技能项目”培训。
陆续有些朋友在Blog上讨论这个事情。Oliver收集了一些链接在这里:
2月23日: 推荐三个朋友去参加“社会企业家技能培训”项目 (by Oliver)
3月2日: 再造“乡村图书馆”的魅力 (by 小石)
3月5日:写给“社会企业家技能项目”—关于“薪火成长计划” (by Amy)
3月6日:面对成长的疼痛,不退缩 (by Amy)
3月7日: 薪火成長計劃 (by Ken)
3月7日: “成长计划”的现状和Amy关注的问题 (by Amy)
3月9日: 小石的友成社会企业家技能培训申请未通过 (by 小石)
今天也收到一个朋友的来信,告知没有入选这个培训项目,并称之为”一个小失败”。Oliver觉得不要过于沮丧。其实许多朋友的项目都很优秀,只是相比起入选的项目来说,可能还不够卓越。另外值得一提的是,友成基金会的朋友早先曾告知Oliver,培训只是他们推广社会企业家静神的第一步,今后会陆续推出其他项目。所以,这次没有入选的朋友也不要灰心,以后还有机会参与类似的活动。
这位朋友也在邮件中抱怨主办机构发来通知时,没有附上“落选的具体原因”。这位朋友这样写到:
为何不能将专家对于每个申请项目的评估意见在整理后反馈给申请人呢?除了实地培训,这不也是帮助申请人认识项目局限、改进项目设计、促进项目实施的较好途径吗?虽然这样做可能会加大有关人士的工作量,但是我个人觉得对于未入选的大多数社会企业项目将极有助益。
Oliver觉得这倒是应该理解他们。相必现在他们已经为筹备50人头尾七天的封闭式培训忙得不可开交。这其中存在许多细节,例如住宿安排、资金管理、教材准备、学员联络和接待、场地布置等等。曾组织过培训或者会议活动的朋友,想必会理解他们。等培训期结束之后,他们或许就会逐渐和所有参与报名的朋友逐一联络,继续沟通。
这个朋友的“抱怨”其实也从另外一方面揭示出当前社会创业企业项目极其需要外部的建议和专业辅导。所以,Oliver的建议就是,把你的社会创业计划分享出来,如果你的计划足够好,那么,自然会有人为你鼓掌,也会有人带来建设性的批评。
主办机构基于保护报名者的原因,承诺不会基于商业或其他原因,将报名者的个人资料售卖或泄露给第三方。
那么,社会创业家们,自己倒是要有勇气把创业构思完全公开分享出来。
在blog上建立一个独立的可评论、可引用的页面,把你的报名表公开出来,形式可以参考Amy写的开放信,或者小石写的开放式社会企业计划。
从上面的链接中,我们可以看到,在Amy分享了之后,Ken提供了非常好的建议,这些建议进一步帮助Amy整理思路。
社会创业计划和商业创业计划相比,更专注于社会问题的解决,而不是投资者的投资回报,所以,社会问题的解决会获得更大范围的共鸣和支持。学会善用网络工具,来分享社会创业计划,这是社会企业家从优秀到卓越需要迈出的第一步。
如果你不能学会分享,你就无法学会如何建立一个社区。如果你无法建立一个社区,你的社会创业计划终将成为一台独角戏。
三月 2009 | ||||||
一 | 二 | 三 | 四 | 五 | 六 | 日 |
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 | 31 |