http://blog.arabx.com.au/?p=1053
The 20 second summary from the Scaling MySQL - Up or Out? from our panel of experts at 2008 MySQL Conference and Expo.
* Paul Tuckfield from YouTube -- The answers to everything is replication, you just have to rephrase the question.
* Jeff Rothschild from Facebook -- Memory, the source of all problems is your developers.
* Domas for Wikipedia -- You should be afraid that 10 min structural change may answer detailed problems.
* Fahan Mashraqi from Fotolog -- Architect property, the most optimized schema may not be enough, what is the cost of serving the data, no just the time to run the SQL.
* John Allspaw from Flickr -- There is nothing more permanent then a temporary solution
matt Sun Performance tuning/Scaling increases is key
* Monty Taylor from MySQL -- You have to know what's happening on every piece of your technology stack.
http://www.paragon-cs.com/wordpress/?p=144
Scaling MySQL - - Up or Out? Panel @ UC
April 16th, 2008 | Category: MySQL
I would recommend that you download the video of this!! Sheeri posted it here.
The numbers in parentheses are Alexa rankings.
Moderator - Kaj Arno
(1317) Monty Taylor - MySQL
(905) Matt Ingenthron - Sun
(39) John Allspaw - Flickr
(13) Frank mash - Fotolog
(9) Domas Mituzas - Wikipedia
(6) Jeff Rothschild - Facebook
(2) Paul Tuckfield - YouTube
Question One: Number of MySQL servers
MySQL one master/three slaves
Sun four servers
Flickr 166
Fotolog 37
Wikipedia
Facebook 1,800 (900m/900s)
YouTube
Question Two: Number of MySQL DBAs
MySQL 1/10th
Sun 1.5
Flickr 0 (normally 1)
Fotolog 1
Wikipedia Technical Team
Facebook 2
YouTube 3
Question Three: Number of Web Servers
MySQL 2
Sun 160
Flickr 244
Fotolog 70
Wikipedia
Facebook 10,000
YouTube
Question Four: Number of Memcached servers
MySQL 2
Sun 8
Flickr 14
Fotolog 40
Wikipedia 79
Facebook 805
YouTube
Question Five: Version of MySQL
MySQL 5.23-2rc
Sun 5.0.21
Flickr 5.0.51
Fotolog 4.11
Wikipedia 4.4
Facebook 5.0.44
YouTube 5.0.24
Question Six: Operating System on Server
MySQL Fedora
Sun OpenSolaris
Flickr Linux
Fotolog Solaris 10
Wikipedia Fedora/Ubuntu
Facebook Fedora/RHEL
YouTube SuSE 9
Question Seven: What happens if a server fails?
Flickr - Federated setup for failover. Can loose any one side of the shard.
Wikipedia - if a master fails they replace with slave
Facebook - archive binlogs, promote slave
Fotolog - mount snapshots?
Youtube - SAN; shards with a master and multiple slaves so they promote slaves
Question Eight: What is Their Crucial Scaling Technology
Facebook doesn't use SAN - they do use RAID 10 with 2.5″ drives
Fotolog -- UltraSparc T1 -- excellent master UltraSparc T2 -- excellent slave -- uses SAN
This was interesting to me. Frank (Fotolog) said they use a SAN to keep things manageable (only two dbas with the second one just hired). Facebook says they don't use SAN because they didn't want to limit themselves.
Next they got off on discussion about power. This varied quite a bit with YouTube pretty much dismissing power concerns. Of course Frank from Fotolog then pointed out that when they (Fotolog) want to expand in a datacenter -- the datacenter has to get Google's approval...hmmm..no wonder Google isn't worried about it. Fotolog and Facebook were very much in favor of power savings. I think there is more than just saving a little power, you get cooling and space (if smaller of course) savings.
http://venublog.com/2008/04/16/notes-from-scaling-mysql-up-or-out/
Here is the quick notes from the session Scaling MySQL - Up or Out ? moderated by Kaj Arno as part of the todays keynote.
Here is the list of panelists are ordered by Alexa ranking.
- Monty Taylor (MySQL)
- Matt Ingerenthron (Sun)
- John Allspaw (Flickr)
- Farhan Mashraqi (Fotolog)
- Domas Mituzas (Wkipedia)
- Jeff Rotheschild (Facebook)
- Paul Tuckfield (YouTube)
Here is the list of questions and answers from panelists:
 |
How many servers |
Number of DBAs |
How many web servers |
Number of caching servers |
Version of MySQL |
Language, platform |
Operating System |
MySQL |
1 M, 3 S |
1/10 |
2 |
2 |
5.1.23 |
Perl,php and bash |
Linux fedora |
Sun |
2 clustered, 2 individual |
1.5 |
160+ |
8 |
5.0.21 |
Lots of stuff (java mostly) |
Open Solaris |
Flickr |
166 |
At present 0 |
244 |
14 |
5.0.51 |
Php and some Java |
Linux |
Fotolog |
140 databases on 37 instances |
10 instances a DBA |
70 |
40 ( 2 on each, 80 total) |
4.11 and 4.4 |
Php, 90% Java |
Solaris 10 |
Wikipedia |
20 |
None, but everybody is kind of a DBA |
70+200 |
40 ( 2 on each, 80 total) |
 |
Php, c++, python |
Fedora / Ubuntu |
Facebook |
30000 (1800 db servers) |
2 |
1200 |
805 |
5.0.44 with relay log corruption patch |
Php, python, c++ and enlang |
Fedora / RHEL |
Youtube |
I can’t say |
3 |
I can’t say |
I can’t say |
5.0.24 |
python |
SuSE 9 |
Few more misc questions ...
Number of times re-architected ?
- MySQL: 2 - 1 time slave, 1 time memcached
- Sun: site depend (many times over the year)
- Flickr: Ummm...2.5 (various clusters federated)
- Fotolog: many cached replacements (about to do one change now)
- Wiped: Never (Spaghetti)
- Facebook: Every Tuesday, continual
- Youtube: Pretty continual, 2-3 times (replication, sharding, federation)
What happens if server fails ? what actions you will generally take ..
- Flickr: All of our 7 are federated, pairs of servers, we can loose any one side of shard, we can loose boxes.. traffic goes to either side of shard, now it goes to one, and we will get another one (very transparent to user) .. for offline, sites are affected
- Wikipedia: Users shout at them on IRC then they moderate fixed in seconds
- Facebook: one of 1800-1900 will always fail, just operate well, minor impact, with data going away for a while...we restore from binlog and start the server quickly, promote slave to master and number of ways
- Fotolog: we simply mount the snapshots to different servers and get
- Youtube: SAN etc, very important data.. recover the server, mirrored disk ...mirrored hard drive is crucial
Any recommendation of scaling technology that you wanted to bring
- Fotolog: UltraSPARC-T1 (excellent master, multi threaded) and UltraSPARC-T2 for slave (single threaded)
- Wikipedia: good n/w switch
- Facebook: cheap switch, we dont use SAN, neatly partitioned, they scale independently and fail independently
- mysql: cluster very sad
Server virtualization ?
- nobody uses at this time
- ETL cluster, we may run more than one in the future (facebook)
Anything to worry at present ?
- Facebook: app design is the key to use resources, data center power supply and consumption
- Fotolog: Google has to approve it for our power (cut app servers by 1/2 by moving from php to java)
- Youtube: not at all
Any reco, lessons to DBA
- better you know what the systems are, then you can
- performance, scaling taking it serious
- nothing more permanent than temp solutions (if u don't know when u will fail, then u will )
- architect properly in start, schema, cost of serving data
- 10 mts biggest architectural change
- memory, resource
--EOF--