It's Not Rocket Science, But It's Our Work
Twitter's system architecture shortcomings are nowhere near as interesting and we're certainly not as exciting as rocket science or exploring Mars to most people. However, folks who use Twitter get frustrated when the service is slow or down which is why we have been trying lately to be more communicative about engineering and operations details.
The TechCrunch blog is particularly interested in behind-the-scenes details about Twitter engineering and systems because the folks who work there use Twitter and also have a very large audience of technology fans. Earlier this evening, TechCrunch posted some specific technology questions for us on their blog so we thought it appropriate to answer them here on our blog.
Before we share our answers, it's important to note one very big piece of information: We are currently taking a new approach to the way Twitter functions technically with the help of a recently enhanced staff of amazing systems engineers formerly of Google, IBM, and other high-profile technology companies added to our core team. Our answers below refer to how Twitter has worked historically—we know it is not correct and we're changing that.
Q: Is it true that you only have a single master MySQL server running replication to two slaves, and the architecture doesn’t auto-switch to a hot backup when the master goes down?
A: We currently use one database for writes with multiple slaves for read queries. As many know, replication of MySQL is no easy task, so we've brought in MySQL experts to help us with that immediately. We've also ordered new machines and failover infrastructure to handle emergencies.
Q: Do you really have a grand total of three physical database machines that are POWERING ALL OF TWITTER?
A: We've mitigated much of this issue by using memcached, as many sites do, to minimize our reliance on a database. Our new architecture will move our reliance to a simple, elegant filesystem-based approach, rather than a collection of database. Until then, we are adding replication to handle the current growth and stresses, but we don't plan on ever relying on a massive number of databases in the future.
Q: Is it true that the only way you can keep Twitter alive is to have somebody sit there and watch it constantly, and then manually switch databases over and re-build when one of the slaves fail?
A: There's a lot of necessary handholding and tweaking of our current system. Nevertheless, we're growing our operations team to meet ongoing challenges.
Q: Is that why most of your major outages can be traced to periods of time when [a system administrator] was there to sit and monitor the system?
A: There are a number of reasons for our past outages. Everything from faulty process, environment, configuration, and just plain load. Our system must be designed for peaks; currently we're tightly coupled which means that massive traffic on one part affects all. We're addressing this by breaking the stack into small lightweight pieces which are designed for failure.
Q: Given the record-beating outages Twitter saw [recently], is anyone there capable of keeping Twitter live?
A: Of course, this is our work. Our growing team is collectively rolling up our sleeves to build a utility-class system. We're all focused on designing something that persists and becomes the background.
Q: How long will it be until you are able to undo the damage [you] caused to Twitter and the community?
A: We're working extremely hard to keep the service stable and performing, as well as architecting a system that stands the test of time. We'd love to be able to tell you exactly how long this will take, but it's no easy task. It will take time, time well spent.
The folks at TechCrunch singled out a former employee of Twitter by name in their questions but Twitter is a team—we share responsibility for our victories as well as our mistakes. At the scale we're working, the tiniest detail matters. If the Mars Lander is off by a fraction, it burns up. A minor localized change on Twitter can have a systemic impact—good or bad. We're working on a better architecture. In the meantime, we're looking for ways we can optimize and extend our current architecture's runway. Thank you for being patient while we do our work and thanks for using Twitter.
—Jack Dorsey and Biz Stone

93 Comments:
Thanks for the quick answers. I really love a company that is willing to answer most of its users' questions pretty fast. Nice work, guys.
Thanks for the response. Certainly an improvement, communicationwise.
great post and well done for not sinking to the same level.
some good answers there, in particular I've discussed with friends the advantages of a filesystem approach (not everyone agreed with me).
cheers again :)
Jack,
No amount of downtime will make me personally stop using Twitter anytime soon.
I look forward to seeing you refactor the infrastructure and have every bit of confidence that you will succeed.
Alex
There must be a mole feeding TechCrunch this information. Seriously though thanks for the post, other hopeful web players should take this as an example, always build the initial architecture as if you are expecting millions of users, no matter how far fetched it may seem.
Oh Snap. Take that Michael
Guys, thank you for being so responsive in the midst of these issues. The prompt and open communication is appreciated.
Hi Twitter Team,
I have a question ,
Did u ever imagined twitter would be such a great hit while designing the application ?
(Scalability issue )
Did u consider factors like scalability,reliability,maintainability etc ?
with Regards,
Ajay
Thanks for the information. I know you are trying real hard to keep the system up. This service is so valuable that it is worth the wait. Thank you for providing such a service.
Great Post. There is something going on here that is being missed and could be truly ground breaking. Twitter's problems are in a way the communities problem's so there has been a lot of very vocal exchanges these past few days. But it is this exchange that can make it better and drives innovation. I wish that everyone follows this model of transparency however difficult swallowing criticism might be. In the end we all want a good product, so please keep this exchange going!!
Just wanted to point out, for other readers of this post, the update from the TC crew:
"Twitter continues to be annoyingly and constructively responsive to criticism."
Classy work @jack and @biz. (And credit to Michael for noticing it too.)
Thanks for your very professional response in spite of the unseemly public debate. It is your product to make a success. I know you'll get it right.
hey, thanks for this response - much appreciated. you guys are doing a great job. LOVE twitter, even through growing pains.
im a great fan of twitter. while i dont understand technical side, i wish you as a team will solve what needs to be solved.
best,
kenjimori
Guys - these smaller intervals of downtimes won't change my mind to leave Twitter. As we cell phones and go thru a tunnel, we lose signal and even with the device in hand we cannot use it to what it's meant to. It's the same technology backlash that's there but could be improved. I'm sure you guys are on top of it. Keep Twittering.
*** APPLAUSE ***
Nice job on your responses to Mike's rant.
I am sure preparing your response was not easy, but you've done a great job of addressing each item.
I am very pleased that you are strategically selecting the best and brightest IT core team to re-architect the new twitter ver 2.0!
I believe you guys have a terrific product and if these key infrastructure, database and application changes are handled correctly, you will continue to grow and improve twitter to new levels of success!
Keep your head up and continue to engage smart people who will help twitter re-surface from the ashes with a new product.
Thank you for communicating and improving twitter!!
@smbeebe
susan beebe
Personally, I wouldn't have even honored that link-bait from Assington with a response, but the response you posted was just brilliant. You made yourselves look great whilst making him look like even more of an ass simultaneously. Cheers and good luck in your continued work!
Very well and up to the point response. I like the way Twitter team is communicating with the community.
I hope Michael will stop his writing rude comments on twitter and Blaine.
After all mistakes happen....
Twitter Fan......
I feel for Twitter. When you build a hatchback for your friends to go to the beach in, and suddenly a bus-load of their friends comes too (and one or two of them are really really fat) and you hadn't put in any contingency plan.... well I can imagine the screams for a systems architect were loud and painful.
Hang in there, I have played many a MMORPG and used many a service that has gone through the same teething pains that came with being popular.
I'm a fan, outages and all. Thanks for the good work - and a measured professional response to the little boy poking the den of hornets to get a reaction.
I love twitter and it has become a staple in my networking and recent popularity. I want to say it is much appreciated that you talk with the community and make people aware of your feats to improve twitter.
Great post! I have no doubts about Twitter becoming even better
*applause*
It's not necessarily the answers we'd all love to hear, but it's answers - and you guys are getting this 'public facing' thing down better and better.
Hang in there.
Excellent post, this kinda of transparency and insight will do you well in the future.
I strongly suggest either moving Twitter to a cloud-type system like 3Tera AppLogic, or, even better, getting out of the systems management game entirely and running it on a cloud service like EC2 or Mosso or somesuch. It's pretty obvious that systems design and operations aren't your forte - nor should they be. You should be concentrating on improving your application functionality, not dinking around with basic sysadmin tasks.
Kudos for your efforts. I love this: " we don't plan on ever relying on a massive number of databases in the future.". Bravo and keep the good work.
if one could buy Twitter stock, now would be a good time. Looking forward to the upgrades.
The service is free and people complain when it's down and hour here and there.
Never mind, we love twitter, I personally don't mind if it's there's some down time.
Keep up the good work.
Round of applause
Well done guys
Excellent response, best of luck with the work,your fans aren't going anywhere no matter how frustrated we appear at times.
Nicely done, I would of just called out Michael Arrington for the fuckwit that he is.
Biz, can you comment on whether IM/XMPP and the 'Track' function is being baked in to your new messaging-based architecture? It's the combination of micropublishing, following and the ability to track keywords into a live stream that is the real future potential of Twitter.
And by the way, thanks for starting to engage in conversation with the community. It's relieving a lot of pressure around the service issues.
/cgerrish
After reading this post, I have the impression that Mike Arrington was correct. Right?
Thanks for sharing this. As for complaints about damage to the Twitter community, the phrase "drama queen" comes to mind.
Twitter rocks.
Sure it has it's off days - and they're annoying at the time - but to expect a utility class, highly volatile application that hit a tipping point and exploded to be scalable and stable without some growing pains shows what cloud-cuckoo land some "experts" live in.
The biggest challenge I ever faced was architecting a very dynamic site that had to handle 6.5m PIs a day. That's probably a rounding error for your stats - and you have API users to contend with as well. I can begin to imagine the pain.
I can't imagine a single utility that takes critisism and rather than plod on doing their thing because they are a utility opens and and takes the high ground with transparency and community engagement. Once again the team proves they are paradigm shifters.
In a years time we'll all look back on this storm in a teacup and wonder what the fuss was all about. By then the "fail whale" will be framed and hanging in a museum somewhere :)
Excellent response to TC rant, and much appreciated. The only reason we complain so much about Twitter outages is that we LOVE the app and can find nothing else that comes close to doing what Twitter does. Because many users don't know about the Twitter blog, it would be helpful to link to new posts like this one on Twitter itself. THANK YOU.
Your increased openness about your technical problems is much appreciated.
I think its great you guys are responding to questions about the service, but really, you don't need to lower yourselves to responding to Techcrunch's whinier questions-"undo the damage caused to the community?" I mean, christ, are they 12 years old?
There was no "damage" to the community. I bet you have increased, not decreased your userbase overall.
Again, great answers to some very important, and also very childish complaints (complaints and questions worded like from a petulant child, for a service that is provided FREE)
radical transparency, well doneh
I can't believe it
Why the Hell do you sped time answering those questions ?
You're not public services & your service is free !
I use twitter all the time, when it's down, I don't like it. But you don't owe me any explanation, neither do you to Techcrunch !
Of course you're working on improvements, your survival depends on it.
Do your work & don't feel that Techcrunh has the right to require answer to their questions
Replies like these really deserve an applause. Keep going Team Twitter!
Well, for the past 30 years or so, my work has been communications, and I can tell you for a fact that good information drives away bad, and it's difficult to leave a service where you know you (the 1.6m users) have created such a "happy problem" Keep working; I'm happy as long as I hear from you and I'm not left wondering if you don't care about me.
Put track on your list of things that must be done. I know it's tough, but do it!
Namaste,
still loco
I really can't see any advantage in a file system based approach. I strongly recommend a relational database approach and - just as a suggestion - a clusterd index on the datetime field in the table with the updates.
Carry on, you'll make the right decision(s).
Damage to the comunity? Are they talking seriously?
Damage to the community? Are they talking seriously?
*in best rob schneider voice*
You can doooo it!
great post--i'm really liking the new transparency and openness.
good luck with the changes! i can't wait!
I'm not using Twitter much, since the IM interface is down and I don't have a cell phone. Using it through the web is pointless, to me: If I'm going to load a browser, navigate to a page, etc etc, I'll just go post a longer piece on a more traditional blog.
But hey, when it comes back, I'll be here waiting for it,
Much more impressed with recent communications. That's all the majority of those who are frustrated were asking for.
Thank you for providing us with information about what the issues you are facing are, rather than holding your cards too close.
<3 Twitter.
Very nicely done.
Thanks Twitter.
I want my money back!
It's just not worth the Twitter subscription fees!
Oh wait... never mind.
Very well said. I really like the way u've accepted your shortcomings. Also good to hear that you are actually looking long term "we don't plan on ever relying on a massive number of databases in the future.". Thanks for the response.....really helps the users have faith.
I'd love to hear more details on how that file based approach works.
I wish you the best of luck!
You are delivering us a unique service.
I've quited reading Techcrunch blog for a while. They use their criticism to draw people's attentions but they do not create anything new except asking annoying questions.
Keep up the good work.
Lol. Love the transparency, but I still can't even log in!
Great post.
Many of these issues that Twitter currently has were encountered at Bloomberg. The Bloomberg Terminal and infrastructure has faced similar problems over the years, and has a staff of 60 operators staffed 24x7 to keep the infrastructure running. Unlike a free service like Twitter, Bloomberg can afford massive E25K's (has over 50+), IBM e690's (over 40+), Hitachi Thunders, EMC storage, etc... And even then, they still incur financial backend processing delays, delayed messaging, incorrect data feeds, etc. BTW, the Bloomberg infrastructure is written in mostly Fortran and C++ using tons of BerkelyDB, some MySQL, some Informix and few Oracle databases.
Classy leadership to match a classy product - I wish you all the luck and success in the world, thank you.
Dude. They might as well have asked you if you're still beating your wife. Jeez. Those are some ruthless questions.
Guys and gals, hang in there. Hang in there. As a developer that has been crunched under the wheel my fair share of times, my heart goes out to you. Just hang in there.
You have an amazing service that a lot of people have connected with, so take the outrage and scathing questions as a backhanded compliment. You have something people love and are attached to, and emotional connections can sometimes breed irrational reactions.
Just hang in there.
@HellaSound
This was a truly great post. It makes me much more confident that Twitter is here to stay, which makes me willing to deal with the growing pains for a while longer.
I have to say, that database architecture is shocking. It seems like there must not have been much growth forecasting done. However, it seems you have some really interesting problems to solve and I hope you share those solutions as they happen.
Thanks!
Wow. That reply was brilliant. If only other tech companies took a page from your book about communications.
It is so refreshing to see a company cast aside the type director of corporate communications style email/memo that is so typical these days.
Many Kudos.
Arjun
Ironic: I can't get to TechCrunch right now, it seems their server is taking too long to respond.
Replication in MySQL is dead simple.
I'm not sure what sort of problems you're having but I've had 3 MySQL servers in multi-master replication and one as a read only slave since 08/2004 w/ no outages.
These servers act as AAA servers w/ Radiator Radius.
Uptime: 127 days 18 hours 44 min 17 sec
Average queries per second: 13.402
Backups are also a snap, I use the slave server to do my backups (and long running report queries).
You write lock, flush, lvm snapshot, unlock.
The only time the system has been offline is during mysql upgrades.
Individual servers have been down for OS updates but Radius and MySQL have always been available.
Very interesting read.
Darlene McCord
thanks for the hard work, and please don't let the snarky ones get you down.
jeffs at rit
It is unfortunate that by merit of being in the web 2.0 space you actually have to give credence to professional trolls like Arrington that would be ignored in any other business. Faced that unfortunate reality, you handled it with extreme class. A++ would tweet again.
Wow.. who's the jerk that wrote the questions? How about a little bit of tact next time?
Ironically, the "utility-class system" http://gu.st is down.
I'm not so sure Twitter's downtime is that much different from any other web service. The biggest difference is that Twitter broadcasts to me that it's down — making it obvious that it's down — it's in my face.
My suggestion? Well, improve performance, of course, but stop sending out messages to twitter apps that let me know that the server returned some unknown error, or that I've surpassed 70 requests an hour. Just don't update my feed when that happens. What am I going to miss between now and the next 60 seconds anyway? The world won't end if I only get an update 58 times an hour instead of 60.
You want to improve community perception of Twitter? Stop broadcasting that you're down so often.
I find Twitter an interesting addition to notify followers of changes to my web site. The performance is still a bit inconsistent for me to depend on it's availbility. I'm however posting notices in twitter and looking forward to you getting the performance issues straitened out.
thanks for the frank and fearless upraisal. It may not be rocket science but I bet a lot of rocket scientist use it daily :) ( I know programmers do :)) )
I believe you are giving up on the database layer too early. Even with the expertise of the ex-Google, IBM employees it will take months to build the FS based solution and to re-architect the whole application to support that. Add to that the pains of running any analysis (which you can run very easily on mysql right now) that you want to run on it.
I am currently solving problems similar to what you are facing and based on the load projections we did - the numbers just don't add up.
My work prior to current one was crawling billion pages, running spam detection and ranking over the corpus - where we used the FS approach. Like I wouldn't use mysql for the latter, FS approach is an overkill and painful process for the former.
Just my 2 cents.
Shows respect to the community, and big pride to answer such brutal questions. It's your baby, and we get it... Keep up the good work.
Let's see...free service, massive growth in popularity (I am using it for three of my university courses to keep in touch with students + a personal account), integrates with Facebook, oh, did I mention it is a free service (which I would gladly pay for to be able to have multiple accounts set to a singe email account).
Forget the critics, fix the problems, and laugh all the way to the bank.
I thought I posted this yesterday... you should include this link to my satirical video that helps explain the scaling issues:
http://www.youtube.com/watch?v=93dGW_hDuQ0
thanks!
Unbelievable transparency, keep up the good work. I love twitter and what they stand for!
Well done.
Thanks for the 511!!! I needed that :D
I know Twitter has been beaten up a lot lately. Some justified. Some not so.
Let me just say, as someone who's been in IT (eng, ops, arch) for the past 15+ years, I recognize that what you guys are doing isn't easy.
While we all may strive for five 9's, the reality is Murphy tends to limit us to four (or even three!)
Know that what you are doing is appreciated.
Twitter has become an invaluable communication tool for me, and I thank you for providing a valuable service.
This is interesting discussion. It is also bold on part of Twitter to be upfront and forthright about the challenges being faced.
Thought it might help to have a visual reference to put this discussion in context and therefore drew up Twitter Architectur
Comments, suggestions for change welcome.
If further discussion can use this visual it might help as to which area is being discussed.
Disclaimer: I don't work at Twitter and this is a simple and abstract view of thow that service may be constructed
"I strongly suggest either moving Twitter to a cloud-type system like 3Tera AppLogic, or, even better, getting out of the systems management game entirely and running it on a cloud service like EC2 or Mosso or somesuch."
Hahah yeah, well you need to research that type of technology a bit more. None of those will provide you with a scalable MySQL architecture any different then they are currently using, with the same faults. You do realize that EC2 is Xen virtualization instances right? So running MySQL on EC2 they would still have the same master/slave architecture to deal with. Better to let them work on and understand their needs rather then the latest trend of recommending outsourcing your tools to the infamous "cloud" computing sets. They work well for specific tasks but not all tasks.
Why don't you guys switch over from MySQL to Microsoft's or Oracle's database. I see that you guys had another database going down today. MS and Oracle can provide you with better solutions dude!
I appreciate you guys for being so bold in aswering questions. Good Job!
Impressive!
I'm so glad that you're being so open to the community, and standing up against those who cash in on readership, rather than focus on accurate content.
It's such a relief to see the new infrastructure plans, and look forward to hearing about it more in the future.
great read. good to see that twitter is open to their community and the communities they have helped others creat through twitter.
I dont know if scientist will find water in mars but one thing i know, yours technology and machine are very crazzy. I liked it
Very interesting post. "Designed for peaks" is mentioned - that is one of the biggest challenges, how do you define a peak? And then, is that really enough? You don't want to push your systems close to 100% at peak times - take peak, double it and add a little, split it across two hot sides; then if one side goes down, you can cope resonably well at least, thats the route I'd be going down. Normally running at 50% on each hot side. This, I guess, applies to non-database systems really; databases are a significant challenge - good to hear your taking on experts in this area.
My experience would be dropping MySQL soon though - no matter what people say, it isn't a production capable solution under high load. Sybase would be my second place solution, but DB2 would be the ideal for ultra high load.
Good to see the communication though, and I don't think a few outages are going to do you that much harm. Keep up the hard work - twitter rocks!
Thanks for the transparency. I love twitter and will be patient while you go through growing pains. I don't think anyone could have anticipated how quickly it would rise in usage. I know I am shocked at how much my own usage has risen ;) This service has proven to be much more valuable than I would have imagined.
About the failover and load balance of MySQL:
Did you know the opensource solution SQL Relay?
—Jack Dorsey and Biz Stone
A great "team" but with ONE name only ...
Thanks for the honest answers to these questions. Best of luck with the new architecture :)
Good of you to do this. I don't think anyone was "damaged" by being unable to use a free service from time to time.
Thanks guys.
Thanks for this post. I haven't signed up for Twitter yet, so I'm still researching. I enjoy and am impressed with what I am uncovering and will probably dive in soon.
Please reconsider your Filesystem approach. There are some good and robust features with it, but it is also a nightmare to make it transaction safe and run backups/restore.
Post a Comment
Links to this post:
Create a Link
<< Home