Design Twitter - 1: Requirement and Storage Selection

11/09/2019 posted in System Design

1. Requirement
- 1.1 Use Case
- 1.2 Volume
2. API
3. Storage Choice
- 3.1 RDBMS
- 3.2 NoSQL

1. Requirement

1.1 Use Case

As a user

I can create my profile
I can post a tweet
I can list my tweets posted before
I can follow / unfollow other users
I can list my followers and my followees
I can see others tweets in my homepage timeline
I can see my own tweets in my profile timeline
I can read a tweet (unlike facebook, you don't  need to click into the message.)
I can delete my tweets, so that no one can see it any more.

NOTE: If this is an interview, you'd better list some common use cases, but pick one or two key use cases to go in depth. Time slips away quickly when you explain a design. You want to show the breadth and depth of your knowledge.

1.2 Volume

Ask for TPS from your interviewer, unless he asks you to estimate it.

A useful way is to start from MAU or DAU based on US population and the relative size of each use case.

It took me 20 minutes to think about the TPS of each use case below. This is absolutely not feasible in an interview. So in reality, we should go with just the read and write of one use case or two.

Your interviewer would like to see how you estimates, not the accurate number. So I intend not to go too high for the DAU. On one hand, You may trap yourself into an over-challenging problem. On the other hand, we can start with US market first, and if we still have time, we can think about scaling it to EU and FE by replication.

MAU: 2% US people - 60 Million
DAU: 1/3 MAU - 20 Million ¹
TPS

API	Peak TPS	Reason
Show timeline (read tweets)	10k	Assume every DAU access it twice with one refresh. 20e6 * 4 / (24 * 60 * 60) ~ 1000. Consider peak hours and peak events, we give it 10 times buffer.
Post a tweet	100	1% of read.
Comment a tweet	1k	10 times of post.
Delete a tweet	10	Rare
List my tweets	10	Rare
Follow a user	100	1% of read
Unfollow a user	10	Rare
List my followers	100	Same as follow
List my followees	100	Same as list followers

Storage
600 Million entries in user table (Assume 10% of all users are MAU)
315 Million tweets per year (10% of Peak post TPS is 10. 10 * 365 * 24 * 3600 = 315 M)

2. API

Each use case can be a RESTFUL API.

3. Storage Choice

3.1 RDBMS

Both user table and tweets table are too big for RDBMS without sharding. Usually one MySQL table is good for < 1M rows.

We can do sharding like this:

partition user table based on user id.
partition tweets table based on tweet id.

Sharding RDBMS is painful and error prone:

Need a proxy layer to route requests.
Cannot access data based on other columns instead of the partition key.
Rescaling is difficult, may need to turn off the whole system.
You cannot do join tables or select columns using flexible where clause.

Therefore RDBMS is not considered as scalable.

3.2 NoSQL

No SQL is good for this use case - No complex join, No transaction, Eventual consistency is enough.

Hbase/DynamoDB/MongoDb and even Redis will all work.

I'll talk about my schema design using HBase, DynamoDB and Redis respectively in the next few articles.

World wide DAU in Q4 2018: Facebook 1,520M; Snap 186M; Twitter 126M. Reference: https://www.vox.com/2019/2/7/18215204/twitter-daily-active-users-dau-snapchat-q4-earnings ↩

« Idempotency of a PUT Request