This afternoon, when the smoke brother and colleagues lined up in the toilet and waited for the pit (there were fewer people). Imagine a scene where I am lined up while holding a mobile phone sister. A colleague in front of me took a text message and turned around to chat with me.
So, we began to discuss the implementation of the following short links . After clicking on the short link, we will jump to the real address. Let’s discuss the implementation principle!


Demand origin

Here to talk about why you need a short link? This is simple, for example, when everyone sends Weibo or twitter

If the URL is too long, obviously fewer keywords can be written!

For example, if the text message is too long, then a text message will be split into two hairs, which is a waste of money!

Therefore, the use of short links, not only save resources, but also very beautiful!

Request process

First, let’s look at the short link of Dangdang. http://xxx.yyy/zzz
It consists of two parts
http://xxx.yyy: the domain name address of the short link system

zzz: request parameter. After
requesting the http://xxx.yyy/zzz address, the return status is as follows.

So, what happened after the address was click?

Here, the slag smoke will be a mouthful. In the short link system shown above, the returned status can be 301 or 302, but the Dangdang network uses 301.
Here I want to say, everyone should understand the 30Xstate, in the HTTP protocol, represents the state of the redirect.
What does 301 stand for?
301 represents a permanent redirect. What do you mean?
For GET requests, the 301 jump will default to the browser cache. That is to say, after the user accesses a short link for the first time, if the server returns a 301 status code, the user will access the same short link address multiple times in the subsequent time, and the browser will directly request the jump address instead of going short. Take it on the link system!

The advantage of this is obvious, reducing server pressure, but not counting the number of clicks on short link addresses.

What does 302 stand for?
302 represents a temporary orientation. What do you mean?
For GET requests, the 302 jump is not cached by the browser by default unless the browser cache is implied by Cache-Control or Expires in the HTTP response. Therefore, each time the user accesses the same short link address, the browser will go to the short link system.

The advantage of doing this is that it counts the number of times a short address has been clicked. But the pressure on the server has grown.

Algorithm principle

Doing this is not impossible, there are two shortcomings you have to evaluate can not accept!

  • (1) If the data is large, such as tens of billions, your url address is still too long
  • (2) Your data has regularity, and others can traverse your jump address with a simple script!

In order to solve the above two shortcomings, we add a column to store the key value. In order to shorten the length of the id, we can generally do this. Since our short links are a total of 62 characters from az, AZ and 0-9, you can choose. Therefore, we can convert the decimal id to a 62-digit number, for example, 201314 can be converted to Qn0.
Algorithm is as follows

private static final String BASE = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

public static String toBase62(long num) {
    StringBuilder sb = new StringBuilder();
    do {
        int i = (int) (num % 62);
        num /= 62;
    } while (num > 0);

    return sb.reverse().toString();

In addition, we need to introduce a global ticker that always returns the global auto increment. Equivalently, our short link system first requests the global auto-increment ID, and then converts the global auto-increment ID into a 62-ary number as the key.

Next, solve the second problem! The data has regular problems. After all, after you convert to 62, it only solves the problem that the data is too long, and the data regularity problem is still not solved.
Therefore, we need to introduce a random algorithm. Then at this point, your consideration is whether you want to reverse the global id value according to the key value! To choose a different random algorithm!
(1) If you don’t want to launch the global ID
OK, use a shuffling algorithm to scramble the calculated value. For example, decimal 201314 can be converted to Qn0. Then use the shuffling algorithm to return one of n0Q, Q0n…. But there will be a chance of conflict, just wash a few times.
(2) I hope to launch the global ID
OK, and then convert it to a binary number after getting the number Qn0. Then insert a random value in the fixed digit, fifth digit, tenth digit…(etc.).
As for how to reverse the push is also very simple, you get the short link key, remove the fixed digits, and then convert to decimal.

Speaking of this, the logic of how the key is generated is basically clear. Then, when the user clicks on the short link, for example, the address, the short link system parses the key to nXR, and returns the url corresponding to the nXR according to the unique index.

Detail optimization

(1) Sub-library table
If the system is placed on the public network, it is for everyone to use. It is recommended to go to the sub-library table, the amount of data is over 10 million is very easy. Here is a question, take the self-incrementing id given by the global sender to do the fragmentation, or use the converted key to do the fragmentation key.

Obviously, it is easier to use the converted key as the shard key. If you use ID as a shard key, there are two problems!
(1) The key requested by the user needs to be inverse calculated to calculate the ID, and then according to the ID, go to the corresponding table to find the response time.
(2) Depending on the chosen random algorithm, the key may not be able to extrapolate back to the ID value. In this case, you can only check each table, which is slower.

So using the key to do the sharding key is a good fit. After getting the KEY requested by the user, directly locate the corresponding table and take the url out.

(2) Separation of reading and writing
This system obviously reads far more than writing. It is recommended to consider doing read-write separation.

(3) Introducing the cache
Assume that we are at a time. After pushing the text message of the SMS link to the mobile phone. Obviously, the amount of requests for this short link will increase significantly in the next period of time. There is no need to go to the database query every time, so you can introduce the redis cache.

(4) The global numbering device
ca n’t use other algorithms . This is just a globally unique ID. You have to estimate the performance impact of using other algorithms. And using other algorithms, will not cause the generated generated ID to be too regular.

(5) Anti-attack
Prepare for malicious attacks to prevent the value of the self-incrementing ID from being completely consumed.


I haven’t written this design article for a long time. After all, it was my colleagues and I queued up in the toilet and chatted there. In the end, because the smoke brother was too focused and talked about the technology with his colleagues, when I replied to the sister paper, I had no following, so I was solitary!

I hope everyone will gain something!

Orignal link: