Interesting Redis and MySQL Questions I’ve Run Into in Interviews

2021/08/04

Preface

Today I interviewed with another company I’m pretty interested in. Besides the usual basic questions, there were a few scenario/design questions that I found really interesting—stuff that actually feels usable in real projects. The business at my current company isn’t that sensitive to real-time requirements, so I haven’t seriously dug into some of the “special tricks” in middleware. During the discussion I didn’t answer particularly well—I mostly just shared my own thoughts. So I came back and deliberately summarized and studied it a bit. Honestly, it’s really interesting.

How to prevent inventory oversell (field value going negative) / idempotency for duplicate requests

How do you ensure a field value in MySQL never becomes negative?

This is a pretty common consideration in business systems. Generally, competition over shared resources is handled by locking at the server layer. When I rolled up my sleeves ready to talk about distributed locks, the interviewer added a constraint: you can’t handle it in code—purely from the MySQL layer.

My first instinct was to set the field to unsigned, so it can’t become negative. If concurrent requests try to decrement it below zero, MySQL will throw an exception and the update fails, then you catch it in code and handle it. That probably wasn’t what the interviewer wanted, so he rephrased it:

How to prevent multiple submissions of the same request—for example, you only want to reduce inventory by 100, but due to duplicate requests it gets reduced twice by 100?

When I rolled up my sleeves again to talk about HTTP idempotency, the constraint was still: solve it at the DB layer. I thought carefully and had no idea, so I just admitted I didn’t really know.

After I got back, I looked it up and found that InnoDB has a transactional row-level lock: for update. I’m guessing the interviewer was trying to guide me toward that. Unfortunately our business rarely uses transactions, so I genuinely didn’t know this area—no excuses.

After understanding it, I realized for update can solve a lot of problems. Later I can try it in our business. for update is a pessimistic lock inside a transaction. “Pessimistic locking” is a concurrency-control idea: when data is about to be modified, the lock assumes conflicts will happen, so it keeps exclusive access to the resource. InnoDB’s lock granularity can go down to the row level. for update is the API; pessimistic row locks and transaction isolation are the underlying principles.

With these two, this problem becomes easy: inside a transaction, query the current inventory, then subtract 100. The update statement should carry the quantity you just queried. Pseudocode:

num = select num from someTable where id = 1
update someTable set num = $num - 100 where id = 1 and num = $num

In most internet companies the isolation level is Repeatable Read, so in this scenario other transactions’ modifications won’t affect the select result. When executing the update, if another transaction has locked this record, the update will block for a bit, waiting for the lock-holding transaction to release it. If this update eventually gets the lock, it will execute—but it will find num is no longer equal, so it won’t modify the record. Because for the select, you’re reading “snapshot data”—historical data—which is exactly the “repeatable” part.

By the same logic, the original “inventory going negative” problem can also be handled this way. After digging into it, it’s actually pretty interesting. I also often do internal MySQL tech sharing, but because my projects use transactions relatively little, I didn’t know many of these characteristics—definitely a bit lazy on my part.

How Redis implements multi-field sorting

How do you implement a leaderboard while ensuring performance?

When I saw “ensure performance”, I knew they wanted me to say Redis zset. But to be rigorous, I still said it depends on the business scenario—some scenarios are better with MySQL, some don’t need Redis at all, and for very large datasets you might only be able to do offline computation (I’m currently in a data department, and we’re very good at offline computation—if you want to talk about that, I’m not sleepy anymore). The interviewer hinted: don’t go off-topic, just use Redis. So it’s easy to think of using the sorted set data structure. The key can be something like xxxrank:zset, the member can be userId, and the score can be the ranking weight. Since score supports the atomic incrBy operation, it’s indeed a good fit. I haven’t built a Redis leaderboard requirement in production, so this idea is mostly from books or learning from seniors. The follow-up questions completely stunned me.

If the leaderboard needs multiple dimensions for sorting, similar to MySQL order by with multiple fields, how do you do it?

Honestly, I had no idea. I thought for a while and admitted it. After looking it up, I realized I was really too green—Redis is seriously fun. It turns out you can still do it with zset. Even though zset can only sort by score, that doesn’t mean it’s limited to a single field sort. You can merge multiple fields into the score. Since most sortable fields are numeric, you can leverage numeric representation and base/radix properties to achieve multi-field sorting.

For example, sorting by two fields: score and time. When writing the score into the zset, you can “concatenate” them: use the score as the integer part, and convert time to a timestamp as the fractional part (watch out for precision).

image-20210804175014820

The integer part of the score is the points, and the fractional part after the dot is the seconds timestamp

You can even define your own score rules to support more fields. That requires controlling radix positions and lengths when writing—put higher-priority fields into higher-order digits.

Redis data structures really have a lot of tricks. In our project we have a zset use case that I find pretty interesting, so I’ll share it here.

The requirement is to remind users: when a user clicks “start timing”, it means they’re studying; when they click “end timing”, it means they’re no longer studying. I need to send a push notification to users who have studied for 2 hours (Jiguang handles this, so no need to worry), and if the user hasn’t “left the mic”, I need to remind them every two hours (roughly telling them to take a break and not study too hard).

There are many ways to solve it. Since it’s time-based, you definitely need scheduled jobs. The dumbest way is to scan the table: the table must have a start time field, then diff the start time and current time to see whether it’s greater than the 2-hour millisecond timestamp. Even if the start time is indexed, the overhead is still big. And scheduled jobs have some risk, plus the precision isn’t great. Most importantly, if after two hours the user hasn’t ended studying and they continue to the fourth hour, you need to remind them again. If you do this by scanning the table, you need extra logic for the repeated reminders.

The most elegant way is to use a delay queue: when they “get on the mic” push into the delay queue, when they “get off the mic” pop from the delay queue. In the delay queue callback, each time you check whether they’re still on the mic; if yes, push into the next round of the delay queue. But I was lazy at the time—new business, not sure if we’d have many such requirements later, and building it was a bit annoying (though we did it later using Redis key expiration callbacks—will share next time).

So what I ended up doing was using a zset. When they get on the mic, put data into the zset: member stores userId, and score stores current millisecond timestamp + 2 hours in milliseconds. When they get off the mic, delete that entry.

Then I run a scheduled job to call an API. Inside the API, I use rangeByScore to scan entries between the current timestamp and the timestamp from 2 minutes ago (the scheduled job runs once per minute; each time it scans the previous two minutes, leaving one minute of fault tolerance). The resulting userIds are the ones that need to be reminded now. After scanning, I load them into memory, and I immediately open a pipeline to write back and update the score to current millisecond timestamp + 2 hours in milliseconds (for the next cycle’s 4-hour reminder).

How Redis implements a leaderboard

How do you implement a leaderboard with many fields?

This question is coupled with the one above, but since the focus is different I split it out. My initial design was: member stores userId, and score stores the weight. Then use zrevrange to fetch, and query the DB by userId to assemble the corresponding data. Then the interviewer said: if it’s 100 people, looping and querying the table 100 times is a bit expensive—honestly I hadn’t thought of that….. I described my approach in more detail: for 100 userIds, you can use an IN clause in a single SQL and one JDBC call to get it done.

The leaderboard has many fields—if you don’t want the overhead of querying the table again, how do you do it with Redis?

Maybe I didn’t understand the interviewer’s point at the beginning. After he explained it, I got it—this one I couldn’t do….. I thought about it: with multiple fields, maybe they want to push toward using a hash, but hashes are hard to sort. Or store serialized JSON in the zset member? In the end I just admitted I didn’t know. After I got back, I also couldn’t find a good solution. I’m throwing this question out here—if anyone has good ideas, I’d love to hear them.

Epilogue

Sometimes chatting with a good interviewer can teach you a lot. Usually good interviewers use scenarios from their own company to evaluate candidates. Sometimes you run into scenarios outside your own business. Even if you don’t figure it out on the spot, with the interviewer’s guidance or by looking things up afterward, you can deepen your understanding.

In my view, many scenarios are actually open-ended. For a given scenario, you can often solve it with many different middleware options, and sometimes one approach is a better choice in that specific context. But because the interviewer wants to test your understanding of a particular middleware, they constrain the scope. For example, I often get asked about cache vs DB consistency. That question itself is all about trade-offs—there’s no silver bullet. I’ll tell them how we handle it in production and some approaches I know, but I also think those approaches don’t fit our project’s current traffic and concurrency. They’re not only complicated, they might even backfire.

References

  1. Meituan Tech Team - The relationship between transaction isolation levels and locks in InnoDB

  2. Why-ge’s leaderboard design approach

All articles in this blog, unless otherwise stated, are licensed under @Oreoft . Please indicate the source when reprinting!

Table of Contents