Websocket transport reliability (Socket.io data loss during reconnection)
Asked 07 September, 2021
Viewed 496 times
  • 65
Votes

Used

NodeJS, Socket.io

Problem

Imagine there are 2 users U1 & U2, connected to an app via Socket.io. The algorithm is the following:

  1. U1 completely loses Internet connection (ex. switches Internet off)
  2. U2 sends a message to U1.
  3. U1 does not receive the message yet, because the Internet is down
  4. Server detects U1 disconnection by heartbeat timeout
  5. U1 reconnects to socket.io
  6. U1 never receives the message from U2 - it is lost on Step 4 I guess.

Possible explanation

I think I understand why it happens:

  • on Step 4 Server kills socket instance and the queue of messages to U1 as well
  • Moreover on Step 5 U1 and Server create new connection (it is not reused), so even if message is still queued, the previous connection is lost anyway.

Need help

How can I prevent this kind of data loss? I have to use hearbeats, because I do not people hang in app forever. Also I must still give a possibility to reconnect, because when I deploy a new version of app I want zero downtime.

P.S. The thing I call "message" is not just a text message I can store in database, but valuable system message, which delivery must be guaranteed, or UI screws up.

Thanks!


Addition 1

I do already have a user account system. Moreover, my application is already complex. Adding offline/online statuses won't help, because I already have this kind of stuff. The problem is different.

Check out step 2. On this step we technically cannot say if U1 goes offline, he just loses connection lets say for 2 seconds, probably because of bad internet. So U2 sends him a message, but U1 doesn't receive it because internet is still down for him (step 3). Step 4 is needed to detect offline users, lets say, the timeout is 60 seconds. Eventually in another 10 seconds internet connection for U1 is up and he reconnects to socket.io. But the message from U2 is lost in space because on server U1 was disconnected by timeout.

That is the problem, I wan't 100% delivery.


Solution

  1. Collect an emit (emit name and data) in {} user, identified by random emitID. Send emit
  2. Confirm the emit on client side (send emit back to server with emitID)
  3. If confirmed - delete object from {} identified by emitID
  4. If user reconnected - check {} for this user and loop through it executing Step 1 for each object in {}
  5. When disconnected or/and connected flush {} for user if necessary
// Server
const pendingEmits = {};

socket.on('reconnection', () => resendAllPendingLimits);
socket.on('confirm', (emitID) => { delete(pendingEmits[emitID]); });

// Client
socket.on('something', () => {
    socket.emit('confirm', emitID);
});

Solution 2 (kinda)

Added 1 Feb 2020.

While this is not really a solution for Websockets, someone may still find it handy. We migrated from Websockets to SSE + Ajax. SSE allows you to connect from a client to keep a persistent TCP connection and receive messages from a server in realtime. To send messages from a client to a server - simply use Ajax. There are disadvantages like latency and overhead, but SSE guarantees reliability because it is a TCP connection.

Since we use Express we use this library for SSE https://github.com/dpskvn/express-sse, but you can choose the one that fits you.

SSE is not supported in IE and most Edge versions, so you would need a polyfill: https://github.com/Yaffle/EventSource.

8 Answer