Deadlock Websocket

Description

Repreducable deadlocks are occurring in a scenario that involves Websocket-based XMPP clients.

This problem is reproducible in 4.7.2, ruling out this being a duplicate of .

The reporter has been able to reproduce the deadlock in these configurations (all but the first).

Openfire version

Connection Type

Stream Management

Deadlock?

4.7.2

BOSH

enabled

no

4.7.2

Websocket

enabled

yes

4.7.2

Websocket

disabled

yes

4.7.1 (possibly OF-2444?)

BOSH

enabled

yes

4.7.1 (possibly OF-2444?)

Websocket

disabled

yes

This is how the problem is reproduced:

The tests are run automatically by log in a bunch of users on the web frontend and sending some messages into chatrooms in a short time.

I have a first function to log in a user in the web frontend via puppeteer. Then the user is able to join a multi user chatroom and publish multiple messages with a delay.

At the end the user logs out and the function ends. 

Then I have a second function to generate the meta data for the first function.

  • Which user should be logged in with. (Pool of 13 users available, random selection)

  • Which chatroom to join. (Pool of 13 chatrooms available, random selection)

  • How many messages to send and with what delay. (20 messages per chatroom, delay 2 seconds)

The second function is also able to generate the meta data for multiple calls of the first function at once and with a delay. (10 – 10 – 10; delay 21 seconds)

The second function is called in a cron job each minute.

Environment

None

Attachments

2

Activity

Show:

Guus der Kinderen 
July 27, 2022 at 12:28 PM
(edited)

I have reproduced the deadlock using the Tsung framework, using the configuration that I’m attaching to this comment. For this to work, support for websocket XMPP framing needs to be added to Tsung. My reproduction worked using this modification of Tsung:

Alternatively, the solution as introduced by can be used.

Both approaches work to reproduce a deadlock in roughly 90% of the attempts.

Guus der Kinderen 
July 27, 2022 at 12:26 PM

It seems that the event of processing a <close> received from the client is causing issues, possibly when the client immediately closes the socket, making it unavailable to deliver the response.

I believe that I’m seeing two paths: one that causes Jetty internally to call the @OnWebSocketClose annotated method, and one hat causes Jetty to call the @OnWebSocketError annotated method. Both implementation in our XmppWebSocket class try to close the session/socket.

Jetty invokes these methods synchronously to the processing of the inbound data. This causes the ‘close' to occur under guard of mutexes that are intended only to guard processing of inbound data. As the 'closing’ of a socket is quite a different operation, similar mutexes are attempted to be acquired, leading to deadlocks.

Guus der Kinderen 
July 19, 2022 at 4:55 PM

On first glance, I would think that the VirtualConnection#close() call, which can happen under guard of the StreamManager mutex, is a factor here. It looks like that call synchronously invokes event listeners, which then starts all kinds of processing again - including the bits that deadlock.

Possibly, those event handlers need to be invoked asynchronously?

Fixed

Details

Assignee

Reporter

Components

Fix versions

Priority

Created July 19, 2022 at 4:51 PM
Updated July 28, 2022 at 3:03 PM
Resolved July 28, 2022 at 3:03 PM