Openfire

Unresponsive clients cause Openfire to run out of memory

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: 3.6.4
  • Fix Version/s: 3.7.0 beta
  • Component/s: Connection Manager, Core
  • Labels:
    None
  • Acceptance Test - Add?:
    No

Description

Openfire runs out of memory if unresponsive clients are connected.

Community discussion at http://www.igniterealtime.org/community/message/196900#196900

Issue Links

Activity

Hide
Guus der Kinderen added a comment -

I'm wondering if this is related to the fix of JM-1066. The crux of that issue was moving the MINA idle strategy from BOTH_IDLE to READER_IDLE. The 'both' strategy requires both the reader and writer to be idle. Instead of detecting only reader idle, I think we should detect reader or writer idle.

Looking at the MINA code, this appears to be achievable by setting the idle strategy twice: once for READER_IDLE, and once for WRITER_IDLE (which, combined, is significantly different from BOTH_IDLE).

Show
Guus der Kinderen added a comment - I'm wondering if this is related to the fix of JM-1066. The crux of that issue was moving the MINA idle strategy from BOTH_IDLE to READER_IDLE. The 'both' strategy requires both the reader and writer to be idle. Instead of detecting only reader idle, I think we should detect reader or writer idle. Looking at the MINA code, this appears to be achievable by setting the idle strategy twice: once for READER_IDLE, and once for WRITER_IDLE (which, combined, is significantly different from BOTH_IDLE).
Hide
Guus der Kinderen added a comment -

Wonder if this does the trick:

Index: src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java
===================================================================
--- src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java	(revision 11321)
+++ src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java	(working copy)
@@ -85,6 +85,7 @@
         int idleTime = getMaxIdleTime();
         if (idleTime > 0) {
             session.setIdleTime(IdleStatus.READER_IDLE, idleTime);
+            session.setIdleTime(IdleStatus.WRITER_IDLE, idleTime);
         }
     }
Show
Guus der Kinderen added a comment - Wonder if this does the trick:
Index: src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java
===================================================================
--- src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java	(revision 11321)
+++ src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java	(working copy)
@@ -85,6 +85,7 @@
         int idleTime = getMaxIdleTime();
         if (idleTime > 0) {
             session.setIdleTime(IdleStatus.READER_IDLE, idleTime);
+            session.setIdleTime(IdleStatus.WRITER_IDLE, idleTime);
         }
     }
Hide
Guus der Kinderen added a comment -

We shouldn't forget to apply any fix to the connectionmanager code too.

Show
Guus der Kinderen added a comment - We shouldn't forget to apply any fix to the connectionmanager code too.
Hide
wroot added a comment -

Maybe affect version should be 3.6.4 and fix version 3.6.5?

Show
wroot added a comment - Maybe affect version should be 3.6.4 and fix version 3.6.5?
Hide
Guus der Kinderen added a comment -

Ah, right. I simply picked the first Jira version that was still open (someone should close it...). I tend to reserve the 'fix version' for those instances where we know in which version the problem was introduced.

Show
Guus der Kinderen added a comment - Ah, right. I simply picked the first Jira version that was still open (someone should close it...). I tend to reserve the 'fix version' for those instances where we know in which version the problem was introduced.
Hide
Guus der Kinderen added a comment -

User reports catastrophic failure when above patch is applied. All clients that have not been sent any data for the idleTime amount of seconds are disconnected. Instead, we'd like to disconnect only clients that have been sent data which hasn't been delivered to them.

Show
Guus der Kinderen added a comment - User reports catastrophic failure when above patch is applied. All clients that have not been sent any data for the idleTime amount of seconds are disconnected. Instead, we'd like to disconnect only clients that have been sent data which hasn't been delivered to them.
Hide
Guus der Kinderen added a comment -

Luckily, MINA also provides setting timeout values on write actions (instead of detecting idle states).

Index: src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java
===================================================================
--- src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java	(revision 11321)
+++ src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java	(working copy)
@@ -85,6 +85,7 @@
         int idleTime = getMaxIdleTime();
         if (idleTime > 0) {
             session.setIdleTime(IdleStatus.READER_IDLE, idleTime);
+            session.setWriteTimeout(idleTime);
         }
     }
Show
Guus der Kinderen added a comment - Luckily, MINA also provides setting timeout values on write actions (instead of detecting idle states).
Index: src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java
===================================================================
--- src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java	(revision 11321)
+++ src/java/org/jivesoftware/openfire/nio/ConnectionHandler.java	(working copy)
@@ -85,6 +85,7 @@
         int idleTime = getMaxIdleTime();
         if (idleTime > 0) {
             session.setIdleTime(IdleStatus.READER_IDLE, idleTime);
+            session.setWriteTimeout(idleTime);
         }
     }
Hide
Guus der Kinderen added a comment -

The user reporting the initial problem now reports that the cause of his problem is a oversized memory configuration. His JVM was using swap to fill the memory requirement. I've downscaled the impact of this bug.

Show
Guus der Kinderen added a comment - The user reporting the initial problem now reports that the cause of his problem is a oversized memory configuration. His JVM was using swap to fill the memory requirement. I've downscaled the impact of this bug.
Hide
Guus der Kinderen added a comment -

I'll leave the timeout patch in the code. I recently discovered StalledSessionsFilter which provides similar functionality - it kills idle sessions, based on traffic to-be-sent. My patch uses a timeout, the filter uses an amount of bytes queued to trigger the 'stalled' condition. Both solutions will work nice together.

Show
Guus der Kinderen added a comment - I'll leave the timeout patch in the code. I recently discovered StalledSessionsFilter which provides similar functionality - it kills idle sessions, based on traffic to-be-sent. My patch uses a timeout, the filter uses an amount of bytes queued to trigger the 'stalled' condition. Both solutions will work nice together.
Hide
Matteo Castelli added a comment -

I would not decrease the priority of this bug. We have been running Openfire for about 2 years now and on the weekend we had huge issues due to someone deciding to connect with the new version of the Empathy client (Ubuntu default). I tried all the possibile memory configuration (from the default one to 1GB of heap space increasing as well the memory of the server to compensate for the increase in Java usage), but nothing helped. As soon as I blocked the IP address of the user connecting via Empathy I could put back the standard 128MB of heap space, but of course it's just a temporary solution.

This is a very critical bug for us and it's probably going to become very critical for more people as soon as users upgrade to the newer Ubuntu.

Show
Matteo Castelli added a comment - I would not decrease the priority of this bug. We have been running Openfire for about 2 years now and on the weekend we had huge issues due to someone deciding to connect with the new version of the Empathy client (Ubuntu default). I tried all the possibile memory configuration (from the default one to 1GB of heap space increasing as well the memory of the server to compensate for the increase in Java usage), but nothing helped. As soon as I blocked the IP address of the user connecting via Empathy I could put back the standard 128MB of heap space, but of course it's just a temporary solution. This is a very critical bug for us and it's probably going to become very critical for more people as soon as users upgrade to the newer Ubuntu.
Hide
wroot added a comment -

Matteo, this ticket is not related to Empathy issue. And it is already closed. Check OF-82

Show
wroot added a comment - Matteo, this ticket is not related to Empathy issue. And it is already closed. Check OF-82
Hide
Matteo Castelli added a comment -

Woot,
are you sure that the two issues are not linked? If you read the community discussion linked to this issue http://www.igniterealtime.org/community/message/196900#196900 it seems to affect lots of people after starting to use Empathy. Perhaps the two issues should be merged.

Thanks,
matteo

Show
Matteo Castelli added a comment - Woot, are you sure that the two issues are not linked? If you read the community discussion linked to this issue http://www.igniterealtime.org/community/message/196900#196900 it seems to affect lots of people after starting to use Empathy. Perhaps the two issues should be merged. Thanks, matteo
Hide
Guus der Kinderen added a comment -

Although the issues where raised in the the same thread, they are indeed very different issues. Wroot is right: we are tracking the Empathy related issue in OF-82.

Show
Guus der Kinderen added a comment - Although the issues where raised in the the same thread, they are indeed very different issues. Wroot is right: we are tracking the Empathy related issue in OF-82.

People

Vote (0)
Watch (5)

Dates

  • Created:
    Updated:
    Resolved: