Fixed
Details
Assignee
Daryl HerzmannDaryl HerzmannReporter
wrootwrootComponents
Fix versions
Priority
Major
Details
Details
Assignee
Daryl Herzmann
Daryl HerzmannReporter
wroot
wrootComponents
Fix versions
Priority
Created June 2, 2011 at 3:41 AM
Updated March 1, 2024 at 10:00 AM
Resolved June 12, 2011 at 9:15 PM
A utf8 character ( as char in Java) is usually composed of 1-3 bytes ( max is 6 bytes), see http://en.wikipedia.org/wiki/UTF-8 .
From now on, I assume a character that is 3 bytes.
Openfire use mina nio process network stream, and implement a XMPPDecoder for docode bytes to String/Stanza.
When decode a bytebuffer, it's may be incomplete bytes for a character. eg. In bytebuffer's last few bytes, you may receive one or two or three
bytes for a character, if there's 1 or 2 bytes then it's incomplete. It's Random happen incomplete state. If input long 3bytes character, the random probability significantly increased.
let's see org.jivesoftware.openfire.nio.XMLLightweightParser ( openfire 3.6.4 ):
Charset encoder = Charset.forName(charset);
CharBuffer charBuffer = encoder.decode(byteBuffer.buf());
char[] buf = charBuffer.array();
int readByte = charBuffer.remaining();
car lastChar = buf[readByte-1];
if (lastChar >= 0xfff0) { // you think it's incomplete, then position-1, readByte-1
byteBuffer.position(byteBuffer.position()-1); //error
readByte--; //error
}
The above code is not properly handled the case that is incomplete for UTF-8.
If a character is 3 bytes, there is incomplete for one or two bytes at the end of bytebuffer.
If one byte incomplete, bb's position should -1. If two bytes position incomplete, bb's position should -2.
So, if position-1 and two bytes position incomplete, this 3 bytes become the last two bytes for decode, and then be replace to two "FD".
Or so, if position-2 and one bytes position incomplete, this 3 bytes become the 4 bytes for decode, and then there's one more "FD" and this character.