XMPPDecoder has a decode problem for UTF-8

Description

A utf8 character ( as char in Java) is usually composed of 1-3 bytes ( max is 6 bytes), see http://en.wikipedia.org/wiki/UTF-8 .

From now on, I assume a character that is 3 bytes.

Openfire use mina nio process network stream, and implement a XMPPDecoder for docode bytes to String/Stanza.

When decode a bytebuffer, it's may be incomplete bytes for a character. eg. In bytebuffer's last few bytes, you may receive one or two or three

bytes for a character, if there's 1 or 2 bytes then it's incomplete. It's Random happen incomplete state. If input long 3bytes character, the random probability significantly increased.

let's see org.jivesoftware.openfire.nio.XMLLightweightParser ( openfire 3.6.4 ):

Charset encoder = Charset.forName(charset);
CharBuffer charBuffer = encoder.decode(byteBuffer.buf());
char[] buf = charBuffer.array();
int readByte = charBuffer.remaining();
car lastChar = buf[readByte-1];
if (lastChar >= 0xfff0) { // you think it's incomplete, then position-1, readByte-1
byteBuffer.position(byteBuffer.position()-1); //error
readByte--; //error
}

The above code is not properly handled the case that is incomplete for UTF-8.

If a character is 3 bytes, there is incomplete for one or two bytes at the end of bytebuffer.

If one byte incomplete, bb's position should -1. If two bytes position incomplete, bb's position should -2.

So, if position-1 and two bytes position incomplete, this 3 bytes become the last two bytes for decode, and then be replace to two "FD".

Or so, if position-2 and one bytes position incomplete, this 3 bytes become the 4 bytes for decode, and then there's one more "FD" and this character.

Environment

None

Attachments

1
  • 02 Jun 2011, 03:41 AM

Activity

Show:

Liyu Wang August 15, 2011 at 6:13 AM

actually the path fix the utf-8 bug.
the exception of my previous test case is threw by
the more() function from the MXParser.java. if you comment
that more() function, everything works fine.

can anyone explain me what is the purpose of checking the
range of the char like that in the more()?

Liyu Wang August 10, 2011 at 5:03 AM

𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢

Liyu Wang August 10, 2011 at 4:53 AM

try if you can survive 𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢

Daryl Herzmann June 12, 2011 at 9:15 PM

r12472

wroot June 12, 2011 at 10:17 AM

As Java 7 is coming this July, maybe we should already drop Java 5 support? Looking at this poll i see that majority is voting for the drop http://community.igniterealtime.org/polls/1025

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Created June 2, 2011 at 3:41 AM
Updated March 1, 2024 at 10:00 AM
Resolved June 12, 2011 at 9:15 PM

Flag notifications