-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[MNG-8241] Handle non-BMP characters when comparing versions #2071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
bd99835
d62e852
d1a091c
d35cfac
2b726bf
dc12b85
2a1888b
bd44a92
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -222,6 +222,33 @@ void testLeadingZeroes() { | |
checkVersionsOrder("0.2", "1.0.7"); | ||
} | ||
|
||
@Test | ||
void testDigitGreaterThanNonAscii() { | ||
ComparableVersion c1 = new ComparableVersion("1"); | ||
ComparableVersion c2 = new ComparableVersion("é"); | ||
assertTrue(c1.compareTo(c2) > 0, "expected " + "1" + " > " + "\uD835\uDFE4"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The messages is wrong There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
assertTrue(c2.compareTo(c1) < 0, "expected " + "\uD835\uDFE4" + " < " + "1"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and that one too There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
||
} | ||
|
||
@Test | ||
void testDigitGreaterThanNonBmpCharacters() { | ||
ComparableVersion c1 = new ComparableVersion("1"); | ||
// MATHEMATICAL SANS-SERIF DIGIT TWO | ||
ComparableVersion c2 = new ComparableVersion("\uD835\uDFE4"); | ||
assertTrue(c1.compareTo(c2) > 0, "expected " + "1" + " > " + "\uD835\uDFE4"); | ||
assertTrue(c2.compareTo(c1) < 0, "expected " + "\uD835\uDFE4" + " < " + "1"); | ||
} | ||
|
||
@Test | ||
void testNonAsciiDigits() { | ||
// These should be treated the same as non-digit characters | ||
ComparableVersion c1 = new ComparableVersion("1"); | ||
// ArabicIndicNine | ||
ComparableVersion c2 = new ComparableVersion("\u0669"); | ||
assertTrue(c1.compareTo(c2) > 0, "expected " + "1" + " > " + "\u0669"); | ||
assertTrue(c2.compareTo(c1) < 0, "expected " + "\u0669" + " < " + "1"); | ||
} | ||
|
||
@Test | ||
void testGetCanonical() { | ||
// MNG-7700 | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why not. It seems to me that later, the string will be parsed using
Integer.parseInt
,Long.parseLong
, ornew BigInteger()
, and AFAIK, they all leverage theCharacter.digit(char, int)
which should support everything that is considered a digit byCharacter.isDigit(char)
(but notCharacter.isDigit(int)
).I think this has the same effect as we're only considering radix-10 when parsing.
An alternative to support supplemental characters would be to normalise those in a buffer while they are read in this loop and pass on the buffer rather than a substring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like the following maybe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, just to be clear you prefer the alternate approach of treating all Unicode digits as version numbers? As long as they're all base 10 this should work. Looking at Unicode details, there are some non-base 10 digits. They have a different Unicode character class, so I need to check what Character.isDigit is doing; i..e whether it's checking for decimal digits or all digits. If it is checking for all digits, I need to check the character class directly. To be specific we want De, possibly Di, but not Nu character classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, looks like Character.isDigit is checking specifically for decimal digits, not non-decimal digits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran into a problem with that approach. It treats different versions of digits as the same. E.g. 7 is the same as Arabic-Indic digit 7 even though they are different characters.
I can see an argument for normalizing all these digits to ASCII digits for Maven purposes, but that would touch a lot of different places in the code and ecosystem including things like file names and artifact URLs in Maven Central.
I'm going to think about this a little more, but for the moment it feels safer to treat all the non-ASCII digits as distinct non-numeric characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only supporting ASCII digits sounds fine to me too.