Konqueror file sorting bug #252

Closed
opened 2 years ago by VinceR · 29 comments
VinceR commented 2 years ago
Collaborator

I have discovered what I consider to be a bug in Konqueror file sorting. Assume that the LANG or LC_ALL environment variables were correctly set to match user's locale and the overriding environment variable LC_COLLATE is unset. When sorting filenames with the "Case Insensitive Sort" option unchecked, the resulting sort order is not the locale-specific one; rather it is by unicode (formerly ASCII) code number.

Using a contrived example with LANG="en_US.utf8" and single-character file names, consider these sort orders:

  1. ! 0 9 : @ A Z [ ` a z { ~   # Current order with "Case Insensitive Sort" unchecked
  2. ! : @ [ ` { ~ 0 9 a A z Z   # Expected order with "Case Insensitive Sort" unchecked
  3. ! : @ [ ` { ~ 0 9 A a Z z   # Current order when "Case Insensitive Sort" checked

Sort order # 1 does not honor locale collation. It would only be appropriate if LC_COLLATE="C". Interestingly, sort order # 3 does honor locale collation albeit with the case-folding side effect of changing "a<A" to "A<a".

Looking at the source code, this buggy behavior seems to start with code in libkonq/konqsettings.cpp at the end of KonqFMSettings::init(). In trying to address KDE bug report # 40131 , an attempt is made to determine whether or not TQString::localeAwareCompare() is case sensitive for the current locale. If it is not, TQString::compare() is used later in KonqFMSettings::caseSensitiveCompare(). The problems with this are 3-fold:

  1. The original bug report seems to be invalid. Apparently locale collation awareness was introduced with KDE (or QT?) 3.0 and that apparently introduced confusion to users who expected "Case Insensitive Sort" to necessarily result in ASCII code sort order. It was not a bug but a new feature.
  2. The fix attempts to determine if TQString::localeAwareCompare() is truly case-insensitive using the invalid test TQString("a").localeAwareCompare("B") > 0. In the en_US locale, which indeed IS case-sensitive, this test fails because TQString("a").localeAwareCompare("B") < 0. I suspect (but cannot prove) that all locale collations are case-insensitive but if there are some that are not, a better test would be TQString("a").localeAwareCompare("A") != 0.
  3. If TQString::localeAwareCompare() is judged to not be case-insensitive, it would better to revert to back to a case-sensitve comparison instead of falling back to TQString::compare() as that would produce a result closer to correct. To allow users to actually see a unicode code number ordering, a new sort option (e.g. "Ignore Locale Collation") should be introduced that will always use TQString::compare() instead of TQString::localeAwareCompare().

I will create a PR to address this.

I have discovered what I consider to be a bug in Konqueror file sorting. Assume that the LANG or LC_ALL environment variables were correctly set to match user's locale and the overriding environment variable LC_COLLATE is unset. When sorting filenames with the "Case Insensitive Sort" option unchecked, the resulting sort order is not the locale-specific one; rather it is by unicode (formerly ASCII) code number. Using a contrived example with LANG="en_US.utf8" and single-character file names, consider these sort orders: 1.  **! 0 9 : @ A Z [ ` a z { ~**   # Current order with "Case Insensitive Sort" unchecked 2.  **! : @ [ ` { ~ 0 9 a A z Z**   # Expected order with "Case Insensitive Sort" unchecked 3.  **! : @ [ ` { ~ 0 9 A a Z z**   # Current order when "Case Insensitive Sort" checked Sort order # 1 does not honor locale collation. It would only be appropriate if LC_COLLATE="C". Interestingly, sort order # 3 does honor locale collation albeit with the case-folding side effect of changing "**a<A**" to "**A<a**". Looking at the source code, this buggy behavior seems to start with code in *libkonq/konqsettings.cpp* at the end of *KonqFMSettings::init()*. In trying to address [KDE bug report # 40131](https://bugs.kde.org/show_bug.cgi?id=40131) , an attempt is made to determine whether or not *TQString::localeAwareCompare()* is case sensitive for the current locale. If it is not, *TQString::compare()* is used later in *KonqFMSettings::caseSensitiveCompare()*. The problems with this are 3-fold: 1. The original bug report seems to be invalid. Apparently locale collation awareness was introduced with KDE (or QT?) 3.0 and that apparently introduced confusion to users who expected "Case Insensitive Sort" to necessarily result in ASCII code sort order. It was not a bug but a new feature. 2. The fix attempts to determine if *TQString::localeAwareCompare()* is truly case-insensitive using the invalid test **TQString("a").localeAwareCompare("B") > 0**. In the en_US locale, which indeed IS case-sensitive, this test fails because **TQString("a").localeAwareCompare("B") < 0**. I suspect (but cannot prove) that all locale collations are case-insensitive but if there are some that are not, a better test would be **TQString("a").localeAwareCompare("A") != 0**. 3. If *TQString::localeAwareCompare()* is judged to not be case-insensitive, it would better to revert to back to a case-sensitve comparison instead of falling back to *TQString::compare()* as that would produce a result closer to correct. To allow users to actually see a unicode code number ordering, a new sort option (e.g. "Ignore Locale Collation") should be introduced that will always use *TQString::compare()* instead of *TQString::localeAwareCompare()*. I will create a PR to address this.
VinceR commented 2 years ago
Poster
Collaborator

Created pull request #253

Created pull request https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/253
Owner

@VinceR
thanks, I will take a look at some point next week. Have a few pending things to do and I need to queue this one up :-(

@VinceR thanks, I will take a look at some point next week. Have a few pending things to do and I need to queue this one up :-(
Owner

Hi @VinceR,
I had a look at this. I am no expert, but here is my opinion.

I think first of all we need to be clear on what we expect from case insensitive and case sensitive order.

Referring to your example above, in my opinion sort order # 2 is wrong. If "case insensitive sort" is unchecked, it means we are doing case sensitive sorting, so all the capital letters should be either before or after lower case letters, depending on the selected locale. So something like A Z a z or a z A Z, but not A a Z z or a A z Z.

Sort order # 3 is case insensitive, so A a Z z or a A z Z seems appropriate.

Sort order # 1 seems more correct compared to # 2. A question here could be where letters should be compared to symbols (before, after, mixed) and for that I think there will always be some dependency on unicode values.
Looking at # 3, I would say that a case sensitive sorting should give something like ! : @ [ ` { ~ 0 9 a z A Z or ! : @ [ ` { ~ 0 9 A Z a z. Here the positions of letters compared to other non-alpha characters is the same as # 3 but the relative order of letters is affected by the case in use.

I am interested in understanding why you were expecting sort order # 2 for case sensitive sorting.

@SlavekB: if you can also share your thoughts on this point it would be good, just to hear a different opinion as well.

Hi @VinceR, I had a look at this. I am no expert, but here is my opinion. I think first of all we need to be clear on what we expect from case insensitive and case sensitive order. Referring to your example above, in my opinion sort order # 2 is wrong. If "case insensitive sort" is unchecked, it means we are doing case sensitive sorting, so all the capital letters should be either before or after lower case letters, depending on the selected locale. So something like ```A Z a z``` or ```a z A Z```, but not ```A a Z z``` or ```a A z Z```. Sort order # 3 is case insensitive, so ```A a Z z``` or ```a A z Z``` seems appropriate. Sort order # 1 seems more correct compared to # 2. A question here could be where letters should be compared to symbols (before, after, mixed) and for that I think there will always be some dependency on unicode values.<br/>Looking at # 3, I would say that a case sensitive sorting should give something like ```! : @ [ ` { ~ 0 9 a z A Z``` or ```! : @ [ ` { ~ 0 9 A Z a z```. Here the positions of letters compared to other non-alpha characters is the same as # 3 but the relative order of letters is affected by the case in use. I am interested in understanding why you were expecting sort order # 2 for case sensitive sorting. @SlavekB: if you can also share your thoughts on this point it would be good, just to hear a different opinion as well.
VinceR commented 2 years ago
Poster
Collaborator

I am interested in understanding why you were expecting sort order # 2 for case sensitive sorting.

This is the actual sort order provided with the en_US.utf8 locale. This is the order returned by TQString::localeAwareCompare and strcoll functions. This ordering is not arbitrary but standardized by ICU. Here is what I like about it:

  1. It collates alphabetic characters together, including case variants and accented variants.
  2. It has a consistant general ordering: special/punctuation characters sort before numerals which sort before alphabetic characters.
  3. It has one more benefit that affects pull request #196 which may render it moot. I will describe that later in another message.

Contrast that with ASCII (now unicode codepoint) ordering. In that, you have some special characters, followed by numerals, followed by some other special characters, followed by upper case (english) alphabetic characters, followed by some more special characters, followed by lower case alphabetic characters, followed by even more special characters, followed by a whole bunch of other characters. That's the order that is returned by TQString::compare and strcmp functions. Since that is an ordering that some people might WANT to see, I am proposing creating a new option "Ignore Locale Collation" that will provide that view regardless of user's locale.

What about "case-INsensitve" comparison? I believe it was developed to make what I call ASCII ordering a bit more friendly: grouping (english) alphabetic character case variants together. Personally, I find that feature to be moot, given that you get that and many more advantages by using the locale's default collation. However, since people still expect that option, I am proposing retaining the "Case Insensitive Sort" option.

With my proposed code modifications, there are 4 possible collations: "Ignore Locale Collation" (on/off) x "Case Insensitive Sort" (on/off). I offer an opinion (as comments in the code) as to the real usefulness of certain combinations.

I am currently running the modified code to make sure there are no hidden gotchas.

> I am interested in understanding why you were expecting sort order # 2 for case sensitive sorting. This is the actual sort order provided with the en_US.utf8 locale. This is the order returned by TQString::localeAwareCompare and strcoll functions. This ordering is not arbitrary but standardized by ICU. Here is what I like about it: 1. It collates alphabetic characters together, including case variants and accented variants. 2. It has a consistant general ordering: special/punctuation characters sort before numerals which sort before alphabetic characters. 3. It has one more benefit that affects pull request https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/196 which may render it moot. I will describe that later in another message. Contrast that with ASCII (now unicode codepoint) ordering. In that, you have some special characters, followed by numerals, followed by some other special characters, followed by upper case (english) alphabetic characters, followed by some more special characters, followed by lower case alphabetic characters, followed by even more special characters, followed by a whole bunch of other characters. That's the order that is returned by TQString::compare and strcmp functions. Since that is an ordering that some people might WANT to see, I am proposing creating a new option "Ignore Locale Collation" that will provide that view regardless of user's locale. What about "case-INsensitve" comparison? I believe it was developed to make what I call ASCII ordering a bit more friendly: grouping (english) alphabetic character case variants together. Personally, I find that feature to be moot, given that you get that and many more advantages by using the locale's default collation. However, since people still expect that option, I am proposing retaining the "Case Insensitive Sort" option. With my proposed code modifications, there are 4 possible collations: "Ignore Locale Collation" (on/off) x "Case Insensitive Sort" (on/off). I offer an opinion (as comments in the code) as to the real usefulness of certain combinations. I am currently running the modified code to make sure there are no hidden gotchas.
VinceR commented 2 years ago
Poster
Collaborator

I've been digging in the weeds a bit too much but I thought I'd post this information as a reference. Below are examples of single character orderings for a variety of scenarios. The first 3 are identical. The last 2 language-specific examples differ in some minor but interesting ways (e.g location of 'y' in alphabet). They also share some strange idiosyncracies (e.g location of fraction characters in the sequence ).

Of course, language-specific sorting is a bit more complicated in that characters immediately before or after a given character in a string may affect overall sorting order. Thankfully all of the programming to do that sorting / comparing has already been done.

I've attached the fun little bash script that helped generate these sequences.


Unicode codepoint order

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰDZDzdzǴǵǶǷǸǹǺǻǼǽǾǿȀ

LC_COLLATE='C'

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰDZDzdzǴǵǶǷǸǹǺǻǼǽǾǿȀ

LC_COLLATE='POSIX'

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰDZDzdzǴǵǶǷǸǹǺǻǼǽǾǿȀ

LC_COLLATE='en_US.utf8'

!"#%&'()*+,-./:;<=>?@[\]^_`{|}~ ¡¦§¨©«¬®¯°±´¶·¸»¿×÷¤¢$£¥01¹½¼2²3³¾456789aAªáÁàÀăĂâÂǎǍåÅǻǺäÄǟǞãÃǡǠąĄāĀȀæÆǽǼǣǢbBƀƁƃƂcCćĆĉĈčČċĊçÇƈƇdDďĎđĐðÐdzDzDZdžDžDŽƉƊƌƋeEéÉèÈĕĔêÊěĚëËėĖęĘēĒǝƎƏƐfFƒƑgGǵǴğĞĝĜǧǦġĠģĢǥǤƓƔƣƢhHĥĤħĦƕǶiIíÍìÌĭĬîÎǐǏïÏĩĨİįĮīĪijIJıƗƖjJĵĴǰkKǩǨķĶƙƘlLĺĹľĽļĻłŁŀĿljLjLJƚƛmMnNńŃǹǸňŇñÑņŅnjNjNJƝƞŋŊoOºóÓòÒŏŎôÔǒǑöÖőŐõÕøØǿǾǫǪǭǬōŌơƠœŒƆƟpPƥƤqQĸrRŕŔřŘŗŖƦsSśŚŝŜšŠşŞſßƩƪtTťŤţŢƾŧŦƫƭƬƮuUúÚùÙŭŬûÛǔǓůŮüÜǘǗǜǛǚǙǖǕűŰũŨųŲūŪưƯƜƱvVƲwWŵŴxXyYýÝŷŶÿŸƴƳzZźŹžŽżŻƍƶƵƷǯǮƹƸƺþÞƿǷƻƨƧƽƼƅƄʼnǀǁǂǃµ

LC_COLLATE='lt_LT.utf8'

!"#%&'()*+,-./:;<=>?@[\]^_`{|}~ ¡¦§¨©«¬®¯°±´¶·¸»¿×÷¤¢$£¥01¹½¼2²3³¾456789aAªáÁàÀăĂâÂǎǍåÅǻǺäÄǟǞãÃǡǠāĀȀæÆǽǼǣǢąĄbBƀƁƃƂcCćĆĉĈċĊçÇƈƇčČdDďĎđĐðÐdzDzDZdžDžDŽƉƊƌƋeEéÉèÈĕĔêÊěĚëËēĒǝƎƏƐęĘėĖfFƒƑgGǵǴğĞĝĜǧǦġĠģĢǥǤƓƔƣƢhHĥĤħĦƕǶiIíÍìÌĭĬîÎǐǏïÏĩĨİīĪijIJıƗƖįĮyYýÝŷŶÿŸjJĵĴǰkKǩǨķĶƙƘlLĺĹľĽļĻłŁŀĿljLjLJƚƛmMnNńŃǹǸňŇñÑņŅnjNjNJƝƞŋŊoOºóÓòÒŏŎôÔǒǑöÖőŐõÕøØǿǾǫǪǭǬōŌơƠœŒƆƟpPƥƤqQĸrRŕŔřŘŗŖƦsSśŚŝŜşŞſßƩƪšŠtTťŤţŢƾŧŦƫƭƬƮuUúÚùÙŭŬûÛǔǓůŮüÜǘǗǜǛǚǙǖǕűŰũŨưƯƜƱųŲūŪvVƲwWŵŴxXƴƳzZźŹżŻƍƶƵžŽƷǯǮƹƸƺþÞƿǷƻƨƧƽƼƅƄʼnǀǁǂǃµ
I've been digging in the weeds a bit too much but I thought I'd post this information as a reference. Below are examples of single character orderings for a variety of scenarios. The first 3 are identical. The last 2 language-specific examples differ in some minor but interesting ways (e.g location of 'y' in alphabet). They also share some strange idiosyncracies (e.g location of fraction characters in the sequence ). Of course, language-specific sorting is a bit more complicated in that characters immediately before or after a given character in a string may affect overall sorting order. Thankfully all of the programming to do that sorting / comparing has already been done. I've attached the fun little bash script that helped generate these sequences. ----- **Unicode codepoint order** !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰDZDzdzǴǵǶǷǸǹǺǻǼǽǾǿȀ **LC_COLLATE='C'** !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰDZDzdzǴǵǶǷǸǹǺǻǼǽǾǿȀ **LC_COLLATE='POSIX'** !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰDZDzdzǴǵǶǷǸǹǺǻǼǽǾǿȀ **LC_COLLATE='en_US.utf8'** !"#%&'()*+,-./:;<=>?@[\]^_`{|}~ ¡¦§¨©«¬®¯°±´¶·¸»¿×÷¤¢$£¥01¹½¼2²3³¾456789aAªáÁàÀăĂâÂǎǍåÅǻǺäÄǟǞãÃǡǠąĄāĀȀæÆǽǼǣǢbBƀƁƃƂcCćĆĉĈčČċĊçÇƈƇdDďĎđĐðÐdzDzDZdžDžDŽƉƊƌƋeEéÉèÈĕĔêÊěĚëËėĖęĘēĒǝƎƏƐfFƒƑgGǵǴğĞĝĜǧǦġĠģĢǥǤƓƔƣƢhHĥĤħĦƕǶiIíÍìÌĭĬîÎǐǏïÏĩĨİįĮīĪijIJıƗƖjJĵĴǰkKǩǨķĶƙƘlLĺĹľĽļĻłŁŀĿljLjLJƚƛmMnNńŃǹǸňŇñÑņŅnjNjNJƝƞŋŊoOºóÓòÒŏŎôÔǒǑöÖőŐõÕøØǿǾǫǪǭǬōŌơƠœŒƆƟpPƥƤqQĸrRŕŔřŘŗŖƦsSśŚŝŜšŠşŞſßƩƪtTťŤţŢƾŧŦƫƭƬƮuUúÚùÙŭŬûÛǔǓůŮüÜǘǗǜǛǚǙǖǕűŰũŨųŲūŪưƯƜƱvVƲwWŵŴxXyYýÝŷŶÿŸƴƳzZźŹžŽżŻƍƶƵƷǯǮƹƸƺþÞƿǷƻƨƧƽƼƅƄʼnǀǁǂǃµ **LC_COLLATE='lt_LT.utf8'** !"#%&'()*+,-./:;<=>?@[\]^_`{|}~ ¡¦§¨©«¬®¯°±´¶·¸»¿×÷¤¢$£¥01¹½¼2²3³¾456789aAªáÁàÀăĂâÂǎǍåÅǻǺäÄǟǞãÃǡǠāĀȀæÆǽǼǣǢąĄbBƀƁƃƂcCćĆĉĈċĊçÇƈƇčČdDďĎđĐðÐdzDzDZdžDžDŽƉƊƌƋeEéÉèÈĕĔêÊěĚëËēĒǝƎƏƐęĘėĖfFƒƑgGǵǴğĞĝĜǧǦġĠģĢǥǤƓƔƣƢhHĥĤħĦƕǶiIíÍìÌĭĬîÎǐǏïÏĩĨİīĪijIJıƗƖįĮyYýÝŷŶÿŸjJĵĴǰkKǩǨķĶƙƘlLĺĹľĽļĻłŁŀĿljLjLJƚƛmMnNńŃǹǸňŇñÑņŅnjNjNJƝƞŋŊoOºóÓòÒŏŎôÔǒǑöÖőŐõÕøØǿǾǫǪǭǬōŌơƠœŒƆƟpPƥƤqQĸrRŕŔřŘŗŖƦsSśŚŝŜşŞſßƩƪšŠtTťŤţŢƾŧŦƫƭƬƮuUúÚùÙŭŬûÛǔǓůŮüÜǘǗǜǛǚǙǖǕűŰũŨưƯƜƱųŲūŪvVƲwWŵŴxXƴƳzZźŹżŻƍƶƵžŽƷǯǮƹƸƺþÞƿǷƻƨƧƽƼƅƄʼnǀǁǂǃµ
VinceR commented 2 years ago
Poster
Collaborator

Returning to the issue at hand, this where I see the problem

Code from libkonq/konqsettings.cpp (around line 122):

/// true if TQString::localeAwareCompare is case sensitive (it usually isn't, when LC_COLLATE is set)
d->localeAwareCompareIsCaseSensitive = TQString( "a" ).localeAwareCompare( "B" ) > 0; // see #40131

This is an absurd test that comes to the wrong conclusion since it returns false for en_US@utf8 where "a" always sorts before "B". This has negative consequences downstream.

Code from libkonq/konqsettings.cpp (around line 172):

int KonqFMSettings::caseSensitiveCompare( const TQString& a, const TQString& b ) const
{
  if ( d->localeAwareCompareIsCaseSensitive ) {
    return a.localeAwareCompare( b );
  }
  else // can't use localeAwareCompare, have to fallback to normal TQString compare
    return a.compare( b );
}

Code from konqueror/listview/konq_listviewitems.cpp (around line 306)

if ( m_pListViewWidget->caseInsensitiveSort() )
  return text( col ).lower().localeAwareCompare( k->text( col ).lower() );
else {
  return m_pListViewWidget->m_pSettings->caseSensitiveCompare( text( col ), k->text( col ) );

As a result, there is no user-friendly locale-aware file sorting when "Case Insensitive Sort" is unchecked - just plain old character code order.

Returning to the issue at hand, this where I see the problem Code from libkonq/konqsettings.cpp (around line 122): /// true if TQString::localeAwareCompare is case sensitive (it usually isn't, when LC_COLLATE is set) d->localeAwareCompareIsCaseSensitive = TQString( "a" ).localeAwareCompare( "B" ) > 0; // see #40131 This is an absurd test that comes to the wrong conclusion since it returns false for en_US@utf8 where "a" **always** sorts before "B". This has negative consequences downstream. Code from libkonq/konqsettings.cpp (around line 172): int KonqFMSettings::caseSensitiveCompare( const TQString& a, const TQString& b ) const { if ( d->localeAwareCompareIsCaseSensitive ) { return a.localeAwareCompare( b ); } else // can't use localeAwareCompare, have to fallback to normal TQString compare return a.compare( b ); } Code from konqueror/listview/konq_listviewitems.cpp (around line 306) if ( m_pListViewWidget->caseInsensitiveSort() ) return text( col ).lower().localeAwareCompare( k->text( col ).lower() ); else { return m_pListViewWidget->m_pSettings->caseSensitiveCompare( text( col ), k->text( col ) ); As a result, there is no user-friendly locale-aware file sorting when "Case Insensitive Sort" is unchecked - just plain old character code order.
VinceR commented 2 years ago
Poster
Collaborator

I am interested in understanding why you were expecting sort order # 2 for case sensitive sorting.

This is the actual sort order provided with the en_US.utf8 locale. This is the order returned by TQString::localeAwareCompare and strcoll functions. This ordering is not arbitrary but standardized by ICU. Here is what I like about it:

  1. It collates alphabetic characters together, including case variants and accented variants.
  2. It has a consistant general ordering: special/punctuation characters sort before numerals which sort before alphabetic characters.
  3. It has one more benefit that affects pull request #196 which may render it moot. I will describe that later in another message.

What I was referring to in item # 3 in above "what I like" list is described in my comment in PR #196

Sorry about spammming list with all of these messages at one time, but I wanted to get this out there before my attention is drawn elsewhere.

> > I am interested in understanding why you were expecting sort order # 2 for case sensitive sorting. > > This is the actual sort order provided with the en_US.utf8 locale. This is the order returned by TQString::localeAwareCompare and strcoll functions. This ordering is not arbitrary but standardized by ICU. Here is what I like about it: > > 1. It collates alphabetic characters together, including case variants and accented variants. > 2. It has a consistant general ordering: special/punctuation characters sort before numerals which sort before alphabetic characters. > 3. It has one more benefit that affects pull request https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/196 which may render it moot. I will describe that later in another message. > What I was referring to in item # 3 in above "what I like" list is described in my comment in PR https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/196#issuecomment-17680 Sorry about spammming list with all of these messages at one time, but I wanted to get this out there before my attention is drawn elsewhere.
Owner

Hi Vince,
thanks for the detailed description of the various sorting methods. As a recap for myself, here are all possible valid ways to sort items:

  1. not locale aware: item are sorted based on ascii/unicode values. Does case-awareness really make sense here, since the order depends on the character value?

  2. locale aware
    case sensitive or insensitive sorting as per your first comment in this issue.

I think the idea to have both locale aware and locale unaware is good, as you mentioned. I would though revert the default choices:

  1. default order is current locale unaware order (people are used to it).
  2. option "use locale awareness sorting" will trigger locale awareness comparison
  3. option "use case sensitive sorting" (to be enabled in case of locale awareness comparison) with enable the respective order based on char case.

I find "negative options" more difficult to work with, you need to thing about what they do. Like for example "case insensitive option unchecked" you need to think to get that it does case sensitive sorting.

If my understanding is correct, should we first fix this and then go back to #196?

@SlavekB: what do you think about the various sorting method proposed by Vince?

Hi Vince, thanks for the detailed description of the various sorting methods. As a recap for myself, here are all possible valid ways to sort items: 1. not locale aware: item are sorted based on ascii/unicode values. Does case-awareness really make sense here, since the order depends on the character value? 2. locale aware case sensitive or insensitive sorting as per your first comment in this issue. I think the idea to have both locale aware and locale unaware is good, as you mentioned. I would though revert the default choices: 1. default order is current locale unaware order (people are used to it). 2. option "use locale awareness sorting" will trigger locale awareness comparison 3. option "use case sensitive sorting" (to be enabled in case of locale awareness comparison) with enable the respective order based on char case. I find "negative options" more difficult to work with, you need to thing about what they do. Like for example "case insensitive option unchecked" you need to think to get that it does case sensitive sorting. If my understanding is correct, should we first fix this and then go back to #196? @SlavekB: what do you think about the various sorting method proposed by Vince?
Owner

Thank you for the good explanation of reasons for locale awareness × locale unaware. Obviously, both options are valid and it is good to give the user choice. For example, I am always dissatisfied when files starting with Č or Š are sorted without locale awareness after Z instead of after C or S.

If I understand correctly, case sensitive and case insensitive are valid for both variants – locale awareness as well as locale unaware.

Thank you for the good explanation of reasons for locale awareness × locale unaware. Obviously, both options are valid and it is good to give the user choice. For example, I am always dissatisfied when files starting with `Č` or `Š` are sorted without locale awareness after `Z` instead of after `C` or `S`. If I understand correctly, case sensitive and case insensitive are valid for both variants – locale awareness as well as locale unaware.
Owner

MicheleC: Does case-awareness really make sense here, since the order depends on the character value?

SlavekB: If I understand correctly, case sensitive and case insensitive are valid for both variants – locale awareness as well as locale unaware.

Looks like @VinceR needs to give us one more round of explanations :-)

> MicheleC: Does case-awareness really make sense here, since the order depends on the character value? > SlavekB: If I understand correctly, case sensitive and case insensitive are valid for both variants – locale awareness as well as locale unaware. Looks like @VinceR needs to give us one more round of explanations :-)
VinceR commented 2 years ago
Poster
Collaborator

MicheleC: Does case-awareness really make sense here, since the order depends on the character value?

SlavekB: If I understand correctly, case sensitive and case insensitive are valid for both variants – locale awareness as well as locale unaware.

Looks like @VinceR needs to give us one more round of explanations :-)

Indeed he does!


I find "negative options" more difficult to work with, you need to thing about what they do. Like for example "case insensitive option unchecked" you need to think to get that it does case sensitive sorting.

I must confess that this has driven me nuts too and undoubtedly I must have used the wrong terminology somewhere along the line. The option "Case Insensitive Sorting" is one we have inherited from the KDE code. Changing it (along with associated variables and logic) would need to be done carefully. For purposes of our discussion, let's temporarily use the new term "Case Merging" to describe the operation where all uppercase characters are converted to their lower case equivalents before conducting a comparison of 2 names. Also, we can use Locale Aware Sorting to describe a choice of sorting comparison algorithm that many years ago did not exist.

Combination 1a: [-] Locale Aware Sorting [-] Case Merging

Sort by unicode codepoint value and accept that lower and uppercase english (ASCII code < 127) letters will not be adjacent. Resulting order for a sample of single character file names:
= A B ^ a b ~ Ā ā
This is a legacy combination that should be offered.

Combination 1b: [-] Locale Aware Sorting [X] Case Merging

Sort by unicode codepoint value after using lower() function to ensure that lower and uppercase english (ASCII code < 127) letters are adjacent. Resulting order:
= ^ a A b B ~ Ā ā
This is what used to be done to make ASCII sorting more friendly. This is a legacy combination that should be retained since people will still expect it. But note discrepancy wherein 'a' < 'A' while 'Ā' < 'ā'. Maybe implementation could be improved to eliminate the discrepancy. Would using upper() instead of lower() make a difference? That's an excercise left for the student ... maybe this student :)

Combination 2a: [X] Locale Aware Sorting [-] Case Merging

Sort using your locale's specific collation. Resulting order for 'en_US:
= ^ ~ a A ā Ā b B
That brings the extended latin-1 alphabet together and puts the ASCII special characters before the alphabetics. This collation is based on data in file /usr/share/i18n/locales/en_US which inherits /usr/share/i18n/locales/iso14651_t1_common. As you can see, I've been spending WAY too much time on this! You may be interested in taking a look at Unicode Collation Algorithm. Then again, you may not :)

Combination 2b: [X] Locale Aware Sorting [X] Case Merging

Like combination 2a but first using lower(). Resulting order:
= ^ ~ a A Ā ā b B
Well that certainly had an effect but not a very pleasing one. It has the same discrepancy as that noted with combination 1b.


As a recap for myself, here are all possible valid ways to sort items:

  1. not locale aware: item are sorted based on ascii/unicode values. Does case-awareness really make sense here, since the order depends on the character value?
  2. locale aware. case sensitive or insensitive sorting as per your first comment in this issue.

So here's my opinion: The whole point of case merging was to make ASCII sort order more like a natural alphabetical order. You already get that with locale-aware sorting. Therefore I propose

  • Forget Combination 2b - it will either appear to do nothing or worse, illustrate the discrepancy of how lower-upper case variants are sorted.
  • I personally would have no problem with also ignoring Combination 1b, thereby doing away with the notion of case sensitivity altogether. I do worry how users would respond to that as the option has been around forever.
  • Combinations 1a & 2a should definitely be offered as options.

If we want to present 3 options (instead of 2 or 4), is there a way to implement such a selection in a menu item or would a separate dialog box be required?

Worth noting - KDE's konqueror presents 3 sorting options named as follows:

  • "Alphabetical, case sensitive" seems to equate to combination 1a.
  • "Alphabetical, case insensitive" is similar to and a bit better than combination 1b in that 'A' < 'a' just as 'Ā' < 'ā'.
  • "Natural" corresponds most closely to combination 2a but it's kind of different: ^ = ~ A a Ā ā B b. Looks like they were trying to "add value" to normal locale-aware sorting. Or maybe QT5 implementation of Qstring::localeAwareCompare() did that.

To re-state the reported problem using above scenarios:
With LC_COLLATE unset and LANG="en_US.utf8":

  • Option "Case Insensitve Sort" unchecked results in Combination 1a but I think it should have resulted in Combination 2a. I got ASCII instead of en_US locale sorting.
  • Option "Case Insensitve Sort" checked results in Combination 2b. I got en_US locale sorting but with the rather pointless case merging variation.

Please note that the 3rd example sort sequence in my original message was screwed up for reasons unknown.

> > MicheleC: Does case-awareness really make sense here, since the order depends on the character value? > > > SlavekB: If I understand correctly, case sensitive and case insensitive are valid for both variants – locale awareness as well as locale unaware. > > Looks like @VinceR needs to give us one more round of explanations :-) Indeed he does! ----- > I find "negative options" more difficult to work with, you need to thing about what they do. Like for example "case insensitive option unchecked" you need to think to get that it does case sensitive sorting. I must confess that this has driven me nuts too and undoubtedly I must have used the wrong terminology somewhere along the line. The option "Case Insensitive Sorting" is one we have inherited from the KDE code. Changing it (along with associated variables and logic) would need to be done carefully. For purposes of our discussion, let's temporarily use the new term "**Case Merging**" to describe the operation where all uppercase characters are converted to their lower case equivalents before conducting a comparison of 2 names. Also, we can use **Locale Aware Sorting** to describe a choice of sorting comparison algorithm that many years ago did not exist. #### Combination 1a: [-] Locale Aware Sorting [-] Case Merging Sort by unicode codepoint value and accept that lower and uppercase english (ASCII code < 127) letters will not be adjacent. Resulting order for a sample of single character file names: **= A B ^ a b ~ Ā ā** *This is a legacy combination that should be offered.* #### Combination 1b: [-] Locale Aware Sorting [X] Case Merging Sort by unicode codepoint value after using lower() function to ensure that lower and uppercase english (ASCII code < 127) letters are adjacent. Resulting order: **= ^ a A b B ~ Ā ā** This is what used to be done to make ASCII sorting more friendly. *This is a legacy combination that should be retained since people will still expect it.* But note discrepancy wherein 'a' < 'A' while 'Ā' < 'ā'. Maybe implementation could be improved to eliminate the discrepancy. Would using upper() instead of lower() make a difference? That's an excercise left for the student ... maybe this student :) #### Combination 2a: [X] Locale Aware Sorting [-] Case Merging Sort using your locale's specific collation. Resulting order for 'en_US: **= ^ ~ a A ā Ā b B** That brings the extended latin-1 alphabet together and puts the ASCII special characters before the alphabetics. This collation is based on data in file */usr/share/i18n/locales/en_US* which inherits */usr/share/i18n/locales/iso14651_t1_common*. As you can see, I've been spending WAY too much time on this! You may be interested in taking a look at [Unicode Collation Algorithm](https://www.unicode.org/reports/tr10/). Then again, you may not :) #### Combination 2b: [X] Locale Aware Sorting [X] Case Merging Like combination 2a but first using lower(). Resulting order: **= ^ ~ a A Ā ā b B** Well that certainly had an effect but not a very pleasing one. It has the same discrepancy as that noted with combination 1b. ----- >As a recap for myself, here are all possible valid ways to sort items: >1. not locale aware: item are sorted based on ascii/unicode values. Does case-awareness really make sense here, since the order depends on the character value? >2. locale aware. case sensitive or insensitive sorting as per your first comment in this issue. So here's my opinion: The whole point of case merging was to make ASCII sort order more like a natural alphabetical order. You already get that with locale-aware sorting. Therefore I propose * Forget Combination 2b - it will either appear to do nothing or worse, illustrate the discrepancy of how lower-upper case variants are sorted. * I personally would have no problem with also ignoring Combination 1b, thereby doing away with the notion of case sensitivity altogether. I do worry how users would respond to that as the option has been around forever. * Combinations 1a & 2a should definitely be offered as options. If we want to present 3 options (instead of 2 or 4), is there a way to implement such a selection in a menu item or would a separate dialog box be required? Worth noting - KDE's konqueror presents 3 sorting options named as follows: * "Alphabetical, case sensitive" seems to equate to combination 1a. * "Alphabetical, case insensitive" is similar to and a bit better than combination 1b in that 'A' < 'a' just as 'Ā' < 'ā'. * "Natural" corresponds most closely to combination 2a but it's kind of different: **^ = ~ A a Ā ā B b**. Looks like they were trying to "add value" to normal locale-aware sorting. Or maybe QT5 implementation of Qstring::localeAwareCompare() did that. ----- To re-state the reported problem using above scenarios: With LC_COLLATE unset and LANG="en_US.utf8": * Option "Case Insensitve Sort" unchecked results in Combination 1a but I think it should have resulted in Combination 2a. I got ASCII instead of en_US locale sorting. * Option "Case Insensitve Sort" checked results in Combination 2b. I got en_US locale sorting but with the rather pointless case merging variation. Please note that the 3rd example sort sequence in my original message was screwed up for reasons unknown.
Collaborator

Hi @VinceR ,
this is excellent reading.

I wonder why it is discussed only in the context of file sorting. I read that most of sorting can be done in "TQt Template Library" for various types of lists.
It would be great if there is one standard method of sorting for all applications or at least make it configurable.

But whatever you do, please avoid the kind of sorting I see now in FF, when I am attaching a file to webmail ... the sorting is fine, only it does not take into account the type, so I have directories and files mixed up. I prefer the konqueror way, where it lists first directories and then files

Hi @VinceR , this is excellent reading. I wonder why it is discussed only in the context of file sorting. I read that most of sorting can be done in "TQt Template Library" for various types of lists. It would be great if there is one standard method of sorting for all applications or at least make it configurable. But whatever you do, please avoid the kind of sorting I see now in FF, when I am attaching a file to webmail ... the sorting is fine, only it does not take into account the type, so I have directories and files mixed up. I prefer the konqueror way, where it lists first directories and then files
VinceR commented 2 years ago
Poster
Collaborator

Hi @VinceR ,
this is excellent reading.

I am hoping that MicheleC and SlavekB are finding it so :)

I wonder why it is discussed only in the context of file sorting. I read that most of sorting can be done in "TQt Template Library" for various types of lists.
It would be great if there is one standard method of sorting for all applications or at least make it configurable.

I think (but don't know) that TDE uses a standard sorting algorithm for all sorting that originates in the TQt template library (see tdelibs/tdecore/ksortablevaluelist.h). It is worth noting that current QT has done away with a dedicated "QT Template Library", preferring to rely on the allegedly more efficient sort algorithms from the standard C++ template library.

However every sort algorithm needs a comparison function (implicit or explicit) and that is what is under consideration here. That function could differ between use cases depending on objective. My current concern is the comparison function used in konqueror sorting, especially as it pertains to file names.

But whatever you do, please avoid the kind of sorting I see now in FF, when I am attaching a file to webmail ... the sorting is fine, only it does not take into account the type, so I have directories and files mixed up. I prefer the konqueror way, where it lists first directories and then files

I will offer you some good news and bad news. The bad news that it is now possible (in 14.1 per recent commit) to reproduce this kind of sorting behavior for konqueror listview. The good news is that is completely configurable via new option "Group Directories First".

> Hi @VinceR , > this is excellent reading. > I am hoping that MicheleC and SlavekB are finding it so :) > I wonder why it is discussed only in the context of file sorting. I read that most of sorting can be done in "TQt Template Library" for various types of lists. > It would be great if there is one standard method of sorting for all applications or at least make it configurable. > I think (but don't know) that TDE uses a standard sorting algorithm for all sorting that originates in the TQt template library (see *tdelibs/tdecore/ksortablevaluelist.h*). It is worth noting that current QT has done away with a dedicated "QT Template Library", preferring to rely on the allegedly more efficient sort algorithms from the standard C++ template library. However every sort algorithm needs a comparison function (implicit or explicit) and that is what is under consideration here. That function could differ between use cases depending on objective. My current concern is the comparison function used in konqueror sorting, especially as it pertains to file names. > But whatever you do, please avoid the kind of sorting I see now in FF, when I am attaching a file to webmail ... the sorting is fine, only it does not take into account the type, so I have directories and files mixed up. I prefer the konqueror way, where it lists first directories and then files > I will offer you some good news and bad news. The bad news that it is now possible (in 14.1 per recent commit) to reproduce this kind of sorting behavior for konqueror listview. The good news is that is completely configurable via new option "Group Directories First".
VinceR commented 2 years ago
Poster
Collaborator

So it didn't take to long to notice that I got messed up on negative options: the resulting orders for combinations 2a & 2b were mistakenly reversed.


Combination 2a: [X] Locale Aware Sorting [-] Case Merging

Sort using your locale's specific collation. Resulting order for 'en_US:
= ^ ~ a A ā Ā b B

Corrrected Combination 2a: [X] Locale Aware Sorting [-] Case Merging

Sort using your locale's specific collation. Resulting order for 'en_US':
= ^ ~ a A Ā ā b B


Combination 2b: [X] Locale Aware Sorting [X] Case Merging

Like combination 2a but first using lower(). Resulting order:
= ^ ~ a A Ā ā b B

Corrected Combination 2b: [X] Locale Aware Sorting [X] Case Merging

Like combination 2a but first using lower(). Resulting order:
= ^ ~ a A ā Ā b B


Here is my updated opinion: Combination 2b does seem to add value to Combination 2a in that all lower case characters sort next to and just before their uppercase versions. Let's keep all 4 sort combinations even though some seem less useful than others. That is what I have done in the associated pull request #253, albeit with those negative options.

Next steps as I see it:

  1. Agree that that the issue reported indeed represents a bug.
  2. Decide terminology for current and new sort options, implement them in the user interface and, if we are feeling up to it, in the variable names.
  3. Decide what the defaults should be. The most user-friendly defaults would be Combination 2b. But if we want to reproduce the current defaults + buggy behavior, the defaults should be Combination 1a.
  4. Decide if we want to present sort option configuration as separate dialog box. There may be a case for doing this since we would have options for "Group Hidden First", "Group Directories First", "Locale Aware Sorting", "Case Merging" and who knows what in the future.

Since I already started a PR for this, I will continue working on it. I might need some help with step 4 as I really have no experience with user interface programming.

So it didn't take to long to notice that I got messed up on negative options: the resulting orders for combinations 2a & 2b were mistakenly reversed. ----- > #### Combination 2a: [X] Locale Aware Sorting [-] Case Merging > Sort using your locale's specific collation. Resulting order for 'en_US: > **= ^ ~ a A ā Ā b B** #### Corrrected Combination 2a: [X] Locale Aware Sorting [-] Case Merging Sort using your locale's specific collation. Resulting order for 'en_US': **= ^ ~ a A Ā ā b B** ----- > #### Combination 2b: [X] Locale Aware Sorting [X] Case Merging > Like combination 2a but first using lower(). Resulting order: > **= ^ ~ a A Ā ā b B** #### Corrected Combination 2b: [X] Locale Aware Sorting [X] Case Merging Like combination 2a but first using lower(). Resulting order: **= ^ ~ a A ā Ā b B** ----- Here is my updated opinion: Combination 2b does seem to add value to Combination 2a in that all lower case characters sort next to and just before their uppercase versions. Let's keep all 4 sort combinations even though some seem less useful than others. That is what I have done in the associated pull request https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/253, albeit with those negative options. Next steps as I see it: 1. Agree that that the issue reported indeed represents a bug. 2. Decide terminology for current and new sort options, implement them in the user interface and, if we are feeling up to it, in the variable names. 3. Decide what the defaults should be. The most user-friendly defaults would be Combination 2b. But if we want to reproduce the current defaults + buggy behavior, the defaults should be Combination 1a. 4. Decide if we want to present sort option configuration as separate dialog box. There may be a case for doing this since we would have options for "Group Hidden First", "Group Directories First", "Locale Aware Sorting", "Case Merging" and who knows what in the future. Since I already started a PR for this, I will continue working on it. I might need some help with step 4 as I really have no experience with user interface programming.
Owner

Hi @VinceR,
it is indeed a very interesting reading and definitely learning something from it. A bit overwhelming given the amount of information that we have to digest, but definitely good one.

Keeping all 4 options seem appropriate, it offers the user the most choice and it still require only two menu option entries.
I also like the steps you listed in your previous comment, so we should proceed with that. Point 4. can perhaps be left as last point and we first focus on getting the sorting right and in an efficient way.

Re terminology: "locale aware sorting" is fine with me. "case insensitive sorting" and "case merging" are not very clear. I would rather use something like "group case sorting" or something that implies that lower, upper and special case are grouped or combined together. What do you think of that?

Re default: I think switching to sorting 2b (as per your last comment) seems reasonable, since users can always switch to other sorting orders if needed.

Re tdebase#196: I take it will no longer be necessary if we inplement this.

@SlavekB: please add your own comments as well before we proceed.

Hi @VinceR, it is indeed a very interesting reading and definitely learning something from it. A bit overwhelming given the amount of information that we have to digest, but definitely good one. Keeping all 4 options seem appropriate, it offers the user the most choice and it still require only two menu option entries. I also like the steps you listed in your previous comment, so we should proceed with that. Point 4. can perhaps be left as last point and we first focus on getting the sorting right and in an efficient way. Re terminology: "locale aware sorting" is fine with me. "case insensitive sorting" and "case merging" are not very clear. I would rather use something like "group case sorting" or something that implies that lower, upper and special case are grouped or combined together. What do you think of that? Re default: I think switching to sorting 2b (as per your last comment) seems reasonable, since users can always switch to other sorting orders if needed. Re tdebase#196: I take it will no longer be necessary if we inplement this. @SlavekB: please add your own comments as well before we proceed.
VinceR commented 2 years ago
Poster
Collaborator

Hi @VinceR,
it is indeed a very interesting reading and definitely learning something from it. A bit overwhelming given the amount of information that we have to digest, but definitely good one.

Keeping all 4 options seem appropriate, it offers the user the most choice and it still require only two menu option entries.
I also like the steps you listed in your previous comment, so we should proceed with that. Point 4. can perhaps be left as last point and we first focus on getting the sorting right and in an efficient way.

Re terminology: "locale aware sorting" is fine with me. "case insensitive sorting" and "case merging" are not very clear. I would rather use something like "group case sorting" or something that implies that lower, upper and special case are grouped or combined together. What do you think of that?

I have to admit that I am at a loss to come up a proper name for the case-munging option. Wait ... how about "Alphabetic Case Munging"?

Re default: I think switching to sorting 2b (as per your last comment) seems reasonable, since users can always switch to other sorting orders if needed.

The only thing is that users for the most part will probably not even notice a difference between 2a & 2b. and may wonder about that. Our example with single-character filenames revealed the difference only because we contrived it to do so.

Re tdebase#196: I take it will no longer be necessary if we inplement this.

That is my belief. Assuming that LC_COLLATE is unset and LANG is set to something other than C or POSIX, you can already see the effect with the current "unfixed" konqueror by selecting "Case Insensitve Sorting". Look at your home directory, dotfile and un-dotfiles are right next to each other. I even created a nonsensically named file "====lesshist" and it sorted just before .lesshist.

@SlavekB: please add your own comments as well before we proceed.

> Hi @VinceR, > it is indeed a very interesting reading and definitely learning something from it. A bit overwhelming given the amount of information that we have to digest, but definitely good one. > > Keeping all 4 options seem appropriate, it offers the user the most choice and it still require only two menu option entries. > I also like the steps you listed in your previous comment, so we should proceed with that. Point 4. can perhaps be left as last point and we first focus on getting the sorting right and in an efficient way. > Re terminology: "locale aware sorting" is fine with me. "case insensitive sorting" and "case merging" are not very clear. I would rather use something like "group case sorting" or something that implies that lower, upper and special case are grouped or combined together. What do you think of that? > I have to admit that I am at a loss to come up a proper name for the case-munging option. Wait ... how about "Alphabetic Case Munging"? > Re default: I think switching to sorting 2b (as per your last comment) seems reasonable, since users can always switch to other sorting orders if needed. > The only thing is that users for the most part will probably not even notice a difference between 2a & 2b. and may wonder about that. Our example with single-character filenames revealed the difference only because we contrived it to do so. > Re tdebase#196: I take it will no longer be necessary if we inplement this. > That is my belief. Assuming that LC_COLLATE is unset and LANG is set to something other than C or POSIX, you can already see the effect with the current "unfixed" konqueror by selecting "Case Insensitve Sorting". Look at your home directory, dotfile and un-dotfiles are right next to each other. I even created a nonsensically named file "====lesshist" and it sorted just before .lesshist. > @SlavekB: please add your own comments as well before we proceed.
VinceR commented 2 years ago
Poster
Collaborator

I have to admit that I am at a loss to come up a proper name for the case-munging option. Wait ... how about "Alphabetic Case Munging"?

Or maybe not. Unfortunately the term munging seems to have been given other meanings by un-well people :(

> I have to admit that I am at a loss to come up a proper name for the case-munging option. Wait ... how about "Alphabetic Case Munging"? > Or maybe not. Unfortunately the term munging seems to have been given other meanings by un-well people :(
Owner

"Munging" is definitely cryptic, I guess most users may not even know the word.

The only thing is that users for the most part will probably not even notice a difference between 2a & 2b. and may wonder about that. Our example with single-character filenames revealed the difference only because we contrived it to do so.

That's a good point. 3 options are probably more sensible than 4, but will require more coding. If you are happy to do that, I am ok with 3 options too.

"Munging" is definitely cryptic, I guess most users may not even know the word. > The only thing is that users for the most part will probably not even notice a difference between 2a & 2b. and may wonder about that. Our example with single-character filenames revealed the difference only because we contrived it to do so. That's a good point. 3 options are probably more sensible than 4, but will require more coding. If you are happy to do that, I am ok with 3 options too.
Owner

@VinceR
what is the next step on this issue? Do you need to do more software changes or are we supposed to review PR #253?

@VinceR what is the next step on this issue? Do you need to do more software changes or are we supposed to review PR #253?
VinceR commented 2 years ago
Poster
Collaborator

MicheleC,

Sorry for going dark for so long but I have been climbing a rather steep learning curve to get to next steps. I've made what I think is really good progress but I need to finish up and test my new stuff (live, on the system I actually work on).

Thanks for your patience,
Vince

MicheleC, Sorry for going dark for so long but I have been climbing a rather steep learning curve to get to next steps. I've made what I think is really good progress but I need to finish up and test my new stuff (live, on the system I actually work on). Thanks for your patience, Vince
Owner

No worries @VinceR, I didn't mean to rush you. I just wanted to make sure I wasn't holding you up. Take your time and we will continue the discussion when you are ready (I am also under very heavy load for the next couple of weeks)

No worries @VinceR, I didn't mean to rush you. I just wanted to make sure I wasn't holding you up. Take your time and we will continue the discussion when you are ready (I am also under very heavy load for the next couple of weeks)
VinceR commented 2 years ago
Poster
Collaborator

Whew, that took a lot longer than I wanted. I just pushed new code, see PR #253. I am currently testing / using the new code on my main computer.

Whew, that took a lot longer than I wanted. I just pushed new code, see PR https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/253. I am currently testing / using the new code on my main computer.
Owner

Hi @VinceR,
apologies for the delayed reply. I am going through a very busy period at work. I will have a look at this during the coming weekend and feedback.
Thanks for the good work so far.

Hi @VinceR, apologies for the delayed reply. I am going through a very busy period at work. I will have a look at this during the coming weekend and feedback. Thanks for the good work so far.
Owner

Hi @VinceR,
apologies for the delayed reply. I am going through a very busy period at work. I will have a look at this during the coming weekend and feedback.
Thanks for the good work so far.

Nope, this weekend I didn't make it. Will go through it during the week though.

> Hi @VinceR, > apologies for the delayed reply. I am going through a very busy period at work. I will have a look at this during the coming weekend and feedback. > Thanks for the good work so far. Nope, this weekend I didn't make it. Will go through it during the week though.
VinceR commented 2 years ago
Poster
Collaborator

MichelleC,

Not a problem, I sure know what its like to get swamped. There's no urgency here:

  • From the perspective of the bug in question: people have been living with it since 2004, probably assuming that that's just the way things are supposed to work.
  • From the perspective of the new UI "features" that are introduced in the PR: well those are new things that can wait indefinitely.

I am happy to report that I have been actively testing / using the PR code for several weeks and have not experienced any problems.

MichelleC, Not a problem, I sure know what its like to get swamped. There's no urgency here: * From the perspective of the bug in question: people have been living with it since 2004, probably assuming that that's just the way things are supposed to work. * From the perspective of the new UI "features" that are introduced in the PR: well those are new things that can wait indefinitely. I am happy to report that I have been actively testing / using the PR code for several weeks and have not experienced any problems.
MicheleC added this to the R14.1.0 release milestone 2 years ago
Owner

PR #253 has been merged, so we can close this issue.

PR #253 has been merged, so we can close this issue.
MicheleC closed this issue 2 years ago
Owner

I'll keep Issue 252 open and add some notes about what remaining things need to be done in order to to fully resolve it.

Reopened as per comment from @VinceR in #253.

@VinceR, please add any note that still pertains to this issue. I thought it was resolved by #253 but I guess I missed something :-)

> I'll keep Issue 252 open and add some notes about what remaining things need to be done in order to to fully resolve it. Reopened as per comment from @VinceR in #253. @VinceR, please add any note that still pertains to this issue. I thought it was resolved by #253 but I guess I missed something :-)
MicheleC reopened this issue 2 years ago
VinceR commented 2 years ago
Poster
Collaborator

While PR # 253 did resolve this issue for Konqueror listview, it did not address it for iconview. Before we close this issue, I wanted to add my notes regarding what needs be accomplished for iconview.


Konqueror iconview will need to be updated to gain access to the features that were introduced for listview in PR # 253. After that, the following iconview code changes will need to be made

Replace code in libkonq/tdefileivi.cpp with call to newly introduced function stringCompare()

KFileIVI::compare()
     ...
     if ( view->caseInsensitiveSort() )
          return key().localeAwareCompare( i->key() );
     else
          return view->m_pSettings->caseSensitiveCompare( key(), i->key() );

Delete remaining obsolete code in libkonq/konq_settings.cpp:

struct KonqFMSettingsPrivate
{
    ...
    bool localeAwareCompareIsCaseSensitive;
    ...
}
void KonqFMSettings::init( TDEConfig * config )
{
    ...
    /// true if TQString::localeAwareCompare is case sensitive (it usually isn't, when LC_COLLATE is set)
    d->localeAwareCompareIsCaseSensitive = TQString( "a" ).localeAwareCompare( "B" ) > 0; // see #40131
}
int KonqFMSettings::caseSensitiveCompare( const TQString& a, const TQString& b ) const
{
    if ( d->localeAwareCompareIsCaseSensitive ) {
        return a.localeAwareCompare( b );
    }
    else // can't use localeAwareCompare, have to fallback to normal TQString compare
        return a.compare( b );
}

We can defer the iconview changes to a future issue & PR, so I will go ahead and close this issue.

While [PR # 253](https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/253) did resolve this issue for Konqueror listview, it did not address it for iconview. Before we close this issue, I wanted to add my notes regarding what needs be accomplished for iconview. ----- Konqueror iconview will need to be updated to gain access to the features that were introduced for listview in [PR # 253](https://mirror.git.trinitydesktop.org/gitea/TDE/tdebase/pulls/253). After that, the following iconview code changes will need to be made Replace code in `libkonq/tdefileivi.cpp` with call to newly introduced function `stringCompare()` ``` KFileIVI::compare() ... if ( view->caseInsensitiveSort() ) return key().localeAwareCompare( i->key() ); else return view->m_pSettings->caseSensitiveCompare( key(), i->key() ); ``` Delete remaining obsolete code in `libkonq/konq_settings.cpp`: ``` struct KonqFMSettingsPrivate { ... bool localeAwareCompareIsCaseSensitive; ... } ``` ``` void KonqFMSettings::init( TDEConfig * config ) { ... /// true if TQString::localeAwareCompare is case sensitive (it usually isn't, when LC_COLLATE is set) d->localeAwareCompareIsCaseSensitive = TQString( "a" ).localeAwareCompare( "B" ) > 0; // see #40131 } ``` ``` int KonqFMSettings::caseSensitiveCompare( const TQString& a, const TQString& b ) const { if ( d->localeAwareCompareIsCaseSensitive ) { return a.localeAwareCompare( b ); } else // can't use localeAwareCompare, have to fallback to normal TQString compare return a.compare( b ); } ``` ----- We can defer the iconview changes to a future issue & PR, so I will go ahead and close this issue.
VinceR closed this issue 2 years ago
Owner

Ok, thanks for the further feedback and again for the good work! :-)

Ok, thanks for the further feedback and again for the good work! :-)
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: TDE/tdebase#252
Loading…
There is no content yet.