Why does sort say that ɛ = e?











up vote
25
down vote

favorite
6












ɛ ("Latin epsilon") is a letter used in certain African languages, usually to represent the vowel sound in English "bed". In Unicode it's encoded as U+025B, very distinct from everyday e.



However, if I sort the following:



eb
ed
ɛa
ɛc


it seems that sort considers ɛ and e equivalent:



ɛa
eb
ɛc
ed


What's going on here? And is there a way to make ɛ and e distinct for sorting purposes?










share|improve this question




















  • 21




    sorting rules are called 'collation', if that helps your googling
    – BlueRaja - Danny Pflughoeft
    Oct 26 at 21:28






  • 1




    Try to put a certain number of ea mixed with ɛa inside a text file and sort it. You will see that it always sorts ea before ɛa. So, no they are not considered equal.
    – Bakuriu
    Oct 27 at 9:21










  • Might be an obvious point, but I haven't seen it suggested explicitly yet: if you are sorting words in $(certain_african_language), the natural thing to do is setting the locale to $(certain_african_language).
    – Federico Poloni
    Oct 28 at 9:13










  • @FedericoPoloni A very good point! Unfortunately I haven't been able to find any locale made for this language.
    – Draconis
    Oct 28 at 15:30






  • 1




    @GermánBouzas This is specifically "Latin epsilon", a form designed to fit in with the Latin alphabet. They look pretty much the same, but Latin epsilon is U+025B, while Greek epsilon is U+03B5.
    – Draconis
    Nov 1 at 13:22















up vote
25
down vote

favorite
6












ɛ ("Latin epsilon") is a letter used in certain African languages, usually to represent the vowel sound in English "bed". In Unicode it's encoded as U+025B, very distinct from everyday e.



However, if I sort the following:



eb
ed
ɛa
ɛc


it seems that sort considers ɛ and e equivalent:



ɛa
eb
ɛc
ed


What's going on here? And is there a way to make ɛ and e distinct for sorting purposes?










share|improve this question




















  • 21




    sorting rules are called 'collation', if that helps your googling
    – BlueRaja - Danny Pflughoeft
    Oct 26 at 21:28






  • 1




    Try to put a certain number of ea mixed with ɛa inside a text file and sort it. You will see that it always sorts ea before ɛa. So, no they are not considered equal.
    – Bakuriu
    Oct 27 at 9:21










  • Might be an obvious point, but I haven't seen it suggested explicitly yet: if you are sorting words in $(certain_african_language), the natural thing to do is setting the locale to $(certain_african_language).
    – Federico Poloni
    Oct 28 at 9:13










  • @FedericoPoloni A very good point! Unfortunately I haven't been able to find any locale made for this language.
    – Draconis
    Oct 28 at 15:30






  • 1




    @GermánBouzas This is specifically "Latin epsilon", a form designed to fit in with the Latin alphabet. They look pretty much the same, but Latin epsilon is U+025B, while Greek epsilon is U+03B5.
    – Draconis
    Nov 1 at 13:22













up vote
25
down vote

favorite
6









up vote
25
down vote

favorite
6






6





ɛ ("Latin epsilon") is a letter used in certain African languages, usually to represent the vowel sound in English "bed". In Unicode it's encoded as U+025B, very distinct from everyday e.



However, if I sort the following:



eb
ed
ɛa
ɛc


it seems that sort considers ɛ and e equivalent:



ɛa
eb
ɛc
ed


What's going on here? And is there a way to make ɛ and e distinct for sorting purposes?










share|improve this question















ɛ ("Latin epsilon") is a letter used in certain African languages, usually to represent the vowel sound in English "bed". In Unicode it's encoded as U+025B, very distinct from everyday e.



However, if I sort the following:



eb
ed
ɛa
ɛc


it seems that sort considers ɛ and e equivalent:



ɛa
eb
ɛc
ed


What's going on here? And is there a way to make ɛ and e distinct for sorting purposes?







sort locale unicode






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 26 at 17:35









jimmij

30.2k867102




30.2k867102










asked Oct 26 at 16:32









Draconis

378310




378310








  • 21




    sorting rules are called 'collation', if that helps your googling
    – BlueRaja - Danny Pflughoeft
    Oct 26 at 21:28






  • 1




    Try to put a certain number of ea mixed with ɛa inside a text file and sort it. You will see that it always sorts ea before ɛa. So, no they are not considered equal.
    – Bakuriu
    Oct 27 at 9:21










  • Might be an obvious point, but I haven't seen it suggested explicitly yet: if you are sorting words in $(certain_african_language), the natural thing to do is setting the locale to $(certain_african_language).
    – Federico Poloni
    Oct 28 at 9:13










  • @FedericoPoloni A very good point! Unfortunately I haven't been able to find any locale made for this language.
    – Draconis
    Oct 28 at 15:30






  • 1




    @GermánBouzas This is specifically "Latin epsilon", a form designed to fit in with the Latin alphabet. They look pretty much the same, but Latin epsilon is U+025B, while Greek epsilon is U+03B5.
    – Draconis
    Nov 1 at 13:22














  • 21




    sorting rules are called 'collation', if that helps your googling
    – BlueRaja - Danny Pflughoeft
    Oct 26 at 21:28






  • 1




    Try to put a certain number of ea mixed with ɛa inside a text file and sort it. You will see that it always sorts ea before ɛa. So, no they are not considered equal.
    – Bakuriu
    Oct 27 at 9:21










  • Might be an obvious point, but I haven't seen it suggested explicitly yet: if you are sorting words in $(certain_african_language), the natural thing to do is setting the locale to $(certain_african_language).
    – Federico Poloni
    Oct 28 at 9:13










  • @FedericoPoloni A very good point! Unfortunately I haven't been able to find any locale made for this language.
    – Draconis
    Oct 28 at 15:30






  • 1




    @GermánBouzas This is specifically "Latin epsilon", a form designed to fit in with the Latin alphabet. They look pretty much the same, but Latin epsilon is U+025B, while Greek epsilon is U+03B5.
    – Draconis
    Nov 1 at 13:22








21




21




sorting rules are called 'collation', if that helps your googling
– BlueRaja - Danny Pflughoeft
Oct 26 at 21:28




sorting rules are called 'collation', if that helps your googling
– BlueRaja - Danny Pflughoeft
Oct 26 at 21:28




1




1




Try to put a certain number of ea mixed with ɛa inside a text file and sort it. You will see that it always sorts ea before ɛa. So, no they are not considered equal.
– Bakuriu
Oct 27 at 9:21




Try to put a certain number of ea mixed with ɛa inside a text file and sort it. You will see that it always sorts ea before ɛa. So, no they are not considered equal.
– Bakuriu
Oct 27 at 9:21












Might be an obvious point, but I haven't seen it suggested explicitly yet: if you are sorting words in $(certain_african_language), the natural thing to do is setting the locale to $(certain_african_language).
– Federico Poloni
Oct 28 at 9:13




Might be an obvious point, but I haven't seen it suggested explicitly yet: if you are sorting words in $(certain_african_language), the natural thing to do is setting the locale to $(certain_african_language).
– Federico Poloni
Oct 28 at 9:13












@FedericoPoloni A very good point! Unfortunately I haven't been able to find any locale made for this language.
– Draconis
Oct 28 at 15:30




@FedericoPoloni A very good point! Unfortunately I haven't been able to find any locale made for this language.
– Draconis
Oct 28 at 15:30




1




1




@GermánBouzas This is specifically "Latin epsilon", a form designed to fit in with the Latin alphabet. They look pretty much the same, but Latin epsilon is U+025B, while Greek epsilon is U+03B5.
– Draconis
Nov 1 at 13:22




@GermánBouzas This is specifically "Latin epsilon", a form designed to fit in with the Latin alphabet. They look pretty much the same, but Latin epsilon is U+025B, while Greek epsilon is U+03B5.
– Draconis
Nov 1 at 13:22










3 Answers
3






active

oldest

votes

















up vote
68
down vote



accepted










No, it doesn't consider them as equivalent, they just have the same primary weight. So that, in first approximation, they sort the same.



If you look at /usr/share/i18n/locales/iso14651_t1_common (as used as basis for most locales) on a GNU system (here with glibc 2.27), you'll see:



<U0065> <e>;<BAS>;<MIN>;IGNORE # 259 e
<U025B> <e>;<PCL>;<MIN>;IGNORE # 287 ɛ
<U0045> <e>;<BAS>;<CAP>;IGNORE # 577 E


e, ɛ and E have the same primary weight, e and E same secondary weight, only the third weight differentiates them.



When comparing strings, sort (the strcoll() standard libc function is uses to compare strings) starts by comparing the primary weights of all characters, and only go for the second weight if the strings are equal with the primary weights (and so on with the other weights).



That's how case seems to be ignored in the sorting order in first approximation. Ab sorts between aa and ac, but Ab can sort before or after ab depending on the language rule (some languages have <MIN> before <CAP> like in British English, some <CAP> before <MIN> like in Estonian).



If e had the same sorting order as ɛ, printf '%sn' e ɛ | sort -u would return only one line. But as <BAS> sorts before <PCL>, e alone sorts before ɛ. eɛe sorts after EEE (at the secondary weight) even though EEE sorts after eee (for which we need to go up to the third weight).



Now if on my system with glibc 2.27, I run:



sed -n 's/(.*;[^[:blank:]]*).*/1/p' /usr/share/i18n/locales/iso14651_t1_common |
sort -k2 | uniq -Df1


You'll notice that there are quite a few characters that have been defined with the exact same 4 weights. In particular, our ɛ has the same weights as:



<U01DD> <e>;<PCL>;<MIN>;IGNORE
<U0259> <e>;<PCL>;<MIN>;IGNORE
<U025B> <e>;<PCL>;<MIN>;IGNORE


And sure enough:



$ printf '%sn' $'u01DD' $'u0259' $'u025B' | sort -u
ǝ
$ expr ɛ = ǝ
1


That can be seen as a bug of GNU libc locales. On most other systems, locales make sure all different characters have different sorting order in the end. On GNU locales, it gets even worse, as there are thousands of characters that don't have a sorting order and end up sorting the same, causing all sorts of problems (like breaking comm, join, ls or globs having non-deterministic orders...), hence the recommendation of using LC_ALL=C to work around those issues.



As noted by @ninjalj in comments, glibc 2.28 released in August 2018 came with some improvements on that front though AFAICS, there are still some characters or collating elements defined with identical sorting order. On Ubuntu 18.10 with glibc 2.28 and in a en_GB.UTF-8 locale.



$ expr $'Lub7' = $'Lu387'
1


(why would U+00B7 be considered equivalent as U+0387 only when combined with L/l?!).



And:



$ perl -lC -e 'for($i=0; $i<0x110000; $i++) {$i = 0xe000 if $i == 0xd800; print chr($i)}' | sort > all-chars-sorted
$ uniq -d all-chars-sorted | wc -l
4
$ uniq -D all-chars-sorted | wc -l
1061355


(still over 1 million characters (95% of the Unicode range, down from 98% in 2.27) sorting the same as other characters as their sorting order is not defined).



See also:




  • What does "LC_ALL=C" do?

  • Generate the collating order of a string

  • What is the difference between "sort -u" and "sort | uniq"?






share|improve this answer



















  • 3




    This is exactly what I was looking for! For completeness, what does <PCL> stand for? The others seem to be Capital, Miniscule, and Basic?
    – Draconis
    Oct 26 at 19:51






  • 3




    @Draconis, collating-symbol <PCL> # 16 particulier/peculiar
    – Stéphane Chazelas
    Oct 26 at 20:59










  • Indeed if we put a bunch of ea and ɛa mixed together in a file we see that sort sorts all eas before ɛas.
    – Bakuriu
    Oct 27 at 9:22








  • 2




    From glibc 2.28, the codepoint should be used as a fallback for a 4th level weight, see sourceware.org/git/… sourceware.org/bugzilla/show_bug.cgi?id=14095
    – ninjalj
    Oct 27 at 10:47








  • 1




    @cat, sorry, I meant strcoll(), see edit.
    – Stéphane Chazelas
    Oct 27 at 21:31


















up vote
15
down vote













man sort:



   ***  WARNING  ***  The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.


So, try: LC_ALL=C sort file.txt






share|improve this answer

















  • 1




    That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
    – Draconis
    Oct 26 at 16:36










  • @Draconis What is "the default locale"?
    – Kamil Maciorowski
    Oct 26 at 16:39










  • @KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
    – Draconis
    Oct 26 at 16:44






  • 3




    @Draconis if LC_ALL is empty, sort may use other LC_* variables, LANG or some configuration files.
    – NieDzejkob
    Oct 26 at 19:43






  • 1




    LC_COLLATE is the string-sort-specific one, LANG is the extra-general one.
    – ShadowRanger
    Oct 27 at 3:16


















up vote
8
down vote













The character ɛ is not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that €uro currency comes close to Europe in dictionary.



Anyway to see what collation you are currently using run locale, the locale -a will give you the list of locales available on the system and to change collation say to C just for one sorting run LC_COLLATE=C sort file. Finally to see how different locales can sort your file try



for loc in $(locale -a)
do echo ____"${loc}"____
LC_COLLATE="$loc" sort file
done


Pipe the result to some greping tool to choose locale that fits your need.






share|improve this answer





















  • This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
    – Draconis
    Oct 26 at 17:47






  • 1




    No, they're not considered identical. Add a plain ea line to the file, then with sort -u you will get both ea and ɛa in the output. The best strategy vs. collate is avoid (export LC_COLLATE=C). Otherwise, many ugly things will happen (eg. /tmp/[a-z] in bash will match /tmp/a and /tmp/A but not /tmp/Z).
    – mosvy
    Oct 26 at 18:13










  • @mosvy Huh, interesting…so they are considered the same for ordering purposes but not for uniqueness purposes?
    – Draconis
    Oct 26 at 18:55










  • they're not considered the same. see here an explanation about it.
    – mosvy
    Oct 26 at 19:03






  • 1




    @ninjalj, that may be fixed in the glibc fnmatch() and regexp ranges, but not in some like bash that implement its ranges by itself using strcoll(). ksh93 never had the problem because its range implementation uses strcoll() and also check the case of range ends and only match on lowercase characters if both ends are lower case. zsh ranges don't have the issue as it's done based on code point, not strcoll().
    – Stéphane Chazelas
    Oct 28 at 8:07













Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f477998%2fwhy-does-sort-say-that-%25c9%259b-e%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
68
down vote



accepted










No, it doesn't consider them as equivalent, they just have the same primary weight. So that, in first approximation, they sort the same.



If you look at /usr/share/i18n/locales/iso14651_t1_common (as used as basis for most locales) on a GNU system (here with glibc 2.27), you'll see:



<U0065> <e>;<BAS>;<MIN>;IGNORE # 259 e
<U025B> <e>;<PCL>;<MIN>;IGNORE # 287 ɛ
<U0045> <e>;<BAS>;<CAP>;IGNORE # 577 E


e, ɛ and E have the same primary weight, e and E same secondary weight, only the third weight differentiates them.



When comparing strings, sort (the strcoll() standard libc function is uses to compare strings) starts by comparing the primary weights of all characters, and only go for the second weight if the strings are equal with the primary weights (and so on with the other weights).



That's how case seems to be ignored in the sorting order in first approximation. Ab sorts between aa and ac, but Ab can sort before or after ab depending on the language rule (some languages have <MIN> before <CAP> like in British English, some <CAP> before <MIN> like in Estonian).



If e had the same sorting order as ɛ, printf '%sn' e ɛ | sort -u would return only one line. But as <BAS> sorts before <PCL>, e alone sorts before ɛ. eɛe sorts after EEE (at the secondary weight) even though EEE sorts after eee (for which we need to go up to the third weight).



Now if on my system with glibc 2.27, I run:



sed -n 's/(.*;[^[:blank:]]*).*/1/p' /usr/share/i18n/locales/iso14651_t1_common |
sort -k2 | uniq -Df1


You'll notice that there are quite a few characters that have been defined with the exact same 4 weights. In particular, our ɛ has the same weights as:



<U01DD> <e>;<PCL>;<MIN>;IGNORE
<U0259> <e>;<PCL>;<MIN>;IGNORE
<U025B> <e>;<PCL>;<MIN>;IGNORE


And sure enough:



$ printf '%sn' $'u01DD' $'u0259' $'u025B' | sort -u
ǝ
$ expr ɛ = ǝ
1


That can be seen as a bug of GNU libc locales. On most other systems, locales make sure all different characters have different sorting order in the end. On GNU locales, it gets even worse, as there are thousands of characters that don't have a sorting order and end up sorting the same, causing all sorts of problems (like breaking comm, join, ls or globs having non-deterministic orders...), hence the recommendation of using LC_ALL=C to work around those issues.



As noted by @ninjalj in comments, glibc 2.28 released in August 2018 came with some improvements on that front though AFAICS, there are still some characters or collating elements defined with identical sorting order. On Ubuntu 18.10 with glibc 2.28 and in a en_GB.UTF-8 locale.



$ expr $'Lub7' = $'Lu387'
1


(why would U+00B7 be considered equivalent as U+0387 only when combined with L/l?!).



And:



$ perl -lC -e 'for($i=0; $i<0x110000; $i++) {$i = 0xe000 if $i == 0xd800; print chr($i)}' | sort > all-chars-sorted
$ uniq -d all-chars-sorted | wc -l
4
$ uniq -D all-chars-sorted | wc -l
1061355


(still over 1 million characters (95% of the Unicode range, down from 98% in 2.27) sorting the same as other characters as their sorting order is not defined).



See also:




  • What does "LC_ALL=C" do?

  • Generate the collating order of a string

  • What is the difference between "sort -u" and "sort | uniq"?






share|improve this answer



















  • 3




    This is exactly what I was looking for! For completeness, what does <PCL> stand for? The others seem to be Capital, Miniscule, and Basic?
    – Draconis
    Oct 26 at 19:51






  • 3




    @Draconis, collating-symbol <PCL> # 16 particulier/peculiar
    – Stéphane Chazelas
    Oct 26 at 20:59










  • Indeed if we put a bunch of ea and ɛa mixed together in a file we see that sort sorts all eas before ɛas.
    – Bakuriu
    Oct 27 at 9:22








  • 2




    From glibc 2.28, the codepoint should be used as a fallback for a 4th level weight, see sourceware.org/git/… sourceware.org/bugzilla/show_bug.cgi?id=14095
    – ninjalj
    Oct 27 at 10:47








  • 1




    @cat, sorry, I meant strcoll(), see edit.
    – Stéphane Chazelas
    Oct 27 at 21:31















up vote
68
down vote



accepted










No, it doesn't consider them as equivalent, they just have the same primary weight. So that, in first approximation, they sort the same.



If you look at /usr/share/i18n/locales/iso14651_t1_common (as used as basis for most locales) on a GNU system (here with glibc 2.27), you'll see:



<U0065> <e>;<BAS>;<MIN>;IGNORE # 259 e
<U025B> <e>;<PCL>;<MIN>;IGNORE # 287 ɛ
<U0045> <e>;<BAS>;<CAP>;IGNORE # 577 E


e, ɛ and E have the same primary weight, e and E same secondary weight, only the third weight differentiates them.



When comparing strings, sort (the strcoll() standard libc function is uses to compare strings) starts by comparing the primary weights of all characters, and only go for the second weight if the strings are equal with the primary weights (and so on with the other weights).



That's how case seems to be ignored in the sorting order in first approximation. Ab sorts between aa and ac, but Ab can sort before or after ab depending on the language rule (some languages have <MIN> before <CAP> like in British English, some <CAP> before <MIN> like in Estonian).



If e had the same sorting order as ɛ, printf '%sn' e ɛ | sort -u would return only one line. But as <BAS> sorts before <PCL>, e alone sorts before ɛ. eɛe sorts after EEE (at the secondary weight) even though EEE sorts after eee (for which we need to go up to the third weight).



Now if on my system with glibc 2.27, I run:



sed -n 's/(.*;[^[:blank:]]*).*/1/p' /usr/share/i18n/locales/iso14651_t1_common |
sort -k2 | uniq -Df1


You'll notice that there are quite a few characters that have been defined with the exact same 4 weights. In particular, our ɛ has the same weights as:



<U01DD> <e>;<PCL>;<MIN>;IGNORE
<U0259> <e>;<PCL>;<MIN>;IGNORE
<U025B> <e>;<PCL>;<MIN>;IGNORE


And sure enough:



$ printf '%sn' $'u01DD' $'u0259' $'u025B' | sort -u
ǝ
$ expr ɛ = ǝ
1


That can be seen as a bug of GNU libc locales. On most other systems, locales make sure all different characters have different sorting order in the end. On GNU locales, it gets even worse, as there are thousands of characters that don't have a sorting order and end up sorting the same, causing all sorts of problems (like breaking comm, join, ls or globs having non-deterministic orders...), hence the recommendation of using LC_ALL=C to work around those issues.



As noted by @ninjalj in comments, glibc 2.28 released in August 2018 came with some improvements on that front though AFAICS, there are still some characters or collating elements defined with identical sorting order. On Ubuntu 18.10 with glibc 2.28 and in a en_GB.UTF-8 locale.



$ expr $'Lub7' = $'Lu387'
1


(why would U+00B7 be considered equivalent as U+0387 only when combined with L/l?!).



And:



$ perl -lC -e 'for($i=0; $i<0x110000; $i++) {$i = 0xe000 if $i == 0xd800; print chr($i)}' | sort > all-chars-sorted
$ uniq -d all-chars-sorted | wc -l
4
$ uniq -D all-chars-sorted | wc -l
1061355


(still over 1 million characters (95% of the Unicode range, down from 98% in 2.27) sorting the same as other characters as their sorting order is not defined).



See also:




  • What does "LC_ALL=C" do?

  • Generate the collating order of a string

  • What is the difference between "sort -u" and "sort | uniq"?






share|improve this answer



















  • 3




    This is exactly what I was looking for! For completeness, what does <PCL> stand for? The others seem to be Capital, Miniscule, and Basic?
    – Draconis
    Oct 26 at 19:51






  • 3




    @Draconis, collating-symbol <PCL> # 16 particulier/peculiar
    – Stéphane Chazelas
    Oct 26 at 20:59










  • Indeed if we put a bunch of ea and ɛa mixed together in a file we see that sort sorts all eas before ɛas.
    – Bakuriu
    Oct 27 at 9:22








  • 2




    From glibc 2.28, the codepoint should be used as a fallback for a 4th level weight, see sourceware.org/git/… sourceware.org/bugzilla/show_bug.cgi?id=14095
    – ninjalj
    Oct 27 at 10:47








  • 1




    @cat, sorry, I meant strcoll(), see edit.
    – Stéphane Chazelas
    Oct 27 at 21:31













up vote
68
down vote



accepted







up vote
68
down vote



accepted






No, it doesn't consider them as equivalent, they just have the same primary weight. So that, in first approximation, they sort the same.



If you look at /usr/share/i18n/locales/iso14651_t1_common (as used as basis for most locales) on a GNU system (here with glibc 2.27), you'll see:



<U0065> <e>;<BAS>;<MIN>;IGNORE # 259 e
<U025B> <e>;<PCL>;<MIN>;IGNORE # 287 ɛ
<U0045> <e>;<BAS>;<CAP>;IGNORE # 577 E


e, ɛ and E have the same primary weight, e and E same secondary weight, only the third weight differentiates them.



When comparing strings, sort (the strcoll() standard libc function is uses to compare strings) starts by comparing the primary weights of all characters, and only go for the second weight if the strings are equal with the primary weights (and so on with the other weights).



That's how case seems to be ignored in the sorting order in first approximation. Ab sorts between aa and ac, but Ab can sort before or after ab depending on the language rule (some languages have <MIN> before <CAP> like in British English, some <CAP> before <MIN> like in Estonian).



If e had the same sorting order as ɛ, printf '%sn' e ɛ | sort -u would return only one line. But as <BAS> sorts before <PCL>, e alone sorts before ɛ. eɛe sorts after EEE (at the secondary weight) even though EEE sorts after eee (for which we need to go up to the third weight).



Now if on my system with glibc 2.27, I run:



sed -n 's/(.*;[^[:blank:]]*).*/1/p' /usr/share/i18n/locales/iso14651_t1_common |
sort -k2 | uniq -Df1


You'll notice that there are quite a few characters that have been defined with the exact same 4 weights. In particular, our ɛ has the same weights as:



<U01DD> <e>;<PCL>;<MIN>;IGNORE
<U0259> <e>;<PCL>;<MIN>;IGNORE
<U025B> <e>;<PCL>;<MIN>;IGNORE


And sure enough:



$ printf '%sn' $'u01DD' $'u0259' $'u025B' | sort -u
ǝ
$ expr ɛ = ǝ
1


That can be seen as a bug of GNU libc locales. On most other systems, locales make sure all different characters have different sorting order in the end. On GNU locales, it gets even worse, as there are thousands of characters that don't have a sorting order and end up sorting the same, causing all sorts of problems (like breaking comm, join, ls or globs having non-deterministic orders...), hence the recommendation of using LC_ALL=C to work around those issues.



As noted by @ninjalj in comments, glibc 2.28 released in August 2018 came with some improvements on that front though AFAICS, there are still some characters or collating elements defined with identical sorting order. On Ubuntu 18.10 with glibc 2.28 and in a en_GB.UTF-8 locale.



$ expr $'Lub7' = $'Lu387'
1


(why would U+00B7 be considered equivalent as U+0387 only when combined with L/l?!).



And:



$ perl -lC -e 'for($i=0; $i<0x110000; $i++) {$i = 0xe000 if $i == 0xd800; print chr($i)}' | sort > all-chars-sorted
$ uniq -d all-chars-sorted | wc -l
4
$ uniq -D all-chars-sorted | wc -l
1061355


(still over 1 million characters (95% of the Unicode range, down from 98% in 2.27) sorting the same as other characters as their sorting order is not defined).



See also:




  • What does "LC_ALL=C" do?

  • Generate the collating order of a string

  • What is the difference between "sort -u" and "sort | uniq"?






share|improve this answer














No, it doesn't consider them as equivalent, they just have the same primary weight. So that, in first approximation, they sort the same.



If you look at /usr/share/i18n/locales/iso14651_t1_common (as used as basis for most locales) on a GNU system (here with glibc 2.27), you'll see:



<U0065> <e>;<BAS>;<MIN>;IGNORE # 259 e
<U025B> <e>;<PCL>;<MIN>;IGNORE # 287 ɛ
<U0045> <e>;<BAS>;<CAP>;IGNORE # 577 E


e, ɛ and E have the same primary weight, e and E same secondary weight, only the third weight differentiates them.



When comparing strings, sort (the strcoll() standard libc function is uses to compare strings) starts by comparing the primary weights of all characters, and only go for the second weight if the strings are equal with the primary weights (and so on with the other weights).



That's how case seems to be ignored in the sorting order in first approximation. Ab sorts between aa and ac, but Ab can sort before or after ab depending on the language rule (some languages have <MIN> before <CAP> like in British English, some <CAP> before <MIN> like in Estonian).



If e had the same sorting order as ɛ, printf '%sn' e ɛ | sort -u would return only one line. But as <BAS> sorts before <PCL>, e alone sorts before ɛ. eɛe sorts after EEE (at the secondary weight) even though EEE sorts after eee (for which we need to go up to the third weight).



Now if on my system with glibc 2.27, I run:



sed -n 's/(.*;[^[:blank:]]*).*/1/p' /usr/share/i18n/locales/iso14651_t1_common |
sort -k2 | uniq -Df1


You'll notice that there are quite a few characters that have been defined with the exact same 4 weights. In particular, our ɛ has the same weights as:



<U01DD> <e>;<PCL>;<MIN>;IGNORE
<U0259> <e>;<PCL>;<MIN>;IGNORE
<U025B> <e>;<PCL>;<MIN>;IGNORE


And sure enough:



$ printf '%sn' $'u01DD' $'u0259' $'u025B' | sort -u
ǝ
$ expr ɛ = ǝ
1


That can be seen as a bug of GNU libc locales. On most other systems, locales make sure all different characters have different sorting order in the end. On GNU locales, it gets even worse, as there are thousands of characters that don't have a sorting order and end up sorting the same, causing all sorts of problems (like breaking comm, join, ls or globs having non-deterministic orders...), hence the recommendation of using LC_ALL=C to work around those issues.



As noted by @ninjalj in comments, glibc 2.28 released in August 2018 came with some improvements on that front though AFAICS, there are still some characters or collating elements defined with identical sorting order. On Ubuntu 18.10 with glibc 2.28 and in a en_GB.UTF-8 locale.



$ expr $'Lub7' = $'Lu387'
1


(why would U+00B7 be considered equivalent as U+0387 only when combined with L/l?!).



And:



$ perl -lC -e 'for($i=0; $i<0x110000; $i++) {$i = 0xe000 if $i == 0xd800; print chr($i)}' | sort > all-chars-sorted
$ uniq -d all-chars-sorted | wc -l
4
$ uniq -D all-chars-sorted | wc -l
1061355


(still over 1 million characters (95% of the Unicode range, down from 98% in 2.27) sorting the same as other characters as their sorting order is not defined).



See also:




  • What does "LC_ALL=C" do?

  • Generate the collating order of a string

  • What is the difference between "sort -u" and "sort | uniq"?







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 8 at 21:53

























answered Oct 26 at 19:38









Stéphane Chazelas

293k54547888




293k54547888








  • 3




    This is exactly what I was looking for! For completeness, what does <PCL> stand for? The others seem to be Capital, Miniscule, and Basic?
    – Draconis
    Oct 26 at 19:51






  • 3




    @Draconis, collating-symbol <PCL> # 16 particulier/peculiar
    – Stéphane Chazelas
    Oct 26 at 20:59










  • Indeed if we put a bunch of ea and ɛa mixed together in a file we see that sort sorts all eas before ɛas.
    – Bakuriu
    Oct 27 at 9:22








  • 2




    From glibc 2.28, the codepoint should be used as a fallback for a 4th level weight, see sourceware.org/git/… sourceware.org/bugzilla/show_bug.cgi?id=14095
    – ninjalj
    Oct 27 at 10:47








  • 1




    @cat, sorry, I meant strcoll(), see edit.
    – Stéphane Chazelas
    Oct 27 at 21:31














  • 3




    This is exactly what I was looking for! For completeness, what does <PCL> stand for? The others seem to be Capital, Miniscule, and Basic?
    – Draconis
    Oct 26 at 19:51






  • 3




    @Draconis, collating-symbol <PCL> # 16 particulier/peculiar
    – Stéphane Chazelas
    Oct 26 at 20:59










  • Indeed if we put a bunch of ea and ɛa mixed together in a file we see that sort sorts all eas before ɛas.
    – Bakuriu
    Oct 27 at 9:22








  • 2




    From glibc 2.28, the codepoint should be used as a fallback for a 4th level weight, see sourceware.org/git/… sourceware.org/bugzilla/show_bug.cgi?id=14095
    – ninjalj
    Oct 27 at 10:47








  • 1




    @cat, sorry, I meant strcoll(), see edit.
    – Stéphane Chazelas
    Oct 27 at 21:31








3




3




This is exactly what I was looking for! For completeness, what does <PCL> stand for? The others seem to be Capital, Miniscule, and Basic?
– Draconis
Oct 26 at 19:51




This is exactly what I was looking for! For completeness, what does <PCL> stand for? The others seem to be Capital, Miniscule, and Basic?
– Draconis
Oct 26 at 19:51




3




3




@Draconis, collating-symbol <PCL> # 16 particulier/peculiar
– Stéphane Chazelas
Oct 26 at 20:59




@Draconis, collating-symbol <PCL> # 16 particulier/peculiar
– Stéphane Chazelas
Oct 26 at 20:59












Indeed if we put a bunch of ea and ɛa mixed together in a file we see that sort sorts all eas before ɛas.
– Bakuriu
Oct 27 at 9:22






Indeed if we put a bunch of ea and ɛa mixed together in a file we see that sort sorts all eas before ɛas.
– Bakuriu
Oct 27 at 9:22






2




2




From glibc 2.28, the codepoint should be used as a fallback for a 4th level weight, see sourceware.org/git/… sourceware.org/bugzilla/show_bug.cgi?id=14095
– ninjalj
Oct 27 at 10:47






From glibc 2.28, the codepoint should be used as a fallback for a 4th level weight, see sourceware.org/git/… sourceware.org/bugzilla/show_bug.cgi?id=14095
– ninjalj
Oct 27 at 10:47






1




1




@cat, sorry, I meant strcoll(), see edit.
– Stéphane Chazelas
Oct 27 at 21:31




@cat, sorry, I meant strcoll(), see edit.
– Stéphane Chazelas
Oct 27 at 21:31












up vote
15
down vote













man sort:



   ***  WARNING  ***  The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.


So, try: LC_ALL=C sort file.txt






share|improve this answer

















  • 1




    That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
    – Draconis
    Oct 26 at 16:36










  • @Draconis What is "the default locale"?
    – Kamil Maciorowski
    Oct 26 at 16:39










  • @KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
    – Draconis
    Oct 26 at 16:44






  • 3




    @Draconis if LC_ALL is empty, sort may use other LC_* variables, LANG or some configuration files.
    – NieDzejkob
    Oct 26 at 19:43






  • 1




    LC_COLLATE is the string-sort-specific one, LANG is the extra-general one.
    – ShadowRanger
    Oct 27 at 3:16















up vote
15
down vote













man sort:



   ***  WARNING  ***  The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.


So, try: LC_ALL=C sort file.txt






share|improve this answer

















  • 1




    That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
    – Draconis
    Oct 26 at 16:36










  • @Draconis What is "the default locale"?
    – Kamil Maciorowski
    Oct 26 at 16:39










  • @KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
    – Draconis
    Oct 26 at 16:44






  • 3




    @Draconis if LC_ALL is empty, sort may use other LC_* variables, LANG or some configuration files.
    – NieDzejkob
    Oct 26 at 19:43






  • 1




    LC_COLLATE is the string-sort-specific one, LANG is the extra-general one.
    – ShadowRanger
    Oct 27 at 3:16













up vote
15
down vote










up vote
15
down vote









man sort:



   ***  WARNING  ***  The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.


So, try: LC_ALL=C sort file.txt






share|improve this answer












man sort:



   ***  WARNING  ***  The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.


So, try: LC_ALL=C sort file.txt







share|improve this answer












share|improve this answer



share|improve this answer










answered Oct 26 at 16:35









Ipor Sircer

10k11023




10k11023








  • 1




    That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
    – Draconis
    Oct 26 at 16:36










  • @Draconis What is "the default locale"?
    – Kamil Maciorowski
    Oct 26 at 16:39










  • @KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
    – Draconis
    Oct 26 at 16:44






  • 3




    @Draconis if LC_ALL is empty, sort may use other LC_* variables, LANG or some configuration files.
    – NieDzejkob
    Oct 26 at 19:43






  • 1




    LC_COLLATE is the string-sort-specific one, LANG is the extra-general one.
    – ShadowRanger
    Oct 27 at 3:16














  • 1




    That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
    – Draconis
    Oct 26 at 16:36










  • @Draconis What is "the default locale"?
    – Kamil Maciorowski
    Oct 26 at 16:39










  • @KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
    – Draconis
    Oct 26 at 16:44






  • 3




    @Draconis if LC_ALL is empty, sort may use other LC_* variables, LANG or some configuration files.
    – NieDzejkob
    Oct 26 at 19:43






  • 1




    LC_COLLATE is the string-sort-specific one, LANG is the extra-general one.
    – ShadowRanger
    Oct 27 at 3:16








1




1




That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
– Draconis
Oct 26 at 16:36




That works! But why does the default locale consider these completely separate codepoints to be the same? I'm curious why this happens.
– Draconis
Oct 26 at 16:36












@Draconis What is "the default locale"?
– Kamil Maciorowski
Oct 26 at 16:39




@Draconis What is "the default locale"?
– Kamil Maciorowski
Oct 26 at 16:39












@KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
– Draconis
Oct 26 at 16:44




@KamilMaciorowski An empty value of the environment variable; I'm not sure what locale that corresponds to.
– Draconis
Oct 26 at 16:44




3




3




@Draconis if LC_ALL is empty, sort may use other LC_* variables, LANG or some configuration files.
– NieDzejkob
Oct 26 at 19:43




@Draconis if LC_ALL is empty, sort may use other LC_* variables, LANG or some configuration files.
– NieDzejkob
Oct 26 at 19:43




1




1




LC_COLLATE is the string-sort-specific one, LANG is the extra-general one.
– ShadowRanger
Oct 27 at 3:16




LC_COLLATE is the string-sort-specific one, LANG is the extra-general one.
– ShadowRanger
Oct 27 at 3:16










up vote
8
down vote













The character ɛ is not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that €uro currency comes close to Europe in dictionary.



Anyway to see what collation you are currently using run locale, the locale -a will give you the list of locales available on the system and to change collation say to C just for one sorting run LC_COLLATE=C sort file. Finally to see how different locales can sort your file try



for loc in $(locale -a)
do echo ____"${loc}"____
LC_COLLATE="$loc" sort file
done


Pipe the result to some greping tool to choose locale that fits your need.






share|improve this answer





















  • This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
    – Draconis
    Oct 26 at 17:47






  • 1




    No, they're not considered identical. Add a plain ea line to the file, then with sort -u you will get both ea and ɛa in the output. The best strategy vs. collate is avoid (export LC_COLLATE=C). Otherwise, many ugly things will happen (eg. /tmp/[a-z] in bash will match /tmp/a and /tmp/A but not /tmp/Z).
    – mosvy
    Oct 26 at 18:13










  • @mosvy Huh, interesting…so they are considered the same for ordering purposes but not for uniqueness purposes?
    – Draconis
    Oct 26 at 18:55










  • they're not considered the same. see here an explanation about it.
    – mosvy
    Oct 26 at 19:03






  • 1




    @ninjalj, that may be fixed in the glibc fnmatch() and regexp ranges, but not in some like bash that implement its ranges by itself using strcoll(). ksh93 never had the problem because its range implementation uses strcoll() and also check the case of range ends and only match on lowercase characters if both ends are lower case. zsh ranges don't have the issue as it's done based on code point, not strcoll().
    – Stéphane Chazelas
    Oct 28 at 8:07

















up vote
8
down vote













The character ɛ is not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that €uro currency comes close to Europe in dictionary.



Anyway to see what collation you are currently using run locale, the locale -a will give you the list of locales available on the system and to change collation say to C just for one sorting run LC_COLLATE=C sort file. Finally to see how different locales can sort your file try



for loc in $(locale -a)
do echo ____"${loc}"____
LC_COLLATE="$loc" sort file
done


Pipe the result to some greping tool to choose locale that fits your need.






share|improve this answer





















  • This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
    – Draconis
    Oct 26 at 17:47






  • 1




    No, they're not considered identical. Add a plain ea line to the file, then with sort -u you will get both ea and ɛa in the output. The best strategy vs. collate is avoid (export LC_COLLATE=C). Otherwise, many ugly things will happen (eg. /tmp/[a-z] in bash will match /tmp/a and /tmp/A but not /tmp/Z).
    – mosvy
    Oct 26 at 18:13










  • @mosvy Huh, interesting…so they are considered the same for ordering purposes but not for uniqueness purposes?
    – Draconis
    Oct 26 at 18:55










  • they're not considered the same. see here an explanation about it.
    – mosvy
    Oct 26 at 19:03






  • 1




    @ninjalj, that may be fixed in the glibc fnmatch() and regexp ranges, but not in some like bash that implement its ranges by itself using strcoll(). ksh93 never had the problem because its range implementation uses strcoll() and also check the case of range ends and only match on lowercase characters if both ends are lower case. zsh ranges don't have the issue as it's done based on code point, not strcoll().
    – Stéphane Chazelas
    Oct 28 at 8:07















up vote
8
down vote










up vote
8
down vote









The character ɛ is not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that €uro currency comes close to Europe in dictionary.



Anyway to see what collation you are currently using run locale, the locale -a will give you the list of locales available on the system and to change collation say to C just for one sorting run LC_COLLATE=C sort file. Finally to see how different locales can sort your file try



for loc in $(locale -a)
do echo ____"${loc}"____
LC_COLLATE="$loc" sort file
done


Pipe the result to some greping tool to choose locale that fits your need.






share|improve this answer












The character ɛ is not equal to e, but some locales can gather these signs close together upon collation. The reason for this is language specific, but also some historical or even political background. For example most people probably expect that €uro currency comes close to Europe in dictionary.



Anyway to see what collation you are currently using run locale, the locale -a will give you the list of locales available on the system and to change collation say to C just for one sorting run LC_COLLATE=C sort file. Finally to see how different locales can sort your file try



for loc in $(locale -a)
do echo ____"${loc}"____
LC_COLLATE="$loc" sort file
done


Pipe the result to some greping tool to choose locale that fits your need.







share|improve this answer












share|improve this answer



share|improve this answer










answered Oct 26 at 17:34









jimmij

30.2k867102




30.2k867102












  • This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
    – Draconis
    Oct 26 at 17:47






  • 1




    No, they're not considered identical. Add a plain ea line to the file, then with sort -u you will get both ea and ɛa in the output. The best strategy vs. collate is avoid (export LC_COLLATE=C). Otherwise, many ugly things will happen (eg. /tmp/[a-z] in bash will match /tmp/a and /tmp/A but not /tmp/Z).
    – mosvy
    Oct 26 at 18:13










  • @mosvy Huh, interesting…so they are considered the same for ordering purposes but not for uniqueness purposes?
    – Draconis
    Oct 26 at 18:55










  • they're not considered the same. see here an explanation about it.
    – mosvy
    Oct 26 at 19:03






  • 1




    @ninjalj, that may be fixed in the glibc fnmatch() and regexp ranges, but not in some like bash that implement its ranges by itself using strcoll(). ksh93 never had the problem because its range implementation uses strcoll() and also check the case of range ends and only match on lowercase characters if both ends are lower case. zsh ranges don't have the issue as it's done based on code point, not strcoll().
    – Stéphane Chazelas
    Oct 28 at 8:07




















  • This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
    – Draconis
    Oct 26 at 17:47






  • 1




    No, they're not considered identical. Add a plain ea line to the file, then with sort -u you will get both ea and ɛa in the output. The best strategy vs. collate is avoid (export LC_COLLATE=C). Otherwise, many ugly things will happen (eg. /tmp/[a-z] in bash will match /tmp/a and /tmp/A but not /tmp/Z).
    – mosvy
    Oct 26 at 18:13










  • @mosvy Huh, interesting…so they are considered the same for ordering purposes but not for uniqueness purposes?
    – Draconis
    Oct 26 at 18:55










  • they're not considered the same. see here an explanation about it.
    – mosvy
    Oct 26 at 19:03






  • 1




    @ninjalj, that may be fixed in the glibc fnmatch() and regexp ranges, but not in some like bash that implement its ranges by itself using strcoll(). ksh93 never had the problem because its range implementation uses strcoll() and also check the case of range ends and only match on lowercase characters if both ends are lower case. zsh ranges don't have the issue as it's done based on code point, not strcoll().
    – Stéphane Chazelas
    Oct 28 at 8:07


















This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
– Draconis
Oct 26 at 17:47




This is a wonderful explanation, but the symbols seem to be considered identical, not just close together.
– Draconis
Oct 26 at 17:47




1




1




No, they're not considered identical. Add a plain ea line to the file, then with sort -u you will get both ea and ɛa in the output. The best strategy vs. collate is avoid (export LC_COLLATE=C). Otherwise, many ugly things will happen (eg. /tmp/[a-z] in bash will match /tmp/a and /tmp/A but not /tmp/Z).
– mosvy
Oct 26 at 18:13




No, they're not considered identical. Add a plain ea line to the file, then with sort -u you will get both ea and ɛa in the output. The best strategy vs. collate is avoid (export LC_COLLATE=C). Otherwise, many ugly things will happen (eg. /tmp/[a-z] in bash will match /tmp/a and /tmp/A but not /tmp/Z).
– mosvy
Oct 26 at 18:13












@mosvy Huh, interesting…so they are considered the same for ordering purposes but not for uniqueness purposes?
– Draconis
Oct 26 at 18:55




@mosvy Huh, interesting…so they are considered the same for ordering purposes but not for uniqueness purposes?
– Draconis
Oct 26 at 18:55












they're not considered the same. see here an explanation about it.
– mosvy
Oct 26 at 19:03




they're not considered the same. see here an explanation about it.
– mosvy
Oct 26 at 19:03




1




1




@ninjalj, that may be fixed in the glibc fnmatch() and regexp ranges, but not in some like bash that implement its ranges by itself using strcoll(). ksh93 never had the problem because its range implementation uses strcoll() and also check the case of range ends and only match on lowercase characters if both ends are lower case. zsh ranges don't have the issue as it's done based on code point, not strcoll().
– Stéphane Chazelas
Oct 28 at 8:07






@ninjalj, that may be fixed in the glibc fnmatch() and regexp ranges, but not in some like bash that implement its ranges by itself using strcoll(). ksh93 never had the problem because its range implementation uses strcoll() and also check the case of range ends and only match on lowercase characters if both ends are lower case. zsh ranges don't have the issue as it's done based on code point, not strcoll().
– Stéphane Chazelas
Oct 28 at 8:07




















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f477998%2fwhy-does-sort-say-that-%25c9%259b-e%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre