Difference between [0-9], [[:digit:]] and d











up vote
28
down vote

favorite
6












In the Wikipedia article on Regular expressions, it seems that [[:digit:]] = [0-9] = d.



What are the circumstances where they do not equal? What is the difference?



After some research, I think one difference is that bracket expression [:expr:] is locale dependent.










share|improve this question




















  • 3




    Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
    – igal
    Jan 2 at 3:34










  • @igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
    – harbinn
    Jan 2 at 7:01

















up vote
28
down vote

favorite
6












In the Wikipedia article on Regular expressions, it seems that [[:digit:]] = [0-9] = d.



What are the circumstances where they do not equal? What is the difference?



After some research, I think one difference is that bracket expression [:expr:] is locale dependent.










share|improve this question




















  • 3




    Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
    – igal
    Jan 2 at 3:34










  • @igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
    – harbinn
    Jan 2 at 7:01















up vote
28
down vote

favorite
6









up vote
28
down vote

favorite
6






6





In the Wikipedia article on Regular expressions, it seems that [[:digit:]] = [0-9] = d.



What are the circumstances where they do not equal? What is the difference?



After some research, I think one difference is that bracket expression [:expr:] is locale dependent.










share|improve this question















In the Wikipedia article on Regular expressions, it seems that [[:digit:]] = [0-9] = d.



What are the circumstances where they do not equal? What is the difference?



After some research, I think one difference is that bracket expression [:expr:] is locale dependent.







regular-expression wildcards






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 14:28









muru

35.2k582155




35.2k582155










asked Jan 2 at 3:01









harbinn

32729




32729








  • 3




    Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
    – igal
    Jan 2 at 3:34










  • @igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
    – harbinn
    Jan 2 at 7:01
















  • 3




    Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
    – igal
    Jan 2 at 3:34










  • @igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
    – harbinn
    Jan 2 at 7:01










3




3




Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
– igal
Jan 2 at 3:34




Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
– igal
Jan 2 at 3:34












@igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
– harbinn
Jan 2 at 7:01






@igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
– harbinn
Jan 2 at 7:01












4 Answers
4






active

oldest

votes

















up vote
36
down vote













Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).

In most programming languages (where it is supported) d[[:digit:]] (identical).

The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).



There are many digits in UNICODE, for example:



123456789 # Hindu-Arabic Arabic numerals
٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
०१२३४५६७८९ # DEVANAGARI



All of which may be included in [[:digit:]] or d.



Instead, [0-9] is generally only the ASCII digits 0123456789.





There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:



$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'

$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:



$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):



$ echo "$a" | grep -oP 'p{Nd}+'
0123456789
٠١٢٣٤٥٦٧٨٩
۰۱۲۳۴۵۶۷۸۹
߀߁߂߃߄߅߆߇߈߉
०१२३४५६७८९


Change it to [0-9] to see:



$ echo "$a" | grep -o '[0-9]+'
0123456789


POSIX



For the specific POSIX BRE or ERE:

The d is not supported (not in POSIX but is in GNU grep -P).
[[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9], [0123456789], d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.



As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).



shells



Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):



$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'
۹ ߀߁߂߃߄߅߆߇߈߉ ९


And that is a sure source of bugs waiting to happen.






share|improve this answer























  • In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
    – Stéphane Chazelas
    May 15 at 19:39










  • I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
    – Stéphane Chazelas
    May 15 at 19:46












  • I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
    – Stéphane Chazelas
    May 15 at 20:38




















up vote
12
down vote













This depends on how you define a digit; [0-9] tends to be just the ASCII ones (or possibly something else that is neither ASCII nor a superset of ASCII but the same 10 digits as in ASCII only with different bit representations (EBCDIC)); d on the other hand could either be just the plain digits (old versions of Perl, or modern versions of Perl with the /a regular expression flag enabled) or it could be a Unicode match of p{Digit} which is rather a larger set of digits than [0-9] or /d/a match.



$ perl -E 'say "match" if 42 =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'
$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'
$


perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.



But wait, there's more! The locale may also vary what d matches, so d could match fewer digits than the complete Unicode set of such, and (hopefully, usually) also includes [0-9]. This is similar to the difference in C between isdigit(3) ([0-9]) and isnumber(3) ([0-9 plus whatever else from the locale).



There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:



$ perl -MUnicode::UCD=num -E 'say num(4)'
4
$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'
4
$





share|improve this answer























  • I think isnumber() is a BSD thing, at least based on the man page it seems so
    – ilkkachu
    Jan 2 at 18:06










  • I do have something of a BSD bias, yes
    – thrig
    Jan 2 at 19:18










  • The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
    – Isaac
    Jun 4 at 22:16


















up vote
4
down vote













Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.



            [[:digit:]]    d
grep -E ✓ ×
grep -P ✓ ✓
sed ✓ ×
sed -E ✓ ×


So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.



PS1: If you know more, please expand the table.



PS2: GNU grep 3.1 and GNU 4.4 is used for test.






share|improve this answer



















  • 2




    1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
    – ilkkachu
    Jan 2 at 14:01










  • @ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
    – harbinn
    Jan 2 at 14:39










  • @harbinn Please edit that into your answer.
    – Dan D.
    Jan 3 at 4:56










  • @DanD. the version info added. thx for attention
    – harbinn
    Jan 4 at 0:43








  • 1




    Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
    – Steve Barnes
    Jan 5 at 19:08


















up vote
3
down vote













The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.



Here are some of the more common use cases for matching a digit:





One-shot data extraction



Often, when you want to crunch some numbers, the numbers themselves are in an awkwardly formatted text file. You want to extract them for use in your program. You can probably tell the number format (by looking at the file) and your current locale, so it's ok to use any of the forms, as long as it gets the job done. d requires the fewest keystrokes, so it's very commonly used.



Input sanitizing



You have some untrusted user input (maybe from a web form), and you need to make certain it doesn't contain any surprises. Maybe you want to store it in a numeric field in a database, or use as a parameter to a shell command to run on a server. In this case, you really want [0-9], since it's the most restrictive and predictable one.



Data validation



You have a bit of data that you are not going to use for anything "dangerous", but it would nice to know if it's a number. For example, your program allows the user to input an address, and you want to highlight a possible typo if the input doesn't contain a house number. In this case, you probably want to be as broad as possible, so [[:digit:]] is the way to go.





Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.






share|improve this answer





















  • nice job, Is security problem related, such as ReDoS or others
    – frams
    Jan 4 at 0:56











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414226%2fdifference-between-0-9-digit-and-d%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























4 Answers
4






active

oldest

votes








4 Answers
4






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
36
down vote













Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).

In most programming languages (where it is supported) d[[:digit:]] (identical).

The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).



There are many digits in UNICODE, for example:



123456789 # Hindu-Arabic Arabic numerals
٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
०१२३४५६७८९ # DEVANAGARI



All of which may be included in [[:digit:]] or d.



Instead, [0-9] is generally only the ASCII digits 0123456789.





There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:



$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'

$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:



$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):



$ echo "$a" | grep -oP 'p{Nd}+'
0123456789
٠١٢٣٤٥٦٧٨٩
۰۱۲۳۴۵۶۷۸۹
߀߁߂߃߄߅߆߇߈߉
०१२३४५६७८९


Change it to [0-9] to see:



$ echo "$a" | grep -o '[0-9]+'
0123456789


POSIX



For the specific POSIX BRE or ERE:

The d is not supported (not in POSIX but is in GNU grep -P).
[[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9], [0123456789], d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.



As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).



shells



Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):



$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'
۹ ߀߁߂߃߄߅߆߇߈߉ ९


And that is a sure source of bugs waiting to happen.






share|improve this answer























  • In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
    – Stéphane Chazelas
    May 15 at 19:39










  • I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
    – Stéphane Chazelas
    May 15 at 19:46












  • I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
    – Stéphane Chazelas
    May 15 at 20:38

















up vote
36
down vote













Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).

In most programming languages (where it is supported) d[[:digit:]] (identical).

The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).



There are many digits in UNICODE, for example:



123456789 # Hindu-Arabic Arabic numerals
٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
०१२३४५६७८९ # DEVANAGARI



All of which may be included in [[:digit:]] or d.



Instead, [0-9] is generally only the ASCII digits 0123456789.





There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:



$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'

$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:



$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):



$ echo "$a" | grep -oP 'p{Nd}+'
0123456789
٠١٢٣٤٥٦٧٨٩
۰۱۲۳۴۵۶۷۸۹
߀߁߂߃߄߅߆߇߈߉
०१२३४५६७८९


Change it to [0-9] to see:



$ echo "$a" | grep -o '[0-9]+'
0123456789


POSIX



For the specific POSIX BRE or ERE:

The d is not supported (not in POSIX but is in GNU grep -P).
[[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9], [0123456789], d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.



As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).



shells



Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):



$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'
۹ ߀߁߂߃߄߅߆߇߈߉ ९


And that is a sure source of bugs waiting to happen.






share|improve this answer























  • In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
    – Stéphane Chazelas
    May 15 at 19:39










  • I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
    – Stéphane Chazelas
    May 15 at 19:46












  • I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
    – Stéphane Chazelas
    May 15 at 20:38















up vote
36
down vote










up vote
36
down vote









Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).

In most programming languages (where it is supported) d[[:digit:]] (identical).

The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).



There are many digits in UNICODE, for example:



123456789 # Hindu-Arabic Arabic numerals
٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
०१२३४५६७८९ # DEVANAGARI



All of which may be included in [[:digit:]] or d.



Instead, [0-9] is generally only the ASCII digits 0123456789.





There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:



$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'

$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:



$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):



$ echo "$a" | grep -oP 'p{Nd}+'
0123456789
٠١٢٣٤٥٦٧٨٩
۰۱۲۳۴۵۶۷۸۹
߀߁߂߃߄߅߆߇߈߉
०१२३४५६७८९


Change it to [0-9] to see:



$ echo "$a" | grep -o '[0-9]+'
0123456789


POSIX



For the specific POSIX BRE or ERE:

The d is not supported (not in POSIX but is in GNU grep -P).
[[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9], [0123456789], d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.



As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).



shells



Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):



$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'
۹ ߀߁߂߃߄߅߆߇߈߉ ९


And that is a sure source of bugs waiting to happen.






share|improve this answer














Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).

In most programming languages (where it is supported) d[[:digit:]] (identical).

The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).



There are many digits in UNICODE, for example:



123456789 # Hindu-Arabic Arabic numerals
٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
०१२३४५६७८९ # DEVANAGARI



All of which may be included in [[:digit:]] or d.



Instead, [0-9] is generally only the ASCII digits 0123456789.





There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:



$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'

$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:



$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९


Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):



$ echo "$a" | grep -oP 'p{Nd}+'
0123456789
٠١٢٣٤٥٦٧٨٩
۰۱۲۳۴۵۶۷۸۹
߀߁߂߃߄߅߆߇߈߉
०१२३४५६७८९


Change it to [0-9] to see:



$ echo "$a" | grep -o '[0-9]+'
0123456789


POSIX



For the specific POSIX BRE or ERE:

The d is not supported (not in POSIX but is in GNU grep -P).
[[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9], [0123456789], d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.



As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).



shells



Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):



$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'
۹ ߀߁߂߃߄߅߆߇߈߉ ९


And that is a sure source of bugs waiting to happen.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 23 at 21:57

























answered Jan 2 at 3:44









Isaac

9,91111445




9,91111445












  • In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
    – Stéphane Chazelas
    May 15 at 19:39










  • I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
    – Stéphane Chazelas
    May 15 at 19:46












  • I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
    – Stéphane Chazelas
    May 15 at 20:38




















  • In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
    – Stéphane Chazelas
    May 15 at 19:39










  • I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
    – Stéphane Chazelas
    May 15 at 19:46












  • I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
    – Stéphane Chazelas
    May 15 at 20:38


















In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
– Stéphane Chazelas
May 15 at 19:39




In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
– Stéphane Chazelas
May 15 at 19:39












I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
– Stéphane Chazelas
May 15 at 19:46






I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
– Stéphane Chazelas
May 15 at 19:46














I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
– Stéphane Chazelas
May 15 at 20:38






I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
– Stéphane Chazelas
May 15 at 20:38














up vote
12
down vote













This depends on how you define a digit; [0-9] tends to be just the ASCII ones (or possibly something else that is neither ASCII nor a superset of ASCII but the same 10 digits as in ASCII only with different bit representations (EBCDIC)); d on the other hand could either be just the plain digits (old versions of Perl, or modern versions of Perl with the /a regular expression flag enabled) or it could be a Unicode match of p{Digit} which is rather a larger set of digits than [0-9] or /d/a match.



$ perl -E 'say "match" if 42 =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'
$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'
$


perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.



But wait, there's more! The locale may also vary what d matches, so d could match fewer digits than the complete Unicode set of such, and (hopefully, usually) also includes [0-9]. This is similar to the difference in C between isdigit(3) ([0-9]) and isnumber(3) ([0-9 plus whatever else from the locale).



There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:



$ perl -MUnicode::UCD=num -E 'say num(4)'
4
$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'
4
$





share|improve this answer























  • I think isnumber() is a BSD thing, at least based on the man page it seems so
    – ilkkachu
    Jan 2 at 18:06










  • I do have something of a BSD bias, yes
    – thrig
    Jan 2 at 19:18










  • The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
    – Isaac
    Jun 4 at 22:16















up vote
12
down vote













This depends on how you define a digit; [0-9] tends to be just the ASCII ones (or possibly something else that is neither ASCII nor a superset of ASCII but the same 10 digits as in ASCII only with different bit representations (EBCDIC)); d on the other hand could either be just the plain digits (old versions of Perl, or modern versions of Perl with the /a regular expression flag enabled) or it could be a Unicode match of p{Digit} which is rather a larger set of digits than [0-9] or /d/a match.



$ perl -E 'say "match" if 42 =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'
$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'
$


perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.



But wait, there's more! The locale may also vary what d matches, so d could match fewer digits than the complete Unicode set of such, and (hopefully, usually) also includes [0-9]. This is similar to the difference in C between isdigit(3) ([0-9]) and isnumber(3) ([0-9 plus whatever else from the locale).



There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:



$ perl -MUnicode::UCD=num -E 'say num(4)'
4
$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'
4
$





share|improve this answer























  • I think isnumber() is a BSD thing, at least based on the man page it seems so
    – ilkkachu
    Jan 2 at 18:06










  • I do have something of a BSD bias, yes
    – thrig
    Jan 2 at 19:18










  • The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
    – Isaac
    Jun 4 at 22:16













up vote
12
down vote










up vote
12
down vote









This depends on how you define a digit; [0-9] tends to be just the ASCII ones (or possibly something else that is neither ASCII nor a superset of ASCII but the same 10 digits as in ASCII only with different bit representations (EBCDIC)); d on the other hand could either be just the plain digits (old versions of Perl, or modern versions of Perl with the /a regular expression flag enabled) or it could be a Unicode match of p{Digit} which is rather a larger set of digits than [0-9] or /d/a match.



$ perl -E 'say "match" if 42 =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'
$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'
$


perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.



But wait, there's more! The locale may also vary what d matches, so d could match fewer digits than the complete Unicode set of such, and (hopefully, usually) also includes [0-9]. This is similar to the difference in C between isdigit(3) ([0-9]) and isnumber(3) ([0-9 plus whatever else from the locale).



There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:



$ perl -MUnicode::UCD=num -E 'say num(4)'
4
$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'
4
$





share|improve this answer














This depends on how you define a digit; [0-9] tends to be just the ASCII ones (or possibly something else that is neither ASCII nor a superset of ASCII but the same 10 digits as in ASCII only with different bit representations (EBCDIC)); d on the other hand could either be just the plain digits (old versions of Perl, or modern versions of Perl with the /a regular expression flag enabled) or it could be a Unicode match of p{Digit} which is rather a larger set of digits than [0-9] or /d/a match.



$ perl -E 'say "match" if 42 =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'
match
$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'
$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'
$


perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.



But wait, there's more! The locale may also vary what d matches, so d could match fewer digits than the complete Unicode set of such, and (hopefully, usually) also includes [0-9]. This is similar to the difference in C between isdigit(3) ([0-9]) and isnumber(3) ([0-9 plus whatever else from the locale).



There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:



$ perl -MUnicode::UCD=num -E 'say num(4)'
4
$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'
4
$






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 2 at 15:18

























answered Jan 2 at 3:42









thrig

23.8k12955




23.8k12955












  • I think isnumber() is a BSD thing, at least based on the man page it seems so
    – ilkkachu
    Jan 2 at 18:06










  • I do have something of a BSD bias, yes
    – thrig
    Jan 2 at 19:18










  • The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
    – Isaac
    Jun 4 at 22:16


















  • I think isnumber() is a BSD thing, at least based on the man page it seems so
    – ilkkachu
    Jan 2 at 18:06










  • I do have something of a BSD bias, yes
    – thrig
    Jan 2 at 19:18










  • The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
    – Isaac
    Jun 4 at 22:16
















I think isnumber() is a BSD thing, at least based on the man page it seems so
– ilkkachu
Jan 2 at 18:06




I think isnumber() is a BSD thing, at least based on the man page it seems so
– ilkkachu
Jan 2 at 18:06












I do have something of a BSD bias, yes
– thrig
Jan 2 at 19:18




I do have something of a BSD bias, yes
– thrig
Jan 2 at 19:18












The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
– Isaac
Jun 4 at 22:16




The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
– Isaac
Jun 4 at 22:16










up vote
4
down vote













Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.



            [[:digit:]]    d
grep -E ✓ ×
grep -P ✓ ✓
sed ✓ ×
sed -E ✓ ×


So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.



PS1: If you know more, please expand the table.



PS2: GNU grep 3.1 and GNU 4.4 is used for test.






share|improve this answer



















  • 2




    1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
    – ilkkachu
    Jan 2 at 14:01










  • @ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
    – harbinn
    Jan 2 at 14:39










  • @harbinn Please edit that into your answer.
    – Dan D.
    Jan 3 at 4:56










  • @DanD. the version info added. thx for attention
    – harbinn
    Jan 4 at 0:43








  • 1




    Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
    – Steve Barnes
    Jan 5 at 19:08















up vote
4
down vote













Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.



            [[:digit:]]    d
grep -E ✓ ×
grep -P ✓ ✓
sed ✓ ×
sed -E ✓ ×


So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.



PS1: If you know more, please expand the table.



PS2: GNU grep 3.1 and GNU 4.4 is used for test.






share|improve this answer



















  • 2




    1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
    – ilkkachu
    Jan 2 at 14:01










  • @ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
    – harbinn
    Jan 2 at 14:39










  • @harbinn Please edit that into your answer.
    – Dan D.
    Jan 3 at 4:56










  • @DanD. the version info added. thx for attention
    – harbinn
    Jan 4 at 0:43








  • 1




    Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
    – Steve Barnes
    Jan 5 at 19:08













up vote
4
down vote










up vote
4
down vote









Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.



            [[:digit:]]    d
grep -E ✓ ×
grep -P ✓ ✓
sed ✓ ×
sed -E ✓ ×


So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.



PS1: If you know more, please expand the table.



PS2: GNU grep 3.1 and GNU 4.4 is used for test.






share|improve this answer














Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.



            [[:digit:]]    d
grep -E ✓ ×
grep -P ✓ ✓
sed ✓ ×
sed -E ✓ ×


So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.



PS1: If you know more, please expand the table.



PS2: GNU grep 3.1 and GNU 4.4 is used for test.







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 4 at 0:40

























answered Jan 2 at 13:45









harbinn

32729




32729








  • 2




    1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
    – ilkkachu
    Jan 2 at 14:01










  • @ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
    – harbinn
    Jan 2 at 14:39










  • @harbinn Please edit that into your answer.
    – Dan D.
    Jan 3 at 4:56










  • @DanD. the version info added. thx for attention
    – harbinn
    Jan 4 at 0:43








  • 1




    Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
    – Steve Barnes
    Jan 5 at 19:08














  • 2




    1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
    – ilkkachu
    Jan 2 at 14:01










  • @ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
    – harbinn
    Jan 2 at 14:39










  • @harbinn Please edit that into your answer.
    – Dan D.
    Jan 3 at 4:56










  • @DanD. the version info added. thx for attention
    – harbinn
    Jan 4 at 0:43








  • 1




    Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
    – Steve Barnes
    Jan 5 at 19:08








2




2




1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
– ilkkachu
Jan 2 at 14:01




1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
– ilkkachu
Jan 2 at 14:01












@ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
– harbinn
Jan 2 at 14:39




@ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
– harbinn
Jan 2 at 14:39












@harbinn Please edit that into your answer.
– Dan D.
Jan 3 at 4:56




@harbinn Please edit that into your answer.
– Dan D.
Jan 3 at 4:56












@DanD. the version info added. thx for attention
– harbinn
Jan 4 at 0:43






@DanD. the version info added. thx for attention
– harbinn
Jan 4 at 0:43






1




1




Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
– Steve Barnes
Jan 5 at 19:08




Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
– Steve Barnes
Jan 5 at 19:08










up vote
3
down vote













The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.



Here are some of the more common use cases for matching a digit:





One-shot data extraction



Often, when you want to crunch some numbers, the numbers themselves are in an awkwardly formatted text file. You want to extract them for use in your program. You can probably tell the number format (by looking at the file) and your current locale, so it's ok to use any of the forms, as long as it gets the job done. d requires the fewest keystrokes, so it's very commonly used.



Input sanitizing



You have some untrusted user input (maybe from a web form), and you need to make certain it doesn't contain any surprises. Maybe you want to store it in a numeric field in a database, or use as a parameter to a shell command to run on a server. In this case, you really want [0-9], since it's the most restrictive and predictable one.



Data validation



You have a bit of data that you are not going to use for anything "dangerous", but it would nice to know if it's a number. For example, your program allows the user to input an address, and you want to highlight a possible typo if the input doesn't contain a house number. In this case, you probably want to be as broad as possible, so [[:digit:]] is the way to go.





Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.






share|improve this answer





















  • nice job, Is security problem related, such as ReDoS or others
    – frams
    Jan 4 at 0:56















up vote
3
down vote













The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.



Here are some of the more common use cases for matching a digit:





One-shot data extraction



Often, when you want to crunch some numbers, the numbers themselves are in an awkwardly formatted text file. You want to extract them for use in your program. You can probably tell the number format (by looking at the file) and your current locale, so it's ok to use any of the forms, as long as it gets the job done. d requires the fewest keystrokes, so it's very commonly used.



Input sanitizing



You have some untrusted user input (maybe from a web form), and you need to make certain it doesn't contain any surprises. Maybe you want to store it in a numeric field in a database, or use as a parameter to a shell command to run on a server. In this case, you really want [0-9], since it's the most restrictive and predictable one.



Data validation



You have a bit of data that you are not going to use for anything "dangerous", but it would nice to know if it's a number. For example, your program allows the user to input an address, and you want to highlight a possible typo if the input doesn't contain a house number. In this case, you probably want to be as broad as possible, so [[:digit:]] is the way to go.





Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.






share|improve this answer





















  • nice job, Is security problem related, such as ReDoS or others
    – frams
    Jan 4 at 0:56













up vote
3
down vote










up vote
3
down vote









The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.



Here are some of the more common use cases for matching a digit:





One-shot data extraction



Often, when you want to crunch some numbers, the numbers themselves are in an awkwardly formatted text file. You want to extract them for use in your program. You can probably tell the number format (by looking at the file) and your current locale, so it's ok to use any of the forms, as long as it gets the job done. d requires the fewest keystrokes, so it's very commonly used.



Input sanitizing



You have some untrusted user input (maybe from a web form), and you need to make certain it doesn't contain any surprises. Maybe you want to store it in a numeric field in a database, or use as a parameter to a shell command to run on a server. In this case, you really want [0-9], since it's the most restrictive and predictable one.



Data validation



You have a bit of data that you are not going to use for anything "dangerous", but it would nice to know if it's a number. For example, your program allows the user to input an address, and you want to highlight a possible typo if the input doesn't contain a house number. In this case, you probably want to be as broad as possible, so [[:digit:]] is the way to go.





Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.






share|improve this answer












The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.



Here are some of the more common use cases for matching a digit:





One-shot data extraction



Often, when you want to crunch some numbers, the numbers themselves are in an awkwardly formatted text file. You want to extract them for use in your program. You can probably tell the number format (by looking at the file) and your current locale, so it's ok to use any of the forms, as long as it gets the job done. d requires the fewest keystrokes, so it's very commonly used.



Input sanitizing



You have some untrusted user input (maybe from a web form), and you need to make certain it doesn't contain any surprises. Maybe you want to store it in a numeric field in a database, or use as a parameter to a shell command to run on a server. In this case, you really want [0-9], since it's the most restrictive and predictable one.



Data validation



You have a bit of data that you are not going to use for anything "dangerous", but it would nice to know if it's a number. For example, your program allows the user to input an address, and you want to highlight a possible typo if the input doesn't contain a house number. In this case, you probably want to be as broad as possible, so [[:digit:]] is the way to go.





Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.







share|improve this answer












share|improve this answer



share|improve this answer










answered Jan 3 at 7:18









Bass

21113




21113












  • nice job, Is security problem related, such as ReDoS or others
    – frams
    Jan 4 at 0:56


















  • nice job, Is security problem related, such as ReDoS or others
    – frams
    Jan 4 at 0:56
















nice job, Is security problem related, such as ReDoS or others
– frams
Jan 4 at 0:56




nice job, Is security problem related, such as ReDoS or others
– frams
Jan 4 at 0:56


















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414226%2fdifference-between-0-9-digit-and-d%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre