Difference between [0-9], [[:digit:]] and d

up vote
28
down vote

favorite

In the Wikipedia article on Regular expressions, it seems that [[:digit:]] = [0-9] = d.

What are the circumstances where they do not equal? What is the difference?

After some research, I think one difference is that bracket expression [:expr:] is locale dependent.

edited Jan 2 at 14:28

muru

35.2k582155

asked Jan 2 at 3:01

harbinn

32729

3

Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
– igal
Jan 2 at 3:34

@igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
– harbinn
Jan 2 at 7:01

add a comment |

up vote
28
down vote

favorite

In the Wikipedia article on Regular expressions, it seems that [[:digit:]] = [0-9] = d.

What are the circumstances where they do not equal? What is the difference?

After some research, I think one difference is that bracket expression [:expr:] is locale dependent.

edited Jan 2 at 14:28

muru

35.2k582155

asked Jan 2 at 3:01

harbinn

32729

3

Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
– igal
Jan 2 at 3:34

@igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
– harbinn
Jan 2 at 7:01

add a comment |

up vote
28
down vote

favorite

In the Wikipedia article on Regular expressions, it seems that [[:digit:]] = [0-9] = d.

What are the circumstances where they do not equal? What is the difference?

After some research, I think one difference is that bracket expression [:expr:] is locale dependent.

edited Jan 2 at 14:28

muru

35.2k582155

asked Jan 2 at 3:01

harbinn

32729

In the Wikipedia article on Regular expressions, it seems that [[:digit:]] = [0-9] = d.

What are the circumstances where they do not equal? What is the difference?

After some research, I think one difference is that bracket expression [:expr:] is locale dependent.

regular-expression wildcards

edited Jan 2 at 14:28

muru

35.2k582155

asked Jan 2 at 3:01

harbinn

32729

edited Jan 2 at 14:28

muru

35.2k582155

asked Jan 2 at 3:01

harbinn

32729

edited Jan 2 at 14:28

muru

35.2k582155

edited Jan 2 at 14:28

muru

35.2k582155

edited Jan 2 at 14:28

muru

35.2k582155

asked Jan 2 at 3:01

harbinn

32729

asked Jan 2 at 3:01

harbinn

32729

asked Jan 2 at 3:01

harbinn

32729

3

Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
– igal
Jan 2 at 3:34

@igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
– harbinn
Jan 2 at 7:01

add a comment |

3

Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
– igal
Jan 2 at 3:34

@igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
– harbinn
Jan 2 at 7:01

Doesn't the Wikipedia article that you linked to answer your question? Different regular expression processors/engines support different syntaxes for character classes (among other things).
– igal
Jan 2 at 3:34

@igal wiki says there is difference but doesn't give much detail. I'm asking the detail, something like isaac, thrig said. I'm pretty interested in their difference in grep, sed, awk... whether GNU version or not.
– harbinn
Jan 2 at 7:01

add a comment |

4 Answers
4

active

oldest

votes

up vote
36
down vote

Yes, it is [[:digit:]] ~ [0-9] ~ d (where ~ means aproximate).

In most programming languages (where it is supported) d ≡ [[:digit:]] (identical).

The d is less common than [[:digit:]] (not in POSIX but it is in GNU grep -P).

There are many digits in UNICODE, for example:

123456789 # Hindu-Arabic Arabic numerals
٠١٢٣٤٥٦٧٨٩ # ARABIC-INDIC
۰۱۲۳۴۵۶۷۸۹ # EXTENDED ARABIC-INDIC/PERSIAN
߀߁߂߃߄߅߆߇߈߉ # NKO DIGIT
०१२३४५६७८९ # DEVANAGARI

All of which may be included in [[:digit:]] or d.

Instead, [0-9] is generally only the ASCII digits 0123456789.

There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:

$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'



$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:

$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):

$ echo "$a" | grep -oP 'p{Nd}+'

0123456789

٠١٢٣٤٥٦٧٨٩

۰۱۲۳۴۵۶۷۸۹

߀߁߂߃߄߅߆߇߈߉

०१२३४५६७८९

Change it to [0-9] to see:

$ echo "$a" | grep -o '[0-9]+'

0123456789

POSIX

For the specific POSIX BRE or ERE:

The d is not supported (not in POSIX but is in GNU grep -P).
[[:digit:]] is required by POSIX to correspond to the digit character class, which in turn is required by ISO C to be the characters 0 through 9 and nothing else. So only in C locale all [0-9], [0123456789], d and [[:digit:]] mean exactly the same. The [0123456789] has no possible misinterpretations, [[:digit:]] is available in more utilities and it is common to mean only [0123456789]. The d is supported by few utilities.

As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).

shells

Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):

$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'

  ۹ ߀߁߂߃߄߅߆߇߈߉ ९

And that is a sure source of bugs waiting to happen.

edited Nov 23 at 21:57

answered Jan 2 at 3:44

Isaac

9,91111445

In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
– Stéphane Chazelas
May 15 at 19:39

I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
– Stéphane Chazelas
May 15 at 19:46

I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
– Stéphane Chazelas
May 15 at 20:38

add a comment |

up vote
12
down vote

This depends on how you define a digit; [0-9] tends to be just the ASCII ones (or possibly something else that is neither ASCII nor a superset of ASCII but the same 10 digits as in ASCII only with different bit representations (EBCDIC)); d on the other hand could either be just the plain digits (old versions of Perl, or modern versions of Perl with the /a regular expression flag enabled) or it could be a Unicode match of p{Digit} which is rather a larger set of digits than [0-9] or /d/a match.

$ perl -E 'say "match" if 42 =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'

$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'

$

perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.

But wait, there's more! The locale may also vary what d matches, so d could match fewer digits than the complete Unicode set of such, and (hopefully, usually) also includes [0-9]. This is similar to the difference in C between isdigit(3) ([0-9]) and isnumber(3) ([0-9 plus whatever else from the locale).

There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:

$ perl -MUnicode::UCD=num -E 'say num(4)'

4

$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'

4

$

edited Jan 2 at 15:18

answered Jan 2 at 3:42

thrig

23.8k12955

I think isnumber() is a BSD thing, at least based on the man page it seems so
– ilkkachu
Jan 2 at 18:06

I do have something of a BSD bias, yes
– thrig
Jan 2 at 19:18

The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
– Isaac
Jun 4 at 22:16

add a comment |

up vote
4
down vote

Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.

            [[:digit:]]    d

grep -E               ✓     ×

grep -P               ✓     ✓

sed                   ✓     ×

sed -E                ✓     ×

So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.

PS1: If you know more, please expand the table.

PS2: GNU grep 3.1 and GNU 4.4 is used for test.

edited Jan 4 at 0:40

answered Jan 2 at 13:45

harbinn

32729

2

1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
– ilkkachu
Jan 2 at 14:01

@ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
– harbinn
Jan 2 at 14:39

@harbinn Please edit that into your answer.
– Dan D.
Jan 3 at 4:56

@DanD. the version info added. thx for attention
– harbinn
Jan 4 at 0:43

1

Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
– Steve Barnes
Jan 5 at 19:08

add a comment |

up vote
3
down vote

The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.

Here are some of the more common use cases for matching a digit:

One-shot data extraction

Often, when you want to crunch some numbers, the numbers themselves are in an awkwardly formatted text file. You want to extract them for use in your program. You can probably tell the number format (by looking at the file) and your current locale, so it's ok to use any of the forms, as long as it gets the job done. d requires the fewest keystrokes, so it's very commonly used.

Input sanitizing

You have some untrusted user input (maybe from a web form), and you need to make certain it doesn't contain any surprises. Maybe you want to store it in a numeric field in a database, or use as a parameter to a shell command to run on a server. In this case, you really want [0-9], since it's the most restrictive and predictable one.

Data validation

You have a bit of data that you are not going to use for anything "dangerous", but it would nice to know if it's a number. For example, your program allows the user to input an address, and you want to highlight a possible typo if the input doesn't contain a house number. In this case, you probably want to be as broad as possible, so [[:digit:]] is the way to go.

Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.

answered Jan 3 at 7:18

Bass

21113

nice job, Is security problem related, such as ReDoS or others
– frams
Jan 4 at 0:56

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f414226%2fdifference-between-0-9-digit-and-d%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
36
down vote

There are many digits in UNICODE, for example:

All of which may be included in [[:digit:]] or d.

Instead, [0-9] is generally only the ASCII digits 0123456789.

There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:

$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'



$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:

$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):

$ echo "$a" | grep -oP 'p{Nd}+'

0123456789

٠١٢٣٤٥٦٧٨٩

۰۱۲۳۴۵۶۷۸۹

߀߁߂߃߄߅߆߇߈߉

०१२३४५६७८९

Change it to [0-9] to see:

$ echo "$a" | grep -o '[0-9]+'

0123456789

POSIX

As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).

shells

Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):

$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'

  ۹ ߀߁߂߃߄߅߆߇߈߉ ९

And that is a sure source of bugs waiting to happen.

edited Nov 23 at 21:57

answered Jan 2 at 3:44

Isaac

9,91111445

In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
– Stéphane Chazelas
May 15 at 19:39

I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
– Stéphane Chazelas
May 15 at 19:46

I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
– Stéphane Chazelas
May 15 at 20:38

add a comment |

up vote
36
down vote

There are many digits in UNICODE, for example:

All of which may be included in [[:digit:]] or d.

Instead, [0-9] is generally only the ASCII digits 0123456789.

There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:

$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'



$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:

$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):

$ echo "$a" | grep -oP 'p{Nd}+'

0123456789

٠١٢٣٤٥٦٧٨٩

۰۱۲۳۴۵۶۷۸۹

߀߁߂߃߄߅߆߇߈߉

०१२३४५६७८९

Change it to [0-9] to see:

$ echo "$a" | grep -o '[0-9]+'

0123456789

POSIX

As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).

shells

Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):

$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'

  ۹ ߀߁߂߃߄߅߆߇߈߉ ९

And that is a sure source of bugs waiting to happen.

edited Nov 23 at 21:57

answered Jan 2 at 3:44

Isaac

9,91111445

In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
– Stéphane Chazelas
May 15 at 19:39

I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
– Stéphane Chazelas
May 15 at 19:46

I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
– Stéphane Chazelas
May 15 at 20:38

add a comment |

up vote
36
down vote

There are many digits in UNICODE, for example:

All of which may be included in [[:digit:]] or d.

Instead, [0-9] is generally only the ASCII digits 0123456789.

There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:

$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'



$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:

$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):

$ echo "$a" | grep -oP 'p{Nd}+'

0123456789

٠١٢٣٤٥٦٧٨٩

۰۱۲۳۴۵۶۷۸۹

߀߁߂߃߄߅߆߇߈߉

०१२३४५६७८९

Change it to [0-9] to see:

$ echo "$a" | grep -o '[0-9]+'

0123456789

POSIX

As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).

shells

Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):

$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'

  ۹ ߀߁߂߃߄߅߆߇߈߉ ९

And that is a sure source of bugs waiting to happen.

edited Nov 23 at 21:57

answered Jan 2 at 3:44

Isaac

9,91111445

There are many digits in UNICODE, for example:

All of which may be included in [[:digit:]] or d.

Instead, [0-9] is generally only the ASCII digits 0123456789.

There are many languages: Perl, Java, Python, C. In which [[:digit:]] (and d) calls for an extended meaning. For example, this perl code will match all the digits from above:

$ a='0123456789 ٠١٢٣٤٥٦٧٨٩ ۰۱۲۳۴۵۶۷۸۹ ߀߁߂߃߄߅߆߇߈߉ ०१२३४५६७८९'



$ echo "$a" | perl -C -pe 's/[^d]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which is equivalent to select all characters that have the Unicode properties of Numeric and digits:

$ echo "$a" | perl -C -pe 's/[^p{Nd}]//g;' ; echo

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९

Which grep could reproduce (the specific version of pcre may have a diferent internal list of numeric code points than Perl):

$ echo "$a" | grep -oP 'p{Nd}+'

0123456789

٠١٢٣٤٥٦٧٨٩

۰۱۲۳۴۵۶۷۸۹

߀߁߂߃߄߅߆߇߈߉

०१२३४५६७८९

Change it to [0-9] to see:

$ echo "$a" | grep -o '[0-9]+'

0123456789

POSIX

As for [0-9], the meaning of range expressions is only defined by POSIX in the C locale; in other locales it might be different (might be codepoint order or collation order or something else).

shells

Some implementations may understand a range to be something different than plain ASCII order (ksh93 for example):

$ LC_ALL=en_US.utf8 ksh -c 'a="'"$a"'";echo "${a//[0-9]}"'

  ۹ ߀߁߂߃߄߅߆߇߈߉ ९

And that is a sure source of bugs waiting to happen.

edited Nov 23 at 21:57

answered Jan 2 at 3:44

Isaac

9,91111445

edited Nov 23 at 21:57

answered Jan 2 at 3:44

Isaac

9,91111445

answered Jan 2 at 3:44

Isaac

9,91111445

answered Jan 2 at 3:44

Isaac

9,91111445

In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
– Stéphane Chazelas
May 15 at 19:39

I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
– Stéphane Chazelas
May 15 at 19:46

I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
– Stéphane Chazelas
May 15 at 20:38

add a comment |

In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
– Stéphane Chazelas
May 15 at 19:39

I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
– Stéphane Chazelas
May 15 at 19:46

I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
– Stéphane Chazelas
May 15 at 20:38

In practice on POSIX systems, iswctype() and BRE/ERE/wildcards in POSIX utilities, [0-9] and [[:digit:]] match on 0123456789 only. And that will be made explicit in the next revision of the standard
– Stéphane Chazelas
May 15 at 19:39

I wasn't aware that perl's d in Unicode mode matched on decimal digits from other scripts. Thanks for that. With PCRE, see (*UCP) as in GNU grep -Po '(*UCP)d' or grep -Po '(*UCP)[[:digit:]] for classes to be based on Unicode properties.
– Stéphane Chazelas
May 15 at 19:46

I agree that the [:digit:] syntax would suggest that you want to use localization, that is whatever the user considers as being a digit. I never use [:digit:] because in practice that's the same as [0-9] and in any case, invariably I want to match on 0123456789, I never mean to match on ٠١٢٣٤٥٦٧٨٩, and I can't think of a use case where one would want to match on a decimal digit in any script with POSIX utilities. See also the current discussion about [:blank:] on the zsh ML. Those character classes are a bit of a mess.
– Stéphane Chazelas
May 15 at 20:38

add a comment |

up vote
12
down vote

$ perl -E 'say "match" if 42 =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'

$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'

$

perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.

There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:

$ perl -MUnicode::UCD=num -E 'say num(4)'

4

$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'

4

$

edited Jan 2 at 15:18

answered Jan 2 at 3:42

thrig

23.8k12955

I think isnumber() is a BSD thing, at least based on the man page it seems so
– ilkkachu
Jan 2 at 18:06

I do have something of a BSD bias, yes
– thrig
Jan 2 at 19:18

The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
– Isaac
Jun 4 at 22:16

add a comment |

up vote
12
down vote

$ perl -E 'say "match" if 42 =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'

$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'

$

perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.

There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:

$ perl -MUnicode::UCD=num -E 'say num(4)'

4

$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'

4

$

edited Jan 2 at 15:18

answered Jan 2 at 3:42

thrig

23.8k12955

I think isnumber() is a BSD thing, at least based on the man page it seems so
– ilkkachu
Jan 2 at 18:06

I do have something of a BSD bias, yes
– thrig
Jan 2 at 19:18

The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
– Isaac
Jun 4 at 22:16

add a comment |

up vote
12
down vote

$ perl -E 'say "match" if 42 =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'

$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'

$

perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.

There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:

$ perl -MUnicode::UCD=num -E 'say num(4)'

4

$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'

4

$

edited Jan 2 at 15:18

answered Jan 2 at 3:42

thrig

23.8k12955

$ perl -E 'say "match" if 42 =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/'

match

$ perl -E 'say "match" if "N{U+09EA}" =~ m/d/a'

$ perl -E 'say "match" if "N{U+09EA}" =~ m/[0-9]/'

$

perldoc perlrecharclass for more information, or consult the documentation for the language in question to see how it behaves.

There may be calls that can be made to obtain the value of the digit, even if it is not [0-9]:

$ perl -MUnicode::UCD=num -E 'say num(4)'

4

$ perl -MUnicode::UCD=num -E 'say num("N{U+09EA}")'

4

$

edited Jan 2 at 15:18

answered Jan 2 at 3:42

thrig

23.8k12955

edited Jan 2 at 15:18

answered Jan 2 at 3:42

thrig

23.8k12955

answered Jan 2 at 3:42

thrig

23.8k12955

answered Jan 2 at 3:42

thrig

23.8k12955

I think isnumber() is a BSD thing, at least based on the man page it seems so
– ilkkachu
Jan 2 at 18:06

I do have something of a BSD bias, yes
– thrig
Jan 2 at 19:18

The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
– Isaac
Jun 4 at 22:16

add a comment |

I think isnumber() is a BSD thing, at least based on the man page it seems so
– ilkkachu
Jan 2 at 18:06

I do have something of a BSD bias, yes
– thrig
Jan 2 at 19:18

The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
– Isaac
Jun 4 at 22:16

I think isnumber() is a BSD thing, at least based on the man page it seems so
– ilkkachu
Jan 2 at 18:06

I do have something of a BSD bias, yes
– thrig
Jan 2 at 19:18

The /a flag is an specific limiter to reduce the list of Unicode digits to match only …the /a modifier can be used to force d to match just the ASCII 0 through 9.. As such, it is forcing to match exactly the same and only [0-9].
– Isaac
Jun 4 at 22:16

add a comment |

up vote
4
down vote

Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.

            [[:digit:]]    d

grep -E               ✓     ×

grep -P               ✓     ✓

sed                   ✓     ×

sed -E                ✓     ×

So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.

PS1: If you know more, please expand the table.

PS2: GNU grep 3.1 and GNU 4.4 is used for test.

edited Jan 4 at 0:40

answered Jan 2 at 13:45

harbinn

32729

2

1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
– ilkkachu
Jan 2 at 14:01

@ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
– harbinn
Jan 2 at 14:39

@harbinn Please edit that into your answer.
– Dan D.
Jan 3 at 4:56

@DanD. the version info added. thx for attention
– harbinn
Jan 4 at 0:43

1

Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
– Steve Barnes
Jan 5 at 19:08

add a comment |

up vote
4
down vote

Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.

            [[:digit:]]    d

grep -E               ✓     ×

grep -P               ✓     ✓

sed                   ✓     ×

sed -E                ✓     ×

So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.

PS1: If you know more, please expand the table.

PS2: GNU grep 3.1 and GNU 4.4 is used for test.

edited Jan 4 at 0:40

answered Jan 2 at 13:45

harbinn

32729

2

1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
– ilkkachu
Jan 2 at 14:01

@ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
– harbinn
Jan 2 at 14:39

@harbinn Please edit that into your answer.
– Dan D.
Jan 3 at 4:56

@DanD. the version info added. thx for attention
– harbinn
Jan 4 at 0:43

1

Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
– Steve Barnes
Jan 5 at 19:08

add a comment |

up vote
4
down vote

Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.

            [[:digit:]]    d

grep -E               ✓     ×

grep -P               ✓     ✓

sed                   ✓     ×

sed -E                ✓     ×

So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.

PS1: If you know more, please expand the table.

PS2: GNU grep 3.1 and GNU 4.4 is used for test.

edited Jan 4 at 0:40

answered Jan 2 at 13:45

harbinn

32729

Different meaning of [0-9], [[:digit:]] and d are presented in other answers. Here I would like to add differences in implementation of regex engine.

            [[:digit:]]    d

grep -E               ✓     ×

grep -P               ✓     ✓

sed                   ✓     ×

sed -E                ✓     ×

So [[:digit:]] always works, d depends. In grep's manual it's mentioned that [[:digit:]] is just 0-9 in the C locale.

PS1: If you know more, please expand the table.

PS2: GNU grep 3.1 and GNU 4.4 is used for test.

edited Jan 4 at 0:40

answered Jan 2 at 13:45

harbinn

32729

edited Jan 4 at 0:40

answered Jan 2 at 13:45

harbinn

32729

answered Jan 2 at 13:45

harbinn

32729

answered Jan 2 at 13:45

harbinn

32729

2

1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
– ilkkachu
Jan 2 at 14:01

@ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
– harbinn
Jan 2 at 14:39

@harbinn Please edit that into your answer.
– Dan D.
Jan 3 at 4:56

@DanD. the version info added. thx for attention
– harbinn
Jan 4 at 0:43

1

Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
– Steve Barnes
Jan 5 at 19:08

add a comment |

2

1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
– ilkkachu
Jan 2 at 14:01

@ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
– harbinn
Jan 2 at 14:39

@harbinn Please edit that into your answer.
– Dan D.
Jan 3 at 4:56

@DanD. the version info added. thx for attention
– harbinn
Jan 4 at 0:43

1

Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
– Steve Barnes
Jan 5 at 19:08

1) There are many versions of grep and sed, with the biggest difference probably between the GNU versions vs. others. This answer might be more useful if it mentioned which version of grep and sed it refers to. Or what the source of that table is, for that matter. 2) that table might as well be transcribed to text, since it doesn't contain anything that requires it to be an image
– ilkkachu
Jan 2 at 14:01

@ilkkachu 1) latest GNU grep 3.1 and GNU 4.4 is used for test. 2) I don't how to create table. It seems that @ muru has converted the table to a pretty text form.
– harbinn
Jan 2 at 14:39

@harbinn Please edit that into your answer.
– Dan D.
Jan 3 at 4:56

@DanD. the version info added. thx for attention
– harbinn
Jan 4 at 0:43

Note that the python built in re module does not support [[:digit:]] but the add in library regex does support it so I would niggle a little at the always works. It always works in posix complaint situations.
– Steve Barnes
Jan 5 at 19:08

add a comment |

up vote
3
down vote

The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.

Here are some of the more common use cases for matching a digit:

One-shot data extraction

Input sanitizing

Data validation

Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.

answered Jan 3 at 7:18

Bass

21113

nice job, Is security problem related, such as ReDoS or others
– frams
Jan 4 at 0:56

add a comment |

up vote
3
down vote

The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.

Here are some of the more common use cases for matching a digit:

One-shot data extraction

Input sanitizing

Data validation

Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.

answered Jan 3 at 7:18

Bass

21113

nice job, Is security problem related, such as ReDoS or others
– frams
Jan 4 at 0:56

add a comment |

up vote
3
down vote

The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.

Here are some of the more common use cases for matching a digit:

One-shot data extraction

Input sanitizing

Data validation

Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.

answered Jan 3 at 7:18

Bass

21113

The theoretical differences have already been pretty well explained in the other answers, so it remains to explain the practical differences.

Here are some of the more common use cases for matching a digit:

One-shot data extraction

Input sanitizing

Data validation

Those would seem to be the three most common use cases for digit matching. If you think I missed an important one, please drop a comment.

answered Jan 3 at 7:18

Bass

21113

answered Jan 3 at 7:18

Bass

21113

answered Jan 3 at 7:18

Bass

21113

answered Jan 3 at 7:18

Bass

21113

nice job, Is security problem related, such as ReDoS or others
– frams
Jan 4 at 0:56

add a comment |

nice job, Is security problem related, such as ReDoS or others
– frams
Jan 4 at 0:56

nice job, Is security problem related, such as ReDoS or others
– frams
Jan 4 at 0:56

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk

Difference between [0-9], [[:digit:]] and d

4 Answers
4

POSIX

shells

One-shot data extraction

Input sanitizing

Data validation

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

POSIX

shells

POSIX

shells

POSIX

shells

POSIX

shells

One-shot data extraction

Input sanitizing

Data validation

One-shot data extraction

Input sanitizing

Data validation

One-shot data extraction

Input sanitizing

Data validation

One-shot data extraction

Input sanitizing

Data validation

Post as a guest

Popular posts from this blog

Scott Moir

Województwo

What dialect is “You wants I should do it for ya?”

Difference between [0-9], [[:digit:]] and d

4 Answers 4

POSIX

shells

One-shot data extraction

Input sanitizing

Data validation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

POSIX

shells

POSIX

shells

POSIX

shells

POSIX

shells

One-shot data extraction

Input sanitizing

Data validation

One-shot data extraction

Input sanitizing

Data validation

One-shot data extraction

Input sanitizing

Data validation

One-shot data extraction

Input sanitizing

Data validation

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Scott Moir

Województwo

What dialect is “You wants I should do it for ya?”

4 Answers
4

4 Answers
4

4 Answers
4