Remove all html special chars
Before saving the input from users in the database, I am using the below function to replace all HTML special chars. All my users will always be using the English Language.
Option 1
function clean($var) {
$regEx="/[^a-zA-Z0-9 -_]/";
$var = preg_replace($regEx, "", $var);
return $var;
}
I want users to be able to only store
- Letters (a to z) (Case Insensitive)
- Numbers (0 to 9)
- Space, Dash, Underscore
Will the above function is good for this job, or should I be using a more efficient/inbuilt function in PHP?
This is how I am using the function.
$userInput = htmlspecialchars(clean($userInput));
Option 2
function h($str_to_encode = ""){
// Pregmatch will replacte all HTML characters with Empty string
return preg_replace("/&#?[a-z0-9]{2,8};/i","", htmlspecialchars($str_to_encode));
}
Regex from : https://stackoverflow.com/a/657670/4050261
php html regex comparative-review
add a comment |
Before saving the input from users in the database, I am using the below function to replace all HTML special chars. All my users will always be using the English Language.
Option 1
function clean($var) {
$regEx="/[^a-zA-Z0-9 -_]/";
$var = preg_replace($regEx, "", $var);
return $var;
}
I want users to be able to only store
- Letters (a to z) (Case Insensitive)
- Numbers (0 to 9)
- Space, Dash, Underscore
Will the above function is good for this job, or should I be using a more efficient/inbuilt function in PHP?
This is how I am using the function.
$userInput = htmlspecialchars(clean($userInput));
Option 2
function h($str_to_encode = ""){
// Pregmatch will replacte all HTML characters with Empty string
return preg_replace("/&#?[a-z0-9]{2,8};/i","", htmlspecialchars($str_to_encode));
}
Regex from : https://stackoverflow.com/a/657670/4050261
php html regex comparative-review
3
"All my users will always be using the English language." doesn't imply that only the letters froma-z
are valid. Take a look at this: English words with diacritics.
– insertusernamehere
Feb 2 '18 at 10:32
You haven't told us much about why you're doing this, but I get the sense that this is almost certainly the wrong thing to do.
– 200_success
Feb 2 '18 at 15:03
@200_success, since it is a small project, want to reduce functionality for better security.
– Adarsh
Feb 2 '18 at 19:42
add a comment |
Before saving the input from users in the database, I am using the below function to replace all HTML special chars. All my users will always be using the English Language.
Option 1
function clean($var) {
$regEx="/[^a-zA-Z0-9 -_]/";
$var = preg_replace($regEx, "", $var);
return $var;
}
I want users to be able to only store
- Letters (a to z) (Case Insensitive)
- Numbers (0 to 9)
- Space, Dash, Underscore
Will the above function is good for this job, or should I be using a more efficient/inbuilt function in PHP?
This is how I am using the function.
$userInput = htmlspecialchars(clean($userInput));
Option 2
function h($str_to_encode = ""){
// Pregmatch will replacte all HTML characters with Empty string
return preg_replace("/&#?[a-z0-9]{2,8};/i","", htmlspecialchars($str_to_encode));
}
Regex from : https://stackoverflow.com/a/657670/4050261
php html regex comparative-review
Before saving the input from users in the database, I am using the below function to replace all HTML special chars. All my users will always be using the English Language.
Option 1
function clean($var) {
$regEx="/[^a-zA-Z0-9 -_]/";
$var = preg_replace($regEx, "", $var);
return $var;
}
I want users to be able to only store
- Letters (a to z) (Case Insensitive)
- Numbers (0 to 9)
- Space, Dash, Underscore
Will the above function is good for this job, or should I be using a more efficient/inbuilt function in PHP?
This is how I am using the function.
$userInput = htmlspecialchars(clean($userInput));
Option 2
function h($str_to_encode = ""){
// Pregmatch will replacte all HTML characters with Empty string
return preg_replace("/&#?[a-z0-9]{2,8};/i","", htmlspecialchars($str_to_encode));
}
Regex from : https://stackoverflow.com/a/657670/4050261
php html regex comparative-review
php html regex comparative-review
edited Feb 2 '18 at 14:56
200_success
129k15152414
129k15152414
asked Feb 2 '18 at 9:56
AdarshAdarsh
1465
1465
3
"All my users will always be using the English language." doesn't imply that only the letters froma-z
are valid. Take a look at this: English words with diacritics.
– insertusernamehere
Feb 2 '18 at 10:32
You haven't told us much about why you're doing this, but I get the sense that this is almost certainly the wrong thing to do.
– 200_success
Feb 2 '18 at 15:03
@200_success, since it is a small project, want to reduce functionality for better security.
– Adarsh
Feb 2 '18 at 19:42
add a comment |
3
"All my users will always be using the English language." doesn't imply that only the letters froma-z
are valid. Take a look at this: English words with diacritics.
– insertusernamehere
Feb 2 '18 at 10:32
You haven't told us much about why you're doing this, but I get the sense that this is almost certainly the wrong thing to do.
– 200_success
Feb 2 '18 at 15:03
@200_success, since it is a small project, want to reduce functionality for better security.
– Adarsh
Feb 2 '18 at 19:42
3
3
"All my users will always be using the English language." doesn't imply that only the letters from
a-z
are valid. Take a look at this: English words with diacritics.– insertusernamehere
Feb 2 '18 at 10:32
"All my users will always be using the English language." doesn't imply that only the letters from
a-z
are valid. Take a look at this: English words with diacritics.– insertusernamehere
Feb 2 '18 at 10:32
You haven't told us much about why you're doing this, but I get the sense that this is almost certainly the wrong thing to do.
– 200_success
Feb 2 '18 at 15:03
You haven't told us much about why you're doing this, but I get the sense that this is almost certainly the wrong thing to do.
– 200_success
Feb 2 '18 at 15:03
@200_success, since it is a small project, want to reduce functionality for better security.
– Adarsh
Feb 2 '18 at 19:42
@200_success, since it is a small project, want to reduce functionality for better security.
– Adarsh
Feb 2 '18 at 19:42
add a comment |
2 Answers
2
active
oldest
votes
Your regex [^a-zA-Z0-9 -_]
matches everything that is not a
to z
, A
to Z
, 0
to 9
and space
to _
, this last range includes all character between hexa020
and hexa5F
(ie for example !
, "
, #
, $
, %
and many other), in a character class, -
must be escaped or place at the beginning or at the end like:
[^a-zA-Z0-9 -_]
[^a-zA-Z0-9 _-]
[^-a-zA-Z0-9 _]
That said, you can simplify a bit:
[a-zA-Z0-9_]
can be coded as w
(depending on locale), so your regex becomes [^w -]
.
If you want to be unicode compatible, use:
[^pLpN_ -]
where pL
stands for any letter in any laguage and pN
for any digit.
1
You can also put the hyphen after a range or a shorthand character class:[^a-z-A-Z0-9 _]
,[^a-zA-Z-0-9 _]
,[^pL-pN_ ]
. It's ugly, but possible...
– Casimir et Hippolyte
Feb 3 '18 at 20:52
add a comment |
Similar to Toto's answer, I recommend:
function clean($var) {
return preg_replace("~[^w -]+~", "", $var);
}
This will replace all occurrences of one or more consecutive forbidden characters.
Adding the "one or more" (+
) quantifier means longer potential matches and fewer total replacements. IOW, imagine a carton of a dozen eggs on yhe ground. If the task was to pick up 12 eggs, you could squat 12 times picking them up one at a time, or just squat once and pickup the carton.
I have eliminated the unnecessary inclusion of "single-use variables" as there is no benefit in retaining them for readability.
Following this custom function call, the call of htmlspecialchars()
is useless because there won't be any chars to convert.
On the other hand, if you wanted to call htmlspecialchars_decode()
prior to clean()
there is reasonable logic to that decision, but it depends on the input that you are expecting.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186587%2fremove-all-html-special-chars%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your regex [^a-zA-Z0-9 -_]
matches everything that is not a
to z
, A
to Z
, 0
to 9
and space
to _
, this last range includes all character between hexa020
and hexa5F
(ie for example !
, "
, #
, $
, %
and many other), in a character class, -
must be escaped or place at the beginning or at the end like:
[^a-zA-Z0-9 -_]
[^a-zA-Z0-9 _-]
[^-a-zA-Z0-9 _]
That said, you can simplify a bit:
[a-zA-Z0-9_]
can be coded as w
(depending on locale), so your regex becomes [^w -]
.
If you want to be unicode compatible, use:
[^pLpN_ -]
where pL
stands for any letter in any laguage and pN
for any digit.
1
You can also put the hyphen after a range or a shorthand character class:[^a-z-A-Z0-9 _]
,[^a-zA-Z-0-9 _]
,[^pL-pN_ ]
. It's ugly, but possible...
– Casimir et Hippolyte
Feb 3 '18 at 20:52
add a comment |
Your regex [^a-zA-Z0-9 -_]
matches everything that is not a
to z
, A
to Z
, 0
to 9
and space
to _
, this last range includes all character between hexa020
and hexa5F
(ie for example !
, "
, #
, $
, %
and many other), in a character class, -
must be escaped or place at the beginning or at the end like:
[^a-zA-Z0-9 -_]
[^a-zA-Z0-9 _-]
[^-a-zA-Z0-9 _]
That said, you can simplify a bit:
[a-zA-Z0-9_]
can be coded as w
(depending on locale), so your regex becomes [^w -]
.
If you want to be unicode compatible, use:
[^pLpN_ -]
where pL
stands for any letter in any laguage and pN
for any digit.
1
You can also put the hyphen after a range or a shorthand character class:[^a-z-A-Z0-9 _]
,[^a-zA-Z-0-9 _]
,[^pL-pN_ ]
. It's ugly, but possible...
– Casimir et Hippolyte
Feb 3 '18 at 20:52
add a comment |
Your regex [^a-zA-Z0-9 -_]
matches everything that is not a
to z
, A
to Z
, 0
to 9
and space
to _
, this last range includes all character between hexa020
and hexa5F
(ie for example !
, "
, #
, $
, %
and many other), in a character class, -
must be escaped or place at the beginning or at the end like:
[^a-zA-Z0-9 -_]
[^a-zA-Z0-9 _-]
[^-a-zA-Z0-9 _]
That said, you can simplify a bit:
[a-zA-Z0-9_]
can be coded as w
(depending on locale), so your regex becomes [^w -]
.
If you want to be unicode compatible, use:
[^pLpN_ -]
where pL
stands for any letter in any laguage and pN
for any digit.
Your regex [^a-zA-Z0-9 -_]
matches everything that is not a
to z
, A
to Z
, 0
to 9
and space
to _
, this last range includes all character between hexa020
and hexa5F
(ie for example !
, "
, #
, $
, %
and many other), in a character class, -
must be escaped or place at the beginning or at the end like:
[^a-zA-Z0-9 -_]
[^a-zA-Z0-9 _-]
[^-a-zA-Z0-9 _]
That said, you can simplify a bit:
[a-zA-Z0-9_]
can be coded as w
(depending on locale), so your regex becomes [^w -]
.
If you want to be unicode compatible, use:
[^pLpN_ -]
where pL
stands for any letter in any laguage and pN
for any digit.
answered Feb 2 '18 at 13:05
TotoToto
5391613
5391613
1
You can also put the hyphen after a range or a shorthand character class:[^a-z-A-Z0-9 _]
,[^a-zA-Z-0-9 _]
,[^pL-pN_ ]
. It's ugly, but possible...
– Casimir et Hippolyte
Feb 3 '18 at 20:52
add a comment |
1
You can also put the hyphen after a range or a shorthand character class:[^a-z-A-Z0-9 _]
,[^a-zA-Z-0-9 _]
,[^pL-pN_ ]
. It's ugly, but possible...
– Casimir et Hippolyte
Feb 3 '18 at 20:52
1
1
You can also put the hyphen after a range or a shorthand character class:
[^a-z-A-Z0-9 _]
, [^a-zA-Z-0-9 _]
, [^pL-pN_ ]
. It's ugly, but possible...– Casimir et Hippolyte
Feb 3 '18 at 20:52
You can also put the hyphen after a range or a shorthand character class:
[^a-z-A-Z0-9 _]
, [^a-zA-Z-0-9 _]
, [^pL-pN_ ]
. It's ugly, but possible...– Casimir et Hippolyte
Feb 3 '18 at 20:52
add a comment |
Similar to Toto's answer, I recommend:
function clean($var) {
return preg_replace("~[^w -]+~", "", $var);
}
This will replace all occurrences of one or more consecutive forbidden characters.
Adding the "one or more" (+
) quantifier means longer potential matches and fewer total replacements. IOW, imagine a carton of a dozen eggs on yhe ground. If the task was to pick up 12 eggs, you could squat 12 times picking them up one at a time, or just squat once and pickup the carton.
I have eliminated the unnecessary inclusion of "single-use variables" as there is no benefit in retaining them for readability.
Following this custom function call, the call of htmlspecialchars()
is useless because there won't be any chars to convert.
On the other hand, if you wanted to call htmlspecialchars_decode()
prior to clean()
there is reasonable logic to that decision, but it depends on the input that you are expecting.
add a comment |
Similar to Toto's answer, I recommend:
function clean($var) {
return preg_replace("~[^w -]+~", "", $var);
}
This will replace all occurrences of one or more consecutive forbidden characters.
Adding the "one or more" (+
) quantifier means longer potential matches and fewer total replacements. IOW, imagine a carton of a dozen eggs on yhe ground. If the task was to pick up 12 eggs, you could squat 12 times picking them up one at a time, or just squat once and pickup the carton.
I have eliminated the unnecessary inclusion of "single-use variables" as there is no benefit in retaining them for readability.
Following this custom function call, the call of htmlspecialchars()
is useless because there won't be any chars to convert.
On the other hand, if you wanted to call htmlspecialchars_decode()
prior to clean()
there is reasonable logic to that decision, but it depends on the input that you are expecting.
add a comment |
Similar to Toto's answer, I recommend:
function clean($var) {
return preg_replace("~[^w -]+~", "", $var);
}
This will replace all occurrences of one or more consecutive forbidden characters.
Adding the "one or more" (+
) quantifier means longer potential matches and fewer total replacements. IOW, imagine a carton of a dozen eggs on yhe ground. If the task was to pick up 12 eggs, you could squat 12 times picking them up one at a time, or just squat once and pickup the carton.
I have eliminated the unnecessary inclusion of "single-use variables" as there is no benefit in retaining them for readability.
Following this custom function call, the call of htmlspecialchars()
is useless because there won't be any chars to convert.
On the other hand, if you wanted to call htmlspecialchars_decode()
prior to clean()
there is reasonable logic to that decision, but it depends on the input that you are expecting.
Similar to Toto's answer, I recommend:
function clean($var) {
return preg_replace("~[^w -]+~", "", $var);
}
This will replace all occurrences of one or more consecutive forbidden characters.
Adding the "one or more" (+
) quantifier means longer potential matches and fewer total replacements. IOW, imagine a carton of a dozen eggs on yhe ground. If the task was to pick up 12 eggs, you could squat 12 times picking them up one at a time, or just squat once and pickup the carton.
I have eliminated the unnecessary inclusion of "single-use variables" as there is no benefit in retaining them for readability.
Following this custom function call, the call of htmlspecialchars()
is useless because there won't be any chars to convert.
On the other hand, if you wanted to call htmlspecialchars_decode()
prior to clean()
there is reasonable logic to that decision, but it depends on the input that you are expecting.
edited 11 mins ago
answered 16 mins ago
mickmackusamickmackusa
1,057112
1,057112
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186587%2fremove-all-html-special-chars%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
"All my users will always be using the English language." doesn't imply that only the letters from
a-z
are valid. Take a look at this: English words with diacritics.– insertusernamehere
Feb 2 '18 at 10:32
You haven't told us much about why you're doing this, but I get the sense that this is almost certainly the wrong thing to do.
– 200_success
Feb 2 '18 at 15:03
@200_success, since it is a small project, want to reduce functionality for better security.
– Adarsh
Feb 2 '18 at 19:42