What is the point in hashing a value?
up vote
5
down vote
favorite
I apologise if this is not the right place for this question... I didn't want to ask on Stackoverflow or CodeReview as it will closed in minutes as "too broad".
A client of mine is writing an API that takes a piece of personal-identifying information as the parameter in the URL, and has asked that the value is hashed using SHA512.
Normally when I deal with anything that involves personal information in this way, I'd encrypt it using a shared private key... but I'm really fuzzy on the whole idea of hashing.
My understanding of hashing was effectively...
- take the input value and create a hash from it
- when checking the value, create a hash from the new input value and compare them
What I'm struggling to understand (and I don't want to ask the client and show my ignorance) is how does the client take the hashed value and turn it back into the original input value?
And in particular, if the client can convert the hash back to the original input, what is stopping anybody else doing it?
And if other people can convert it back, what's the point in hashing it in the first place?
Update
After speak to the client, the answer (as many of you guessed) is that they are storing the hash of the PII in their database, and doing a match-search against the value I will be sending through.
hash encryption
New contributor
|
show 5 more comments
up vote
5
down vote
favorite
I apologise if this is not the right place for this question... I didn't want to ask on Stackoverflow or CodeReview as it will closed in minutes as "too broad".
A client of mine is writing an API that takes a piece of personal-identifying information as the parameter in the URL, and has asked that the value is hashed using SHA512.
Normally when I deal with anything that involves personal information in this way, I'd encrypt it using a shared private key... but I'm really fuzzy on the whole idea of hashing.
My understanding of hashing was effectively...
- take the input value and create a hash from it
- when checking the value, create a hash from the new input value and compare them
What I'm struggling to understand (and I don't want to ask the client and show my ignorance) is how does the client take the hashed value and turn it back into the original input value?
And in particular, if the client can convert the hash back to the original input, what is stopping anybody else doing it?
And if other people can convert it back, what's the point in hashing it in the first place?
Update
After speak to the client, the answer (as many of you guessed) is that they are storing the hash of the PII in their database, and doing a match-search against the value I will be sending through.
hash encryption
New contributor
4
What sized PII? If it's a social security number, then I would be concerned that there are only 1,000,000,000 social security numbers, and it would be trivial for an attacker to generate the SHA512 of all of them in a brute force attack.
– Cort Ammon
Dec 14 at 17:14
2
So the client can have what the client wants. That's why they're the client. But there may be some value in pushing back and pointing out that hashing only slows an attacker down by a few seconds, and questioning whether it does what is needed. (There's a good chance "what is needed" is to satisfy something a lawyer told them to do)
– Cort Ammon
Dec 14 at 17:40
3
Doing a quick check, SHA512 can be hashed at roughly 100M hashes/sec. That means 10 seconds to hash through the 1 billion SSNs (I don't know how many digits your UK code has). Those 10 seconds can also be done offline if need be, and probably stored on a thumb drive. Those things are getting so big these days!
– Cort Ammon
Dec 14 at 17:42
2
You are entirely right to think that there is a bad smell here. This sounds like a client who believes that hashing, crypto, and so on, are magic dust that you sprinkle throughout your program and it becomes "secure". Were I in this situation I would ask to see a threat model that clearly describes the resource protected, the threat it is protected from, and the reason why the proposed mitigation is appropriate.
– Eric Lippert
Dec 14 at 21:45
1
@CortAmmon, if implemented properly, the hash should be secure again a brute force attack. For example, some extra strings such as a 128-bit random salt can be appended to SSN before hashing them together. Now that you have more than $10^{40}$ samples, there is hardly any chance to get back the original string even with the most power computer today in billions of years.
– Apass.Jack
Dec 15 at 0:32
|
show 5 more comments
up vote
5
down vote
favorite
up vote
5
down vote
favorite
I apologise if this is not the right place for this question... I didn't want to ask on Stackoverflow or CodeReview as it will closed in minutes as "too broad".
A client of mine is writing an API that takes a piece of personal-identifying information as the parameter in the URL, and has asked that the value is hashed using SHA512.
Normally when I deal with anything that involves personal information in this way, I'd encrypt it using a shared private key... but I'm really fuzzy on the whole idea of hashing.
My understanding of hashing was effectively...
- take the input value and create a hash from it
- when checking the value, create a hash from the new input value and compare them
What I'm struggling to understand (and I don't want to ask the client and show my ignorance) is how does the client take the hashed value and turn it back into the original input value?
And in particular, if the client can convert the hash back to the original input, what is stopping anybody else doing it?
And if other people can convert it back, what's the point in hashing it in the first place?
Update
After speak to the client, the answer (as many of you guessed) is that they are storing the hash of the PII in their database, and doing a match-search against the value I will be sending through.
hash encryption
New contributor
I apologise if this is not the right place for this question... I didn't want to ask on Stackoverflow or CodeReview as it will closed in minutes as "too broad".
A client of mine is writing an API that takes a piece of personal-identifying information as the parameter in the URL, and has asked that the value is hashed using SHA512.
Normally when I deal with anything that involves personal information in this way, I'd encrypt it using a shared private key... but I'm really fuzzy on the whole idea of hashing.
My understanding of hashing was effectively...
- take the input value and create a hash from it
- when checking the value, create a hash from the new input value and compare them
What I'm struggling to understand (and I don't want to ask the client and show my ignorance) is how does the client take the hashed value and turn it back into the original input value?
And in particular, if the client can convert the hash back to the original input, what is stopping anybody else doing it?
And if other people can convert it back, what's the point in hashing it in the first place?
Update
After speak to the client, the answer (as many of you guessed) is that they are storing the hash of the PII in their database, and doing a match-search against the value I will be sending through.
hash encryption
hash encryption
New contributor
New contributor
edited yesterday
New contributor
asked Dec 14 at 14:14
freefaller
1287
1287
New contributor
New contributor
4
What sized PII? If it's a social security number, then I would be concerned that there are only 1,000,000,000 social security numbers, and it would be trivial for an attacker to generate the SHA512 of all of them in a brute force attack.
– Cort Ammon
Dec 14 at 17:14
2
So the client can have what the client wants. That's why they're the client. But there may be some value in pushing back and pointing out that hashing only slows an attacker down by a few seconds, and questioning whether it does what is needed. (There's a good chance "what is needed" is to satisfy something a lawyer told them to do)
– Cort Ammon
Dec 14 at 17:40
3
Doing a quick check, SHA512 can be hashed at roughly 100M hashes/sec. That means 10 seconds to hash through the 1 billion SSNs (I don't know how many digits your UK code has). Those 10 seconds can also be done offline if need be, and probably stored on a thumb drive. Those things are getting so big these days!
– Cort Ammon
Dec 14 at 17:42
2
You are entirely right to think that there is a bad smell here. This sounds like a client who believes that hashing, crypto, and so on, are magic dust that you sprinkle throughout your program and it becomes "secure". Were I in this situation I would ask to see a threat model that clearly describes the resource protected, the threat it is protected from, and the reason why the proposed mitigation is appropriate.
– Eric Lippert
Dec 14 at 21:45
1
@CortAmmon, if implemented properly, the hash should be secure again a brute force attack. For example, some extra strings such as a 128-bit random salt can be appended to SSN before hashing them together. Now that you have more than $10^{40}$ samples, there is hardly any chance to get back the original string even with the most power computer today in billions of years.
– Apass.Jack
Dec 15 at 0:32
|
show 5 more comments
4
What sized PII? If it's a social security number, then I would be concerned that there are only 1,000,000,000 social security numbers, and it would be trivial for an attacker to generate the SHA512 of all of them in a brute force attack.
– Cort Ammon
Dec 14 at 17:14
2
So the client can have what the client wants. That's why they're the client. But there may be some value in pushing back and pointing out that hashing only slows an attacker down by a few seconds, and questioning whether it does what is needed. (There's a good chance "what is needed" is to satisfy something a lawyer told them to do)
– Cort Ammon
Dec 14 at 17:40
3
Doing a quick check, SHA512 can be hashed at roughly 100M hashes/sec. That means 10 seconds to hash through the 1 billion SSNs (I don't know how many digits your UK code has). Those 10 seconds can also be done offline if need be, and probably stored on a thumb drive. Those things are getting so big these days!
– Cort Ammon
Dec 14 at 17:42
2
You are entirely right to think that there is a bad smell here. This sounds like a client who believes that hashing, crypto, and so on, are magic dust that you sprinkle throughout your program and it becomes "secure". Were I in this situation I would ask to see a threat model that clearly describes the resource protected, the threat it is protected from, and the reason why the proposed mitigation is appropriate.
– Eric Lippert
Dec 14 at 21:45
1
@CortAmmon, if implemented properly, the hash should be secure again a brute force attack. For example, some extra strings such as a 128-bit random salt can be appended to SSN before hashing them together. Now that you have more than $10^{40}$ samples, there is hardly any chance to get back the original string even with the most power computer today in billions of years.
– Apass.Jack
Dec 15 at 0:32
4
4
What sized PII? If it's a social security number, then I would be concerned that there are only 1,000,000,000 social security numbers, and it would be trivial for an attacker to generate the SHA512 of all of them in a brute force attack.
– Cort Ammon
Dec 14 at 17:14
What sized PII? If it's a social security number, then I would be concerned that there are only 1,000,000,000 social security numbers, and it would be trivial for an attacker to generate the SHA512 of all of them in a brute force attack.
– Cort Ammon
Dec 14 at 17:14
2
2
So the client can have what the client wants. That's why they're the client. But there may be some value in pushing back and pointing out that hashing only slows an attacker down by a few seconds, and questioning whether it does what is needed. (There's a good chance "what is needed" is to satisfy something a lawyer told them to do)
– Cort Ammon
Dec 14 at 17:40
So the client can have what the client wants. That's why they're the client. But there may be some value in pushing back and pointing out that hashing only slows an attacker down by a few seconds, and questioning whether it does what is needed. (There's a good chance "what is needed" is to satisfy something a lawyer told them to do)
– Cort Ammon
Dec 14 at 17:40
3
3
Doing a quick check, SHA512 can be hashed at roughly 100M hashes/sec. That means 10 seconds to hash through the 1 billion SSNs (I don't know how many digits your UK code has). Those 10 seconds can also be done offline if need be, and probably stored on a thumb drive. Those things are getting so big these days!
– Cort Ammon
Dec 14 at 17:42
Doing a quick check, SHA512 can be hashed at roughly 100M hashes/sec. That means 10 seconds to hash through the 1 billion SSNs (I don't know how many digits your UK code has). Those 10 seconds can also be done offline if need be, and probably stored on a thumb drive. Those things are getting so big these days!
– Cort Ammon
Dec 14 at 17:42
2
2
You are entirely right to think that there is a bad smell here. This sounds like a client who believes that hashing, crypto, and so on, are magic dust that you sprinkle throughout your program and it becomes "secure". Were I in this situation I would ask to see a threat model that clearly describes the resource protected, the threat it is protected from, and the reason why the proposed mitigation is appropriate.
– Eric Lippert
Dec 14 at 21:45
You are entirely right to think that there is a bad smell here. This sounds like a client who believes that hashing, crypto, and so on, are magic dust that you sprinkle throughout your program and it becomes "secure". Were I in this situation I would ask to see a threat model that clearly describes the resource protected, the threat it is protected from, and the reason why the proposed mitigation is appropriate.
– Eric Lippert
Dec 14 at 21:45
1
1
@CortAmmon, if implemented properly, the hash should be secure again a brute force attack. For example, some extra strings such as a 128-bit random salt can be appended to SSN before hashing them together. Now that you have more than $10^{40}$ samples, there is hardly any chance to get back the original string even with the most power computer today in billions of years.
– Apass.Jack
Dec 15 at 0:32
@CortAmmon, if implemented properly, the hash should be secure again a brute force attack. For example, some extra strings such as a 128-bit random salt can be appended to SSN before hashing them together. Now that you have more than $10^{40}$ samples, there is hardly any chance to get back the original string even with the most power computer today in billions of years.
– Apass.Jack
Dec 15 at 0:32
|
show 5 more comments
3 Answers
3
active
oldest
votes
up vote
14
down vote
accepted
The purpose of a hash in this scenario to be able to uniquely identify an entity. It's not strictly unique, only probabilistically unique.
Hashes are not reversible functions, so your client can't know the data that was encoded with it. It could be guessed by brute force and maybe some know attacks to the hash assuming the type/format of data is known, but in principle is not reversible).
So process A (system or organization) A can work with the facts of the entity uniquely identified by the hash without knowing the personal identity value. Process A can then pass the processed information back to process B, that knows the personal identity value and then can combine the information given to Process A and the information it already has and carry out another processing.
The advantage in this case from a security perspective is that you have less security issues as process A (system or organization) will never has access to that personal identity. Even if A is hacked, the personal data is safe*.
New contributor
1
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
Thanks for the increased explanation, that helps my understanding
– freefaller
Dec 14 at 14:58
Is there a footnote on its way for the last paragraph? The claim there has a big hole in it, demonstrated by several recent attacks in which attackers have intercepted the personal data after the user has entered it, before it's hashed.
– David Richerby
Dec 14 at 15:07
1
@DavidRicherby by that mark I meant that there are more security considerations/variables to take into account. That can't be known without looking at the whole scenario. Indeed knowing the personal data with be some form of attack e.g. a dictionary attack. There are some techniques like salting and and combining several attributes that could be used against it.
– Koenig Lear
Dec 14 at 15:27
add a comment |
up vote
3
down vote
The whole point of (cryptographically secure) hashing is that you can't recover the original from the hash.
Technically, it's impossible anyway, since there are only $2^{512}$ different values the hash can take, but there are many more than $2^{512}$ possible documents. This means many different documents will produce the same hash value. However, $2^{512}$ is a huge number, so it's very difficult to generate a document that will have a specific hash value.
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
2
@freefaller Sounds like it. Make sure you're not only hashing the data, though: unless you add a nonce, you're just doing the effect of replacing "If the password isMySecurePassword123
, let them access the account" with "If the hashed password is98349581747129874
, let them access the account" and that's just a different plaintext password.
– David Richerby
Dec 14 at 14:50
Yes, absolutely agree. I'm trying to get hold of the client to discuss it, as it doesn't sound like the right approach to me, but they might have their reasons. Thanks again
– freefaller
Dec 14 at 14:52
1
@PatJ Fair point -- I changed it to just "possible documents".
– David Richerby
Dec 14 at 17:21
1
@freefaller They can store the hashes in the database along with the data, and index that. That will allow them to find the value that corresponds to the hash quickly.
– Barmar
Dec 15 at 3:12
|
show 1 more comment
up vote
1
down vote
Besides being used to obfuscate sensitive information, hashes are also used in data lookups where the expected use case is random access, rather than sequential reading.
After all, while computers can often spread data out evenly if designed to, humans tend to generate some very clumpy data. For example, Smith is the most popular last name in the US, so finding a specific person with that name is harder than finding someone with a less common last name, if your data is indexed on last names.
The ideal hash algorithms for creating these types of hashmaps are not cryptographically secure; simply great at spreading data across the available key space, and fast to use. Any security, such as being unable to re-assemble the plaintext from the hash value, is secondary.
Now, in this particular case, SHA-2 512 is one of the worst options to use. SHA-2 is slow on purpose. In cryptographic hashing, this is a feature... but not so great in creating hashmaps. The keyspace is also huge; well beyond what anyone would want to use in their index, and like all hash algorithms, does not guarantee to avoid all collisions; so the software will need to recognize collisions and still be able to serve up the appropriate data.
That is, data indexed on a hash does not use the hash alone as its key. The hashmap's key includes the plaintext portion of the key, as well... The hash portion just gets you in close to the data that you want, and it's usually trivial to pick the exact data from there, rather than a sequential scan through millions of Smiths.
From a security standpoint, this question raises a couple of "smells." SHA-2 is used primarily for message digests; as part of a step to make sure that a message had not been tampered with while in transit. (Of course, you need a way to keep the digest from being tampered with, as well, which is why digital signatures require public key cryptography, and unencrypted HMACs are often given through side channels.)
Using any hash without random salt opens your data up to rainbow table attacks... These are typically only useful for short plaintexts, such as passwords, social security numbers, telephone numbers, and names. Since the hashes are being shared in the URL, it sounds like your client is most definitely not using random salts.
A random salt could be implemented in this case, but if done wrong would be susceptible to replay attacks. There is nothing sensitive about sharing the salt, just as there's nothing sensitive about the hash value itself, outside of brute force attempts. However, once a request has been made with a particular salt/plaintext combination, additional requests for that resource using that combination should be avoided.
The most secure way, though, and the current industry standard, is to do away with hashes entirely in the URL, and use a robust authentication and authorization system. Note that authentication and authorization are two distinct things, even though they're related, and should be treated as such. Use TLS. Redirect all non-secure HTTP requests to HTTPS, always. (Certs are free. There is no excuse. Even for your static-HTML-only blog; use TLS.)
And finally, if you need to get a resource where the URI shouldn't reveal anything about the content, use a nonce. Make sure you throw it away when it's been used... and make sure you've authenticated them AND checked their authorization when the user is using the nonce.
You can't just sprinkle hashes around an application and call it secure.
New contributor
Many thanks for your comments. One minor point, I'm not sure where you get your free certificates (other than self-signed ones), but outside academia we have to pay for each one of ours
– freefaller
Dec 15 at 8:46
1
@freefaller Let's Encrypt: letsencrypt.org
– Ghedipunk
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "419"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
freefaller is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f101536%2fwhat-is-the-point-in-hashing-a-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
14
down vote
accepted
The purpose of a hash in this scenario to be able to uniquely identify an entity. It's not strictly unique, only probabilistically unique.
Hashes are not reversible functions, so your client can't know the data that was encoded with it. It could be guessed by brute force and maybe some know attacks to the hash assuming the type/format of data is known, but in principle is not reversible).
So process A (system or organization) A can work with the facts of the entity uniquely identified by the hash without knowing the personal identity value. Process A can then pass the processed information back to process B, that knows the personal identity value and then can combine the information given to Process A and the information it already has and carry out another processing.
The advantage in this case from a security perspective is that you have less security issues as process A (system or organization) will never has access to that personal identity. Even if A is hacked, the personal data is safe*.
New contributor
1
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
Thanks for the increased explanation, that helps my understanding
– freefaller
Dec 14 at 14:58
Is there a footnote on its way for the last paragraph? The claim there has a big hole in it, demonstrated by several recent attacks in which attackers have intercepted the personal data after the user has entered it, before it's hashed.
– David Richerby
Dec 14 at 15:07
1
@DavidRicherby by that mark I meant that there are more security considerations/variables to take into account. That can't be known without looking at the whole scenario. Indeed knowing the personal data with be some form of attack e.g. a dictionary attack. There are some techniques like salting and and combining several attributes that could be used against it.
– Koenig Lear
Dec 14 at 15:27
add a comment |
up vote
14
down vote
accepted
The purpose of a hash in this scenario to be able to uniquely identify an entity. It's not strictly unique, only probabilistically unique.
Hashes are not reversible functions, so your client can't know the data that was encoded with it. It could be guessed by brute force and maybe some know attacks to the hash assuming the type/format of data is known, but in principle is not reversible).
So process A (system or organization) A can work with the facts of the entity uniquely identified by the hash without knowing the personal identity value. Process A can then pass the processed information back to process B, that knows the personal identity value and then can combine the information given to Process A and the information it already has and carry out another processing.
The advantage in this case from a security perspective is that you have less security issues as process A (system or organization) will never has access to that personal identity. Even if A is hacked, the personal data is safe*.
New contributor
1
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
Thanks for the increased explanation, that helps my understanding
– freefaller
Dec 14 at 14:58
Is there a footnote on its way for the last paragraph? The claim there has a big hole in it, demonstrated by several recent attacks in which attackers have intercepted the personal data after the user has entered it, before it's hashed.
– David Richerby
Dec 14 at 15:07
1
@DavidRicherby by that mark I meant that there are more security considerations/variables to take into account. That can't be known without looking at the whole scenario. Indeed knowing the personal data with be some form of attack e.g. a dictionary attack. There are some techniques like salting and and combining several attributes that could be used against it.
– Koenig Lear
Dec 14 at 15:27
add a comment |
up vote
14
down vote
accepted
up vote
14
down vote
accepted
The purpose of a hash in this scenario to be able to uniquely identify an entity. It's not strictly unique, only probabilistically unique.
Hashes are not reversible functions, so your client can't know the data that was encoded with it. It could be guessed by brute force and maybe some know attacks to the hash assuming the type/format of data is known, but in principle is not reversible).
So process A (system or organization) A can work with the facts of the entity uniquely identified by the hash without knowing the personal identity value. Process A can then pass the processed information back to process B, that knows the personal identity value and then can combine the information given to Process A and the information it already has and carry out another processing.
The advantage in this case from a security perspective is that you have less security issues as process A (system or organization) will never has access to that personal identity. Even if A is hacked, the personal data is safe*.
New contributor
The purpose of a hash in this scenario to be able to uniquely identify an entity. It's not strictly unique, only probabilistically unique.
Hashes are not reversible functions, so your client can't know the data that was encoded with it. It could be guessed by brute force and maybe some know attacks to the hash assuming the type/format of data is known, but in principle is not reversible).
So process A (system or organization) A can work with the facts of the entity uniquely identified by the hash without knowing the personal identity value. Process A can then pass the processed information back to process B, that knows the personal identity value and then can combine the information given to Process A and the information it already has and carry out another processing.
The advantage in this case from a security perspective is that you have less security issues as process A (system or organization) will never has access to that personal identity. Even if A is hacked, the personal data is safe*.
New contributor
edited Dec 14 at 14:49
New contributor
answered Dec 14 at 14:37
Koenig Lear
15613
15613
New contributor
New contributor
1
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
Thanks for the increased explanation, that helps my understanding
– freefaller
Dec 14 at 14:58
Is there a footnote on its way for the last paragraph? The claim there has a big hole in it, demonstrated by several recent attacks in which attackers have intercepted the personal data after the user has entered it, before it's hashed.
– David Richerby
Dec 14 at 15:07
1
@DavidRicherby by that mark I meant that there are more security considerations/variables to take into account. That can't be known without looking at the whole scenario. Indeed knowing the personal data with be some form of attack e.g. a dictionary attack. There are some techniques like salting and and combining several attributes that could be used against it.
– Koenig Lear
Dec 14 at 15:27
add a comment |
1
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
Thanks for the increased explanation, that helps my understanding
– freefaller
Dec 14 at 14:58
Is there a footnote on its way for the last paragraph? The claim there has a big hole in it, demonstrated by several recent attacks in which attackers have intercepted the personal data after the user has entered it, before it's hashed.
– David Richerby
Dec 14 at 15:07
1
@DavidRicherby by that mark I meant that there are more security considerations/variables to take into account. That can't be known without looking at the whole scenario. Indeed knowing the personal data with be some form of attack e.g. a dictionary attack. There are some techniques like salting and and combining several attributes that could be used against it.
– Koenig Lear
Dec 14 at 15:27
1
1
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
Thanks for the increased explanation, that helps my understanding
– freefaller
Dec 14 at 14:58
Thanks for the increased explanation, that helps my understanding
– freefaller
Dec 14 at 14:58
Is there a footnote on its way for the last paragraph? The claim there has a big hole in it, demonstrated by several recent attacks in which attackers have intercepted the personal data after the user has entered it, before it's hashed.
– David Richerby
Dec 14 at 15:07
Is there a footnote on its way for the last paragraph? The claim there has a big hole in it, demonstrated by several recent attacks in which attackers have intercepted the personal data after the user has entered it, before it's hashed.
– David Richerby
Dec 14 at 15:07
1
1
@DavidRicherby by that mark I meant that there are more security considerations/variables to take into account. That can't be known without looking at the whole scenario. Indeed knowing the personal data with be some form of attack e.g. a dictionary attack. There are some techniques like salting and and combining several attributes that could be used against it.
– Koenig Lear
Dec 14 at 15:27
@DavidRicherby by that mark I meant that there are more security considerations/variables to take into account. That can't be known without looking at the whole scenario. Indeed knowing the personal data with be some form of attack e.g. a dictionary attack. There are some techniques like salting and and combining several attributes that could be used against it.
– Koenig Lear
Dec 14 at 15:27
add a comment |
up vote
3
down vote
The whole point of (cryptographically secure) hashing is that you can't recover the original from the hash.
Technically, it's impossible anyway, since there are only $2^{512}$ different values the hash can take, but there are many more than $2^{512}$ possible documents. This means many different documents will produce the same hash value. However, $2^{512}$ is a huge number, so it's very difficult to generate a document that will have a specific hash value.
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
2
@freefaller Sounds like it. Make sure you're not only hashing the data, though: unless you add a nonce, you're just doing the effect of replacing "If the password isMySecurePassword123
, let them access the account" with "If the hashed password is98349581747129874
, let them access the account" and that's just a different plaintext password.
– David Richerby
Dec 14 at 14:50
Yes, absolutely agree. I'm trying to get hold of the client to discuss it, as it doesn't sound like the right approach to me, but they might have their reasons. Thanks again
– freefaller
Dec 14 at 14:52
1
@PatJ Fair point -- I changed it to just "possible documents".
– David Richerby
Dec 14 at 17:21
1
@freefaller They can store the hashes in the database along with the data, and index that. That will allow them to find the value that corresponds to the hash quickly.
– Barmar
Dec 15 at 3:12
|
show 1 more comment
up vote
3
down vote
The whole point of (cryptographically secure) hashing is that you can't recover the original from the hash.
Technically, it's impossible anyway, since there are only $2^{512}$ different values the hash can take, but there are many more than $2^{512}$ possible documents. This means many different documents will produce the same hash value. However, $2^{512}$ is a huge number, so it's very difficult to generate a document that will have a specific hash value.
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
2
@freefaller Sounds like it. Make sure you're not only hashing the data, though: unless you add a nonce, you're just doing the effect of replacing "If the password isMySecurePassword123
, let them access the account" with "If the hashed password is98349581747129874
, let them access the account" and that's just a different plaintext password.
– David Richerby
Dec 14 at 14:50
Yes, absolutely agree. I'm trying to get hold of the client to discuss it, as it doesn't sound like the right approach to me, but they might have their reasons. Thanks again
– freefaller
Dec 14 at 14:52
1
@PatJ Fair point -- I changed it to just "possible documents".
– David Richerby
Dec 14 at 17:21
1
@freefaller They can store the hashes in the database along with the data, and index that. That will allow them to find the value that corresponds to the hash quickly.
– Barmar
Dec 15 at 3:12
|
show 1 more comment
up vote
3
down vote
up vote
3
down vote
The whole point of (cryptographically secure) hashing is that you can't recover the original from the hash.
Technically, it's impossible anyway, since there are only $2^{512}$ different values the hash can take, but there are many more than $2^{512}$ possible documents. This means many different documents will produce the same hash value. However, $2^{512}$ is a huge number, so it's very difficult to generate a document that will have a specific hash value.
The whole point of (cryptographically secure) hashing is that you can't recover the original from the hash.
Technically, it's impossible anyway, since there are only $2^{512}$ different values the hash can take, but there are many more than $2^{512}$ possible documents. This means many different documents will produce the same hash value. However, $2^{512}$ is a huge number, so it's very difficult to generate a document that will have a specific hash value.
edited Dec 14 at 17:21
answered Dec 14 at 14:37
David Richerby
65.5k1598187
65.5k1598187
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
2
@freefaller Sounds like it. Make sure you're not only hashing the data, though: unless you add a nonce, you're just doing the effect of replacing "If the password isMySecurePassword123
, let them access the account" with "If the hashed password is98349581747129874
, let them access the account" and that's just a different plaintext password.
– David Richerby
Dec 14 at 14:50
Yes, absolutely agree. I'm trying to get hold of the client to discuss it, as it doesn't sound like the right approach to me, but they might have their reasons. Thanks again
– freefaller
Dec 14 at 14:52
1
@PatJ Fair point -- I changed it to just "possible documents".
– David Richerby
Dec 14 at 17:21
1
@freefaller They can store the hashes in the database along with the data, and index that. That will allow them to find the value that corresponds to the hash quickly.
– Barmar
Dec 15 at 3:12
|
show 1 more comment
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
2
@freefaller Sounds like it. Make sure you're not only hashing the data, though: unless you add a nonce, you're just doing the effect of replacing "If the password isMySecurePassword123
, let them access the account" with "If the hashed password is98349581747129874
, let them access the account" and that's just a different plaintext password.
– David Richerby
Dec 14 at 14:50
Yes, absolutely agree. I'm trying to get hold of the client to discuss it, as it doesn't sound like the right approach to me, but they might have their reasons. Thanks again
– freefaller
Dec 14 at 14:52
1
@PatJ Fair point -- I changed it to just "possible documents".
– David Richerby
Dec 14 at 17:21
1
@freefaller They can store the hashes in the database along with the data, and index that. That will allow them to find the value that corresponds to the hash quickly.
– Barmar
Dec 15 at 3:12
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
Thanks for your answer. Based on that, my best guess is that my client is going to hash/compare every value they have in their system to the hash I provide
– freefaller
Dec 14 at 14:47
2
2
@freefaller Sounds like it. Make sure you're not only hashing the data, though: unless you add a nonce, you're just doing the effect of replacing "If the password is
MySecurePassword123
, let them access the account" with "If the hashed password is 98349581747129874
, let them access the account" and that's just a different plaintext password.– David Richerby
Dec 14 at 14:50
@freefaller Sounds like it. Make sure you're not only hashing the data, though: unless you add a nonce, you're just doing the effect of replacing "If the password is
MySecurePassword123
, let them access the account" with "If the hashed password is 98349581747129874
, let them access the account" and that's just a different plaintext password.– David Richerby
Dec 14 at 14:50
Yes, absolutely agree. I'm trying to get hold of the client to discuss it, as it doesn't sound like the right approach to me, but they might have their reasons. Thanks again
– freefaller
Dec 14 at 14:52
Yes, absolutely agree. I'm trying to get hold of the client to discuss it, as it doesn't sound like the right approach to me, but they might have their reasons. Thanks again
– freefaller
Dec 14 at 14:52
1
1
@PatJ Fair point -- I changed it to just "possible documents".
– David Richerby
Dec 14 at 17:21
@PatJ Fair point -- I changed it to just "possible documents".
– David Richerby
Dec 14 at 17:21
1
1
@freefaller They can store the hashes in the database along with the data, and index that. That will allow them to find the value that corresponds to the hash quickly.
– Barmar
Dec 15 at 3:12
@freefaller They can store the hashes in the database along with the data, and index that. That will allow them to find the value that corresponds to the hash quickly.
– Barmar
Dec 15 at 3:12
|
show 1 more comment
up vote
1
down vote
Besides being used to obfuscate sensitive information, hashes are also used in data lookups where the expected use case is random access, rather than sequential reading.
After all, while computers can often spread data out evenly if designed to, humans tend to generate some very clumpy data. For example, Smith is the most popular last name in the US, so finding a specific person with that name is harder than finding someone with a less common last name, if your data is indexed on last names.
The ideal hash algorithms for creating these types of hashmaps are not cryptographically secure; simply great at spreading data across the available key space, and fast to use. Any security, such as being unable to re-assemble the plaintext from the hash value, is secondary.
Now, in this particular case, SHA-2 512 is one of the worst options to use. SHA-2 is slow on purpose. In cryptographic hashing, this is a feature... but not so great in creating hashmaps. The keyspace is also huge; well beyond what anyone would want to use in their index, and like all hash algorithms, does not guarantee to avoid all collisions; so the software will need to recognize collisions and still be able to serve up the appropriate data.
That is, data indexed on a hash does not use the hash alone as its key. The hashmap's key includes the plaintext portion of the key, as well... The hash portion just gets you in close to the data that you want, and it's usually trivial to pick the exact data from there, rather than a sequential scan through millions of Smiths.
From a security standpoint, this question raises a couple of "smells." SHA-2 is used primarily for message digests; as part of a step to make sure that a message had not been tampered with while in transit. (Of course, you need a way to keep the digest from being tampered with, as well, which is why digital signatures require public key cryptography, and unencrypted HMACs are often given through side channels.)
Using any hash without random salt opens your data up to rainbow table attacks... These are typically only useful for short plaintexts, such as passwords, social security numbers, telephone numbers, and names. Since the hashes are being shared in the URL, it sounds like your client is most definitely not using random salts.
A random salt could be implemented in this case, but if done wrong would be susceptible to replay attacks. There is nothing sensitive about sharing the salt, just as there's nothing sensitive about the hash value itself, outside of brute force attempts. However, once a request has been made with a particular salt/plaintext combination, additional requests for that resource using that combination should be avoided.
The most secure way, though, and the current industry standard, is to do away with hashes entirely in the URL, and use a robust authentication and authorization system. Note that authentication and authorization are two distinct things, even though they're related, and should be treated as such. Use TLS. Redirect all non-secure HTTP requests to HTTPS, always. (Certs are free. There is no excuse. Even for your static-HTML-only blog; use TLS.)
And finally, if you need to get a resource where the URI shouldn't reveal anything about the content, use a nonce. Make sure you throw it away when it's been used... and make sure you've authenticated them AND checked their authorization when the user is using the nonce.
You can't just sprinkle hashes around an application and call it secure.
New contributor
Many thanks for your comments. One minor point, I'm not sure where you get your free certificates (other than self-signed ones), but outside academia we have to pay for each one of ours
– freefaller
Dec 15 at 8:46
1
@freefaller Let's Encrypt: letsencrypt.org
– Ghedipunk
yesterday
add a comment |
up vote
1
down vote
Besides being used to obfuscate sensitive information, hashes are also used in data lookups where the expected use case is random access, rather than sequential reading.
After all, while computers can often spread data out evenly if designed to, humans tend to generate some very clumpy data. For example, Smith is the most popular last name in the US, so finding a specific person with that name is harder than finding someone with a less common last name, if your data is indexed on last names.
The ideal hash algorithms for creating these types of hashmaps are not cryptographically secure; simply great at spreading data across the available key space, and fast to use. Any security, such as being unable to re-assemble the plaintext from the hash value, is secondary.
Now, in this particular case, SHA-2 512 is one of the worst options to use. SHA-2 is slow on purpose. In cryptographic hashing, this is a feature... but not so great in creating hashmaps. The keyspace is also huge; well beyond what anyone would want to use in their index, and like all hash algorithms, does not guarantee to avoid all collisions; so the software will need to recognize collisions and still be able to serve up the appropriate data.
That is, data indexed on a hash does not use the hash alone as its key. The hashmap's key includes the plaintext portion of the key, as well... The hash portion just gets you in close to the data that you want, and it's usually trivial to pick the exact data from there, rather than a sequential scan through millions of Smiths.
From a security standpoint, this question raises a couple of "smells." SHA-2 is used primarily for message digests; as part of a step to make sure that a message had not been tampered with while in transit. (Of course, you need a way to keep the digest from being tampered with, as well, which is why digital signatures require public key cryptography, and unencrypted HMACs are often given through side channels.)
Using any hash without random salt opens your data up to rainbow table attacks... These are typically only useful for short plaintexts, such as passwords, social security numbers, telephone numbers, and names. Since the hashes are being shared in the URL, it sounds like your client is most definitely not using random salts.
A random salt could be implemented in this case, but if done wrong would be susceptible to replay attacks. There is nothing sensitive about sharing the salt, just as there's nothing sensitive about the hash value itself, outside of brute force attempts. However, once a request has been made with a particular salt/plaintext combination, additional requests for that resource using that combination should be avoided.
The most secure way, though, and the current industry standard, is to do away with hashes entirely in the URL, and use a robust authentication and authorization system. Note that authentication and authorization are two distinct things, even though they're related, and should be treated as such. Use TLS. Redirect all non-secure HTTP requests to HTTPS, always. (Certs are free. There is no excuse. Even for your static-HTML-only blog; use TLS.)
And finally, if you need to get a resource where the URI shouldn't reveal anything about the content, use a nonce. Make sure you throw it away when it's been used... and make sure you've authenticated them AND checked their authorization when the user is using the nonce.
You can't just sprinkle hashes around an application and call it secure.
New contributor
Many thanks for your comments. One minor point, I'm not sure where you get your free certificates (other than self-signed ones), but outside academia we have to pay for each one of ours
– freefaller
Dec 15 at 8:46
1
@freefaller Let's Encrypt: letsencrypt.org
– Ghedipunk
yesterday
add a comment |
up vote
1
down vote
up vote
1
down vote
Besides being used to obfuscate sensitive information, hashes are also used in data lookups where the expected use case is random access, rather than sequential reading.
After all, while computers can often spread data out evenly if designed to, humans tend to generate some very clumpy data. For example, Smith is the most popular last name in the US, so finding a specific person with that name is harder than finding someone with a less common last name, if your data is indexed on last names.
The ideal hash algorithms for creating these types of hashmaps are not cryptographically secure; simply great at spreading data across the available key space, and fast to use. Any security, such as being unable to re-assemble the plaintext from the hash value, is secondary.
Now, in this particular case, SHA-2 512 is one of the worst options to use. SHA-2 is slow on purpose. In cryptographic hashing, this is a feature... but not so great in creating hashmaps. The keyspace is also huge; well beyond what anyone would want to use in their index, and like all hash algorithms, does not guarantee to avoid all collisions; so the software will need to recognize collisions and still be able to serve up the appropriate data.
That is, data indexed on a hash does not use the hash alone as its key. The hashmap's key includes the plaintext portion of the key, as well... The hash portion just gets you in close to the data that you want, and it's usually trivial to pick the exact data from there, rather than a sequential scan through millions of Smiths.
From a security standpoint, this question raises a couple of "smells." SHA-2 is used primarily for message digests; as part of a step to make sure that a message had not been tampered with while in transit. (Of course, you need a way to keep the digest from being tampered with, as well, which is why digital signatures require public key cryptography, and unencrypted HMACs are often given through side channels.)
Using any hash without random salt opens your data up to rainbow table attacks... These are typically only useful for short plaintexts, such as passwords, social security numbers, telephone numbers, and names. Since the hashes are being shared in the URL, it sounds like your client is most definitely not using random salts.
A random salt could be implemented in this case, but if done wrong would be susceptible to replay attacks. There is nothing sensitive about sharing the salt, just as there's nothing sensitive about the hash value itself, outside of brute force attempts. However, once a request has been made with a particular salt/plaintext combination, additional requests for that resource using that combination should be avoided.
The most secure way, though, and the current industry standard, is to do away with hashes entirely in the URL, and use a robust authentication and authorization system. Note that authentication and authorization are two distinct things, even though they're related, and should be treated as such. Use TLS. Redirect all non-secure HTTP requests to HTTPS, always. (Certs are free. There is no excuse. Even for your static-HTML-only blog; use TLS.)
And finally, if you need to get a resource where the URI shouldn't reveal anything about the content, use a nonce. Make sure you throw it away when it's been used... and make sure you've authenticated them AND checked their authorization when the user is using the nonce.
You can't just sprinkle hashes around an application and call it secure.
New contributor
Besides being used to obfuscate sensitive information, hashes are also used in data lookups where the expected use case is random access, rather than sequential reading.
After all, while computers can often spread data out evenly if designed to, humans tend to generate some very clumpy data. For example, Smith is the most popular last name in the US, so finding a specific person with that name is harder than finding someone with a less common last name, if your data is indexed on last names.
The ideal hash algorithms for creating these types of hashmaps are not cryptographically secure; simply great at spreading data across the available key space, and fast to use. Any security, such as being unable to re-assemble the plaintext from the hash value, is secondary.
Now, in this particular case, SHA-2 512 is one of the worst options to use. SHA-2 is slow on purpose. In cryptographic hashing, this is a feature... but not so great in creating hashmaps. The keyspace is also huge; well beyond what anyone would want to use in their index, and like all hash algorithms, does not guarantee to avoid all collisions; so the software will need to recognize collisions and still be able to serve up the appropriate data.
That is, data indexed on a hash does not use the hash alone as its key. The hashmap's key includes the plaintext portion of the key, as well... The hash portion just gets you in close to the data that you want, and it's usually trivial to pick the exact data from there, rather than a sequential scan through millions of Smiths.
From a security standpoint, this question raises a couple of "smells." SHA-2 is used primarily for message digests; as part of a step to make sure that a message had not been tampered with while in transit. (Of course, you need a way to keep the digest from being tampered with, as well, which is why digital signatures require public key cryptography, and unencrypted HMACs are often given through side channels.)
Using any hash without random salt opens your data up to rainbow table attacks... These are typically only useful for short plaintexts, such as passwords, social security numbers, telephone numbers, and names. Since the hashes are being shared in the URL, it sounds like your client is most definitely not using random salts.
A random salt could be implemented in this case, but if done wrong would be susceptible to replay attacks. There is nothing sensitive about sharing the salt, just as there's nothing sensitive about the hash value itself, outside of brute force attempts. However, once a request has been made with a particular salt/plaintext combination, additional requests for that resource using that combination should be avoided.
The most secure way, though, and the current industry standard, is to do away with hashes entirely in the URL, and use a robust authentication and authorization system. Note that authentication and authorization are two distinct things, even though they're related, and should be treated as such. Use TLS. Redirect all non-secure HTTP requests to HTTPS, always. (Certs are free. There is no excuse. Even for your static-HTML-only blog; use TLS.)
And finally, if you need to get a resource where the URI shouldn't reveal anything about the content, use a nonce. Make sure you throw it away when it's been used... and make sure you've authenticated them AND checked their authorization when the user is using the nonce.
You can't just sprinkle hashes around an application and call it secure.
New contributor
New contributor
answered Dec 15 at 0:23
Ghedipunk
1112
1112
New contributor
New contributor
Many thanks for your comments. One minor point, I'm not sure where you get your free certificates (other than self-signed ones), but outside academia we have to pay for each one of ours
– freefaller
Dec 15 at 8:46
1
@freefaller Let's Encrypt: letsencrypt.org
– Ghedipunk
yesterday
add a comment |
Many thanks for your comments. One minor point, I'm not sure where you get your free certificates (other than self-signed ones), but outside academia we have to pay for each one of ours
– freefaller
Dec 15 at 8:46
1
@freefaller Let's Encrypt: letsencrypt.org
– Ghedipunk
yesterday
Many thanks for your comments. One minor point, I'm not sure where you get your free certificates (other than self-signed ones), but outside academia we have to pay for each one of ours
– freefaller
Dec 15 at 8:46
Many thanks for your comments. One minor point, I'm not sure where you get your free certificates (other than self-signed ones), but outside academia we have to pay for each one of ours
– freefaller
Dec 15 at 8:46
1
1
@freefaller Let's Encrypt: letsencrypt.org
– Ghedipunk
yesterday
@freefaller Let's Encrypt: letsencrypt.org
– Ghedipunk
yesterday
add a comment |
freefaller is a new contributor. Be nice, and check out our Code of Conduct.
freefaller is a new contributor. Be nice, and check out our Code of Conduct.
freefaller is a new contributor. Be nice, and check out our Code of Conduct.
freefaller is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Computer Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f101536%2fwhat-is-the-point-in-hashing-a-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
What sized PII? If it's a social security number, then I would be concerned that there are only 1,000,000,000 social security numbers, and it would be trivial for an attacker to generate the SHA512 of all of them in a brute force attack.
– Cort Ammon
Dec 14 at 17:14
2
So the client can have what the client wants. That's why they're the client. But there may be some value in pushing back and pointing out that hashing only slows an attacker down by a few seconds, and questioning whether it does what is needed. (There's a good chance "what is needed" is to satisfy something a lawyer told them to do)
– Cort Ammon
Dec 14 at 17:40
3
Doing a quick check, SHA512 can be hashed at roughly 100M hashes/sec. That means 10 seconds to hash through the 1 billion SSNs (I don't know how many digits your UK code has). Those 10 seconds can also be done offline if need be, and probably stored on a thumb drive. Those things are getting so big these days!
– Cort Ammon
Dec 14 at 17:42
2
You are entirely right to think that there is a bad smell here. This sounds like a client who believes that hashing, crypto, and so on, are magic dust that you sprinkle throughout your program and it becomes "secure". Were I in this situation I would ask to see a threat model that clearly describes the resource protected, the threat it is protected from, and the reason why the proposed mitigation is appropriate.
– Eric Lippert
Dec 14 at 21:45
1
@CortAmmon, if implemented properly, the hash should be secure again a brute force attack. For example, some extra strings such as a 128-bit random salt can be appended to SSN before hashing them together. Now that you have more than $10^{40}$ samples, there is hardly any chance to get back the original string even with the most power computer today in billions of years.
– Apass.Jack
Dec 15 at 0:32