Show others how I hear myself
up vote
13
down vote
favorite
Sooo .. i've been thinking about this stuff. We all know that we sound different from what we hear of our own voice. It is easy to find out how others hear us by recording oneself and listen to it.
But what about the other way around?
Is there a way to transform our voice in a way that others can hear us as we percieve our own voice? I find it to be a quite interesting question. Sadly I coudlnt find anything on the web after a couple google searches. Has nobody thought about this or is it impossible bc of some reason that i'm not seeing?
Any leads on this would be appriciated :).
signal-analysis audio transform
New contributor
add a comment |
up vote
13
down vote
favorite
Sooo .. i've been thinking about this stuff. We all know that we sound different from what we hear of our own voice. It is easy to find out how others hear us by recording oneself and listen to it.
But what about the other way around?
Is there a way to transform our voice in a way that others can hear us as we percieve our own voice? I find it to be a quite interesting question. Sadly I coudlnt find anything on the web after a couple google searches. Has nobody thought about this or is it impossible bc of some reason that i'm not seeing?
Any leads on this would be appriciated :).
signal-analysis audio transform
New contributor
5
You could make the problem as easy as possible: Make a recording of your speech that, when listened by you through headphones, sounds the same as your speech sounds to you when you speak in an anechoic chamber. Not sure how to do that.
– Olli Niemitalo
14 hours ago
2
I just wanted to propose exactly that. However, is it really necessary to exclude the influence of the room? The directivity of your voice as a sound source is surely a factor, but I think this method will probably work quite well if the recording is done in the same place as where the "adjustment procedure" takes place.
– applesoup
14 hours ago
add a comment |
up vote
13
down vote
favorite
up vote
13
down vote
favorite
Sooo .. i've been thinking about this stuff. We all know that we sound different from what we hear of our own voice. It is easy to find out how others hear us by recording oneself and listen to it.
But what about the other way around?
Is there a way to transform our voice in a way that others can hear us as we percieve our own voice? I find it to be a quite interesting question. Sadly I coudlnt find anything on the web after a couple google searches. Has nobody thought about this or is it impossible bc of some reason that i'm not seeing?
Any leads on this would be appriciated :).
signal-analysis audio transform
New contributor
Sooo .. i've been thinking about this stuff. We all know that we sound different from what we hear of our own voice. It is easy to find out how others hear us by recording oneself and listen to it.
But what about the other way around?
Is there a way to transform our voice in a way that others can hear us as we percieve our own voice? I find it to be a quite interesting question. Sadly I coudlnt find anything on the web after a couple google searches. Has nobody thought about this or is it impossible bc of some reason that i'm not seeing?
Any leads on this would be appriciated :).
signal-analysis audio transform
signal-analysis audio transform
New contributor
New contributor
New contributor
asked 14 hours ago
Kevin Fiegenbaum
663
663
New contributor
New contributor
5
You could make the problem as easy as possible: Make a recording of your speech that, when listened by you through headphones, sounds the same as your speech sounds to you when you speak in an anechoic chamber. Not sure how to do that.
– Olli Niemitalo
14 hours ago
2
I just wanted to propose exactly that. However, is it really necessary to exclude the influence of the room? The directivity of your voice as a sound source is surely a factor, but I think this method will probably work quite well if the recording is done in the same place as where the "adjustment procedure" takes place.
– applesoup
14 hours ago
add a comment |
5
You could make the problem as easy as possible: Make a recording of your speech that, when listened by you through headphones, sounds the same as your speech sounds to you when you speak in an anechoic chamber. Not sure how to do that.
– Olli Niemitalo
14 hours ago
2
I just wanted to propose exactly that. However, is it really necessary to exclude the influence of the room? The directivity of your voice as a sound source is surely a factor, but I think this method will probably work quite well if the recording is done in the same place as where the "adjustment procedure" takes place.
– applesoup
14 hours ago
5
5
You could make the problem as easy as possible: Make a recording of your speech that, when listened by you through headphones, sounds the same as your speech sounds to you when you speak in an anechoic chamber. Not sure how to do that.
– Olli Niemitalo
14 hours ago
You could make the problem as easy as possible: Make a recording of your speech that, when listened by you through headphones, sounds the same as your speech sounds to you when you speak in an anechoic chamber. Not sure how to do that.
– Olli Niemitalo
14 hours ago
2
2
I just wanted to propose exactly that. However, is it really necessary to exclude the influence of the room? The directivity of your voice as a sound source is surely a factor, but I think this method will probably work quite well if the recording is done in the same place as where the "adjustment procedure" takes place.
– applesoup
14 hours ago
I just wanted to propose exactly that. However, is it really necessary to exclude the influence of the room? The directivity of your voice as a sound source is surely a factor, but I think this method will probably work quite well if the recording is done in the same place as where the "adjustment procedure" takes place.
– applesoup
14 hours ago
add a comment |
2 Answers
2
active
oldest
votes
up vote
8
down vote
It is not impossible but it is not going to be a walk in the park too.
What you would be trying to do is to add to the voice signal, those vibrations that are delivered to the ear via the bones and are not accessible to anyone else.
But this is easier said than done in an accurate way.
Sound propagation through a medium depends very much on its density. Sound travels at ~1500m/s in water and with less dissipation than it travels in air (~340m/s). Bone is denser than air, therefore sound should travel faster through bone. This means that "your" sound begins to excite your ears first, followed by the sound that you perceive via the "normal" air channel. In reality, bone has an internal structure that might be affecting the way different frequencies pass through it but at the range of frequencies we are talking about, perhaps we can consider it as an equivalent solid. This can only be approximated because any attempt at measurement would have to be invasive but also because hearing is subjective.
Hearing, or the perception of sound is a HUGE contributor of difficulty here. The ear itself, the outer ear (the visible bit), the canal and the inner mechanism work together in very complicated ways. This is the subject of psychoacoustics. One example of this complex processing is phantom tones where the brain is filling in things that are supposed to be there. The brain itself may have already developed ways of isolating the self-generated signal that are inaccessible to us yet.
But, a simplistic (simplistic!) way to witness the differences between being the listener of your own sound and not is this:
Record a short and simple word (e.g. "Fishbone", a word that has both low frequencies (b,o,n) and high frequencies (F,sh,i,e)) with a bit of silence and loop it through an equaliser through your headphones. Start playback and synchronise your self uttering the word with the recording (so, something like "Fishbone...Fishbone...Fishbone..."). Now try to fiddle with the equaliser until what you hear and what you utter are reasonably similar.
At that point, the settings on the equaliser would represent the differences between the sound and what it is perceived through you and theoretically, any other speech passed through that equaliser would simulate how it arrives at your ears, as if you would have generated it with a source inside your body.
Hope this helps.
1
it's probably impossible due to the individual differences of perception and impossibility of quantifying that subjectivity. Yet the differences could be minor, such as in the case of every produced 1000uF cap is actually slightly different...
– Fat32
6 hours ago
1
@Fat32 I could not decide on the impossibility because technically, it could be possible to quantify / measure the contribution of the second channel which is established through the bones and via reasonable assumptions come up with some approximation. Like what it feels like in a medical condition which is totally different for the "patient" perspective. That would be a better approximation than just EQ. But at the point of perception, yes, right now it would be impossible to suggest the definitive "filter" that would transform the sound clip as requested.
– A_A
5 hours ago
re-stated in another way: given the same exact phsyical stimulus is created at the cochleas of two distinct individuals, they will (probably) hearing two different perceptions and what they actually hear (afaik) is a self experience that's closed to any external inquisiton of any sort yet mathematical... That being said, humans can communicate acoustically is a result of the discrete nature of the language.
– Fat32
4 hours ago
add a comment |
up vote
7
down vote
The most practical attempt that I am aware of is by Won and Berger (2005). They simultaneously recorded vocalizations at the mouth with a microphone and on the skull with a homemade vibrometer. They then estimated the relevant transfer functions with linear predictive coding and cepstral smoothing.
New contributor
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
8
down vote
It is not impossible but it is not going to be a walk in the park too.
What you would be trying to do is to add to the voice signal, those vibrations that are delivered to the ear via the bones and are not accessible to anyone else.
But this is easier said than done in an accurate way.
Sound propagation through a medium depends very much on its density. Sound travels at ~1500m/s in water and with less dissipation than it travels in air (~340m/s). Bone is denser than air, therefore sound should travel faster through bone. This means that "your" sound begins to excite your ears first, followed by the sound that you perceive via the "normal" air channel. In reality, bone has an internal structure that might be affecting the way different frequencies pass through it but at the range of frequencies we are talking about, perhaps we can consider it as an equivalent solid. This can only be approximated because any attempt at measurement would have to be invasive but also because hearing is subjective.
Hearing, or the perception of sound is a HUGE contributor of difficulty here. The ear itself, the outer ear (the visible bit), the canal and the inner mechanism work together in very complicated ways. This is the subject of psychoacoustics. One example of this complex processing is phantom tones where the brain is filling in things that are supposed to be there. The brain itself may have already developed ways of isolating the self-generated signal that are inaccessible to us yet.
But, a simplistic (simplistic!) way to witness the differences between being the listener of your own sound and not is this:
Record a short and simple word (e.g. "Fishbone", a word that has both low frequencies (b,o,n) and high frequencies (F,sh,i,e)) with a bit of silence and loop it through an equaliser through your headphones. Start playback and synchronise your self uttering the word with the recording (so, something like "Fishbone...Fishbone...Fishbone..."). Now try to fiddle with the equaliser until what you hear and what you utter are reasonably similar.
At that point, the settings on the equaliser would represent the differences between the sound and what it is perceived through you and theoretically, any other speech passed through that equaliser would simulate how it arrives at your ears, as if you would have generated it with a source inside your body.
Hope this helps.
1
it's probably impossible due to the individual differences of perception and impossibility of quantifying that subjectivity. Yet the differences could be minor, such as in the case of every produced 1000uF cap is actually slightly different...
– Fat32
6 hours ago
1
@Fat32 I could not decide on the impossibility because technically, it could be possible to quantify / measure the contribution of the second channel which is established through the bones and via reasonable assumptions come up with some approximation. Like what it feels like in a medical condition which is totally different for the "patient" perspective. That would be a better approximation than just EQ. But at the point of perception, yes, right now it would be impossible to suggest the definitive "filter" that would transform the sound clip as requested.
– A_A
5 hours ago
re-stated in another way: given the same exact phsyical stimulus is created at the cochleas of two distinct individuals, they will (probably) hearing two different perceptions and what they actually hear (afaik) is a self experience that's closed to any external inquisiton of any sort yet mathematical... That being said, humans can communicate acoustically is a result of the discrete nature of the language.
– Fat32
4 hours ago
add a comment |
up vote
8
down vote
It is not impossible but it is not going to be a walk in the park too.
What you would be trying to do is to add to the voice signal, those vibrations that are delivered to the ear via the bones and are not accessible to anyone else.
But this is easier said than done in an accurate way.
Sound propagation through a medium depends very much on its density. Sound travels at ~1500m/s in water and with less dissipation than it travels in air (~340m/s). Bone is denser than air, therefore sound should travel faster through bone. This means that "your" sound begins to excite your ears first, followed by the sound that you perceive via the "normal" air channel. In reality, bone has an internal structure that might be affecting the way different frequencies pass through it but at the range of frequencies we are talking about, perhaps we can consider it as an equivalent solid. This can only be approximated because any attempt at measurement would have to be invasive but also because hearing is subjective.
Hearing, or the perception of sound is a HUGE contributor of difficulty here. The ear itself, the outer ear (the visible bit), the canal and the inner mechanism work together in very complicated ways. This is the subject of psychoacoustics. One example of this complex processing is phantom tones where the brain is filling in things that are supposed to be there. The brain itself may have already developed ways of isolating the self-generated signal that are inaccessible to us yet.
But, a simplistic (simplistic!) way to witness the differences between being the listener of your own sound and not is this:
Record a short and simple word (e.g. "Fishbone", a word that has both low frequencies (b,o,n) and high frequencies (F,sh,i,e)) with a bit of silence and loop it through an equaliser through your headphones. Start playback and synchronise your self uttering the word with the recording (so, something like "Fishbone...Fishbone...Fishbone..."). Now try to fiddle with the equaliser until what you hear and what you utter are reasonably similar.
At that point, the settings on the equaliser would represent the differences between the sound and what it is perceived through you and theoretically, any other speech passed through that equaliser would simulate how it arrives at your ears, as if you would have generated it with a source inside your body.
Hope this helps.
1
it's probably impossible due to the individual differences of perception and impossibility of quantifying that subjectivity. Yet the differences could be minor, such as in the case of every produced 1000uF cap is actually slightly different...
– Fat32
6 hours ago
1
@Fat32 I could not decide on the impossibility because technically, it could be possible to quantify / measure the contribution of the second channel which is established through the bones and via reasonable assumptions come up with some approximation. Like what it feels like in a medical condition which is totally different for the "patient" perspective. That would be a better approximation than just EQ. But at the point of perception, yes, right now it would be impossible to suggest the definitive "filter" that would transform the sound clip as requested.
– A_A
5 hours ago
re-stated in another way: given the same exact phsyical stimulus is created at the cochleas of two distinct individuals, they will (probably) hearing two different perceptions and what they actually hear (afaik) is a self experience that's closed to any external inquisiton of any sort yet mathematical... That being said, humans can communicate acoustically is a result of the discrete nature of the language.
– Fat32
4 hours ago
add a comment |
up vote
8
down vote
up vote
8
down vote
It is not impossible but it is not going to be a walk in the park too.
What you would be trying to do is to add to the voice signal, those vibrations that are delivered to the ear via the bones and are not accessible to anyone else.
But this is easier said than done in an accurate way.
Sound propagation through a medium depends very much on its density. Sound travels at ~1500m/s in water and with less dissipation than it travels in air (~340m/s). Bone is denser than air, therefore sound should travel faster through bone. This means that "your" sound begins to excite your ears first, followed by the sound that you perceive via the "normal" air channel. In reality, bone has an internal structure that might be affecting the way different frequencies pass through it but at the range of frequencies we are talking about, perhaps we can consider it as an equivalent solid. This can only be approximated because any attempt at measurement would have to be invasive but also because hearing is subjective.
Hearing, or the perception of sound is a HUGE contributor of difficulty here. The ear itself, the outer ear (the visible bit), the canal and the inner mechanism work together in very complicated ways. This is the subject of psychoacoustics. One example of this complex processing is phantom tones where the brain is filling in things that are supposed to be there. The brain itself may have already developed ways of isolating the self-generated signal that are inaccessible to us yet.
But, a simplistic (simplistic!) way to witness the differences between being the listener of your own sound and not is this:
Record a short and simple word (e.g. "Fishbone", a word that has both low frequencies (b,o,n) and high frequencies (F,sh,i,e)) with a bit of silence and loop it through an equaliser through your headphones. Start playback and synchronise your self uttering the word with the recording (so, something like "Fishbone...Fishbone...Fishbone..."). Now try to fiddle with the equaliser until what you hear and what you utter are reasonably similar.
At that point, the settings on the equaliser would represent the differences between the sound and what it is perceived through you and theoretically, any other speech passed through that equaliser would simulate how it arrives at your ears, as if you would have generated it with a source inside your body.
Hope this helps.
It is not impossible but it is not going to be a walk in the park too.
What you would be trying to do is to add to the voice signal, those vibrations that are delivered to the ear via the bones and are not accessible to anyone else.
But this is easier said than done in an accurate way.
Sound propagation through a medium depends very much on its density. Sound travels at ~1500m/s in water and with less dissipation than it travels in air (~340m/s). Bone is denser than air, therefore sound should travel faster through bone. This means that "your" sound begins to excite your ears first, followed by the sound that you perceive via the "normal" air channel. In reality, bone has an internal structure that might be affecting the way different frequencies pass through it but at the range of frequencies we are talking about, perhaps we can consider it as an equivalent solid. This can only be approximated because any attempt at measurement would have to be invasive but also because hearing is subjective.
Hearing, or the perception of sound is a HUGE contributor of difficulty here. The ear itself, the outer ear (the visible bit), the canal and the inner mechanism work together in very complicated ways. This is the subject of psychoacoustics. One example of this complex processing is phantom tones where the brain is filling in things that are supposed to be there. The brain itself may have already developed ways of isolating the self-generated signal that are inaccessible to us yet.
But, a simplistic (simplistic!) way to witness the differences between being the listener of your own sound and not is this:
Record a short and simple word (e.g. "Fishbone", a word that has both low frequencies (b,o,n) and high frequencies (F,sh,i,e)) with a bit of silence and loop it through an equaliser through your headphones. Start playback and synchronise your self uttering the word with the recording (so, something like "Fishbone...Fishbone...Fishbone..."). Now try to fiddle with the equaliser until what you hear and what you utter are reasonably similar.
At that point, the settings on the equaliser would represent the differences between the sound and what it is perceived through you and theoretically, any other speech passed through that equaliser would simulate how it arrives at your ears, as if you would have generated it with a source inside your body.
Hope this helps.
answered 13 hours ago
A_A
7,08431630
7,08431630
1
it's probably impossible due to the individual differences of perception and impossibility of quantifying that subjectivity. Yet the differences could be minor, such as in the case of every produced 1000uF cap is actually slightly different...
– Fat32
6 hours ago
1
@Fat32 I could not decide on the impossibility because technically, it could be possible to quantify / measure the contribution of the second channel which is established through the bones and via reasonable assumptions come up with some approximation. Like what it feels like in a medical condition which is totally different for the "patient" perspective. That would be a better approximation than just EQ. But at the point of perception, yes, right now it would be impossible to suggest the definitive "filter" that would transform the sound clip as requested.
– A_A
5 hours ago
re-stated in another way: given the same exact phsyical stimulus is created at the cochleas of two distinct individuals, they will (probably) hearing two different perceptions and what they actually hear (afaik) is a self experience that's closed to any external inquisiton of any sort yet mathematical... That being said, humans can communicate acoustically is a result of the discrete nature of the language.
– Fat32
4 hours ago
add a comment |
1
it's probably impossible due to the individual differences of perception and impossibility of quantifying that subjectivity. Yet the differences could be minor, such as in the case of every produced 1000uF cap is actually slightly different...
– Fat32
6 hours ago
1
@Fat32 I could not decide on the impossibility because technically, it could be possible to quantify / measure the contribution of the second channel which is established through the bones and via reasonable assumptions come up with some approximation. Like what it feels like in a medical condition which is totally different for the "patient" perspective. That would be a better approximation than just EQ. But at the point of perception, yes, right now it would be impossible to suggest the definitive "filter" that would transform the sound clip as requested.
– A_A
5 hours ago
re-stated in another way: given the same exact phsyical stimulus is created at the cochleas of two distinct individuals, they will (probably) hearing two different perceptions and what they actually hear (afaik) is a self experience that's closed to any external inquisiton of any sort yet mathematical... That being said, humans can communicate acoustically is a result of the discrete nature of the language.
– Fat32
4 hours ago
1
1
it's probably impossible due to the individual differences of perception and impossibility of quantifying that subjectivity. Yet the differences could be minor, such as in the case of every produced 1000uF cap is actually slightly different...
– Fat32
6 hours ago
it's probably impossible due to the individual differences of perception and impossibility of quantifying that subjectivity. Yet the differences could be minor, such as in the case of every produced 1000uF cap is actually slightly different...
– Fat32
6 hours ago
1
1
@Fat32 I could not decide on the impossibility because technically, it could be possible to quantify / measure the contribution of the second channel which is established through the bones and via reasonable assumptions come up with some approximation. Like what it feels like in a medical condition which is totally different for the "patient" perspective. That would be a better approximation than just EQ. But at the point of perception, yes, right now it would be impossible to suggest the definitive "filter" that would transform the sound clip as requested.
– A_A
5 hours ago
@Fat32 I could not decide on the impossibility because technically, it could be possible to quantify / measure the contribution of the second channel which is established through the bones and via reasonable assumptions come up with some approximation. Like what it feels like in a medical condition which is totally different for the "patient" perspective. That would be a better approximation than just EQ. But at the point of perception, yes, right now it would be impossible to suggest the definitive "filter" that would transform the sound clip as requested.
– A_A
5 hours ago
re-stated in another way: given the same exact phsyical stimulus is created at the cochleas of two distinct individuals, they will (probably) hearing two different perceptions and what they actually hear (afaik) is a self experience that's closed to any external inquisiton of any sort yet mathematical... That being said, humans can communicate acoustically is a result of the discrete nature of the language.
– Fat32
4 hours ago
re-stated in another way: given the same exact phsyical stimulus is created at the cochleas of two distinct individuals, they will (probably) hearing two different perceptions and what they actually hear (afaik) is a self experience that's closed to any external inquisiton of any sort yet mathematical... That being said, humans can communicate acoustically is a result of the discrete nature of the language.
– Fat32
4 hours ago
add a comment |
up vote
7
down vote
The most practical attempt that I am aware of is by Won and Berger (2005). They simultaneously recorded vocalizations at the mouth with a microphone and on the skull with a homemade vibrometer. They then estimated the relevant transfer functions with linear predictive coding and cepstral smoothing.
New contributor
add a comment |
up vote
7
down vote
The most practical attempt that I am aware of is by Won and Berger (2005). They simultaneously recorded vocalizations at the mouth with a microphone and on the skull with a homemade vibrometer. They then estimated the relevant transfer functions with linear predictive coding and cepstral smoothing.
New contributor
add a comment |
up vote
7
down vote
up vote
7
down vote
The most practical attempt that I am aware of is by Won and Berger (2005). They simultaneously recorded vocalizations at the mouth with a microphone and on the skull with a homemade vibrometer. They then estimated the relevant transfer functions with linear predictive coding and cepstral smoothing.
New contributor
The most practical attempt that I am aware of is by Won and Berger (2005). They simultaneously recorded vocalizations at the mouth with a microphone and on the skull with a homemade vibrometer. They then estimated the relevant transfer functions with linear predictive coding and cepstral smoothing.
New contributor
New contributor
answered 10 hours ago
StrongBad
1912
1912
New contributor
New contributor
add a comment |
add a comment |
Kevin Fiegenbaum is a new contributor. Be nice, and check out our Code of Conduct.
Kevin Fiegenbaum is a new contributor. Be nice, and check out our Code of Conduct.
Kevin Fiegenbaum is a new contributor. Be nice, and check out our Code of Conduct.
Kevin Fiegenbaum is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Signal Processing Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdsp.stackexchange.com%2fquestions%2f54061%2fshow-others-how-i-hear-myself%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
You could make the problem as easy as possible: Make a recording of your speech that, when listened by you through headphones, sounds the same as your speech sounds to you when you speak in an anechoic chamber. Not sure how to do that.
– Olli Niemitalo
14 hours ago
2
I just wanted to propose exactly that. However, is it really necessary to exclude the influence of the room? The directivity of your voice as a sound source is surely a factor, but I think this method will probably work quite well if the recording is done in the same place as where the "adjustment procedure" takes place.
– applesoup
14 hours ago