How should I approach reverse engineering this text encoding?












13














So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!



However, various other message files and quest summaries and the like are not so convenient.



Here's a comparison of two files in a hex editor:
enter image description here



At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.



I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0



Going through it and looking for common and recurring words, I found this:



Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530

Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30

Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530

Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30

Grief
EA30 FC30 D530

Thief
B730 FC30 D530


So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"



Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830


And you don't see the EA30 that starts off "Grief".



I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.



The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.



Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?










share|improve this question


















  • 1




    I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
    – corsiKa
    Dec 10 at 16:15
















13














So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!



However, various other message files and quest summaries and the like are not so convenient.



Here's a comparison of two files in a hex editor:
enter image description here



At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.



I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0



Going through it and looking for common and recurring words, I found this:



Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530

Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30

Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530

Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30

Grief
EA30 FC30 D530

Thief
B730 FC30 D530


So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"



Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830


And you don't see the EA30 that starts off "Grief".



I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.



The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.



Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?










share|improve this question


















  • 1




    I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
    – corsiKa
    Dec 10 at 16:15














13












13








13


3





So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!



However, various other message files and quest summaries and the like are not so convenient.



Here's a comparison of two files in a hex editor:
enter image description here



At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.



I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0



Going through it and looking for common and recurring words, I found this:



Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530

Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30

Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530

Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30

Grief
EA30 FC30 D530

Thief
B730 FC30 D530


So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"



Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830


And you don't see the EA30 that starts off "Grief".



I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.



The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.



Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?










share|improve this question













So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!



However, various other message files and quest summaries and the like are not so convenient.



Here's a comparison of two files in a hex editor:
enter image description here



At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.



I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0



Going through it and looking for common and recurring words, I found this:



Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530

Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30

Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530

Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30

Grief
EA30 FC30 D530

Thief
B730 FC30 D530


So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"



Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830


And you don't see the EA30 that starts off "Grief".



I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.



The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.



Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?







encodings






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 10 at 12:10









Celandine Crane

6814




6814








  • 1




    I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
    – corsiKa
    Dec 10 at 16:15














  • 1




    I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
    – corsiKa
    Dec 10 at 16:15








1




1




I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
– corsiKa
Dec 10 at 16:15




I can't help but notice Grief screamer has an extra B030 at the start that isn't present in the Griefline
– corsiKa
Dec 10 at 16:15










1 Answer
1






active

oldest

votes


















25














The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.






share|improve this answer





















  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    Dec 10 at 14:21






  • 1




    As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    Dec 10 at 14:27






  • 4




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    Dec 10 at 14:30








  • 3




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    Dec 10 at 20:07










  • In other word, there is no 'special encoding' to reverse engineer...
    – user202729
    Dec 13 at 8:01











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "489"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2freverseengineering.stackexchange.com%2fquestions%2f20109%2fhow-should-i-approach-reverse-engineering-this-text-encoding%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









25














The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.






share|improve this answer





















  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    Dec 10 at 14:21






  • 1




    As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    Dec 10 at 14:27






  • 4




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    Dec 10 at 14:30








  • 3




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    Dec 10 at 20:07










  • In other word, there is no 'special encoding' to reverse engineer...
    – user202729
    Dec 13 at 8:01
















25














The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.






share|improve this answer





















  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    Dec 10 at 14:21






  • 1




    As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    Dec 10 at 14:27






  • 4




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    Dec 10 at 14:30








  • 3




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    Dec 10 at 20:07










  • In other word, there is no 'special encoding' to reverse engineer...
    – user202729
    Dec 13 at 8:01














25












25








25






The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.






share|improve this answer












The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:



1131 3210 3130 3330 1021 0000 0000 0000  .12.1030.!......


Translates to this on the right:



1100 3100 3200 1000 3100 3000 3300 3000  ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................


This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21 in ASCII or UTF-8 but 0x01ff is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).



So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"



b430 d630 ea30 f330 b730 fc30 d530


is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:



ゴ  go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu


So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.







share|improve this answer












share|improve this answer



share|improve this answer










answered Dec 10 at 14:03









Edward

1,9981021




1,9981021












  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    Dec 10 at 14:21






  • 1




    As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    Dec 10 at 14:27






  • 4




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    Dec 10 at 14:30








  • 3




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    Dec 10 at 20:07










  • In other word, there is no 'special encoding' to reverse engineer...
    – user202729
    Dec 13 at 8:01


















  • Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
    – usr2564301
    Dec 10 at 14:21






  • 1




    As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
    – Edward
    Dec 10 at 14:27






  • 4




    Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
    – Celandine Crane
    Dec 10 at 14:30








  • 3




    Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
    – Celandine Crane
    Dec 10 at 20:07










  • In other word, there is no 'special encoding' to reverse engineer...
    – user202729
    Dec 13 at 8:01
















Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
– usr2564301
Dec 10 at 14:21




Can you switch around your hex words to the correct endianness 30b4 30d6 ..? That way they form correct Unicode values for Japanese.
– usr2564301
Dec 10 at 14:21




1




1




As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
Dec 10 at 14:27




As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
Dec 10 at 14:27




4




4




Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
Dec 10 at 14:30






Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
Dec 10 at 14:30






3




3




Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
Dec 10 at 20:07




Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
Dec 10 at 20:07












In other word, there is no 'special encoding' to reverse engineer...
– user202729
Dec 13 at 8:01




In other word, there is no 'special encoding' to reverse engineer...
– user202729
Dec 13 at 8:01


















draft saved

draft discarded




















































Thanks for contributing an answer to Reverse Engineering Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2freverseengineering.stackexchange.com%2fquestions%2f20109%2fhow-should-i-approach-reverse-engineering-this-text-encoding%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Morgemoulin

Scott Moir

Souastre