combine two files to single file with combined columns

up vote
2
down vote

favorite

I need to combine two files into a single file with all columns from both files.

I am providing my example files.
File 1

chr loc T1  C1

chr1    100 2   3

chr1    200 3   4

chr2    100 1   4

chr2    400 3   1

File 2

chr loc T2  C2

chr1    100 1   2

chr1    300 4   1

chr2    100 7   5

chr2    500 1   9

and output file should be like this

output file

chr loc T1  C1  T2  C2

chr1    100 2   3   1   2

chr1    200 3   4   0   0

chr1    300 0   0   4   1

chr2    100 1   4   7   5

chr2    400 3   1   0   0

chr2    500 0   0   1   9

edited Nov 24 at 20:12

Rui F Ribeiro

38.3k1476127

asked Jul 16 '15 at 20:52

Naresh DJ

4714

add a comment |

up vote
2
down vote

favorite

I need to combine two files into a single file with all columns from both files.

I am providing my example files.
File 1

chr loc T1  C1

chr1    100 2   3

chr1    200 3   4

chr2    100 1   4

chr2    400 3   1

File 2

chr loc T2  C2

chr1    100 1   2

chr1    300 4   1

chr2    100 7   5

chr2    500 1   9

and output file should be like this

output file

chr loc T1  C1  T2  C2

chr1    100 2   3   1   2

chr1    200 3   4   0   0

chr1    300 0   0   4   1

chr2    100 1   4   7   5

chr2    400 3   1   0   0

chr2    500 0   0   1   9

edited Nov 24 at 20:12

Rui F Ribeiro

38.3k1476127

asked Jul 16 '15 at 20:52

Naresh DJ

4714

add a comment |

up vote
2
down vote

favorite

I need to combine two files into a single file with all columns from both files.

I am providing my example files.
File 1

chr loc T1  C1

chr1    100 2   3

chr1    200 3   4

chr2    100 1   4

chr2    400 3   1

File 2

chr loc T2  C2

chr1    100 1   2

chr1    300 4   1

chr2    100 7   5

chr2    500 1   9

and output file should be like this

output file

chr loc T1  C1  T2  C2

chr1    100 2   3   1   2

chr1    200 3   4   0   0

chr1    300 0   0   4   1

chr2    100 1   4   7   5

chr2    400 3   1   0   0

chr2    500 0   0   1   9

edited Nov 24 at 20:12

Rui F Ribeiro

38.3k1476127

asked Jul 16 '15 at 20:52

Naresh DJ

4714

I need to combine two files into a single file with all columns from both files.

I am providing my example files.
File 1

chr loc T1  C1

chr1    100 2   3

chr1    200 3   4

chr2    100 1   4

chr2    400 3   1

File 2

chr loc T2  C2

chr1    100 1   2

chr1    300 4   1

chr2    100 7   5

chr2    500 1   9

and output file should be like this

output file

chr loc T1  C1  T2  C2

chr1    100 2   3   1   2

chr1    200 3   4   0   0

chr1    300 0   0   4   1

chr2    100 1   4   7   5

chr2    400 3   1   0   0

chr2    500 0   0   1   9

shell text-processing join

edited Nov 24 at 20:12

Rui F Ribeiro

38.3k1476127

asked Jul 16 '15 at 20:52

Naresh DJ

4714

edited Nov 24 at 20:12

Rui F Ribeiro

38.3k1476127

asked Jul 16 '15 at 20:52

Naresh DJ

4714

edited Nov 24 at 20:12

Rui F Ribeiro

38.3k1476127

edited Nov 24 at 20:12

Rui F Ribeiro

38.3k1476127

edited Nov 24 at 20:12

Rui F Ribeiro

38.3k1476127

asked Jul 16 '15 at 20:52

Naresh DJ

4714

asked Jul 16 '15 at 20:52

Naresh DJ

4714

asked Jul 16 '15 at 20:52

Naresh DJ

4714

add a comment |

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

join -a1 -a2 -e 0 -o 0,1.2,1.3,2.2,2.3 

    <(sed 's/ +/_/' file1 | sort) 

    <(sed 's/ +/_/' file2 | sort) | 

sed 's/_/ /' | 

column  -t | 

sort

chr   loc  T1  C1  T2  C2

chr1  100  2   3   1   2

chr1  200  3   4   0   0

chr1  300  0   0   4   1

chr2  100  1   4   7   5

chr2  400  3   1   0   0

chr2  500  0   0   1   9

The trickiest part here are the reasons for sed -- join will only join on a single field, and here the join criteria is the first 2 fields. So, we have to combine those fields into a single word: I replace the first sequence of whitespace with an underscore so join will see chr1_100, chr1_200, etc.

join requires its input files to be sorted.

I use process substitution so that join can work with the sed|sort pipelines like files.

Then another sed call to undo the combined field, and then column to make it pretty.

By default, join uses the first field of each file as the key field.

By default, join does an inner join: only keys present in both files are printed. The -a1 and -a2 option enable the full outer join we want. The -e option provides the default value for null fields, and we need the -o option to specify that we want all the fields.

Can also use awk:

awk '

    {key = $1 OFS $2} 

    NR == FNR {f1[key] = $3; f2[key] = $4; next} 

    !(key in f1) {print $1, $2, 0, 0, $3, $4; next} 

    {print key, f1[key], f2[key], $3, $4; delete f1[key]} 

    END {for (key in f1) print key, f1[key], f2[key], 0, 0}

' file1 file2 | sort

chr loc T1 C1 T2 C2

chr1 100 2 3 1 2

chr1 200 3 4 0 0

chr1 300 0 0 4 1

chr2 100 1 4 7 5

chr2 400 3 1 0 0

chr2 500 0 0 1 9

edited Jul 17 '15 at 14:01

answered Jul 16 '15 at 21:13

glenn jackman

49.6k569106

join is still greek to me. I've never yet climbed that mountain.
– mikeserv
Jul 16 '15 at 21:15

Hi @glenn jackmann, its not giving the output as you mentioned. Instead it gives output like this. chr1 100 2 100 1 chr1 100 2 300 4 chr1 200 3 100 1 chr1 200 3 300 4 chr2 100 1 100 7 chr2 100 1 500 1 chr2 400 3 100 7 chr2 400 3 500 1 chr loc T1 loc T2
– Naresh DJ
Jul 16 '15 at 21:40

@NareshDJ - you're getting the header on the last line because of your locale; add LANG=C or LC_ALL=C before the last sort e.g. ... | LANG=C sort
– don_crissti
Jul 17 '15 at 15:36

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f216569%2fcombine-two-files-to-single-file-with-combined-columns%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

join -a1 -a2 -e 0 -o 0,1.2,1.3,2.2,2.3 

    <(sed 's/ +/_/' file1 | sort) 

    <(sed 's/ +/_/' file2 | sort) | 

sed 's/_/ /' | 

column  -t | 

sort

chr   loc  T1  C1  T2  C2

chr1  100  2   3   1   2

chr1  200  3   4   0   0

chr1  300  0   0   4   1

chr2  100  1   4   7   5

chr2  400  3   1   0   0

chr2  500  0   0   1   9

join requires its input files to be sorted.

I use process substitution so that join can work with the sed|sort pipelines like files.

Then another sed call to undo the combined field, and then column to make it pretty.

By default, join uses the first field of each file as the key field.

Can also use awk:

awk '

    {key = $1 OFS $2} 

    NR == FNR {f1[key] = $3; f2[key] = $4; next} 

    !(key in f1) {print $1, $2, 0, 0, $3, $4; next} 

    {print key, f1[key], f2[key], $3, $4; delete f1[key]} 

    END {for (key in f1) print key, f1[key], f2[key], 0, 0}

' file1 file2 | sort

chr loc T1 C1 T2 C2

chr1 100 2 3 1 2

chr1 200 3 4 0 0

chr1 300 0 0 4 1

chr2 100 1 4 7 5

chr2 400 3 1 0 0

chr2 500 0 0 1 9

edited Jul 17 '15 at 14:01

answered Jul 16 '15 at 21:13

glenn jackman

49.6k569106

join is still greek to me. I've never yet climbed that mountain.
– mikeserv
Jul 16 '15 at 21:15

Hi @glenn jackmann, its not giving the output as you mentioned. Instead it gives output like this. chr1 100 2 100 1 chr1 100 2 300 4 chr1 200 3 100 1 chr1 200 3 300 4 chr2 100 1 100 7 chr2 100 1 500 1 chr2 400 3 100 7 chr2 400 3 500 1 chr loc T1 loc T2
– Naresh DJ
Jul 16 '15 at 21:40

@NareshDJ - you're getting the header on the last line because of your locale; add LANG=C or LC_ALL=C before the last sort e.g. ... | LANG=C sort
– don_crissti
Jul 17 '15 at 15:36

add a comment |

up vote
5
down vote

accepted

join -a1 -a2 -e 0 -o 0,1.2,1.3,2.2,2.3 

    <(sed 's/ +/_/' file1 | sort) 

    <(sed 's/ +/_/' file2 | sort) | 

sed 's/_/ /' | 

column  -t | 

sort

chr   loc  T1  C1  T2  C2

chr1  100  2   3   1   2

chr1  200  3   4   0   0

chr1  300  0   0   4   1

chr2  100  1   4   7   5

chr2  400  3   1   0   0

chr2  500  0   0   1   9

join requires its input files to be sorted.

I use process substitution so that join can work with the sed|sort pipelines like files.

Then another sed call to undo the combined field, and then column to make it pretty.

By default, join uses the first field of each file as the key field.

Can also use awk:

awk '

    {key = $1 OFS $2} 

    NR == FNR {f1[key] = $3; f2[key] = $4; next} 

    !(key in f1) {print $1, $2, 0, 0, $3, $4; next} 

    {print key, f1[key], f2[key], $3, $4; delete f1[key]} 

    END {for (key in f1) print key, f1[key], f2[key], 0, 0}

' file1 file2 | sort

chr loc T1 C1 T2 C2

chr1 100 2 3 1 2

chr1 200 3 4 0 0

chr1 300 0 0 4 1

chr2 100 1 4 7 5

chr2 400 3 1 0 0

chr2 500 0 0 1 9

edited Jul 17 '15 at 14:01

answered Jul 16 '15 at 21:13

glenn jackman

49.6k569106

join is still greek to me. I've never yet climbed that mountain.
– mikeserv
Jul 16 '15 at 21:15

Hi @glenn jackmann, its not giving the output as you mentioned. Instead it gives output like this. chr1 100 2 100 1 chr1 100 2 300 4 chr1 200 3 100 1 chr1 200 3 300 4 chr2 100 1 100 7 chr2 100 1 500 1 chr2 400 3 100 7 chr2 400 3 500 1 chr loc T1 loc T2
– Naresh DJ
Jul 16 '15 at 21:40

@NareshDJ - you're getting the header on the last line because of your locale; add LANG=C or LC_ALL=C before the last sort e.g. ... | LANG=C sort
– don_crissti
Jul 17 '15 at 15:36

add a comment |

up vote
5
down vote

accepted

join -a1 -a2 -e 0 -o 0,1.2,1.3,2.2,2.3 

    <(sed 's/ +/_/' file1 | sort) 

    <(sed 's/ +/_/' file2 | sort) | 

sed 's/_/ /' | 

column  -t | 

sort

chr   loc  T1  C1  T2  C2

chr1  100  2   3   1   2

chr1  200  3   4   0   0

chr1  300  0   0   4   1

chr2  100  1   4   7   5

chr2  400  3   1   0   0

chr2  500  0   0   1   9

join requires its input files to be sorted.

I use process substitution so that join can work with the sed|sort pipelines like files.

Then another sed call to undo the combined field, and then column to make it pretty.

By default, join uses the first field of each file as the key field.

Can also use awk:

awk '

    {key = $1 OFS $2} 

    NR == FNR {f1[key] = $3; f2[key] = $4; next} 

    !(key in f1) {print $1, $2, 0, 0, $3, $4; next} 

    {print key, f1[key], f2[key], $3, $4; delete f1[key]} 

    END {for (key in f1) print key, f1[key], f2[key], 0, 0}

' file1 file2 | sort

chr loc T1 C1 T2 C2

chr1 100 2 3 1 2

chr1 200 3 4 0 0

chr1 300 0 0 4 1

chr2 100 1 4 7 5

chr2 400 3 1 0 0

chr2 500 0 0 1 9

edited Jul 17 '15 at 14:01

answered Jul 16 '15 at 21:13

glenn jackman

49.6k569106

join -a1 -a2 -e 0 -o 0,1.2,1.3,2.2,2.3 

    <(sed 's/ +/_/' file1 | sort) 

    <(sed 's/ +/_/' file2 | sort) | 

sed 's/_/ /' | 

column  -t | 

sort

chr   loc  T1  C1  T2  C2

chr1  100  2   3   1   2

chr1  200  3   4   0   0

chr1  300  0   0   4   1

chr2  100  1   4   7   5

chr2  400  3   1   0   0

chr2  500  0   0   1   9

join requires its input files to be sorted.

I use process substitution so that join can work with the sed|sort pipelines like files.

Then another sed call to undo the combined field, and then column to make it pretty.

By default, join uses the first field of each file as the key field.

Can also use awk:

awk '

    {key = $1 OFS $2} 

    NR == FNR {f1[key] = $3; f2[key] = $4; next} 

    !(key in f1) {print $1, $2, 0, 0, $3, $4; next} 

    {print key, f1[key], f2[key], $3, $4; delete f1[key]} 

    END {for (key in f1) print key, f1[key], f2[key], 0, 0}

' file1 file2 | sort

chr loc T1 C1 T2 C2

chr1 100 2 3 1 2

chr1 200 3 4 0 0

chr1 300 0 0 4 1

chr2 100 1 4 7 5

chr2 400 3 1 0 0

chr2 500 0 0 1 9

edited Jul 17 '15 at 14:01

answered Jul 16 '15 at 21:13

glenn jackman

49.6k569106

edited Jul 17 '15 at 14:01

answered Jul 16 '15 at 21:13

glenn jackman

49.6k569106

answered Jul 16 '15 at 21:13

glenn jackman

49.6k569106

answered Jul 16 '15 at 21:13

glenn jackman

49.6k569106

join is still greek to me. I've never yet climbed that mountain.
– mikeserv
Jul 16 '15 at 21:15

Hi @glenn jackmann, its not giving the output as you mentioned. Instead it gives output like this. chr1 100 2 100 1 chr1 100 2 300 4 chr1 200 3 100 1 chr1 200 3 300 4 chr2 100 1 100 7 chr2 100 1 500 1 chr2 400 3 100 7 chr2 400 3 500 1 chr loc T1 loc T2
– Naresh DJ
Jul 16 '15 at 21:40

@NareshDJ - you're getting the header on the last line because of your locale; add LANG=C or LC_ALL=C before the last sort e.g. ... | LANG=C sort
– don_crissti
Jul 17 '15 at 15:36

add a comment |

join is still greek to me. I've never yet climbed that mountain.
– mikeserv
Jul 16 '15 at 21:15

Hi @glenn jackmann, its not giving the output as you mentioned. Instead it gives output like this. chr1 100 2 100 1 chr1 100 2 300 4 chr1 200 3 100 1 chr1 200 3 300 4 chr2 100 1 100 7 chr2 100 1 500 1 chr2 400 3 100 7 chr2 400 3 500 1 chr loc T1 loc T2
– Naresh DJ
Jul 16 '15 at 21:40

@NareshDJ - you're getting the header on the last line because of your locale; add LANG=C or LC_ALL=C before the last sort e.g. ... | LANG=C sort
– don_crissti
Jul 17 '15 at 15:36

join is still greek to me. I've never yet climbed that mountain.
– mikeserv
Jul 16 '15 at 21:15

Hi @glenn jackmann, its not giving the output as you mentioned. Instead it gives output like this. chr1 100 2 100 1 chr1 100 2 300 4 chr1 200 3 100 1 chr1 200 3 300 4 chr2 100 1 100 7 chr2 100 1 500 1 chr2 400 3 100 7 chr2 400 3 500 1 chr loc T1 loc T2
– Naresh DJ
Jul 16 '15 at 21:40

@NareshDJ - you're getting the header on the last line because of your locale; add LANG=C or LC_ALL=C before the last sort e.g. ... | LANG=C sort
– don_crissti
Jul 17 '15 at 15:36

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk