Organizing data by a header
I have two csv files.
File 1 contains header.
File 2 contains data
file 1 format : file1.csv
id,abc,xyz,aaa,bbb,ccc
file 2 format : file2.csv
id,source,value
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500
Now, I have to match the data in the source column in file2.csv with header in file1.csv and output should be like below
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
awk sed csv text-formatting gawk
add a comment |
I have two csv files.
File 1 contains header.
File 2 contains data
file 1 format : file1.csv
id,abc,xyz,aaa,bbb,ccc
file 2 format : file2.csv
id,source,value
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500
Now, I have to match the data in the source column in file2.csv with header in file1.csv and output should be like below
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
awk sed csv text-formatting gawk
2
Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
– maulinglawns
Oct 17 '18 at 12:39
Import data to sql table then export it in your format.
– MichalSv
Oct 17 '18 at 13:29
add a comment |
I have two csv files.
File 1 contains header.
File 2 contains data
file 1 format : file1.csv
id,abc,xyz,aaa,bbb,ccc
file 2 format : file2.csv
id,source,value
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500
Now, I have to match the data in the source column in file2.csv with header in file1.csv and output should be like below
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
awk sed csv text-formatting gawk
I have two csv files.
File 1 contains header.
File 2 contains data
file 1 format : file1.csv
id,abc,xyz,aaa,bbb,ccc
file 2 format : file2.csv
id,source,value
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500
Now, I have to match the data in the source column in file2.csv with header in file1.csv and output should be like below
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
awk sed csv text-formatting gawk
awk sed csv text-formatting gawk
edited Oct 17 '18 at 13:28
user314914
asked Oct 17 '18 at 12:10
prathimaprathima
62
62
2
Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
– maulinglawns
Oct 17 '18 at 12:39
Import data to sql table then export it in your format.
– MichalSv
Oct 17 '18 at 13:29
add a comment |
2
Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
– maulinglawns
Oct 17 '18 at 12:39
Import data to sql table then export it in your format.
– MichalSv
Oct 17 '18 at 13:29
2
2
Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
– maulinglawns
Oct 17 '18 at 12:39
Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
– maulinglawns
Oct 17 '18 at 12:39
Import data to sql table then export it in your format.
– MichalSv
Oct 17 '18 at 13:29
Import data to sql table then export it in your format.
– MichalSv
Oct 17 '18 at 13:29
add a comment |
3 Answers
3
active
oldest
votes
can't ... resist ... line noise ...
perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv
or a (somewhat) more sensible one-liner
perl -F, -lane '
if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
elsif ($. > 1) { # the 2nd file, minus the header
$data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
$data{$F[0]}{$F[1]} = $F[2];
}
close ARGV if eof; # reset $. for each new file
} END {
$, = ",";
print $_, $data{$_}->@{@sources} for sort keys %data
' file1.csv file2.csv
or, this is "combine.pl"
#!/usr/bin/env perl
use v5.22;
use Path::Tiny; # convenience module from CPAN
# read the header from the first file
my $file = path("file1.csv");
my @lines = $file->lines;
my $header = $lines[0];
chomp $header;
my @sources = split /,/, $header;
# read the data from the second file
$file = path("file2.csv");
chomp( @lines = $file->lines );
shift @lines; # ignore the header
my %data;
for my $line (@lines) {
my ($id, $source, $value) = split /,/, $line, 3;
if (not exists $data{$id}) {
# initialize the output data for a new id
$data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
}
# and store this value
$data{$id}{$source} = $value;
}
# output the results
say $header;
$, = ",";
for my $id (sort {$a <=> $b} keys %data) {
say $data{$id}->@{@sources};
}
then: perl combine.pl > output.csv
add a comment |
... and the inevitable awk
proposal:
awk -F, '
function PRT() {printf "%d", ID # print first field: ID
for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
printf ORS # line feed
}
NR==FNR {print # in first file; print header
for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
next # no further processing of this line
}
FNR > 2 && # skip printout for first ID (in second line of file2)
$1 != ID {PRT() # print if new ID found
split ("", PF) # empty the print array
}
{ID = $1 # retain ID
PF[COL[$2]] = $3 # collect col values into respective column
}
END {PRT() # print last IDs record
}
' file[12] # shells "pattern matching" expands resp. files
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
add a comment |
it's a classical unmelt or untidy operation.
Using Miller (http://johnkerl.org/miller/doc) and starting from
id,source,value
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500
you can run
mlr --csv reshape -s source,value then unsparsify input.csv
And have
id,abc,xyz,aaa,bbb,ccc
1,100,200,,,
2,,,300,400,500
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476023%2forganizing-data-by-a-header%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
can't ... resist ... line noise ...
perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv
or a (somewhat) more sensible one-liner
perl -F, -lane '
if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
elsif ($. > 1) { # the 2nd file, minus the header
$data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
$data{$F[0]}{$F[1]} = $F[2];
}
close ARGV if eof; # reset $. for each new file
} END {
$, = ",";
print $_, $data{$_}->@{@sources} for sort keys %data
' file1.csv file2.csv
or, this is "combine.pl"
#!/usr/bin/env perl
use v5.22;
use Path::Tiny; # convenience module from CPAN
# read the header from the first file
my $file = path("file1.csv");
my @lines = $file->lines;
my $header = $lines[0];
chomp $header;
my @sources = split /,/, $header;
# read the data from the second file
$file = path("file2.csv");
chomp( @lines = $file->lines );
shift @lines; # ignore the header
my %data;
for my $line (@lines) {
my ($id, $source, $value) = split /,/, $line, 3;
if (not exists $data{$id}) {
# initialize the output data for a new id
$data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
}
# and store this value
$data{$id}{$source} = $value;
}
# output the results
say $header;
$, = ",";
for my $id (sort {$a <=> $b} keys %data) {
say $data{$id}->@{@sources};
}
then: perl combine.pl > output.csv
add a comment |
can't ... resist ... line noise ...
perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv
or a (somewhat) more sensible one-liner
perl -F, -lane '
if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
elsif ($. > 1) { # the 2nd file, minus the header
$data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
$data{$F[0]}{$F[1]} = $F[2];
}
close ARGV if eof; # reset $. for each new file
} END {
$, = ",";
print $_, $data{$_}->@{@sources} for sort keys %data
' file1.csv file2.csv
or, this is "combine.pl"
#!/usr/bin/env perl
use v5.22;
use Path::Tiny; # convenience module from CPAN
# read the header from the first file
my $file = path("file1.csv");
my @lines = $file->lines;
my $header = $lines[0];
chomp $header;
my @sources = split /,/, $header;
# read the data from the second file
$file = path("file2.csv");
chomp( @lines = $file->lines );
shift @lines; # ignore the header
my %data;
for my $line (@lines) {
my ($id, $source, $value) = split /,/, $line, 3;
if (not exists $data{$id}) {
# initialize the output data for a new id
$data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
}
# and store this value
$data{$id}{$source} = $value;
}
# output the results
say $header;
$, = ",";
for my $id (sort {$a <=> $b} keys %data) {
say $data{$id}->@{@sources};
}
then: perl combine.pl > output.csv
add a comment |
can't ... resist ... line noise ...
perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv
or a (somewhat) more sensible one-liner
perl -F, -lane '
if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
elsif ($. > 1) { # the 2nd file, minus the header
$data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
$data{$F[0]}{$F[1]} = $F[2];
}
close ARGV if eof; # reset $. for each new file
} END {
$, = ",";
print $_, $data{$_}->@{@sources} for sort keys %data
' file1.csv file2.csv
or, this is "combine.pl"
#!/usr/bin/env perl
use v5.22;
use Path::Tiny; # convenience module from CPAN
# read the header from the first file
my $file = path("file1.csv");
my @lines = $file->lines;
my $header = $lines[0];
chomp $header;
my @sources = split /,/, $header;
# read the data from the second file
$file = path("file2.csv");
chomp( @lines = $file->lines );
shift @lines; # ignore the header
my %data;
for my $line (@lines) {
my ($id, $source, $value) = split /,/, $line, 3;
if (not exists $data{$id}) {
# initialize the output data for a new id
$data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
}
# and store this value
$data{$id}{$source} = $value;
}
# output the results
say $header;
$, = ",";
for my $id (sort {$a <=> $b} keys %data) {
say $data{$id}->@{@sources};
}
then: perl combine.pl > output.csv
can't ... resist ... line noise ...
perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv
or a (somewhat) more sensible one-liner
perl -F, -lane '
if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
elsif ($. > 1) { # the 2nd file, minus the header
$data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
$data{$F[0]}{$F[1]} = $F[2];
}
close ARGV if eof; # reset $. for each new file
} END {
$, = ",";
print $_, $data{$_}->@{@sources} for sort keys %data
' file1.csv file2.csv
or, this is "combine.pl"
#!/usr/bin/env perl
use v5.22;
use Path::Tiny; # convenience module from CPAN
# read the header from the first file
my $file = path("file1.csv");
my @lines = $file->lines;
my $header = $lines[0];
chomp $header;
my @sources = split /,/, $header;
# read the data from the second file
$file = path("file2.csv");
chomp( @lines = $file->lines );
shift @lines; # ignore the header
my %data;
for my $line (@lines) {
my ($id, $source, $value) = split /,/, $line, 3;
if (not exists $data{$id}) {
# initialize the output data for a new id
$data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
}
# and store this value
$data{$id}{$source} = $value;
}
# output the results
say $header;
$, = ",";
for my $id (sort {$a <=> $b} keys %data) {
say $data{$id}->@{@sources};
}
then: perl combine.pl > output.csv
answered Oct 17 '18 at 17:19
community wiki
glenn jackman
add a comment |
add a comment |
... and the inevitable awk
proposal:
awk -F, '
function PRT() {printf "%d", ID # print first field: ID
for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
printf ORS # line feed
}
NR==FNR {print # in first file; print header
for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
next # no further processing of this line
}
FNR > 2 && # skip printout for first ID (in second line of file2)
$1 != ID {PRT() # print if new ID found
split ("", PF) # empty the print array
}
{ID = $1 # retain ID
PF[COL[$2]] = $3 # collect col values into respective column
}
END {PRT() # print last IDs record
}
' file[12] # shells "pattern matching" expands resp. files
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
add a comment |
... and the inevitable awk
proposal:
awk -F, '
function PRT() {printf "%d", ID # print first field: ID
for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
printf ORS # line feed
}
NR==FNR {print # in first file; print header
for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
next # no further processing of this line
}
FNR > 2 && # skip printout for first ID (in second line of file2)
$1 != ID {PRT() # print if new ID found
split ("", PF) # empty the print array
}
{ID = $1 # retain ID
PF[COL[$2]] = $3 # collect col values into respective column
}
END {PRT() # print last IDs record
}
' file[12] # shells "pattern matching" expands resp. files
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
add a comment |
... and the inevitable awk
proposal:
awk -F, '
function PRT() {printf "%d", ID # print first field: ID
for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
printf ORS # line feed
}
NR==FNR {print # in first file; print header
for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
next # no further processing of this line
}
FNR > 2 && # skip printout for first ID (in second line of file2)
$1 != ID {PRT() # print if new ID found
split ("", PF) # empty the print array
}
{ID = $1 # retain ID
PF[COL[$2]] = $3 # collect col values into respective column
}
END {PRT() # print last IDs record
}
' file[12] # shells "pattern matching" expands resp. files
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
... and the inevitable awk
proposal:
awk -F, '
function PRT() {printf "%d", ID # print first field: ID
for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
printf ORS # line feed
}
NR==FNR {print # in first file; print header
for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
next # no further processing of this line
}
FNR > 2 && # skip printout for first ID (in second line of file2)
$1 != ID {PRT() # print if new ID found
split ("", PF) # empty the print array
}
{ID = $1 # retain ID
PF[COL[$2]] = $3 # collect col values into respective column
}
END {PRT() # print last IDs record
}
' file[12] # shells "pattern matching" expands resp. files
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
answered Oct 17 '18 at 21:08
RudiCRudiC
4,2191312
4,2191312
add a comment |
add a comment |
it's a classical unmelt or untidy operation.
Using Miller (http://johnkerl.org/miller/doc) and starting from
id,source,value
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500
you can run
mlr --csv reshape -s source,value then unsparsify input.csv
And have
id,abc,xyz,aaa,bbb,ccc
1,100,200,,,
2,,,300,400,500
add a comment |
it's a classical unmelt or untidy operation.
Using Miller (http://johnkerl.org/miller/doc) and starting from
id,source,value
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500
you can run
mlr --csv reshape -s source,value then unsparsify input.csv
And have
id,abc,xyz,aaa,bbb,ccc
1,100,200,,,
2,,,300,400,500
add a comment |
it's a classical unmelt or untidy operation.
Using Miller (http://johnkerl.org/miller/doc) and starting from
id,source,value
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500
you can run
mlr --csv reshape -s source,value then unsparsify input.csv
And have
id,abc,xyz,aaa,bbb,ccc
1,100,200,,,
2,,,300,400,500
it's a classical unmelt or untidy operation.
Using Miller (http://johnkerl.org/miller/doc) and starting from
id,source,value
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500
you can run
mlr --csv reshape -s source,value then unsparsify input.csv
And have
id,abc,xyz,aaa,bbb,ccc
1,100,200,,,
2,,,300,400,500
answered Dec 31 '18 at 10:11
aborrusoaborruso
1619
1619
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476023%2forganizing-data-by-a-header%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
– maulinglawns
Oct 17 '18 at 12:39
Import data to sql table then export it in your format.
– MichalSv
Oct 17 '18 at 13:29