Organizing data by a header












0














I have two csv files.



File 1 contains header.
File 2 contains data



file 1 format : file1.csv



id,abc,xyz,aaa,bbb,ccc


file 2 format : file2.csv



id,source,value                
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500


Now, I have to match the data in the source column in file2.csv with header in file1.csv and output should be like below



id,abc,xyz,aaa,bbb,ccc   
1,100,200,null,null,null
2,null,null,300,400,500









share|improve this question




















  • 2




    Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
    – maulinglawns
    Oct 17 '18 at 12:39












  • Import data to sql table then export it in your format.
    – MichalSv
    Oct 17 '18 at 13:29


















0














I have two csv files.



File 1 contains header.
File 2 contains data



file 1 format : file1.csv



id,abc,xyz,aaa,bbb,ccc


file 2 format : file2.csv



id,source,value                
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500


Now, I have to match the data in the source column in file2.csv with header in file1.csv and output should be like below



id,abc,xyz,aaa,bbb,ccc   
1,100,200,null,null,null
2,null,null,300,400,500









share|improve this question




















  • 2




    Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
    – maulinglawns
    Oct 17 '18 at 12:39












  • Import data to sql table then export it in your format.
    – MichalSv
    Oct 17 '18 at 13:29
















0












0








0







I have two csv files.



File 1 contains header.
File 2 contains data



file 1 format : file1.csv



id,abc,xyz,aaa,bbb,ccc


file 2 format : file2.csv



id,source,value                
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500


Now, I have to match the data in the source column in file2.csv with header in file1.csv and output should be like below



id,abc,xyz,aaa,bbb,ccc   
1,100,200,null,null,null
2,null,null,300,400,500









share|improve this question















I have two csv files.



File 1 contains header.
File 2 contains data



file 1 format : file1.csv



id,abc,xyz,aaa,bbb,ccc


file 2 format : file2.csv



id,source,value                
1,abc,100
1,xyz,200
2,aaa,300
2,bbb,400
2,ccc,500


Now, I have to match the data in the source column in file2.csv with header in file1.csv and output should be like below



id,abc,xyz,aaa,bbb,ccc   
1,100,200,null,null,null
2,null,null,300,400,500






awk sed csv text-formatting gawk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 17 '18 at 13:28







user314914

















asked Oct 17 '18 at 12:10









prathimaprathima

62




62








  • 2




    Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
    – maulinglawns
    Oct 17 '18 at 12:39












  • Import data to sql table then export it in your format.
    – MichalSv
    Oct 17 '18 at 13:29
















  • 2




    Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
    – maulinglawns
    Oct 17 '18 at 12:39












  • Import data to sql table then export it in your format.
    – MichalSv
    Oct 17 '18 at 13:29










2




2




Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
– maulinglawns
Oct 17 '18 at 12:39






Could you please expande the question with: What is your approach this far? What have you tried, and where are you stuck?
– maulinglawns
Oct 17 '18 at 12:39














Import data to sql table then export it in your format.
– MichalSv
Oct 17 '18 at 13:29






Import data to sql table then export it in your format.
– MichalSv
Oct 17 '18 at 13:29












3 Answers
3






active

oldest

votes


















0














can't ... resist ... line noise ...



perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv


or a (somewhat) more sensible one-liner



perl -F, -lane '
if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
elsif ($. > 1) { # the 2nd file, minus the header
$data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
$data{$F[0]}{$F[1]} = $F[2];
}
close ARGV if eof; # reset $. for each new file
} END {
$, = ",";
print $_, $data{$_}->@{@sources} for sort keys %data
' file1.csv file2.csv


or, this is "combine.pl"



#!/usr/bin/env perl
use v5.22;
use Path::Tiny; # convenience module from CPAN

# read the header from the first file
my $file = path("file1.csv");
my @lines = $file->lines;
my $header = $lines[0];
chomp $header;
my @sources = split /,/, $header;

# read the data from the second file
$file = path("file2.csv");
chomp( @lines = $file->lines );
shift @lines; # ignore the header
my %data;
for my $line (@lines) {
my ($id, $source, $value) = split /,/, $line, 3;
if (not exists $data{$id}) {
# initialize the output data for a new id
$data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
}
# and store this value
$data{$id}{$source} = $value;
}

# output the results
say $header;
$, = ",";
for my $id (sort {$a <=> $b} keys %data) {
say $data{$id}->@{@sources};
}


then: perl combine.pl > output.csv






share|improve this answer































    0














    ... and the inevitable awk proposal:



    awk -F, '
    function PRT() {printf "%d", ID # print first field: ID
    for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
    printf ORS # line feed
    }

    NR==FNR {print # in first file; print header
    for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
    next # no further processing of this line
    }
    FNR > 2 && # skip printout for first ID (in second line of file2)
    $1 != ID {PRT() # print if new ID found
    split ("", PF) # empty the print array
    }

    {ID = $1 # retain ID
    PF[COL[$2]] = $3 # collect col values into respective column
    }

    END {PRT() # print last IDs record
    }
    ' file[12] # shells "pattern matching" expands resp. files
    id,abc,xyz,aaa,bbb,ccc
    1,100,200,null,null,null
    2,null,null,300,400,500





    share|improve this answer





























      0














      it's a classical unmelt or untidy operation.



      Using Miller (http://johnkerl.org/miller/doc) and starting from




      id,source,value
      1,abc,100
      1,xyz,200
      2,aaa,300
      2,bbb,400
      2,ccc,500



      you can run



      mlr --csv reshape -s source,value then unsparsify input.csv


      And have




      id,abc,xyz,aaa,bbb,ccc
      1,100,200,,,
      2,,,300,400,500






      share|improve this answer





















        Your Answer








        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "106"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476023%2forganizing-data-by-a-header%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        0














        can't ... resist ... line noise ...



        perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv


        or a (somewhat) more sensible one-liner



        perl -F, -lane '
        if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
        elsif ($. > 1) { # the 2nd file, minus the header
        $data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
        $data{$F[0]}{$F[1]} = $F[2];
        }
        close ARGV if eof; # reset $. for each new file
        } END {
        $, = ",";
        print $_, $data{$_}->@{@sources} for sort keys %data
        ' file1.csv file2.csv


        or, this is "combine.pl"



        #!/usr/bin/env perl
        use v5.22;
        use Path::Tiny; # convenience module from CPAN

        # read the header from the first file
        my $file = path("file1.csv");
        my @lines = $file->lines;
        my $header = $lines[0];
        chomp $header;
        my @sources = split /,/, $header;

        # read the data from the second file
        $file = path("file2.csv");
        chomp( @lines = $file->lines );
        shift @lines; # ignore the header
        my %data;
        for my $line (@lines) {
        my ($id, $source, $value) = split /,/, $line, 3;
        if (not exists $data{$id}) {
        # initialize the output data for a new id
        $data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
        }
        # and store this value
        $data{$id}{$source} = $value;
        }

        # output the results
        say $header;
        $, = ",";
        for my $id (sort {$a <=> $b} keys %data) {
        say $data{$id}->@{@sources};
        }


        then: perl combine.pl > output.csv






        share|improve this answer




























          0














          can't ... resist ... line noise ...



          perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv


          or a (somewhat) more sensible one-liner



          perl -F, -lane '
          if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
          elsif ($. > 1) { # the 2nd file, minus the header
          $data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
          $data{$F[0]}{$F[1]} = $F[2];
          }
          close ARGV if eof; # reset $. for each new file
          } END {
          $, = ",";
          print $_, $data{$_}->@{@sources} for sort keys %data
          ' file1.csv file2.csv


          or, this is "combine.pl"



          #!/usr/bin/env perl
          use v5.22;
          use Path::Tiny; # convenience module from CPAN

          # read the header from the first file
          my $file = path("file1.csv");
          my @lines = $file->lines;
          my $header = $lines[0];
          chomp $header;
          my @sources = split /,/, $header;

          # read the data from the second file
          $file = path("file2.csv");
          chomp( @lines = $file->lines );
          shift @lines; # ignore the header
          my %data;
          for my $line (@lines) {
          my ($id, $source, $value) = split /,/, $line, 3;
          if (not exists $data{$id}) {
          # initialize the output data for a new id
          $data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
          }
          # and store this value
          $data{$id}{$source} = $value;
          }

          # output the results
          say $header;
          $, = ",";
          for my $id (sort {$a <=> $b} keys %data) {
          say $data{$id}->@{@sources};
          }


          then: perl combine.pl > output.csv






          share|improve this answer


























            0












            0








            0






            can't ... resist ... line noise ...



            perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv


            or a (somewhat) more sensible one-liner



            perl -F, -lane '
            if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
            elsif ($. > 1) { # the 2nd file, minus the header
            $data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
            $data{$F[0]}{$F[1]} = $F[2];
            }
            close ARGV if eof; # reset $. for each new file
            } END {
            $, = ",";
            print $_, $data{$_}->@{@sources} for sort keys %data
            ' file1.csv file2.csv


            or, this is "combine.pl"



            #!/usr/bin/env perl
            use v5.22;
            use Path::Tiny; # convenience module from CPAN

            # read the header from the first file
            my $file = path("file1.csv");
            my @lines = $file->lines;
            my $header = $lines[0];
            chomp $header;
            my @sources = split /,/, $header;

            # read the data from the second file
            $file = path("file2.csv");
            chomp( @lines = $file->lines );
            shift @lines; # ignore the header
            my %data;
            for my $line (@lines) {
            my ($id, $source, $value) = split /,/, $line, 3;
            if (not exists $data{$id}) {
            # initialize the output data for a new id
            $data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
            }
            # and store this value
            $data{$id}{$source} = $value;
            }

            # output the results
            say $header;
            $, = ",";
            for my $id (sort {$a <=> $b} keys %data) {
            say $data{$id}->@{@sources};
            }


            then: perl combine.pl > output.csv






            share|improve this answer














            can't ... resist ... line noise ...



            perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv


            or a (somewhat) more sensible one-liner



            perl -F, -lane '
            if (@ARGV) {print; @sources = @F[1..$#F]} # the first file
            elsif ($. > 1) { # the 2nd file, minus the header
            $data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
            $data{$F[0]}{$F[1]} = $F[2];
            }
            close ARGV if eof; # reset $. for each new file
            } END {
            $, = ",";
            print $_, $data{$_}->@{@sources} for sort keys %data
            ' file1.csv file2.csv


            or, this is "combine.pl"



            #!/usr/bin/env perl
            use v5.22;
            use Path::Tiny; # convenience module from CPAN

            # read the header from the first file
            my $file = path("file1.csv");
            my @lines = $file->lines;
            my $header = $lines[0];
            chomp $header;
            my @sources = split /,/, $header;

            # read the data from the second file
            $file = path("file2.csv");
            chomp( @lines = $file->lines );
            shift @lines; # ignore the header
            my %data;
            for my $line (@lines) {
            my ($id, $source, $value) = split /,/, $line, 3;
            if (not exists $data{$id}) {
            # initialize the output data for a new id
            $data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
            }
            # and store this value
            $data{$id}{$source} = $value;
            }

            # output the results
            say $header;
            $, = ",";
            for my $id (sort {$a <=> $b} keys %data) {
            say $data{$id}->@{@sources};
            }


            then: perl combine.pl > output.csv







            share|improve this answer














            share|improve this answer



            share|improve this answer








            answered Oct 17 '18 at 17:19


























            community wiki





            glenn jackman


























                0














                ... and the inevitable awk proposal:



                awk -F, '
                function PRT() {printf "%d", ID # print first field: ID
                for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
                printf ORS # line feed
                }

                NR==FNR {print # in first file; print header
                for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
                next # no further processing of this line
                }
                FNR > 2 && # skip printout for first ID (in second line of file2)
                $1 != ID {PRT() # print if new ID found
                split ("", PF) # empty the print array
                }

                {ID = $1 # retain ID
                PF[COL[$2]] = $3 # collect col values into respective column
                }

                END {PRT() # print last IDs record
                }
                ' file[12] # shells "pattern matching" expands resp. files
                id,abc,xyz,aaa,bbb,ccc
                1,100,200,null,null,null
                2,null,null,300,400,500





                share|improve this answer


























                  0














                  ... and the inevitable awk proposal:



                  awk -F, '
                  function PRT() {printf "%d", ID # print first field: ID
                  for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
                  printf ORS # line feed
                  }

                  NR==FNR {print # in first file; print header
                  for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
                  next # no further processing of this line
                  }
                  FNR > 2 && # skip printout for first ID (in second line of file2)
                  $1 != ID {PRT() # print if new ID found
                  split ("", PF) # empty the print array
                  }

                  {ID = $1 # retain ID
                  PF[COL[$2]] = $3 # collect col values into respective column
                  }

                  END {PRT() # print last IDs record
                  }
                  ' file[12] # shells "pattern matching" expands resp. files
                  id,abc,xyz,aaa,bbb,ccc
                  1,100,200,null,null,null
                  2,null,null,300,400,500





                  share|improve this answer
























                    0












                    0








                    0






                    ... and the inevitable awk proposal:



                    awk -F, '
                    function PRT() {printf "%d", ID # print first field: ID
                    for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
                    printf ORS # line feed
                    }

                    NR==FNR {print # in first file; print header
                    for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
                    next # no further processing of this line
                    }
                    FNR > 2 && # skip printout for first ID (in second line of file2)
                    $1 != ID {PRT() # print if new ID found
                    split ("", PF) # empty the print array
                    }

                    {ID = $1 # retain ID
                    PF[COL[$2]] = $3 # collect col values into respective column
                    }

                    END {PRT() # print last IDs record
                    }
                    ' file[12] # shells "pattern matching" expands resp. files
                    id,abc,xyz,aaa,bbb,ccc
                    1,100,200,null,null,null
                    2,null,null,300,400,500





                    share|improve this answer












                    ... and the inevitable awk proposal:



                    awk -F, '
                    function PRT() {printf "%d", ID # print first field: ID
                    for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
                    printf ORS # line feed
                    }

                    NR==FNR {print # in first file; print header
                    for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n # split header and find column No. per header field
                    next # no further processing of this line
                    }
                    FNR > 2 && # skip printout for first ID (in second line of file2)
                    $1 != ID {PRT() # print if new ID found
                    split ("", PF) # empty the print array
                    }

                    {ID = $1 # retain ID
                    PF[COL[$2]] = $3 # collect col values into respective column
                    }

                    END {PRT() # print last IDs record
                    }
                    ' file[12] # shells "pattern matching" expands resp. files
                    id,abc,xyz,aaa,bbb,ccc
                    1,100,200,null,null,null
                    2,null,null,300,400,500






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Oct 17 '18 at 21:08









                    RudiCRudiC

                    4,2191312




                    4,2191312























                        0














                        it's a classical unmelt or untidy operation.



                        Using Miller (http://johnkerl.org/miller/doc) and starting from




                        id,source,value
                        1,abc,100
                        1,xyz,200
                        2,aaa,300
                        2,bbb,400
                        2,ccc,500



                        you can run



                        mlr --csv reshape -s source,value then unsparsify input.csv


                        And have




                        id,abc,xyz,aaa,bbb,ccc
                        1,100,200,,,
                        2,,,300,400,500






                        share|improve this answer


























                          0














                          it's a classical unmelt or untidy operation.



                          Using Miller (http://johnkerl.org/miller/doc) and starting from




                          id,source,value
                          1,abc,100
                          1,xyz,200
                          2,aaa,300
                          2,bbb,400
                          2,ccc,500



                          you can run



                          mlr --csv reshape -s source,value then unsparsify input.csv


                          And have




                          id,abc,xyz,aaa,bbb,ccc
                          1,100,200,,,
                          2,,,300,400,500






                          share|improve this answer
























                            0












                            0








                            0






                            it's a classical unmelt or untidy operation.



                            Using Miller (http://johnkerl.org/miller/doc) and starting from




                            id,source,value
                            1,abc,100
                            1,xyz,200
                            2,aaa,300
                            2,bbb,400
                            2,ccc,500



                            you can run



                            mlr --csv reshape -s source,value then unsparsify input.csv


                            And have




                            id,abc,xyz,aaa,bbb,ccc
                            1,100,200,,,
                            2,,,300,400,500






                            share|improve this answer












                            it's a classical unmelt or untidy operation.



                            Using Miller (http://johnkerl.org/miller/doc) and starting from




                            id,source,value
                            1,abc,100
                            1,xyz,200
                            2,aaa,300
                            2,bbb,400
                            2,ccc,500



                            you can run



                            mlr --csv reshape -s source,value then unsparsify input.csv


                            And have




                            id,abc,xyz,aaa,bbb,ccc
                            1,100,200,,,
                            2,,,300,400,500







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Dec 31 '18 at 10:11









                            aborrusoaborruso

                            1619




                            1619






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f476023%2forganizing-data-by-a-header%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Morgemoulin

                                Scott Moir

                                Souastre