Unix script to join data from different rows into one single row

We have a large tsv file where the data for a single row are splitted into different rows with new line delimiter.

We need to join them together based on the tab counts.

For eg: If suppose the total tab count for a single row is 995 , the data is split in between as follows,

Row Number  Tab Count

Row 1       660

Row 2       0

Row 3       300         

Row 5       20

Row 6       15

Total       995

N.B The above row split is not consitent and varies.

I want to add the tab counts and once we get 995 as total , need to join the data from different rows into one single row.

We have the below command to join the lines based on new line delimiter.

paste -sd 'n' inputfile > output file

I want to know,

If we can get the tab counts for different rows

Add the tab counts to get sum of 995

Once the sum is achieved , whichever tab counts were added from those rows , need to be joined into one single row.

Please let me know if this can be achieved using shell script.

Thanks.!

edited Dec 11 at 10:47

Jeff Schaller

38.5k1053125

asked Dec 11 at 6:31

v.rajan

It sounds like you are trying to repair a file with newlines in their data fields so that each line has a predefined number of columns. Is the data not properly quoted? If it is, it may still be a perfectly valid CSV file with tabs as field delimiters. CSV is allowed to contain newlines within quoted fields and a CSV parser will be able to handle the data.
– Kusalananda
Dec 11 at 6:43

This is a TSV file we are working on and the data is not quoted and only separated by tabs and terminating with new line
– v.rajan
Dec 11 at 6:46

Sorry, your description doesn't make any sense to me. Have you combined the raw data and the desired result into on file above? And what is the single row then supposed to look like?
– tink
Dec 11 at 7:39

add a comment |

We have a large tsv file where the data for a single row are splitted into different rows with new line delimiter.

We need to join them together based on the tab counts.

For eg: If suppose the total tab count for a single row is 995 , the data is split in between as follows,

Row Number  Tab Count

Row 1       660

Row 2       0

Row 3       300         

Row 5       20

Row 6       15

Total       995

N.B The above row split is not consitent and varies.

I want to add the tab counts and once we get 995 as total , need to join the data from different rows into one single row.

We have the below command to join the lines based on new line delimiter.

paste -sd 'n' inputfile > output file

I want to know,

If we can get the tab counts for different rows

Add the tab counts to get sum of 995

Once the sum is achieved , whichever tab counts were added from those rows , need to be joined into one single row.

Please let me know if this can be achieved using shell script.

Thanks.!

edited Dec 11 at 10:47

Jeff Schaller

38.5k1053125

asked Dec 11 at 6:31

v.rajan

It sounds like you are trying to repair a file with newlines in their data fields so that each line has a predefined number of columns. Is the data not properly quoted? If it is, it may still be a perfectly valid CSV file with tabs as field delimiters. CSV is allowed to contain newlines within quoted fields and a CSV parser will be able to handle the data.
– Kusalananda
Dec 11 at 6:43

This is a TSV file we are working on and the data is not quoted and only separated by tabs and terminating with new line
– v.rajan
Dec 11 at 6:46

Sorry, your description doesn't make any sense to me. Have you combined the raw data and the desired result into on file above? And what is the single row then supposed to look like?
– tink
Dec 11 at 7:39

add a comment |

We have a large tsv file where the data for a single row are splitted into different rows with new line delimiter.

We need to join them together based on the tab counts.

For eg: If suppose the total tab count for a single row is 995 , the data is split in between as follows,

Row Number  Tab Count

Row 1       660

Row 2       0

Row 3       300         

Row 5       20

Row 6       15

Total       995

N.B The above row split is not consitent and varies.

I want to add the tab counts and once we get 995 as total , need to join the data from different rows into one single row.

We have the below command to join the lines based on new line delimiter.

paste -sd 'n' inputfile > output file

I want to know,

If we can get the tab counts for different rows

Add the tab counts to get sum of 995

Once the sum is achieved , whichever tab counts were added from those rows , need to be joined into one single row.

Please let me know if this can be achieved using shell script.

Thanks.!

edited Dec 11 at 10:47

Jeff Schaller

38.5k1053125

asked Dec 11 at 6:31

v.rajan

We have a large tsv file where the data for a single row are splitted into different rows with new line delimiter.

We need to join them together based on the tab counts.

For eg: If suppose the total tab count for a single row is 995 , the data is split in between as follows,

Row Number  Tab Count

Row 1       660

Row 2       0

Row 3       300         

Row 5       20

Row 6       15

Total       995

N.B The above row split is not consitent and varies.

I want to add the tab counts and once we get 995 as total , need to join the data from different rows into one single row.

We have the below command to join the lines based on new line delimiter.

paste -sd 'n' inputfile > output file

I want to know,

If we can get the tab counts for different rows

Add the tab counts to get sum of 995

Once the sum is achieved , whichever tab counts were added from those rows , need to be joined into one single row.

Please let me know if this can be achieved using shell script.

Thanks.!

shell-script text-processing

edited Dec 11 at 10:47

Jeff Schaller

38.5k1053125

asked Dec 11 at 6:31

v.rajan

edited Dec 11 at 10:47

Jeff Schaller

38.5k1053125

asked Dec 11 at 6:31

v.rajan

edited Dec 11 at 10:47

Jeff Schaller

38.5k1053125

edited Dec 11 at 10:47

Jeff Schaller

38.5k1053125

edited Dec 11 at 10:47

Jeff Schaller

38.5k1053125

asked Dec 11 at 6:31

v.rajan

asked Dec 11 at 6:31

v.rajan

asked Dec 11 at 6:31

v.rajan

It sounds like you are trying to repair a file with newlines in their data fields so that each line has a predefined number of columns. Is the data not properly quoted? If it is, it may still be a perfectly valid CSV file with tabs as field delimiters. CSV is allowed to contain newlines within quoted fields and a CSV parser will be able to handle the data.
– Kusalananda
Dec 11 at 6:43

This is a TSV file we are working on and the data is not quoted and only separated by tabs and terminating with new line
– v.rajan
Dec 11 at 6:46

Sorry, your description doesn't make any sense to me. Have you combined the raw data and the desired result into on file above? And what is the single row then supposed to look like?
– tink
Dec 11 at 7:39

add a comment |

It sounds like you are trying to repair a file with newlines in their data fields so that each line has a predefined number of columns. Is the data not properly quoted? If it is, it may still be a perfectly valid CSV file with tabs as field delimiters. CSV is allowed to contain newlines within quoted fields and a CSV parser will be able to handle the data.
– Kusalananda
Dec 11 at 6:43

This is a TSV file we are working on and the data is not quoted and only separated by tabs and terminating with new line
– v.rajan
Dec 11 at 6:46

Sorry, your description doesn't make any sense to me. Have you combined the raw data and the desired result into on file above? And what is the single row then supposed to look like?
– tink
Dec 11 at 7:39

It sounds like you are trying to repair a file with newlines in their data fields so that each line has a predefined number of columns. Is the data not properly quoted? If it is, it may still be a perfectly valid CSV file with tabs as field delimiters. CSV is allowed to contain newlines within quoted fields and a CSV parser will be able to handle the data.
– Kusalananda
Dec 11 at 6:43

This is a TSV file we are working on and the data is not quoted and only separated by tabs and terminating with new line
– v.rajan
Dec 11 at 6:46

Sorry, your description doesn't make any sense to me. Have you combined the raw data and the desired result into on file above? And what is the single row then supposed to look like?
– tink
Dec 11 at 7:39

add a comment |

2 Answers
2

active

oldest

votes

As always with these type of questions, it would be better to correct the process that creates the data in the first place rather than appending a post-processing stage to the process. Having said that, here's what you can do.

$ cat file

1       2

3

1       2       3

1

2

3

$ awk -v w=3 -f script.awk file

1       2       3

1       2       3

1       2       3

This awk script will collect the tab-delimited fields from the input until a preset number of fields have been collected. It will then output these collected fields as its own line before continuing reading from the input.

The number of fields in the output is given by the value of w, which is passed on the command line as shown above. Note that this is the number of fields, not the number of tab characters.

BEGIN { OFS = FS = "t" }



function output_line () {

    # a function that outputs the nf elements in the array a

    # separated by OFS (tab) and terminated by ORS (newline)



    for (j = 1; j < nf; ++j)

        printf("%s%s", a[j], OFS)



    printf("%s%s", a[nf], ORS)

}



{

    # a:  an array of fields that we want to output together

    # nf: the length of that array



    # just add each field to the a array

    for (i = 1; i <= NF; ++i) {

        a[++nf] = $i



        # if enough has been read, output the collected data

        if (nf == w) {

            output_line()

            nf = 0

        }

    }

}



END {

    # output any data remaining in a

    if (nf > 0)

        output_line()

}

This is the same as

tr 't' 'n' <file | paste - - -

for my small example. In your case, you could use the awk script above with -v w=996, or you could type the tr+paste command with 996 dashes.

edited Dec 11 at 8:11

answered Dec 11 at 7:38

Kusalananda

121k16228372

add a comment |

Would continued reading lines until reaching field count help? From another post:

awk -F't' '

        {while (NF<996) {getline X

                         $0 = $0 FS X

                        }

        }

1

' file

answered Dec 11 at 10:20

RudiC

3,9641312

The issue with this may be that you overshoot. It depends on whether the input data always contains a newline at the true record boundaries or not.
– Kusalananda
Dec 11 at 10:26

Absolutely. I take from the question that we have always an exact count match. Additional care needs to be taken if not.
– RudiC
Dec 11 at 10:33

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f487272%2funix-script-to-join-data-from-different-rows-into-one-single-row%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

$ cat file

1       2

3

1       2       3

1

2

3

$ awk -v w=3 -f script.awk file

1       2       3

1       2       3

1       2       3

The number of fields in the output is given by the value of w, which is passed on the command line as shown above. Note that this is the number of fields, not the number of tab characters.

BEGIN { OFS = FS = "t" }



function output_line () {

    # a function that outputs the nf elements in the array a

    # separated by OFS (tab) and terminated by ORS (newline)



    for (j = 1; j < nf; ++j)

        printf("%s%s", a[j], OFS)



    printf("%s%s", a[nf], ORS)

}



{

    # a:  an array of fields that we want to output together

    # nf: the length of that array



    # just add each field to the a array

    for (i = 1; i <= NF; ++i) {

        a[++nf] = $i



        # if enough has been read, output the collected data

        if (nf == w) {

            output_line()

            nf = 0

        }

    }

}



END {

    # output any data remaining in a

    if (nf > 0)

        output_line()

}

This is the same as

tr 't' 'n' <file | paste - - -

for my small example. In your case, you could use the awk script above with -v w=996, or you could type the tr+paste command with 996 dashes.

edited Dec 11 at 8:11

answered Dec 11 at 7:38

Kusalananda

121k16228372

add a comment |

$ cat file

1       2

3

1       2       3

1

2

3

$ awk -v w=3 -f script.awk file

1       2       3

1       2       3

1       2       3

The number of fields in the output is given by the value of w, which is passed on the command line as shown above. Note that this is the number of fields, not the number of tab characters.

BEGIN { OFS = FS = "t" }



function output_line () {

    # a function that outputs the nf elements in the array a

    # separated by OFS (tab) and terminated by ORS (newline)



    for (j = 1; j < nf; ++j)

        printf("%s%s", a[j], OFS)



    printf("%s%s", a[nf], ORS)

}



{

    # a:  an array of fields that we want to output together

    # nf: the length of that array



    # just add each field to the a array

    for (i = 1; i <= NF; ++i) {

        a[++nf] = $i



        # if enough has been read, output the collected data

        if (nf == w) {

            output_line()

            nf = 0

        }

    }

}



END {

    # output any data remaining in a

    if (nf > 0)

        output_line()

}

This is the same as

tr 't' 'n' <file | paste - - -

for my small example. In your case, you could use the awk script above with -v w=996, or you could type the tr+paste command with 996 dashes.

edited Dec 11 at 8:11

answered Dec 11 at 7:38

Kusalananda

121k16228372

add a comment |

$ cat file

1       2

3

1       2       3

1

2

3

$ awk -v w=3 -f script.awk file

1       2       3

1       2       3

1       2       3

The number of fields in the output is given by the value of w, which is passed on the command line as shown above. Note that this is the number of fields, not the number of tab characters.

BEGIN { OFS = FS = "t" }



function output_line () {

    # a function that outputs the nf elements in the array a

    # separated by OFS (tab) and terminated by ORS (newline)



    for (j = 1; j < nf; ++j)

        printf("%s%s", a[j], OFS)



    printf("%s%s", a[nf], ORS)

}



{

    # a:  an array of fields that we want to output together

    # nf: the length of that array



    # just add each field to the a array

    for (i = 1; i <= NF; ++i) {

        a[++nf] = $i



        # if enough has been read, output the collected data

        if (nf == w) {

            output_line()

            nf = 0

        }

    }

}



END {

    # output any data remaining in a

    if (nf > 0)

        output_line()

}

This is the same as

tr 't' 'n' <file | paste - - -

for my small example. In your case, you could use the awk script above with -v w=996, or you could type the tr+paste command with 996 dashes.

edited Dec 11 at 8:11

answered Dec 11 at 7:38

Kusalananda

121k16228372

$ cat file

1       2

3

1       2       3

1

2

3

$ awk -v w=3 -f script.awk file

1       2       3

1       2       3

1       2       3

The number of fields in the output is given by the value of w, which is passed on the command line as shown above. Note that this is the number of fields, not the number of tab characters.

BEGIN { OFS = FS = "t" }



function output_line () {

    # a function that outputs the nf elements in the array a

    # separated by OFS (tab) and terminated by ORS (newline)



    for (j = 1; j < nf; ++j)

        printf("%s%s", a[j], OFS)



    printf("%s%s", a[nf], ORS)

}



{

    # a:  an array of fields that we want to output together

    # nf: the length of that array



    # just add each field to the a array

    for (i = 1; i <= NF; ++i) {

        a[++nf] = $i



        # if enough has been read, output the collected data

        if (nf == w) {

            output_line()

            nf = 0

        }

    }

}



END {

    # output any data remaining in a

    if (nf > 0)

        output_line()

}

This is the same as

tr 't' 'n' <file | paste - - -

for my small example. In your case, you could use the awk script above with -v w=996, or you could type the tr+paste command with 996 dashes.

edited Dec 11 at 8:11

answered Dec 11 at 7:38

Kusalananda

121k16228372

edited Dec 11 at 8:11

answered Dec 11 at 7:38

Kusalananda

121k16228372

answered Dec 11 at 7:38

Kusalananda

121k16228372

answered Dec 11 at 7:38

Kusalananda

121k16228372

add a comment |

Would continued reading lines until reaching field count help? From another post:

awk -F't' '

        {while (NF<996) {getline X

                         $0 = $0 FS X

                        }

        }

1

' file

answered Dec 11 at 10:20

RudiC

3,9641312

The issue with this may be that you overshoot. It depends on whether the input data always contains a newline at the true record boundaries or not.
– Kusalananda
Dec 11 at 10:26

Absolutely. I take from the question that we have always an exact count match. Additional care needs to be taken if not.
– RudiC
Dec 11 at 10:33

add a comment |

Would continued reading lines until reaching field count help? From another post:

awk -F't' '

        {while (NF<996) {getline X

                         $0 = $0 FS X

                        }

        }

1

' file

answered Dec 11 at 10:20

RudiC

3,9641312

The issue with this may be that you overshoot. It depends on whether the input data always contains a newline at the true record boundaries or not.
– Kusalananda
Dec 11 at 10:26

Absolutely. I take from the question that we have always an exact count match. Additional care needs to be taken if not.
– RudiC
Dec 11 at 10:33

add a comment |

Would continued reading lines until reaching field count help? From another post:

awk -F't' '

        {while (NF<996) {getline X

                         $0 = $0 FS X

                        }

        }

1

' file

answered Dec 11 at 10:20

RudiC

3,9641312

Would continued reading lines until reaching field count help? From another post:

awk -F't' '

        {while (NF<996) {getline X

                         $0 = $0 FS X

                        }

        }

1

' file

answered Dec 11 at 10:20

RudiC

3,9641312

answered Dec 11 at 10:20

RudiC

3,9641312

answered Dec 11 at 10:20

RudiC

3,9641312

answered Dec 11 at 10:20

RudiC

3,9641312

The issue with this may be that you overshoot. It depends on whether the input data always contains a newline at the true record boundaries or not.
– Kusalananda
Dec 11 at 10:26

Absolutely. I take from the question that we have always an exact count match. Additional care needs to be taken if not.
– RudiC
Dec 11 at 10:33

add a comment |

The issue with this may be that you overshoot. It depends on whether the input data always contains a newline at the true record boundaries or not.
– Kusalananda
Dec 11 at 10:26

Absolutely. I take from the question that we have always an exact count match. Additional care needs to be taken if not.
– RudiC
Dec 11 at 10:33

The issue with this may be that you overshoot. It depends on whether the input data always contains a newline at the true record boundaries or not.
– Kusalananda
Dec 11 at 10:26

Absolutely. I take from the question that we have always an exact count match. Additional care needs to be taken if not.
– RudiC
Dec 11 at 10:33

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk