Speed up script that determines if all columns in a row are the same or not

I need to speed up a script that essentially determines whether or not all the "columns" for each row are the same, then writes a new file containing either one of the identical elements, or a "no_match". The file is comma delimited, consists of around 15,000 rows, and contains varying numbers of "columns".

For example:

1-69

4-59,4-59,4-59,4-61,4-61,4-61

1-46,1-46

4-59,4-59,4-59,4-61,4-61,4-61

6-1,6-1

5-51,5-51

4-59,4-59

Writes a new file:

1-69

no_match

1-46

no_match

6-1

5-51

4-59

Deleting the second and fourth rows because they contain non-identical columns.

Here is my far from elegant script:

#!/bin/bash



ind=$1 #file in

num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'

echo "alleles" > same_alleles.txt #new file to write to



#loop over every line of 'file in'

for (( i =2; i <= "$num"; i++));do

    #take first column of row being looped over (string to check match of other columns with)

    match=`awk "FNR=="$i" {print}" "$ind"|cut -d, -f1`

    #counts how many matches there are in the looped row

    match_num=`awk "FNR=="$i" {print}" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`

    #counts number of commas in each looped row

    comma_num=`awk "FNR=="$i" {print}" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`

    #number of columns in each row

    tot_num=$((comma_num + 1))

    #writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise

    if [ "$tot_num" == "$match_num" ]; then

            echo $match >> same_alleles.txt

    else

            echo "no_match" >> same_alleles.txt

    fi

done



#END

Currently, the script takes around 11 min to do all ~15,000 rows. I'm not really sure how to speed this up (I'm honestly surprised I could even get it to work). Any time knocked off would be fantastic. Below is a smaller excerpt of 100 rows that could be used:

allele

4-39

1-46,1-46,1-46

4-39

4-4,4-4,4-4,4-4

3-23,3-23,3-23

3-21,3-21

4-34,4-34

3-33

4-4,4-4,4-4

4-59,4-59

3-23,3-23,3-23

1-45

1-46,1-46

3-23,3-23,3-23

4-61

1-8

3-7

4-4

4-59,4-59,4-59

1-18,1-18

3-21,3-21

3-23,3-23,3-23

3-23,3-23,3-23

3-30,3-30-3

4-39,4-39

4-61

2-70

4-38-2,4-38-2

1-69,1-69,1-69,1-69,1-69

1-69

4-59,4-59,4-59,4-61,4-61,4-61

1-46,1-46

4-59,4-59,4-59,4-61,4-61,4-61

6-1,6-1

5-51,5-51

4-59,4-59

1-18

3-7

1-69

4-30-4

4-39

1-69

1-69

4-39

3-23,3-23,3-23

4-39

2-5

3-30-3

4-59,4-59,4-59

3-21,3-21

4-59,4-59

3-9

4-59,4-59,4-59

4-31,4-31

1-46,1-46

1-46,1-46,1-46

5-51,5-51

3-48

4-31,4-31

3-7

4-61

4-59,4-59,4-59,4-61,4-61,4-61

4-38-2,4-38-2

3-21,3-21

1-69,1-69,1-69

3-23,3-23,3-23

4-59,4-59

3-48

3-48

1-46,1-46

3-23,3-23,3-23

3-30-3,3-30-3

1-46,1-46,1-46

3-64

3-73,3-73

4-4

1-18

3-7

1-46,1-46

1-3

4-61

2-70

4-59,4-59

5-51,5-51

3-49,3-49

4-4,4-4,4-4

4-31,4-31

1-69

1-69,1-69,1-69

4-39

3-21,3-21

3-33

3-9

3-48

4-59,4-59

4-59,4-59

4-39,4-39

3-21,3-21

1-18

My script takes ~ 7 sec to complete this.

edited Dec 16 at 4:29

Rui F Ribeiro

38.9k1479129

asked Aug 6 at 19:11

Johnny

553

add a comment |

For example:

1-69

4-59,4-59,4-59,4-61,4-61,4-61

1-46,1-46

4-59,4-59,4-59,4-61,4-61,4-61

6-1,6-1

5-51,5-51

4-59,4-59

Writes a new file:

1-69

no_match

1-46

no_match

6-1

5-51

4-59

Deleting the second and fourth rows because they contain non-identical columns.

Here is my far from elegant script:

#!/bin/bash



ind=$1 #file in

num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'

echo "alleles" > same_alleles.txt #new file to write to



#loop over every line of 'file in'

for (( i =2; i <= "$num"; i++));do

    #take first column of row being looped over (string to check match of other columns with)

    match=`awk "FNR=="$i" {print}" "$ind"|cut -d, -f1`

    #counts how many matches there are in the looped row

    match_num=`awk "FNR=="$i" {print}" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`

    #counts number of commas in each looped row

    comma_num=`awk "FNR=="$i" {print}" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`

    #number of columns in each row

    tot_num=$((comma_num + 1))

    #writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise

    if [ "$tot_num" == "$match_num" ]; then

            echo $match >> same_alleles.txt

    else

            echo "no_match" >> same_alleles.txt

    fi

done



#END

allele

4-39

1-46,1-46,1-46

4-39

4-4,4-4,4-4,4-4

3-23,3-23,3-23

3-21,3-21

4-34,4-34

3-33

4-4,4-4,4-4

4-59,4-59

3-23,3-23,3-23

1-45

1-46,1-46

3-23,3-23,3-23

4-61

1-8

3-7

4-4

4-59,4-59,4-59

1-18,1-18

3-21,3-21

3-23,3-23,3-23

3-23,3-23,3-23

3-30,3-30-3

4-39,4-39

4-61

2-70

4-38-2,4-38-2

1-69,1-69,1-69,1-69,1-69

1-69

4-59,4-59,4-59,4-61,4-61,4-61

1-46,1-46

4-59,4-59,4-59,4-61,4-61,4-61

6-1,6-1

5-51,5-51

4-59,4-59

1-18

3-7

1-69

4-30-4

4-39

1-69

1-69

4-39

3-23,3-23,3-23

4-39

2-5

3-30-3

4-59,4-59,4-59

3-21,3-21

4-59,4-59

3-9

4-59,4-59,4-59

4-31,4-31

1-46,1-46

1-46,1-46,1-46

5-51,5-51

3-48

4-31,4-31

3-7

4-61

4-59,4-59,4-59,4-61,4-61,4-61

4-38-2,4-38-2

3-21,3-21

1-69,1-69,1-69

3-23,3-23,3-23

4-59,4-59

3-48

3-48

1-46,1-46

3-23,3-23,3-23

3-30-3,3-30-3

1-46,1-46,1-46

3-64

3-73,3-73

4-4

1-18

3-7

1-46,1-46

1-3

4-61

2-70

4-59,4-59

5-51,5-51

3-49,3-49

4-4,4-4,4-4

4-31,4-31

1-69

1-69,1-69,1-69

4-39

3-21,3-21

3-33

3-9

3-48

4-59,4-59

4-59,4-59

4-39,4-39

3-21,3-21

1-18

My script takes ~ 7 sec to complete this.

edited Dec 16 at 4:29

Rui F Ribeiro

38.9k1479129

asked Aug 6 at 19:11

Johnny

553

add a comment |

For example:

1-69

4-59,4-59,4-59,4-61,4-61,4-61

1-46,1-46

4-59,4-59,4-59,4-61,4-61,4-61

6-1,6-1

5-51,5-51

4-59,4-59

Writes a new file:

1-69

no_match

1-46

no_match

6-1

5-51

4-59

Deleting the second and fourth rows because they contain non-identical columns.

Here is my far from elegant script:

#!/bin/bash



ind=$1 #file in

num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'

echo "alleles" > same_alleles.txt #new file to write to



#loop over every line of 'file in'

for (( i =2; i <= "$num"; i++));do

    #take first column of row being looped over (string to check match of other columns with)

    match=`awk "FNR=="$i" {print}" "$ind"|cut -d, -f1`

    #counts how many matches there are in the looped row

    match_num=`awk "FNR=="$i" {print}" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`

    #counts number of commas in each looped row

    comma_num=`awk "FNR=="$i" {print}" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`

    #number of columns in each row

    tot_num=$((comma_num + 1))

    #writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise

    if [ "$tot_num" == "$match_num" ]; then

            echo $match >> same_alleles.txt

    else

            echo "no_match" >> same_alleles.txt

    fi

done



#END

allele

4-39

1-46,1-46,1-46

4-39

4-4,4-4,4-4,4-4

3-23,3-23,3-23

3-21,3-21

4-34,4-34

3-33

4-4,4-4,4-4

4-59,4-59

3-23,3-23,3-23

1-45

1-46,1-46

3-23,3-23,3-23

4-61

1-8

3-7

4-4

4-59,4-59,4-59

1-18,1-18

3-21,3-21

3-23,3-23,3-23

3-23,3-23,3-23

3-30,3-30-3

4-39,4-39

4-61

2-70

4-38-2,4-38-2

1-69,1-69,1-69,1-69,1-69

1-69

4-59,4-59,4-59,4-61,4-61,4-61

1-46,1-46

4-59,4-59,4-59,4-61,4-61,4-61

6-1,6-1

5-51,5-51

4-59,4-59

1-18

3-7

1-69

4-30-4

4-39

1-69

1-69

4-39

3-23,3-23,3-23

4-39

2-5

3-30-3

4-59,4-59,4-59

3-21,3-21

4-59,4-59

3-9

4-59,4-59,4-59

4-31,4-31

1-46,1-46

1-46,1-46,1-46

5-51,5-51

3-48

4-31,4-31

3-7

4-61

4-59,4-59,4-59,4-61,4-61,4-61

4-38-2,4-38-2

3-21,3-21

1-69,1-69,1-69

3-23,3-23,3-23

4-59,4-59

3-48

3-48

1-46,1-46

3-23,3-23,3-23

3-30-3,3-30-3

1-46,1-46,1-46

3-64

3-73,3-73

4-4

1-18

3-7

1-46,1-46

1-3

4-61

2-70

4-59,4-59

5-51,5-51

3-49,3-49

4-4,4-4,4-4

4-31,4-31

1-69

1-69,1-69,1-69

4-39

3-21,3-21

3-33

3-9

3-48

4-59,4-59

4-59,4-59

4-39,4-39

3-21,3-21

1-18

My script takes ~ 7 sec to complete this.

edited Dec 16 at 4:29

Rui F Ribeiro

38.9k1479129

asked Aug 6 at 19:11

Johnny

553

For example:

1-69

4-59,4-59,4-59,4-61,4-61,4-61

1-46,1-46

4-59,4-59,4-59,4-61,4-61,4-61

6-1,6-1

5-51,5-51

4-59,4-59

Writes a new file:

1-69

no_match

1-46

no_match

6-1

5-51

4-59

Deleting the second and fourth rows because they contain non-identical columns.

Here is my far from elegant script:

#!/bin/bash



ind=$1 #file in

num=`wc -l "$ind"|cut -d' ' -f1` #number of lines in 'file in'

echo "alleles" > same_alleles.txt #new file to write to



#loop over every line of 'file in'

for (( i =2; i <= "$num"; i++));do

    #take first column of row being looped over (string to check match of other columns with)

    match=`awk "FNR=="$i" {print}" "$ind"|cut -d, -f1`

    #counts how many matches there are in the looped row

    match_num=`awk "FNR=="$i" {print}" "$ind"|grep -o "$match"|wc -l|cut -d' ' -f1`

    #counts number of commas in each looped row

    comma_num=`awk "FNR=="$i" {print}" "$ind"|grep -o ","|wc -l|cut -d' ' -f1`

    #number of columns in each row

    tot_num=$((comma_num + 1))

    #writes one of the identical elements if all contents of row are identical, or writes "no_match" otherwise

    if [ "$tot_num" == "$match_num" ]; then

            echo $match >> same_alleles.txt

    else

            echo "no_match" >> same_alleles.txt

    fi

done



#END

allele

4-39

1-46,1-46,1-46

4-39

4-4,4-4,4-4,4-4

3-23,3-23,3-23

3-21,3-21

4-34,4-34

3-33

4-4,4-4,4-4

4-59,4-59

3-23,3-23,3-23

1-45

1-46,1-46

3-23,3-23,3-23

4-61

1-8

3-7

4-4

4-59,4-59,4-59

1-18,1-18

3-21,3-21

3-23,3-23,3-23

3-23,3-23,3-23

3-30,3-30-3

4-39,4-39

4-61

2-70

4-38-2,4-38-2

1-69,1-69,1-69,1-69,1-69

1-69

4-59,4-59,4-59,4-61,4-61,4-61

1-46,1-46

4-59,4-59,4-59,4-61,4-61,4-61

6-1,6-1

5-51,5-51

4-59,4-59

1-18

3-7

1-69

4-30-4

4-39

1-69

1-69

4-39

3-23,3-23,3-23

4-39

2-5

3-30-3

4-59,4-59,4-59

3-21,3-21

4-59,4-59

3-9

4-59,4-59,4-59

4-31,4-31

1-46,1-46

1-46,1-46,1-46

5-51,5-51

3-48

4-31,4-31

3-7

4-61

4-59,4-59,4-59,4-61,4-61,4-61

4-38-2,4-38-2

3-21,3-21

1-69,1-69,1-69

3-23,3-23,3-23

4-59,4-59

3-48

3-48

1-46,1-46

3-23,3-23,3-23

3-30-3,3-30-3

1-46,1-46,1-46

3-64

3-73,3-73

4-4

1-18

3-7

1-46,1-46

1-3

4-61

2-70

4-59,4-59

5-51,5-51

3-49,3-49

4-4,4-4,4-4

4-31,4-31

1-69

1-69,1-69,1-69

4-39

3-21,3-21

3-33

3-9

3-48

4-59,4-59

4-59,4-59

4-39,4-39

3-21,3-21

1-18

My script takes ~ 7 sec to complete this.

shell-script text-processing scripting arithmetic

edited Dec 16 at 4:29

Rui F Ribeiro

38.9k1479129

asked Aug 6 at 19:11

Johnny

553

edited Dec 16 at 4:29

Rui F Ribeiro

38.9k1479129

asked Aug 6 at 19:11

Johnny

553

edited Dec 16 at 4:29

Rui F Ribeiro

38.9k1479129

edited Dec 16 at 4:29

Rui F Ribeiro

38.9k1479129

edited Dec 16 at 4:29

Rui F Ribeiro

38.9k1479129

asked Aug 6 at 19:11

Johnny

553

asked Aug 6 at 19:11

Johnny

553

asked Aug 6 at 19:11

Johnny

553

add a comment |

4 Answers
4

active

oldest

votes

$ awk -F, '{ for (i=2; i<=NF; ++i) if ($i != $1) { print "no_match"; next } print $1 }' file

1-69

no_match

1-46

no_match

6-1

5-51

4-59

I'm sorry, but I did not even look at your code, there was too much going on. When you find yourself calling awk three times in the body of a loop on the same data, you will have to look at other ways to do it more efficiently. Also, if you involve awk, you don't need grep and cut as awk would easily be able to do their tasks (which are not needed in this case though).

The awk script above reads a comma-delimited line at a time and compares each field with the first field. If any of the tests fails, the string no_match is printed and the script continues with the next line. If the loop finishes (without finding a mismatch), the first field is printed.

As a script:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    for (i=2; i<=NF; ++i)

        if ($i != $1) {

            print "no_match"

            next

        }



    print $1

}

FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.

NF is the number of fields in the current record ("columns on the line").

$i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

Why is using a shell loop to process text considered bad practice?

DRY variation:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    output = $1



    for (i=2; i<=NF; ++i)

        if ($i != output) {

            output = "no_match"

            break

        }



    print output

}

edited Aug 6 at 19:56

answered Aug 6 at 19:15

Kusalananda

121k16229372

add a comment |

Awk is a full programming language. You already use it. But don't use it just for simple tasks with multiple invocations per line, use it for the whole task. Use the field delimiter in awk, don't use cut. Do the full processing in awk.

awk -F',' '

{ 

  eq=1; 

  for (i = 2; i <= NF; i++)

    if ($1 != $i)

      eq=0;

  print eq ? $1 : "no_match";

}

' $1

answered Aug 6 at 19:21

RalfFriedl

5,3153925

add a comment |

With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:

perl -MList::MoreUtils=distinct -F, -lne '

  print( (distinct @F) > 1 ? "no_match" : $F[0])

' example 

1-69

no_match

1-46

no_match

6-1

5-51

4-59

edited Aug 6 at 20:34

answered Aug 6 at 20:29

steeldriver

34.2k35083

add a comment |

You could do this using the sed editor also, like as shown:

sed -e '

    s/^([^,]*)(,1)*$/1/;t

    s/.*/NOMATCH/

' input.csv

Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.

Explanation:

This is what goes on in my head when seeing this pbm:

Think of the comma-separated fields as stones of different colors. And picture them whether they can be arranged in a row as a repetition of the first stone, with a comma prefixing them.

Something like:

STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line

Now in terms of regex terminology, it becomes:

^ (STONEA) (,1) (,1) (,1) ... all the way to end of line

^ (STONEA) (,1)* $

Output:

1-69

NOMATCH

1-46

NOMATCH

6-1

5-51

4-59

edited Aug 7 at 3:08

answered Aug 7 at 2:55

Rakesh Sharma

63513

consider c for command two rather than s - should be nominally quicker still. smart, though.
– mikeserv
Aug 7 at 3:31

1

@mikeserv Thank you mike for your gracious words.I feel delighted.
– Rakesh Sharma
Aug 8 at 5:08

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f460887%2fspeed-up-script-that-determines-if-all-columns-in-a-row-are-the-same-or-not%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

$ awk -F, '{ for (i=2; i<=NF; ++i) if ($i != $1) { print "no_match"; next } print $1 }' file

1-69

no_match

1-46

no_match

6-1

5-51

4-59

As a script:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    for (i=2; i<=NF; ++i)

        if ($i != $1) {

            print "no_match"

            next

        }



    print $1

}

FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.

NF is the number of fields in the current record ("columns on the line").

$i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

Why is using a shell loop to process text considered bad practice?

DRY variation:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    output = $1



    for (i=2; i<=NF; ++i)

        if ($i != output) {

            output = "no_match"

            break

        }



    print output

}

edited Aug 6 at 19:56

answered Aug 6 at 19:15

Kusalananda

121k16229372

add a comment |

$ awk -F, '{ for (i=2; i<=NF; ++i) if ($i != $1) { print "no_match"; next } print $1 }' file

1-69

no_match

1-46

no_match

6-1

5-51

4-59

As a script:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    for (i=2; i<=NF; ++i)

        if ($i != $1) {

            print "no_match"

            next

        }



    print $1

}

FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.

NF is the number of fields in the current record ("columns on the line").

$i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

Why is using a shell loop to process text considered bad practice?

DRY variation:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    output = $1



    for (i=2; i<=NF; ++i)

        if ($i != output) {

            output = "no_match"

            break

        }



    print output

}

edited Aug 6 at 19:56

answered Aug 6 at 19:15

Kusalananda

121k16229372

add a comment |

$ awk -F, '{ for (i=2; i<=NF; ++i) if ($i != $1) { print "no_match"; next } print $1 }' file

1-69

no_match

1-46

no_match

6-1

5-51

4-59

As a script:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    for (i=2; i<=NF; ++i)

        if ($i != $1) {

            print "no_match"

            next

        }



    print $1

}

FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.

NF is the number of fields in the current record ("columns on the line").

$i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

Why is using a shell loop to process text considered bad practice?

DRY variation:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    output = $1



    for (i=2; i<=NF; ++i)

        if ($i != output) {

            output = "no_match"

            break

        }



    print output

}

edited Aug 6 at 19:56

answered Aug 6 at 19:15

Kusalananda

121k16229372

$ awk -F, '{ for (i=2; i<=NF; ++i) if ($i != $1) { print "no_match"; next } print $1 }' file

1-69

no_match

1-46

no_match

6-1

5-51

4-59

As a script:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    for (i=2; i<=NF; ++i)

        if ($i != $1) {

            print "no_match"

            next

        }



    print $1

}

FS is the input field separator, also settable with the -F option on the command line. awk will split each line on this character to create the fields.

NF is the number of fields in the current record ("columns on the line").

$i refers the the i:th field in the current record, where i may be a variable or a constant (as in $1).

Why is using a shell loop to process text considered bad practice?

DRY variation:

#!/usr/bin/awk -f



BEGIN { FS = "," }



{

    output = $1



    for (i=2; i<=NF; ++i)

        if ($i != output) {

            output = "no_match"

            break

        }



    print output

}

edited Aug 6 at 19:56

answered Aug 6 at 19:15

Kusalananda

121k16229372

edited Aug 6 at 19:56

answered Aug 6 at 19:15

Kusalananda

121k16229372

answered Aug 6 at 19:15

Kusalananda

121k16229372

answered Aug 6 at 19:15

Kusalananda

121k16229372

add a comment |

awk -F',' '

{ 

  eq=1; 

  for (i = 2; i <= NF; i++)

    if ($1 != $i)

      eq=0;

  print eq ? $1 : "no_match";

}

' $1

answered Aug 6 at 19:21

RalfFriedl

5,3153925

add a comment |

awk -F',' '

{ 

  eq=1; 

  for (i = 2; i <= NF; i++)

    if ($1 != $i)

      eq=0;

  print eq ? $1 : "no_match";

}

' $1

answered Aug 6 at 19:21

RalfFriedl

5,3153925

add a comment |

awk -F',' '

{ 

  eq=1; 

  for (i = 2; i <= NF; i++)

    if ($1 != $i)

      eq=0;

  print eq ? $1 : "no_match";

}

' $1

answered Aug 6 at 19:21

RalfFriedl

5,3153925

awk -F',' '

{ 

  eq=1; 

  for (i = 2; i <= NF; i++)

    if ($1 != $i)

      eq=0;

  print eq ? $1 : "no_match";

}

' $1

answered Aug 6 at 19:21

RalfFriedl

5,3153925

answered Aug 6 at 19:21

RalfFriedl

5,3153925

answered Aug 6 at 19:21

RalfFriedl

5,3153925

answered Aug 6 at 19:21

RalfFriedl

5,3153925

add a comment |

With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:

perl -MList::MoreUtils=distinct -F, -lne '

  print( (distinct @F) > 1 ? "no_match" : $F[0])

' example 

1-69

no_match

1-46

no_match

6-1

5-51

4-59

edited Aug 6 at 20:34

answered Aug 6 at 20:29

steeldriver

34.2k35083

add a comment |

With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:

perl -MList::MoreUtils=distinct -F, -lne '

  print( (distinct @F) > 1 ? "no_match" : $F[0])

' example 

1-69

no_match

1-46

no_match

6-1

5-51

4-59

edited Aug 6 at 20:34

answered Aug 6 at 20:29

steeldriver

34.2k35083

add a comment |

With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:

perl -MList::MoreUtils=distinct -F, -lne '

  print( (distinct @F) > 1 ? "no_match" : $F[0])

' example 

1-69

no_match

1-46

no_match

6-1

5-51

4-59

edited Aug 6 at 20:34

answered Aug 6 at 20:29

steeldriver

34.2k35083

With perl List::MoreUtils, by evaluating the distinct / uniq elements in scalar context:

perl -MList::MoreUtils=distinct -F, -lne '

  print( (distinct @F) > 1 ? "no_match" : $F[0])

' example 

1-69

no_match

1-46

no_match

6-1

5-51

4-59

edited Aug 6 at 20:34

answered Aug 6 at 20:29

steeldriver

34.2k35083

edited Aug 6 at 20:34

answered Aug 6 at 20:29

steeldriver

34.2k35083

answered Aug 6 at 20:29

steeldriver

34.2k35083

answered Aug 6 at 20:29

steeldriver

34.2k35083

add a comment |

You could do this using the sed editor also, like as shown:

sed -e '

    s/^([^,]*)(,1)*$/1/;t

    s/.*/NOMATCH/

' input.csv

Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.

Explanation:

Something like:

STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line

Now in terms of regex terminology, it becomes:

^ (STONEA) (,1) (,1) (,1) ... all the way to end of line

^ (STONEA) (,1)* $

Output:

1-69

NOMATCH

1-46

NOMATCH

6-1

5-51

4-59

edited Aug 7 at 3:08

answered Aug 7 at 2:55

Rakesh Sharma

63513

consider c for command two rather than s - should be nominally quicker still. smart, though.
– mikeserv
Aug 7 at 3:31

1

@mikeserv Thank you mike for your gracious words.I feel delighted.
– Rakesh Sharma
Aug 8 at 5:08

add a comment |

You could do this using the sed editor also, like as shown:

sed -e '

    s/^([^,]*)(,1)*$/1/;t

    s/.*/NOMATCH/

' input.csv

Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.

Explanation:

Something like:

STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line

Now in terms of regex terminology, it becomes:

^ (STONEA) (,1) (,1) (,1) ... all the way to end of line

^ (STONEA) (,1)* $

Output:

1-69

NOMATCH

1-46

NOMATCH

6-1

5-51

4-59

edited Aug 7 at 3:08

answered Aug 7 at 2:55

Rakesh Sharma

63513

consider c for command two rather than s - should be nominally quicker still. smart, though.
– mikeserv
Aug 7 at 3:31

1

@mikeserv Thank you mike for your gracious words.I feel delighted.
– Rakesh Sharma
Aug 8 at 5:08

add a comment |

You could do this using the sed editor also, like as shown:

sed -e '

    s/^([^,]*)(,1)*$/1/;t

    s/.*/NOMATCH/

' input.csv

Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.

Explanation:

Something like:

STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line

Now in terms of regex terminology, it becomes:

^ (STONEA) (,1) (,1) (,1) ... all the way to end of line

^ (STONEA) (,1)* $

Output:

1-69

NOMATCH

1-46

NOMATCH

6-1

5-51

4-59

edited Aug 7 at 3:08

answered Aug 7 at 2:55

Rakesh Sharma

63513

You could do this using the sed editor also, like as shown:

sed -e '

    s/^([^,]*)(,1)*$/1/;t

    s/.*/NOMATCH/

' input.csv

Here we rely on the regex to multiplicate itself and reach the end of line. If it is able to do so, then terminate with the first field otherwise flash NOMATCH.

Explanation:

Something like:

STONEA ,STONEA ,STONEA ,STONEA ... all the way to end of line

Now in terms of regex terminology, it becomes:

^ (STONEA) (,1) (,1) (,1) ... all the way to end of line

^ (STONEA) (,1)* $

Output:

1-69

NOMATCH

1-46

NOMATCH

6-1

5-51

4-59

edited Aug 7 at 3:08

answered Aug 7 at 2:55

Rakesh Sharma

63513

edited Aug 7 at 3:08

answered Aug 7 at 2:55

Rakesh Sharma

63513

answered Aug 7 at 2:55

Rakesh Sharma

63513

answered Aug 7 at 2:55

Rakesh Sharma

63513

consider c for command two rather than s - should be nominally quicker still. smart, though.
– mikeserv
Aug 7 at 3:31

1

@mikeserv Thank you mike for your gracious words.I feel delighted.
– Rakesh Sharma
Aug 8 at 5:08

add a comment |

consider c for command two rather than s - should be nominally quicker still. smart, though.
– mikeserv
Aug 7 at 3:31

1

@mikeserv Thank you mike for your gracious words.I feel delighted.
– Rakesh Sharma
Aug 8 at 5:08

consider c for command two rather than s - should be nominally quicker still. smart, though.
– mikeserv
Aug 7 at 3:31

@mikeserv Thank you mike for your gracious words.I feel delighted.
– Rakesh Sharma
Aug 8 at 5:08

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk