Handling comma in string values in a CSV file [duplicate]

This question already has an answer here:

Is there a robust command line tool for processing csv files?

17 answers

I have a comma-separated file which has numeric and string columns. String columns are quoted and can have comma in between the quotes. How do I identify the columns with FS =","?

Sample records

"prabhat,kumar",19,2000,"bangalore,India"

In AWK it should be

$1 = "prabhat,kumar"

$2 = 19

$3 = "2000"

$4 = "bangalore,india"

Setting FS="," is creating the problem.

Input is:

"prabhat,kumar",19,2000,"bangalore,India","ABC,DEF","GHI",123,"KLM","NOP,QRS"

"prabhat,kumar",19,2000,"bangalore,India","ABC,DEF","GHI",123,"KLM","NOP,QRS"

Output should be:

"prabhat,kumar"|19|2000|"bangalore,India"|"ABC,DEF"|"GHI"|123|"KLM"|"NOP,QRS"

"prabhat,kumar"|19|2000|"bangalore,India"|"ABC,DEF"|"GHI"|123|"KLM"|"NOP,QRS"

Code I am trying:

awk -F"," '{for(i=1;i<=NF;i++){if(i%NF==0){ORS="n"} {if($i ~ /^"/ || $i ~ /"$/) {a=a OFS $i;j++;{if(j%2==0){sub(/^,/,X,a); print a;j=0;a=""}}} else {print $i}}} {ORS="|"}}' ORS="|" OFS=, p.txt

edited Aug 17 '14 at 12:18

Peter Mortensen

88758

asked Aug 11 '14 at 11:53

prabhat diwaker

16113

marked as duplicate by Gilles, Michael Homer, jasonwryan, Ramesh, cuonglm Aug 12 '14 at 3:29

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

5

And that is why you should use a proper library that parses CSV (and XML and HTML etc.) rather than trying to write your own. Take a look at CSVfix
– HBruijn
Aug 11 '14 at 12:04

add a comment |

This question already has an answer here:

Is there a robust command line tool for processing csv files?

17 answers

I have a comma-separated file which has numeric and string columns. String columns are quoted and can have comma in between the quotes. How do I identify the columns with FS =","?

Sample records

"prabhat,kumar",19,2000,"bangalore,India"

In AWK it should be

$1 = "prabhat,kumar"

$2 = 19

$3 = "2000"

$4 = "bangalore,india"

Setting FS="," is creating the problem.

Input is:

"prabhat,kumar",19,2000,"bangalore,India","ABC,DEF","GHI",123,"KLM","NOP,QRS"

"prabhat,kumar",19,2000,"bangalore,India","ABC,DEF","GHI",123,"KLM","NOP,QRS"

Output should be:

"prabhat,kumar"|19|2000|"bangalore,India"|"ABC,DEF"|"GHI"|123|"KLM"|"NOP,QRS"

"prabhat,kumar"|19|2000|"bangalore,India"|"ABC,DEF"|"GHI"|123|"KLM"|"NOP,QRS"

Code I am trying:

awk -F"," '{for(i=1;i<=NF;i++){if(i%NF==0){ORS="n"} {if($i ~ /^"/ || $i ~ /"$/) {a=a OFS $i;j++;{if(j%2==0){sub(/^,/,X,a); print a;j=0;a=""}}} else {print $i}}} {ORS="|"}}' ORS="|" OFS=, p.txt

edited Aug 17 '14 at 12:18

Peter Mortensen

88758

asked Aug 11 '14 at 11:53

prabhat diwaker

16113

marked as duplicate by Gilles, Michael Homer, jasonwryan, Ramesh, cuonglm Aug 12 '14 at 3:29

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

5

And that is why you should use a proper library that parses CSV (and XML and HTML etc.) rather than trying to write your own. Take a look at CSVfix
– HBruijn
Aug 11 '14 at 12:04

add a comment |

This question already has an answer here:

Is there a robust command line tool for processing csv files?

17 answers

I have a comma-separated file which has numeric and string columns. String columns are quoted and can have comma in between the quotes. How do I identify the columns with FS =","?

Sample records

"prabhat,kumar",19,2000,"bangalore,India"

In AWK it should be

$1 = "prabhat,kumar"

$2 = 19

$3 = "2000"

$4 = "bangalore,india"

Setting FS="," is creating the problem.

Input is:

"prabhat,kumar",19,2000,"bangalore,India","ABC,DEF","GHI",123,"KLM","NOP,QRS"

"prabhat,kumar",19,2000,"bangalore,India","ABC,DEF","GHI",123,"KLM","NOP,QRS"

Output should be:

"prabhat,kumar"|19|2000|"bangalore,India"|"ABC,DEF"|"GHI"|123|"KLM"|"NOP,QRS"

"prabhat,kumar"|19|2000|"bangalore,India"|"ABC,DEF"|"GHI"|123|"KLM"|"NOP,QRS"

Code I am trying:

awk -F"," '{for(i=1;i<=NF;i++){if(i%NF==0){ORS="n"} {if($i ~ /^"/ || $i ~ /"$/) {a=a OFS $i;j++;{if(j%2==0){sub(/^,/,X,a); print a;j=0;a=""}}} else {print $i}}} {ORS="|"}}' ORS="|" OFS=, p.txt

edited Aug 17 '14 at 12:18

Peter Mortensen

88758

asked Aug 11 '14 at 11:53

prabhat diwaker

16113

This question already has an answer here:

Is there a robust command line tool for processing csv files?

17 answers

I have a comma-separated file which has numeric and string columns. String columns are quoted and can have comma in between the quotes. How do I identify the columns with FS =","?

Sample records

"prabhat,kumar",19,2000,"bangalore,India"

In AWK it should be

$1 = "prabhat,kumar"

$2 = 19

$3 = "2000"

$4 = "bangalore,india"

Setting FS="," is creating the problem.

Input is:

"prabhat,kumar",19,2000,"bangalore,India","ABC,DEF","GHI",123,"KLM","NOP,QRS"

"prabhat,kumar",19,2000,"bangalore,India","ABC,DEF","GHI",123,"KLM","NOP,QRS"

Output should be:

"prabhat,kumar"|19|2000|"bangalore,India"|"ABC,DEF"|"GHI"|123|"KLM"|"NOP,QRS"

"prabhat,kumar"|19|2000|"bangalore,India"|"ABC,DEF"|"GHI"|123|"KLM"|"NOP,QRS"

Code I am trying:

awk -F"," '{for(i=1;i<=NF;i++){if(i%NF==0){ORS="n"} {if($i ~ /^"/ || $i ~ /"$/) {a=a OFS $i;j++;{if(j%2==0){sub(/^,/,X,a); print a;j=0;a=""}}} else {print $i}}} {ORS="|"}}' ORS="|" OFS=, p.txt

This question already has an answer here:

Is there a robust command line tool for processing csv files?

17 answers

text-processing awk csv

edited Aug 17 '14 at 12:18

Peter Mortensen

88758

asked Aug 11 '14 at 11:53

prabhat diwaker

16113

edited Aug 17 '14 at 12:18

Peter Mortensen

88758

asked Aug 11 '14 at 11:53

prabhat diwaker

16113

edited Aug 17 '14 at 12:18

Peter Mortensen

88758

edited Aug 17 '14 at 12:18

Peter Mortensen

88758

edited Aug 17 '14 at 12:18

Peter Mortensen

88758

asked Aug 11 '14 at 11:53

prabhat diwaker

16113

asked Aug 11 '14 at 11:53

prabhat diwaker

16113

asked Aug 11 '14 at 11:53

prabhat diwaker

16113

marked as duplicate by Gilles, Michael Homer, jasonwryan, Ramesh, cuonglm Aug 12 '14 at 3:29

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by Gilles, Michael Homer, jasonwryan, Ramesh, cuonglm Aug 12 '14 at 3:29

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

5

And that is why you should use a proper library that parses CSV (and XML and HTML etc.) rather than trying to write your own. Take a look at CSVfix
– HBruijn
Aug 11 '14 at 12:04

add a comment |

5

And that is why you should use a proper library that parses CSV (and XML and HTML etc.) rather than trying to write your own. Take a look at CSVfix
– HBruijn
Aug 11 '14 at 12:04

And that is why you should use a proper library that parses CSV (and XML and HTML etc.) rather than trying to write your own. Take a look at CSVfix
– HBruijn
Aug 11 '14 at 12:04

add a comment |

6 Answers
6

active

oldest

votes

First of all, you should use a proper CSV parser. For example, in Perl, you can use Text::CSV:

Install cpanm (if you use Perl, you'll thank me later)
```
$ sudo apt-get install cpanminus
```
If you're not on a Debian based system, you should be able to install it using your distribution's package manager.

Install the Text::CSV module
```
$ sudo cpanm Text::CSV
```

Parse your file

$ perl -MText::CSV -le '

    $csv = Text::CSV->new({binary=>1}); 

    while ($row = $csv->getline(STDIN)){

    print "1:$row->[0], 2:$row->[1], 3:$row->[2], 4:$row->[3]"}' < file.csv 

1:prabhat,kumar, 2:19, 3:2000, 4:bangalore,India

As you can see above, the 1st field is $row->[0], the second $row->[1] etc.

That was the correct way. A simpler, but dirty, hack would be to replace any quoted commas with another character. Then, use awk normally and finally, switch them back to commas again. I am using ### here but you can use anything you are sure will never be in one of your fields.

$ sed -r 's/("[^",]+),([^",]+")/1###2/g' file.csv | 

    awk -F, '{print $1,$3}' | sed 's/###/,/g'

"prabhat,kumar" 2000

edited Nov 10 '14 at 14:49

answered Aug 11 '14 at 13:09

terdon♦

129k32253428

Your script has a problem if the quoted field doesn't contain a comma.echo '"prabhat kumar",19,2000,"bangalore,India"'|sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 16:56

First field in the above should be "prabhat kumar" - sorry
– Doug O'Neal
Aug 11 '14 at 17:03

I cut the wrong line: echo '"prabhat kumar",19,2000,"bangalore,India"'| sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 17:04

@DougO'Neal yeah, I see the problem, thanks. I don't have time to fix it right now but anyway, as I said, that's just a dirty hack and a proper parser should be used instead.
– terdon♦
Aug 11 '14 at 17:07

OP asked for awk
– Andrew
Aug 15 '14 at 19:37

|
show 1 more comment

If you have GNU awk:

$ awk -vFPAT='[^,]*|"[^"]*"' '{ gsub("^"|"$","",$1); gsub("^"|"$","",$4); print $1 $4} '

prabhat,kumarbangalore,India

The output format is a little ugly as I've only printed $1 and $4 next to each other - I'm sure you can alter it to your taste.

If you need to retain the double quotes around the fields, remove both gsub(); functions.

Explanation:

Normally, awk separates fields in a record by the content of the FS (Field Separator) variable, which defaults to any whitespace (tab, space and newline). The separator tells awk where a record ends. In a csv file, a record ends at a comma (passed to awk as -vFS=,), but of course, in examples similar to yours, this is too simple and it breaks.

Alternatively, the FPAT (Field Pattern) defines the record within awk. Instead of telling awk where the record ends, you create a definition that encompasses a whole record. In the complex csv of your example, this is [^,]*|"[^"]*"

This breaks down as follows:-

Characters that are not commas ([^,]) as many times as possible (*). Everything between two commas is a field.

Or (|)

A single double-quote (") followed by no double-quotes ([^"]) as many times as possible (*) followed by a single double-quote ("). Everything within double-quotes (including a comma) counts as one field.

edited Apr 29 '18 at 22:32

Jeff Schaller

39.2k1054125

answered Aug 11 '14 at 13:11

garethTheRed

24.1k36180

add a comment |

Ruby is handy for CSV parsing:

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|")' file

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

Note that there are no quotes in the outputs. This is because none of the fields contain the field separator. If you need quotes, you can forcibly quote all fields (even integers):

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|",:force_quotes=>true)' file

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

answered Aug 11 '14 at 18:27

glenn jackman

50.5k570108

OP asked for awk
– Andrew
Aug 15 '14 at 19:38

add a comment |

You can use perl instead:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' |

  perl -F',(?![[:alpha:]])' -anle 'print "$F[0] $F[3]"'

"prabhat,kumar" "bangalore,India"

answered Aug 11 '14 at 12:07

cuonglm

102k23202302

1

Clever! However, it will fail for cases where there is a non letter after the comma. For example prabhat, (kumar) or a letter after a non-quoted comma: "prabhat,kumar",A 19,
– terdon♦
Aug 11 '14 at 12:53

@terdon: Yes, of course, as it's desired. Your answer using Text:CSV is better for this job.
– cuonglm
Aug 11 '14 at 14:06

add a comment |

This worked for me:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' | 

  awk -F, '{print $1,$2,$3,$4,$5,$6}'| 

    awk -F" '{print $2,$3,$4}'|awk -F  '{print $1","$2,$3,$4,$5","$6}'`

edited Aug 11 '14 at 14:07

terdon♦

129k32253428

answered Aug 11 '14 at 13:26

Andrew

93557

1

please add an explanation to your answer
– Networker
Aug 11 '14 at 13:54

1

This will only work if you know the format before hand. What if it is the 3d field that's quoted and not the 1st?
– terdon♦
Aug 11 '14 at 14:08

OP gave a specific format, no need nor expectation for the format to change. "Last,First",Age,Year,"City,Country". Bangalore is a city in Karnataka, so state is undefined in the data. First and fourth field are text, let's expect quotes. Second and third field is integer and would/should never be quoted. Understand the data and avoid over complicating the problem.
– Andrew
Aug 15 '14 at 19:34

add a comment |

Probably a qsv would be more your speed?

sed 's/.*/,&,/;:t

s/,"/"/;s/",/"/;s/,([^"]*),/"1/;tt

' <<DATA

"prabhat,kumar",19,2000,"bangalore,India"

DATA

OUTPUT

"prabhat,kumar"19"2000"bangalore,India"

edited Aug 11 '14 at 22:17

answered Aug 11 '14 at 21:46

mikeserv

45.3k568154

add a comment |

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

First of all, you should use a proper CSV parser. For example, in Perl, you can use Text::CSV:

Install cpanm (if you use Perl, you'll thank me later)
```
$ sudo apt-get install cpanminus
```
If you're not on a Debian based system, you should be able to install it using your distribution's package manager.

Install the Text::CSV module
```
$ sudo cpanm Text::CSV
```

Parse your file

$ perl -MText::CSV -le '

    $csv = Text::CSV->new({binary=>1}); 

    while ($row = $csv->getline(STDIN)){

    print "1:$row->[0], 2:$row->[1], 3:$row->[2], 4:$row->[3]"}' < file.csv 

1:prabhat,kumar, 2:19, 3:2000, 4:bangalore,India

As you can see above, the 1st field is $row->[0], the second $row->[1] etc.

$ sed -r 's/("[^",]+),([^",]+")/1###2/g' file.csv | 

    awk -F, '{print $1,$3}' | sed 's/###/,/g'

"prabhat,kumar" 2000

edited Nov 10 '14 at 14:49

answered Aug 11 '14 at 13:09

terdon♦

129k32253428

Your script has a problem if the quoted field doesn't contain a comma.echo '"prabhat kumar",19,2000,"bangalore,India"'|sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 16:56

First field in the above should be "prabhat kumar" - sorry
– Doug O'Neal
Aug 11 '14 at 17:03

I cut the wrong line: echo '"prabhat kumar",19,2000,"bangalore,India"'| sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 17:04

@DougO'Neal yeah, I see the problem, thanks. I don't have time to fix it right now but anyway, as I said, that's just a dirty hack and a proper parser should be used instead.
– terdon♦
Aug 11 '14 at 17:07

OP asked for awk
– Andrew
Aug 15 '14 at 19:37

|
show 1 more comment

First of all, you should use a proper CSV parser. For example, in Perl, you can use Text::CSV:

Install cpanm (if you use Perl, you'll thank me later)
```
$ sudo apt-get install cpanminus
```
If you're not on a Debian based system, you should be able to install it using your distribution's package manager.

Install the Text::CSV module
```
$ sudo cpanm Text::CSV
```

Parse your file

$ perl -MText::CSV -le '

    $csv = Text::CSV->new({binary=>1}); 

    while ($row = $csv->getline(STDIN)){

    print "1:$row->[0], 2:$row->[1], 3:$row->[2], 4:$row->[3]"}' < file.csv 

1:prabhat,kumar, 2:19, 3:2000, 4:bangalore,India

As you can see above, the 1st field is $row->[0], the second $row->[1] etc.

$ sed -r 's/("[^",]+),([^",]+")/1###2/g' file.csv | 

    awk -F, '{print $1,$3}' | sed 's/###/,/g'

"prabhat,kumar" 2000

edited Nov 10 '14 at 14:49

answered Aug 11 '14 at 13:09

terdon♦

129k32253428

Your script has a problem if the quoted field doesn't contain a comma.echo '"prabhat kumar",19,2000,"bangalore,India"'|sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 16:56

First field in the above should be "prabhat kumar" - sorry
– Doug O'Neal
Aug 11 '14 at 17:03

I cut the wrong line: echo '"prabhat kumar",19,2000,"bangalore,India"'| sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 17:04

@DougO'Neal yeah, I see the problem, thanks. I don't have time to fix it right now but anyway, as I said, that's just a dirty hack and a proper parser should be used instead.
– terdon♦
Aug 11 '14 at 17:07

OP asked for awk
– Andrew
Aug 15 '14 at 19:37

|
show 1 more comment

First of all, you should use a proper CSV parser. For example, in Perl, you can use Text::CSV:

Install cpanm (if you use Perl, you'll thank me later)
```
$ sudo apt-get install cpanminus
```
If you're not on a Debian based system, you should be able to install it using your distribution's package manager.

Install the Text::CSV module
```
$ sudo cpanm Text::CSV
```

Parse your file

$ perl -MText::CSV -le '

    $csv = Text::CSV->new({binary=>1}); 

    while ($row = $csv->getline(STDIN)){

    print "1:$row->[0], 2:$row->[1], 3:$row->[2], 4:$row->[3]"}' < file.csv 

1:prabhat,kumar, 2:19, 3:2000, 4:bangalore,India

As you can see above, the 1st field is $row->[0], the second $row->[1] etc.

$ sed -r 's/("[^",]+),([^",]+")/1###2/g' file.csv | 

    awk -F, '{print $1,$3}' | sed 's/###/,/g'

"prabhat,kumar" 2000

edited Nov 10 '14 at 14:49

answered Aug 11 '14 at 13:09

terdon♦

129k32253428

First of all, you should use a proper CSV parser. For example, in Perl, you can use Text::CSV:

Install cpanm (if you use Perl, you'll thank me later)
```
$ sudo apt-get install cpanminus
```
If you're not on a Debian based system, you should be able to install it using your distribution's package manager.

Install the Text::CSV module
```
$ sudo cpanm Text::CSV
```

Parse your file

$ perl -MText::CSV -le '

    $csv = Text::CSV->new({binary=>1}); 

    while ($row = $csv->getline(STDIN)){

    print "1:$row->[0], 2:$row->[1], 3:$row->[2], 4:$row->[3]"}' < file.csv 

1:prabhat,kumar, 2:19, 3:2000, 4:bangalore,India

As you can see above, the 1st field is $row->[0], the second $row->[1] etc.

$ sed -r 's/("[^",]+),([^",]+")/1###2/g' file.csv | 

    awk -F, '{print $1,$3}' | sed 's/###/,/g'

"prabhat,kumar" 2000

edited Nov 10 '14 at 14:49

answered Aug 11 '14 at 13:09

terdon♦

129k32253428

edited Nov 10 '14 at 14:49

answered Aug 11 '14 at 13:09

terdon♦

129k32253428

answered Aug 11 '14 at 13:09

terdon♦

129k32253428

answered Aug 11 '14 at 13:09

terdon♦

129k32253428

Your script has a problem if the quoted field doesn't contain a comma.echo '"prabhat kumar",19,2000,"bangalore,India"'|sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 16:56

First field in the above should be "prabhat kumar" - sorry
– Doug O'Neal
Aug 11 '14 at 17:03

I cut the wrong line: echo '"prabhat kumar",19,2000,"bangalore,India"'| sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 17:04

@DougO'Neal yeah, I see the problem, thanks. I don't have time to fix it right now but anyway, as I said, that's just a dirty hack and a proper parser should be used instead.
– terdon♦
Aug 11 '14 at 17:07

OP asked for awk
– Andrew
Aug 15 '14 at 19:37

|
show 1 more comment

Your script has a problem if the quoted field doesn't contain a comma.echo '"prabhat kumar",19,2000,"bangalore,India"'|sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 16:56

First field in the above should be "prabhat kumar" - sorry
– Doug O'Neal
Aug 11 '14 at 17:03

I cut the wrong line: echo '"prabhat kumar",19,2000,"bangalore,India"'| sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 17:04

@DougO'Neal yeah, I see the problem, thanks. I don't have time to fix it right now but anyway, as I said, that's just a dirty hack and a proper parser should be used instead.
– terdon♦
Aug 11 '14 at 17:07

OP asked for awk
– Andrew
Aug 15 '14 at 19:37

Your script has a problem if the quoted field doesn't contain a comma.echo '"prabhat kumar",19,2000,"bangalore,India"'|sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 16:56

First field in the above should be "prabhat kumar" - sorry
– Doug O'Neal
Aug 11 '14 at 17:03

I cut the wrong line: echo '"prabhat kumar",19,2000,"bangalore,India"'| sed -r 's/("[^"]+),([^"]+")/1###2/g' prints "prabhat kumar",19###2000,"bangalore,India"
– Doug O'Neal
Aug 11 '14 at 17:04

@DougO'Neal yeah, I see the problem, thanks. I don't have time to fix it right now but anyway, as I said, that's just a dirty hack and a proper parser should be used instead.
– terdon♦
Aug 11 '14 at 17:07

OP asked for awk
– Andrew
Aug 15 '14 at 19:37

|
show 1 more comment

If you have GNU awk:

$ awk -vFPAT='[^,]*|"[^"]*"' '{ gsub("^"|"$","",$1); gsub("^"|"$","",$4); print $1 $4} '

prabhat,kumarbangalore,India

The output format is a little ugly as I've only printed $1 and $4 next to each other - I'm sure you can alter it to your taste.

If you need to retain the double quotes around the fields, remove both gsub(); functions.

Explanation:

This breaks down as follows:-

Characters that are not commas ([^,]) as many times as possible (*). Everything between two commas is a field.

Or (|)

A single double-quote (") followed by no double-quotes ([^"]) as many times as possible (*) followed by a single double-quote ("). Everything within double-quotes (including a comma) counts as one field.

edited Apr 29 '18 at 22:32

Jeff Schaller

39.2k1054125

answered Aug 11 '14 at 13:11

garethTheRed

24.1k36180

add a comment |

If you have GNU awk:

$ awk -vFPAT='[^,]*|"[^"]*"' '{ gsub("^"|"$","",$1); gsub("^"|"$","",$4); print $1 $4} '

prabhat,kumarbangalore,India

The output format is a little ugly as I've only printed $1 and $4 next to each other - I'm sure you can alter it to your taste.

If you need to retain the double quotes around the fields, remove both gsub(); functions.

Explanation:

This breaks down as follows:-

Characters that are not commas ([^,]) as many times as possible (*). Everything between two commas is a field.

Or (|)

A single double-quote (") followed by no double-quotes ([^"]) as many times as possible (*) followed by a single double-quote ("). Everything within double-quotes (including a comma) counts as one field.

edited Apr 29 '18 at 22:32

Jeff Schaller

39.2k1054125

answered Aug 11 '14 at 13:11

garethTheRed

24.1k36180

add a comment |

If you have GNU awk:

$ awk -vFPAT='[^,]*|"[^"]*"' '{ gsub("^"|"$","",$1); gsub("^"|"$","",$4); print $1 $4} '

prabhat,kumarbangalore,India

The output format is a little ugly as I've only printed $1 and $4 next to each other - I'm sure you can alter it to your taste.

If you need to retain the double quotes around the fields, remove both gsub(); functions.

Explanation:

This breaks down as follows:-

Characters that are not commas ([^,]) as many times as possible (*). Everything between two commas is a field.

Or (|)

A single double-quote (") followed by no double-quotes ([^"]) as many times as possible (*) followed by a single double-quote ("). Everything within double-quotes (including a comma) counts as one field.

edited Apr 29 '18 at 22:32

Jeff Schaller

39.2k1054125

answered Aug 11 '14 at 13:11

garethTheRed

24.1k36180

If you have GNU awk:

$ awk -vFPAT='[^,]*|"[^"]*"' '{ gsub("^"|"$","",$1); gsub("^"|"$","",$4); print $1 $4} '

prabhat,kumarbangalore,India

The output format is a little ugly as I've only printed $1 and $4 next to each other - I'm sure you can alter it to your taste.

If you need to retain the double quotes around the fields, remove both gsub(); functions.

Explanation:

This breaks down as follows:-

Characters that are not commas ([^,]) as many times as possible (*). Everything between two commas is a field.

Or (|)

A single double-quote (") followed by no double-quotes ([^"]) as many times as possible (*) followed by a single double-quote ("). Everything within double-quotes (including a comma) counts as one field.

edited Apr 29 '18 at 22:32

Jeff Schaller

39.2k1054125

answered Aug 11 '14 at 13:11

garethTheRed

24.1k36180

edited Apr 29 '18 at 22:32

Jeff Schaller

39.2k1054125

edited Apr 29 '18 at 22:32

Jeff Schaller

39.2k1054125

edited Apr 29 '18 at 22:32

Jeff Schaller

39.2k1054125

answered Aug 11 '14 at 13:11

garethTheRed

24.1k36180

answered Aug 11 '14 at 13:11

garethTheRed

24.1k36180

answered Aug 11 '14 at 13:11

garethTheRed

24.1k36180

add a comment |

Ruby is handy for CSV parsing:

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|")' file

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

Note that there are no quotes in the outputs. This is because none of the fields contain the field separator. If you need quotes, you can forcibly quote all fields (even integers):

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|",:force_quotes=>true)' file

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

answered Aug 11 '14 at 18:27

glenn jackman

50.5k570108

OP asked for awk
– Andrew
Aug 15 '14 at 19:38

add a comment |

Ruby is handy for CSV parsing:

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|")' file

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

Note that there are no quotes in the outputs. This is because none of the fields contain the field separator. If you need quotes, you can forcibly quote all fields (even integers):

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|",:force_quotes=>true)' file

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

answered Aug 11 '14 at 18:27

glenn jackman

50.5k570108

OP asked for awk
– Andrew
Aug 15 '14 at 19:38

add a comment |

Ruby is handy for CSV parsing:

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|")' file

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

Note that there are no quotes in the outputs. This is because none of the fields contain the field separator. If you need quotes, you can forcibly quote all fields (even integers):

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|",:force_quotes=>true)' file

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

answered Aug 11 '14 at 18:27

glenn jackman

50.5k570108

Ruby is handy for CSV parsing:

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|")' file

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

prabhat,kumar|19|2000|bangalore,India|ABC,DEF|GHI|123|KLM|NOP,QRS

Note that there are no quotes in the outputs. This is because none of the fields contain the field separator. If you need quotes, you can forcibly quote all fields (even integers):

ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|",:force_quotes=>true)' file

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

"prabhat,kumar"|"19"|"2000"|"bangalore,India"|"ABC,DEF"|"GHI"|"123"|"KLM"|"NOP,QRS"

answered Aug 11 '14 at 18:27

glenn jackman

50.5k570108

answered Aug 11 '14 at 18:27

glenn jackman

50.5k570108

answered Aug 11 '14 at 18:27

glenn jackman

50.5k570108

answered Aug 11 '14 at 18:27

glenn jackman

50.5k570108

OP asked for awk
– Andrew
Aug 15 '14 at 19:38

add a comment |

OP asked for awk
– Andrew
Aug 15 '14 at 19:38

OP asked for awk
– Andrew
Aug 15 '14 at 19:38

add a comment |

You can use perl instead:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' |

  perl -F',(?![[:alpha:]])' -anle 'print "$F[0] $F[3]"'

"prabhat,kumar" "bangalore,India"

answered Aug 11 '14 at 12:07

cuonglm

102k23202302

1

Clever! However, it will fail for cases where there is a non letter after the comma. For example prabhat, (kumar) or a letter after a non-quoted comma: "prabhat,kumar",A 19,
– terdon♦
Aug 11 '14 at 12:53

@terdon: Yes, of course, as it's desired. Your answer using Text:CSV is better for this job.
– cuonglm
Aug 11 '14 at 14:06

add a comment |

You can use perl instead:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' |

  perl -F',(?![[:alpha:]])' -anle 'print "$F[0] $F[3]"'

"prabhat,kumar" "bangalore,India"

answered Aug 11 '14 at 12:07

cuonglm

102k23202302

1

Clever! However, it will fail for cases where there is a non letter after the comma. For example prabhat, (kumar) or a letter after a non-quoted comma: "prabhat,kumar",A 19,
– terdon♦
Aug 11 '14 at 12:53

@terdon: Yes, of course, as it's desired. Your answer using Text:CSV is better for this job.
– cuonglm
Aug 11 '14 at 14:06

add a comment |

You can use perl instead:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' |

  perl -F',(?![[:alpha:]])' -anle 'print "$F[0] $F[3]"'

"prabhat,kumar" "bangalore,India"

answered Aug 11 '14 at 12:07

cuonglm

102k23202302

You can use perl instead:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' |

  perl -F',(?![[:alpha:]])' -anle 'print "$F[0] $F[3]"'

"prabhat,kumar" "bangalore,India"

answered Aug 11 '14 at 12:07

cuonglm

102k23202302

answered Aug 11 '14 at 12:07

cuonglm

102k23202302

answered Aug 11 '14 at 12:07

cuonglm

102k23202302

answered Aug 11 '14 at 12:07

cuonglm

102k23202302

1

Clever! However, it will fail for cases where there is a non letter after the comma. For example prabhat, (kumar) or a letter after a non-quoted comma: "prabhat,kumar",A 19,
– terdon♦
Aug 11 '14 at 12:53

@terdon: Yes, of course, as it's desired. Your answer using Text:CSV is better for this job.
– cuonglm
Aug 11 '14 at 14:06

add a comment |

1

Clever! However, it will fail for cases where there is a non letter after the comma. For example prabhat, (kumar) or a letter after a non-quoted comma: "prabhat,kumar",A 19,
– terdon♦
Aug 11 '14 at 12:53

@terdon: Yes, of course, as it's desired. Your answer using Text:CSV is better for this job.
– cuonglm
Aug 11 '14 at 14:06

Clever! However, it will fail for cases where there is a non letter after the comma. For example prabhat, (kumar) or a letter after a non-quoted comma: "prabhat,kumar",A 19,
– terdon♦
Aug 11 '14 at 12:53

@terdon: Yes, of course, as it's desired. Your answer using Text:CSV is better for this job.
– cuonglm
Aug 11 '14 at 14:06

add a comment |

This worked for me:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' | 

  awk -F, '{print $1,$2,$3,$4,$5,$6}'| 

    awk -F" '{print $2,$3,$4}'|awk -F  '{print $1","$2,$3,$4,$5","$6}'`

edited Aug 11 '14 at 14:07

terdon♦

129k32253428

answered Aug 11 '14 at 13:26

Andrew

93557

1

please add an explanation to your answer
– Networker
Aug 11 '14 at 13:54

1

This will only work if you know the format before hand. What if it is the 3d field that's quoted and not the 1st?
– terdon♦
Aug 11 '14 at 14:08

OP gave a specific format, no need nor expectation for the format to change. "Last,First",Age,Year,"City,Country". Bangalore is a city in Karnataka, so state is undefined in the data. First and fourth field are text, let's expect quotes. Second and third field is integer and would/should never be quoted. Understand the data and avoid over complicating the problem.
– Andrew
Aug 15 '14 at 19:34

add a comment |

This worked for me:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' | 

  awk -F, '{print $1,$2,$3,$4,$5,$6}'| 

    awk -F" '{print $2,$3,$4}'|awk -F  '{print $1","$2,$3,$4,$5","$6}'`

edited Aug 11 '14 at 14:07

terdon♦

129k32253428

answered Aug 11 '14 at 13:26

Andrew

93557

1

please add an explanation to your answer
– Networker
Aug 11 '14 at 13:54

1

This will only work if you know the format before hand. What if it is the 3d field that's quoted and not the 1st?
– terdon♦
Aug 11 '14 at 14:08

OP gave a specific format, no need nor expectation for the format to change. "Last,First",Age,Year,"City,Country". Bangalore is a city in Karnataka, so state is undefined in the data. First and fourth field are text, let's expect quotes. Second and third field is integer and would/should never be quoted. Understand the data and avoid over complicating the problem.
– Andrew
Aug 15 '14 at 19:34

add a comment |

This worked for me:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' | 

  awk -F, '{print $1,$2,$3,$4,$5,$6}'| 

    awk -F" '{print $2,$3,$4}'|awk -F  '{print $1","$2,$3,$4,$5","$6}'`

edited Aug 11 '14 at 14:07

terdon♦

129k32253428

answered Aug 11 '14 at 13:26

Andrew

93557

This worked for me:

$ echo '"prabhat,kumar",19,2000,"bangalore,India"' | 

  awk -F, '{print $1,$2,$3,$4,$5,$6}'| 

    awk -F" '{print $2,$3,$4}'|awk -F  '{print $1","$2,$3,$4,$5","$6}'`

edited Aug 11 '14 at 14:07

terdon♦

129k32253428

answered Aug 11 '14 at 13:26

Andrew

93557

edited Aug 11 '14 at 14:07

terdon♦

129k32253428

edited Aug 11 '14 at 14:07

terdon♦

129k32253428

edited Aug 11 '14 at 14:07

terdon♦

129k32253428

answered Aug 11 '14 at 13:26

Andrew

93557

answered Aug 11 '14 at 13:26

Andrew

93557

answered Aug 11 '14 at 13:26

Andrew

93557

1

please add an explanation to your answer
– Networker
Aug 11 '14 at 13:54

1

This will only work if you know the format before hand. What if it is the 3d field that's quoted and not the 1st?
– terdon♦
Aug 11 '14 at 14:08

OP gave a specific format, no need nor expectation for the format to change. "Last,First",Age,Year,"City,Country". Bangalore is a city in Karnataka, so state is undefined in the data. First and fourth field are text, let's expect quotes. Second and third field is integer and would/should never be quoted. Understand the data and avoid over complicating the problem.
– Andrew
Aug 15 '14 at 19:34

add a comment |

1

please add an explanation to your answer
– Networker
Aug 11 '14 at 13:54

1

This will only work if you know the format before hand. What if it is the 3d field that's quoted and not the 1st?
– terdon♦
Aug 11 '14 at 14:08

OP gave a specific format, no need nor expectation for the format to change. "Last,First",Age,Year,"City,Country". Bangalore is a city in Karnataka, so state is undefined in the data. First and fourth field are text, let's expect quotes. Second and third field is integer and would/should never be quoted. Understand the data and avoid over complicating the problem.
– Andrew
Aug 15 '14 at 19:34

please add an explanation to your answer
– Networker
Aug 11 '14 at 13:54

This will only work if you know the format before hand. What if it is the 3d field that's quoted and not the 1st?
– terdon♦
Aug 11 '14 at 14:08

OP gave a specific format, no need nor expectation for the format to change. "Last,First",Age,Year,"City,Country". Bangalore is a city in Karnataka, so state is undefined in the data. First and fourth field are text, let's expect quotes. Second and third field is integer and would/should never be quoted. Understand the data and avoid over complicating the problem.
– Andrew
Aug 15 '14 at 19:34

add a comment |

Probably a qsv would be more your speed?

sed 's/.*/,&,/;:t

s/,"/"/;s/",/"/;s/,([^"]*),/"1/;tt

' <<DATA

"prabhat,kumar",19,2000,"bangalore,India"

DATA

OUTPUT

"prabhat,kumar"19"2000"bangalore,India"

edited Aug 11 '14 at 22:17

answered Aug 11 '14 at 21:46

mikeserv

45.3k568154

add a comment |

Probably a qsv would be more your speed?

sed 's/.*/,&,/;:t

s/,"/"/;s/",/"/;s/,([^"]*),/"1/;tt

' <<DATA

"prabhat,kumar",19,2000,"bangalore,India"

DATA

OUTPUT

"prabhat,kumar"19"2000"bangalore,India"

edited Aug 11 '14 at 22:17

answered Aug 11 '14 at 21:46

mikeserv

45.3k568154

add a comment |

Probably a qsv would be more your speed?

sed 's/.*/,&,/;:t

s/,"/"/;s/",/"/;s/,([^"]*),/"1/;tt

' <<DATA

"prabhat,kumar",19,2000,"bangalore,India"

DATA

OUTPUT

"prabhat,kumar"19"2000"bangalore,India"

edited Aug 11 '14 at 22:17

answered Aug 11 '14 at 21:46

mikeserv

45.3k568154

Probably a qsv would be more your speed?

sed 's/.*/,&,/;:t

s/,"/"/;s/",/"/;s/,([^"]*),/"1/;tt

' <<DATA

"prabhat,kumar",19,2000,"bangalore,India"

DATA

OUTPUT

"prabhat,kumar"19"2000"bangalore,India"

edited Aug 11 '14 at 22:17

answered Aug 11 '14 at 21:46

mikeserv

45.3k568154

edited Aug 11 '14 at 22:17

answered Aug 11 '14 at 21:46

mikeserv

45.3k568154

answered Aug 11 '14 at 21:46

mikeserv

45.3k568154

answered Aug 11 '14 at 21:46

mikeserv

45.3k568154

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrtjryk